thermography.classification.dataset package¶
This package contains the source code for loading and handling the dataset needed to train a classifier.
Submodules¶
thermography.classification.dataset.thermo_dataset module¶
-
class
ThermoClass
(class_name: str, class_value: int, class_folder: str = None)[source]¶ Bases:
object
A class for defining which label is assigned to each class used in the classification step.
Builds a class used for classification with the parameter passed as argument.
Parameters: - class_name – Human readable name associated to this class.
- class_value – Numerical value (label) associated to this class.
- class_folder – Folder where the training images associated to this class are stored.
-
class
ThermoDataset
(img_shape: numpy.ndarray, batch_size: int = 32, balance_data: bool = True, normalize_images: bool = True)[source]¶ Bases:
object
Dataset class which handles the input image as a dataset.
Example: dataset = ThermoDataset(img_shape, batch_size) dataset.load_dataset(root_directory_list, class_list) train_iterator = dataset.get_train_iterator() next_train_batch = train_iterator.get_next() test_iterator = dataset.get_test_iterator() next_test_batch = test_iterator.get_next() # Build the computation graph. ... with tf.Session() as sess: for epoch in range(epochs): sess.run(train_iterator.initializer) sess.run(test_iterator.initializer) # Train while True: try: img_batch, label_batch = sess.run(next_train_batch) except: # Training dataset is terminated break # Train the model. ... # Test while True: try: img_batch, label_batch = sess.run(next_test_batch) except: # Test dataset is terminated break # Test the model.
Initializes the parameters of the dataset without loading anything.
Parameters: - img_shape – Image shape of the dataset. All images on disk which don’t fulfill this shape are resized accordingly.
- batch_size – Batch size used for training. This parameters influences the size of tha batch returned by the dataset iterators (see
<self.get_train_iterator
) - balance_data – Boolean flag which determines whether to balance the data on disk. If True, the loaded classes will have the same amount of samples (some samples of the majority classes will be discarded).
- normalize_images – Boolean flag indicating whether no normalize each input image (mean: 0, std: 1)
-
data_size
¶ Returns the size of the dataset, i.e. the total number of images loaded.
-
get_test_iterator
() → tensorflow.python.data.ops.iterator_ops.Iterator[source]¶ Builds and returns an initializable iterator for the test dataset.
-
get_train_iterator
() → tensorflow.python.data.ops.iterator_ops.Iterator[source]¶ Builds and returns an initializable iterator for the training dataset.
-
get_validation_iterator
() → tensorflow.python.data.ops.iterator_ops.Iterator[source]¶ Builds and returns an initializable iterator for the validation dataset.
-
image_shape
¶ Returns the image shape of the dataset.
-
load_dataset
(root_directory_list: list, class_list: typing.List[thermography.classification.dataset.thermo_dataset.ThermoClass], load_all_data: bool = False) → None[source]¶ Loads the dataset from the files contained in the list of root directories.
Parameters: - root_directory_list – List of root directories containing the data. This list can be generated using
create_directory_list()
. - class_list – List of classes used for classification.
- load_all_data – Boolean flag indicating whether to preload the entire dataset in memory once, or to load the data on the fly whenever a new batch is needed.
Note
Loading the entire dataset into memory can increase the training process.
- root_directory_list – List of root directories containing the data. This list can be generated using
-
rgb
¶ Returns a boolean indicating whether the dataset has three channels (RGB) or is grayscale.
-
root_directory_list
¶ Returns the list of root directories of the data.
-
set_train_test_validation_fraction
(train_fraction, test_fraction, validation_fraction) → None[source]¶ Sets the train-test-validation fraction of the dataset.
-
split_fraction
¶ Returns the fraction used to split the loaded data into train, test and validation data.
-
test
¶ Returns a reference to the test data.
-
test_size
¶ Returns the size of the testing data, i.e. the total number of images available for testing.
-
thermo_class_list
¶ Returns the classes associated to the dataset.
-
train
¶ Returns a reference to the training data.
-
train_size
¶ Returns the size of the training data, i.e. the total number of images available for training.
-
validation
¶ Returns a reference to the validation data.
-
validation_size
¶ Returns the size of the validation data, i.e. the total number of images available for validation.
-
create_directory_list
(root_dir: str)[source]¶ Creates a list of directories for dataset loading.
Parameters: root_dir – Absolute path to the root directory of the dataset. Note
The dataset root directory must be of the following form:
root_dir |__video1 | |__0-1000 | |__1000_2000 |__video2 | |__0-500 | |__500-1000 | |__1000-1200 |__video3 |__0-1000
and each folder ‘xxxx-yyyy’ must contain three directories associated to the classes of the dataset (see
ThermoClass.class_folder
).Returns: A list of absolute paths to the class directories containing the dataset images.