thermography.classification.dataset package¶

This package contains the source code for loading and handling the dataset needed to train a classifier.

Submodules¶

thermography.classification.dataset.thermo_dataset module¶

class ThermoClass(class_name: str, class_value: int, class_folder: str = None)[source]¶

Bases: object

A class for defining which label is assigned to each class used in the classification step.

Builds a class used for classification with the parameter passed as argument.

Parameters:	class_name – Human readable name associated to this class. class_value – Numerical value (label) associated to this class. class_folder – Folder where the training images associated to this class are stored.

class ThermoDataset(img_shape: numpy.ndarray, batch_size: int = 32, balance_data: bool = True, normalize_images: bool = True)[source]¶

Bases: object

Dataset class which handles the input image as a dataset.

Example:

dataset = ThermoDataset(img_shape, batch_size)
dataset.load_dataset(root_directory_list, class_list)

train_iterator = dataset.get_train_iterator()
next_train_batch = train_iterator.get_next()
test_iterator = dataset.get_test_iterator()
next_test_batch = test_iterator.get_next()

# Build the computation graph.
...

with tf.Session() as sess:
    for epoch in range(epochs):
        sess.run(train_iterator.initializer)
        sess.run(test_iterator.initializer)

        # Train
        while True:
            try:
                img_batch, label_batch = sess.run(next_train_batch)
            except: # Training dataset is terminated
                break
            # Train the model.
            ...

        # Test
         while True:
            try:
                img_batch, label_batch = sess.run(next_test_batch)
            except: # Test dataset is terminated
                break
            # Test the model.

Initializes the parameters of the dataset without loading anything.

Parameters:

img_shape – Image shape of the dataset. All images on disk which don’t fulfill this shape are resized accordingly.
batch_size – Batch size used for training. This parameters influences the size of tha batch returned by the dataset iterators (see <self.get_train_iterator)
balance_data – Boolean flag which determines whether to balance the data on disk. If True, the loaded classes will have the same amount of samples (some samples of the majority classes will be discarded).
normalize_images – Boolean flag indicating whether no normalize each input image (mean: 0, std: 1)

data_size¶: Returns the size of the dataset, i.e. the total number of images loaded.

get_test_iterator() → tensorflow.python.data.ops.iterator_ops.Iterator[source]¶: Builds and returns an initializable iterator for the test dataset.

get_train_iterator() → tensorflow.python.data.ops.iterator_ops.Iterator[source]¶: Builds and returns an initializable iterator for the training dataset.

get_validation_iterator() → tensorflow.python.data.ops.iterator_ops.Iterator[source]¶: Builds and returns an initializable iterator for the validation dataset.

image_shape¶: Returns the image shape of the dataset.

load_dataset(root_directory_list: list, class_list: typing.List[thermography.classification.dataset.thermo_dataset.ThermoClass], load_all_data: bool = False) → None[source]¶

Loads the dataset from the files contained in the list of root directories.

Parameters:	root_directory_list – List of root directories containing the data. This list can be generated using `create_directory_list()`. class_list – List of classes used for classification. load_all_data – Boolean flag indicating whether to preload the entire dataset in memory once, or to load the data on the fly whenever a new batch is needed.

Note

Loading the entire dataset into memory can increase the training process.

print_info()[source]¶: Prints the dataset properties.

rgb¶: Returns a boolean indicating whether the dataset has three channels (RGB) or is grayscale.

root_directory_list¶: Returns the list of root directories of the data.

set_train_test_validation_fraction(train_fraction, test_fraction, validation_fraction) → None[source]¶: Sets the train-test-validation fraction of the dataset.

split_fraction¶: Returns the fraction used to split the loaded data into train, test and validation data.

test¶: Returns a reference to the test data.

test_size¶: Returns the size of the testing data, i.e. the total number of images available for testing.

thermo_class_list¶: Returns the classes associated to the dataset.

train¶: Returns a reference to the training data.

train_size¶: Returns the size of the training data, i.e. the total number of images available for training.

validation¶: Returns a reference to the validation data.

validation_size¶: Returns the size of the validation data, i.e. the total number of images available for validation.

create_directory_list(root_dir: str)[source]¶

Creates a list of directories for dataset loading.

Parameters:	root_dir – Absolute path to the root directory of the dataset.

Note

The dataset root directory must be of the following form:

root_dir
|__video1
|    |__0-1000
|    |__1000_2000
|__video2
|    |__0-500
|    |__500-1000
|    |__1000-1200
|__video3
     |__0-1000

and each folder ‘xxxx-yyyy’ must contain three directories associated to the classes of the dataset (see ThermoClass.class_folder).

Returns:	A list of absolute paths to the class directories containing the dataset images.