thermography.classification.dataset package

This package contains the source code for loading and handling the dataset needed to train a classifier.

Submodules

thermography.classification.dataset.thermo_dataset module

class ThermoClass(class_name: str, class_value: int, class_folder: str = None)[source]

Bases: object

A class for defining which label is assigned to each class used in the classification step.

Builds a class used for classification with the parameter passed as argument.

Parameters:
  • class_name – Human readable name associated to this class.
  • class_value – Numerical value (label) associated to this class.
  • class_folder – Folder where the training images associated to this class are stored.
class ThermoDataset(img_shape: numpy.ndarray, batch_size: int = 32, balance_data: bool = True, normalize_images: bool = True)[source]

Bases: object

Dataset class which handles the input image as a dataset.

Example:
dataset = ThermoDataset(img_shape, batch_size)
dataset.load_dataset(root_directory_list, class_list)

train_iterator = dataset.get_train_iterator()
next_train_batch = train_iterator.get_next()
test_iterator = dataset.get_test_iterator()
next_test_batch = test_iterator.get_next()

# Build the computation graph.
...

with tf.Session() as sess:
    for epoch in range(epochs):
        sess.run(train_iterator.initializer)
        sess.run(test_iterator.initializer)

        # Train
        while True:
            try:
                img_batch, label_batch = sess.run(next_train_batch)
            except: # Training dataset is terminated
                break
            # Train the model.
            ...

        # Test
         while True:
            try:
                img_batch, label_batch = sess.run(next_test_batch)
            except: # Test dataset is terminated
                break
            # Test the model.

Initializes the parameters of the dataset without loading anything.

Parameters:
  • img_shape – Image shape of the dataset. All images on disk which don’t fulfill this shape are resized accordingly.
  • batch_size – Batch size used for training. This parameters influences the size of tha batch returned by the dataset iterators (see <self.get_train_iterator)
  • balance_data – Boolean flag which determines whether to balance the data on disk. If True, the loaded classes will have the same amount of samples (some samples of the majority classes will be discarded).
  • normalize_images – Boolean flag indicating whether no normalize each input image (mean: 0, std: 1)
data_size

Returns the size of the dataset, i.e. the total number of images loaded.

get_test_iterator() → tensorflow.python.data.ops.iterator_ops.Iterator[source]

Builds and returns an initializable iterator for the test dataset.

get_train_iterator() → tensorflow.python.data.ops.iterator_ops.Iterator[source]

Builds and returns an initializable iterator for the training dataset.

get_validation_iterator() → tensorflow.python.data.ops.iterator_ops.Iterator[source]

Builds and returns an initializable iterator for the validation dataset.

image_shape

Returns the image shape of the dataset.

load_dataset(root_directory_list: list, class_list: typing.List[thermography.classification.dataset.thermo_dataset.ThermoClass], load_all_data: bool = False) → None[source]

Loads the dataset from the files contained in the list of root directories.

Parameters:
  • root_directory_list – List of root directories containing the data. This list can be generated using create_directory_list().
  • class_list – List of classes used for classification.
  • load_all_data – Boolean flag indicating whether to preload the entire dataset in memory once, or to load the data on the fly whenever a new batch is needed.

Note

Loading the entire dataset into memory can increase the training process.

print_info()[source]

Prints the dataset properties.

rgb

Returns a boolean indicating whether the dataset has three channels (RGB) or is grayscale.

root_directory_list

Returns the list of root directories of the data.

set_train_test_validation_fraction(train_fraction, test_fraction, validation_fraction) → None[source]

Sets the train-test-validation fraction of the dataset.

split_fraction

Returns the fraction used to split the loaded data into train, test and validation data.

test

Returns a reference to the test data.

test_size

Returns the size of the testing data, i.e. the total number of images available for testing.

thermo_class_list

Returns the classes associated to the dataset.

train

Returns a reference to the training data.

train_size

Returns the size of the training data, i.e. the total number of images available for training.

validation

Returns a reference to the validation data.

validation_size

Returns the size of the validation data, i.e. the total number of images available for validation.

create_directory_list(root_dir: str)[source]

Creates a list of directories for dataset loading.

Parameters:root_dir – Absolute path to the root directory of the dataset.

Note

The dataset root directory must be of the following form:

root_dir
|__video1
|    |__0-1000
|    |__1000_2000
|__video2
|    |__0-500
|    |__500-1000
|    |__1000-1200
|__video3
     |__0-1000

and each folder ‘xxxx-yyyy’ must contain three directories associated to the classes of the dataset (see ThermoClass.class_folder).

Returns:A list of absolute paths to the class directories containing the dataset images.