Engineer

The engineer class works to create train and test data. It reads in processed data, and writes out (X, y) pairs. This class allows the nature of the input and output variables to be defined.

The (X, y) pairs saved by the engineer are grouped by prediction month (i.e. a single NetCDF file is written for each prediction month, both for the train and test data.


class src.engineer.Engineer(data_folder: pathlib.Path = PosixPath('data'), process_static: bool = False)

Takes the output of the processors, and turns it into NetCDF files ready to be input into the machine learning models.

Parameters
  • data_folder – The location of the data folder.

  • process_static – Defines whether or not to process the static data.

engineer(test_year: Union[int, List[int]], target_variable: str = 'VHI', pred_months: int = 12, expected_length: Optional[int] = 12) → None

Runs the engineer.

Parameters
  • test_year – Years of data to use as test data. Only data from before min(test_year) will be used for training data.

  • target_variable – The target variable. Must be in one of the processed files. Default: "VHI".

  • pred_months – The number of months to use as input to the model. Default: 12 (a year’s worth of data).

  • expected_length – The expected length of the output sequence (e.g. if the data was processed to weekly timesteps, then we might expect expected_length = 4 * pred_months). If not None, any sequence which does not have this length (e.g. due to missing data) will be skipped. Default: 12.