Engineer¶
The engineer class works to create train and test data. It reads in processed
data, and writes out (X, y)
pairs. This class allows the nature of the input
and output variables to be defined.
The (X, y)
pairs saved by the engineer are grouped by prediction month (i.e.
a single NetCDF
file is written for each prediction month, both for the train and
test data.
-
class
src.engineer.
Engineer
(data_folder: pathlib.Path = PosixPath('data'), process_static: bool = False)¶ Takes the output of the processors, and turns it into NetCDF files ready to be input into the machine learning models.
- Parameters
data_folder – The location of the data folder.
process_static – Defines whether or not to process the static data.
-
engineer
(test_year: Union[int, List[int]], target_variable: str = 'VHI', pred_months: int = 12, expected_length: Optional[int] = 12) → None¶ Runs the engineer.
- Parameters
test_year – Years of data to use as test data. Only data from before
min(test_year)
will be used for training data.target_variable – The target variable. Must be in one of the processed files. Default:
"VHI"
.pred_months – The number of months to use as input to the model. Default:
12
(a year’s worth of data).expected_length – The expected length of the output sequence (e.g. if the data was processed to weekly timesteps, then we might expect
expected_length = 4 * pred_months
). If notNone
, any sequence which does not have this length (e.g. due to missing data) will be skipped. Default:12
.