cmapss
The NASA C-MAPSS Turbofan Degradation dataset is a collection of simulated degradation experiments on jet engines. It contains four sub-datasets named FD1, FD2, FD3 and FD4 which differ in operation conditions and possible failure types.
CmapssReader
Bases: AbstractReader
This reader represents the NASA CMAPSS Turbofan Degradation dataset. Each of its four sub-datasets contains a training and a test split. Upon first usage, the training split will be further divided into a development and a validation split. 20% of the original training split is reserved for validation.
The features are provided as sliding windows over each time series in the dataset. The label of a window is the label of its last time step. The RUL labels are capped by a maximum value. The original data contains 24 channels per time step. Following the literature, we omit the constant channels and operation condition channels by default. Therefore, the default channel indices are 4, 5, 6, 9, 10, 11, 13, 14, 15, 16, 17, 19, 22 and 23.
The features are min-max scaled between -1 and 1. The scaler is fitted on the development data only.
Examples:
Default channels
>>> import rul_datasets
>>> fd1 = rul_datasets.reader.CmapssReader(fd=1, window_size=30)
>>> fd1.prepare_data()
>>> features, labels = fd1.load_split("dev")
>>> features[0].shape
(163, 30, 14)
Custom channels
>>> import rul_datasets
>>> fd1 = rul_datasets.reader.CmapssReader(fd=1, feature_select=[1, 2, 3])
>>> fd1.prepare_data()
>>> features, labels = fd1.load_split("dev")
>>> features[0].shape
(163, 30, 3)
fds: List[int]
property
Indices of available sub-datasets.
__init__(fd, window_size=None, max_rul=125, percent_broken=None, percent_fail_runs=None, feature_select=None, truncate_val=False, operation_condition_aware_scaling=False, truncate_degraded_only=False)
Create a new CMAPSS reader for one of the sub-datasets. The maximum RUL value
is set to 125 by default. The 14 feature channels selected by default can be
overridden by passing a list of channel indices to feature_select
. The
default window size is defined per sub-dataset as the minimum time series
length in the test set.
The data can be scaled separately for each operation condition, as done by Ragab et al. This only affects FD002 and FD004 due to them having multiple operation conditions.
For more information about using readers, refer to the reader module page.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fd |
int
|
Index of the selected sub-dataset |
required |
window_size |
Optional[int]
|
Size of the sliding window. Default defined per sub-dataset. |
None
|
max_rul |
Optional[int]
|
Maximum RUL value of targets. |
125
|
percent_broken |
Optional[float]
|
The maximum relative degradation per time series. |
None
|
percent_fail_runs |
Optional[Union[float, List[int]]]
|
The percentage or index list of available time series. |
None
|
feature_select |
Optional[List[int]]
|
The index list of selected feature channels. |
None
|
truncate_val |
bool
|
Truncate the validation data with |
False
|
operation_condition_aware_scaling |
bool
|
Scale data separatly for each operation condition. |
False
|
truncate_degraded_only |
bool
|
Only truncate the degraded part of the data (< max RUL). |
False
|
prepare_data()
Prepare the CMAPSS dataset. This function needs to be called before using the dataset for the first time.
The dataset is downloaded from a custom mirror and extracted into the data root directory. The training data is then split into development and validation set. Afterwards, a scaler is fit on the development features. Previously completed steps are skipped.