Skip to content

cmapss

The NASA C-MAPSS Turbofan Degradation dataset is a collection of simulated degradation experiments on jet engines. It contains four sub-datasets named FD1, FD2, FD3 and FD4 which differ in operation conditions and possible failure types.

CmapssReader

Bases: AbstractReader

This reader represents the NASA CMAPSS Turbofan Degradation dataset. Each of its four sub-datasets contains a training and a test split. Upon first usage, the training split will be further divided into a development and a validation split. 20% of the original training split is reserved for validation.

The features are provided as sliding windows over each time series in the dataset. The label of a window is the label of its last time step. The RUL labels are capped by a maximum value. The original data contains 24 channels per time step. Following the literature, we omit the constant channels and operation condition channels by default. Therefore, the default channel indices are 4, 5, 6, 9, 10, 11, 13, 14, 15, 16, 17, 19, 22 and 23.

The features are min-max scaled between -1 and 1. The scaler is fitted on the development data only.

Examples:

Default channels

>>> import rul_datasets
>>> fd1 = rul_datasets.reader.CmapssReader(fd=1, window_size=30)
>>> fd1.prepare_data()
>>> features, labels = fd1.load_split("dev")
>>> features[0].shape
(163, 30, 14)

Custom channels

>>> import rul_datasets
>>> fd1 = rul_datasets.reader.CmapssReader(fd=1, feature_select=[1, 2, 3])
>>> fd1.prepare_data()
>>> features, labels = fd1.load_split("dev")
>>> features[0].shape
(163, 30, 3)

fds: List[int] property

Indices of available sub-datasets.

__init__(fd, window_size=None, max_rul=125, percent_broken=None, percent_fail_runs=None, feature_select=None, truncate_val=False, operation_condition_aware_scaling=False, truncate_degraded_only=False)

Create a new CMAPSS reader for one of the sub-datasets. The maximum RUL value is set to 125 by default. The 14 feature channels selected by default can be overridden by passing a list of channel indices to feature_select. The default window size is defined per sub-dataset as the minimum time series length in the test set.

The data can be scaled separately for each operation condition, as done by Ragab et al. This only affects FD002 and FD004 due to them having multiple operation conditions.

For more information about using readers, refer to the reader module page.

Parameters:

Name Type Description Default
fd int

Index of the selected sub-dataset

required
window_size Optional[int]

Size of the sliding window. Default defined per sub-dataset.

None
max_rul Optional[int]

Maximum RUL value of targets.

125
percent_broken Optional[float]

The maximum relative degradation per time series.

None
percent_fail_runs Optional[Union[float, List[int]]]

The percentage or index list of available time series.

None
feature_select Optional[List[int]]

The index list of selected feature channels.

None
truncate_val bool

Truncate the validation data with percent_broken, too.

False
operation_condition_aware_scaling bool

Scale data separatly for each operation condition.

False
truncate_degraded_only bool

Only truncate the degraded part of the data (< max RUL).

False

prepare_data()

Prepare the CMAPSS dataset. This function needs to be called before using the dataset for the first time.

The dataset is downloaded from a custom mirror and extracted into the data root directory. The training data is then split into development and validation set. Afterwards, a scaler is fit on the development features. Previously completed steps are skipped.