Skip to content

ncmapss

The New C-MAPSS Turbofan Degradation dataset is based on the same simulation as C-MAPSS. In contrast to the original dataset, it contains fewer engine units, but each of them is recorded in more detail and under more realistic operation conditions. Each unit has flight cycles recorded from the healthy state until failure with RUL values assigned to the whole cycle. Inside a flight cycle, data is recorded with a 1Hz resolution The dataset is split into seven sub-datasets (FD=1 to FD=7) that differ in the number of engine units and the types of failures present.

Note

An eighth sub-dataset exists but is not present here as one of its data files seems corrupted. The dataset authors were already contacted about this issue.

NCmapssReader

Bases: AbstractReader

This reader provides access to the New C-MAPSS Turbofan Degradation dataset. Each of its seven sub-datasets contains a default train/val/test split which can be overridden by the run_split_dist argument.

The features are provided as a windowed time series for each unit. The windows represent one flight cycle and are, by default, padded to the longest cycle in the sub-dataset. The window size can be overridden by the window_size argument which truncates each cycle at the end. Additionally, the features can be downsampled in time by taking the average of resolution_seconds consecutive time steps. The default channels are the four operating conditions, the 14 physical, and 14 virtual sensors in this order.

The features are min-max scaled between zero and one. The scaler is fitted on the development data only. It is refit for each custom run_split_dist when prepare_data is called.

Examples:

Default channels

>>> reader = NCmapssReader(fd=1)
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 20294, 32)

Physical sensors only

>>> reader = NCmapssReader(fd=1, feature_select=list(range(4, 18)))
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 20294, 14)

Custom split and window size

>>> reader = NCmapssReader(
...     fd=1,
...     run_split_dist={"dev": [0, 1], "val": [2], "test": [3]},
...     window_size=100,  # first 100 steps of each cycle
... )
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 100, 32)

Downsampled features

>>> reader = NCmapssReader(fd=1, resolution_seconds=10)
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape  # window size is automatically adjusted
(100, 2029, 32)

fds: List[int] property

Indices of the available sub-datasets.

__init__(fd, window_size=None, max_rul=65, percent_broken=None, percent_fail_runs=None, feature_select=None, truncate_val=False, run_split_dist=None, truncate_degraded_only=False, resolution_seconds=1, padding_value=0.0, scaling_range=(0, 1))

Create a new reader for the New C-MAPSS dataset. The maximum RUL value is set to 65 by default. The default channels are the four operating conditions, the 14 physical, and 14 virtual sensors in this order.

The default window size is, by default, the longest flight cycle in the sub-dataset. Shorter cycles are padded on the left. The default padding value is zero but can be overridden, e.g., as -1 to make filtering for padding easier later on.

The default run_split_dist is the same as in the original dataset, but with the last unit of the original train split designated for validation.

If the features are downsampled in time, the default window size is automatically adjusted to window_size // resolution_seconds. Any manually set window_size needs to take this into account as it is applied after downsampling.

For more information about using readers, refer to the reader module page.

Parameters:

Name Type Description Default
fd int

The sub-dataset to use. Must be in [1, 7].

required
max_rul Optional[int]

The maximum RUL value.

65
percent_broken Optional[float]

The maximum relative degradation per unit.

None
percent_fail_runs Optional[Union[float, List[int]]]

The percentage or index list of available units.

None
feature_select Optional[List[int]]

The indices of the features to use.

None
truncate_val bool

Truncate the validation data with percent_broken, too.

False
run_split_dist Optional[Dict[str, List[int]]]

The assignment of units to each split.

None
truncate_degraded_only bool

Only truncate the degraded part of the data (< max RUL).

False
resolution_seconds int

The number of consecutive seconds to average over for downsampling.

1
padding_value float

The value to use for padding the flight cycles.

0.0

prepare_data(cache=True)

Prepare the N-C-MAPSS dataset. This function needs to be called before using the dataset for the first time. The dataset is cached for faster loading in the future. This behavior can be disabled to save disk space by setting cache to False.

The dataset is assumed to be present in the data root directory. The training data is then split into development and validation set. Afterward, a scaler is fit on the development features if it was not already done previously.

Parameters:

Name Type Description Default
cache bool

Whether to cache the data for faster loading in the future.

True