ncmapss

The New C-MAPSS Turbofan Degradation dataset is based on the same simulation as C-MAPSS. In contrast to the original dataset, it contains fewer engine units, but each of them is recorded in more detail and under more realistic operation conditions. Each unit has flight cycles recorded from the healthy state until failure with RUL values assigned to the whole cycle. Inside a flight cycle, data is recorded with a 1Hz resolution The dataset is split into seven sub-datasets (FD=1 to FD=7) that differ in the number of engine units and the types of failures present.

Note

An eighth sub-dataset exists but is not present here as one of its data files seems corrupted. The dataset authors were already contacted about this issue.

`NCmapssReader`

Bases: AbstractReader

This reader provides access to the New C-MAPSS Turbofan Degradation dataset. Each of its seven sub-datasets contains a default train/val/test split which can be overridden by the run_split_dist argument.

The features are provided as a windowed time series for each unit. The windows represent one flight cycle and are, by default, padded to the longest cycle in the sub-dataset. The window size can be overridden by the window_size argument which truncates each cycle at the end. Additionally, the features can be downsampled in time by taking the average of resolution_seconds consecutive time steps. The default channels are the four operating conditions, the 14 physical, and 14 virtual sensors in this order.

The features are min-max scaled between zero and one. The scaler is fitted on the development data only. It is refit for each custom run_split_dist when prepare_data is called.

Examples:

Default channels

>>> reader = NCmapssReader(fd=1)
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 20294, 32)

Physical sensors only

>>> reader = NCmapssReader(fd=1, feature_select=list(range(4, 18)))
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 20294, 14)

Custom split and window size

>>> reader = NCmapssReader(
...     fd=1,
...     run_split_dist={"dev": [0, 1], "val": [2], "test": [3]},
...     window_size=100,  # first 100 steps of each cycle
... )
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 100, 32)

Downsampled features

>>> reader = NCmapssReader(fd=1, resolution_seconds=10)
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape  # window size is automatically adjusted
(100, 2029, 32)

`fds: List[int]` `property`

Indices of the available sub-datasets.

`init(fd, window_size=None, max_rul=65, percent_broken=None, percent_fail_runs=None, feature_select=None, truncate_val=False, run_split_dist=None, truncate_degraded_only=False, resolution_seconds=1, padding_value=0.0, scaling_range=(0, 1))`

Create a new reader for the New C-MAPSS dataset. The maximum RUL value is set to 65 by default. The default channels are the four operating conditions, the 14 physical, and 14 virtual sensors in this order.

The default window size is, by default, the longest flight cycle in the sub-dataset. Shorter cycles are padded on the left. The default padding value is zero but can be overridden, e.g., as -1 to make filtering for padding easier later on.

The default run_split_dist is the same as in the original dataset, but with the last unit of the original train split designated for validation.

If the features are downsampled in time, the default window size is automatically adjusted to window_size // resolution_seconds. Any manually set window_size needs to take this into account as it is applied after downsampling.

For more information about using readers, refer to the reader module page.

Parameters:

Name	Type	Description	Default
`fd`	`int`	The sub-dataset to use. Must be in `[1, 7]`.	required
`max_rul`	`Optional[int]`	The maximum RUL value.	`65`
`percent_broken`	`Optional[float]`	The maximum relative degradation per unit.	`None`
`percent_fail_runs`	`Optional[Union[float, List[int]]]`	The percentage or index list of available units.	`None`
`feature_select`	`Optional[List[int]]`	The indices of the features to use.	`None`
`truncate_val`	`bool`	Truncate the validation data with `percent_broken`, too.	`False`
`run_split_dist`	`Optional[Dict[str, List[int]]]`	The assignment of units to each split.	`None`
`truncate_degraded_only`	`bool`	Only truncate the degraded part of the data (< max RUL).	`False`
`resolution_seconds`	`int`	The number of consecutive seconds to average over for downsampling.	`1`
`padding_value`	`float`	The value to use for padding the flight cycles.	`0.0`

`prepare_data(cache=True)`

Prepare the N-C-MAPSS dataset. This function needs to be called before using the dataset for the first time. The dataset is cached for faster loading in the future. This behavior can be disabled to save disk space by setting cache to False.

The dataset is assumed to be present in the data root directory. The training data is then split into development and validation set. Afterward, a scaler is fit on the development features if it was not already done previously.

Parameters:

Name	Type	Description	Default
`cache`	`bool`	Whether to cache the data for faster loading in the future.	`True`

ncmapss

NCmapssReader

fds: List[int] property

__init__(fd, window_size=None, max_rul=65, percent_broken=None, percent_fail_runs=None, feature_select=None, truncate_val=False, run_split_dist=None, truncate_degraded_only=False, resolution_seconds=1, padding_value=0.0, scaling_range=(0, 1))

prepare_data(cache=True)

`NCmapssReader`

`fds: List[int]` `property`

`init(fd, window_size=None, max_rul=65, percent_broken=None, percent_fail_runs=None, feature_select=None, truncate_val=False, run_split_dist=None, truncate_degraded_only=False, resolution_seconds=1, padding_value=0.0, scaling_range=(0, 1))`

`prepare_data(cache=True)`