ncmapss
The New C-MAPSS Turbofan Degradation dataset is based on the same simulation as
C-MAPSS. In contrast to the original dataset,
it contains fewer engine units, but each of them is recorded in more detail and under
more realistic operation conditions. Each unit has flight cycles recorded from the
healthy state until failure with RUL values assigned to the whole cycle. Inside a
flight cycle, data is recorded with a 1Hz resolution The dataset is split into seven
sub-datasets (FD=1
to FD=7
) that differ in the number of engine units and the
types of failures present.
Note
An eighth sub-dataset exists but is not present here as one of its data files seems corrupted. The dataset authors were already contacted about this issue.
NCmapssReader
Bases: AbstractReader
This reader provides access to the New C-MAPSS Turbofan Degradation dataset. Each
of its seven sub-datasets contains a default train/val/test split which can be
overridden by the run_split_dist
argument.
The features are provided as a windowed time series for each unit. The windows
represent one flight cycle and are, by default, padded to the longest cycle in
the sub-dataset. The window size can be overridden by the window_size
argument
which truncates each cycle at the end. Additionally, the features can be
downsampled in time by taking the average of resolution_seconds
consecutive
time steps. The default channels are the four operating conditions,
the 14 physical, and 14 virtual sensors in this order.
The features are min-max scaled between zero and one. The scaler is fitted on the
development data only. It is refit for each custom run_split_dist
when
prepare_data
is called.
Examples:
Default channels
>>> reader = NCmapssReader(fd=1)
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 20294, 32)
Physical sensors only
>>> reader = NCmapssReader(fd=1, feature_select=list(range(4, 18)))
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 20294, 14)
Custom split and window size
>>> reader = NCmapssReader(
... fd=1,
... run_split_dist={"dev": [0, 1], "val": [2], "test": [3]},
... window_size=100, # first 100 steps of each cycle
... )
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape
(100, 100, 32)
Downsampled features
>>> reader = NCmapssReader(fd=1, resolution_seconds=10)
>>> reader.prepare_data()
>>> features, labels = reader.load_split("dev")
>>> features[0].shape # window size is automatically adjusted
(100, 2029, 32)
fds: List[int]
property
Indices of the available sub-datasets.
__init__(fd, window_size=None, max_rul=65, percent_broken=None, percent_fail_runs=None, feature_select=None, truncate_val=False, run_split_dist=None, truncate_degraded_only=False, resolution_seconds=1, padding_value=0.0, scaling_range=(0, 1))
Create a new reader for the New C-MAPSS dataset. The maximum RUL value is set to 65 by default. The default channels are the four operating conditions, the 14 physical, and 14 virtual sensors in this order.
The default window size is, by default, the longest flight cycle in the sub-dataset. Shorter cycles are padded on the left. The default padding value is zero but can be overridden, e.g., as -1 to make filtering for padding easier later on.
The default run_split_dist
is the same as in the original dataset, but with
the last unit of the original train split designated for validation.
If the features are downsampled in time, the default window size is
automatically adjusted to window_size // resolution_seconds
. Any manually
set window_size
needs to take this into account as it is applied after
downsampling.
For more information about using readers, refer to the reader module page.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fd |
int
|
The sub-dataset to use. Must be in |
required |
max_rul |
Optional[int]
|
The maximum RUL value. |
65
|
percent_broken |
Optional[float]
|
The maximum relative degradation per unit. |
None
|
percent_fail_runs |
Optional[Union[float, List[int]]]
|
The percentage or index list of available units. |
None
|
feature_select |
Optional[List[int]]
|
The indices of the features to use. |
None
|
truncate_val |
bool
|
Truncate the validation data with |
False
|
run_split_dist |
Optional[Dict[str, List[int]]]
|
The assignment of units to each split. |
None
|
truncate_degraded_only |
bool
|
Only truncate the degraded part of the data (< max RUL). |
False
|
resolution_seconds |
int
|
The number of consecutive seconds to average over for downsampling. |
1
|
padding_value |
float
|
The value to use for padding the flight cycles. |
0.0
|
prepare_data(cache=True)
Prepare the N-C-MAPSS dataset. This function needs to be called before using
the dataset for the first time. The dataset is cached for faster loading in
the future. This behavior can be disabled to save disk space by setting
cache
to False
.
The dataset is assumed to be present in the data root directory. The training data is then split into development and validation set. Afterward, a scaler is fit on the development features if it was not already done previously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cache |
bool
|
Whether to cache the data for faster loading in the future. |
True
|