Skip to content

core

Basic data modules for experiments involving only a single subset of any RUL dataset.

PairedRulDataset

Bases: IterableDataset

A dataset of sample pairs drawn from the same time series.

The dataset uses the runs exactly as loaded by the passed data modules. Options like degraded_only need to be set there.

RulDataModule

Bases: LightningDataModule

A data module to provide windowed time series features with RUL targets. It exposes the splits of the underlying dataset for easy usage with PyTorch and PyTorch Lightning.

The data module implements the hparams property used by PyTorch Lightning to save hyperparameters to checkpoints. It retrieves the hyperparameters of its underlying reader and adds the batch size to them.

If you want to extract features from the windows, you can pass the feature_extractor and window_size arguments to the constructor. The feature_extractor is a callable that takes a windowed time series as a numpy array with the shape [num_windows, window_size, num_features] and returns another numpy array. Depending on window_size, the expected output shapes for the feature_extractor are:

  • window_size is None: [num_new_windows, new_window_size, features]
  • window_size is not None: [num_windows, features]

If window_size is set, the extracted features are re-windowed.

Examples:

Default

>>> import rul_datasets
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(cmapss, batch_size=32)

With Feature Extractor

>>> import rul_datasets
>>> import numpy as np
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(
...     cmapss,
...     batch_size=32,
...     feature_extractor=lambda x: np.mean(x, axis=1),
...     window_size=10
... )

Only Degraded Validation and Test Samples

>>> import rul_datasets
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(cmapss, 32, degraded_only=["val", "test"])

data: Dict[str, Tuple[List[np.ndarray], List[np.ndarray]]] property

A dictionary of the training, validation and test splits.

Each split is a tuple of feature and target tensors. The keys are dev (training split), val (validation split) and test (test split).

fds property

Index list of the available subsets of the underlying dataset, i.e. [1, 2, 3, 4] for CMAPSS.

reader: AbstractReader property

The underlying dataset reader.

__init__(reader, batch_size, feature_extractor=None, window_size=None, degraded_only=None)

Create a new RUL data module from a reader.

This data module exposes a training, validation and test data loader for the underlying dataset. First, prepare_data is called to download and pre-process the dataset. Afterward, setup_data is called to load all splits into memory.

If a feature_extractor is supplied, the data module extracts new features from each window of the time series. If window_size is None, it is assumed that the extracted features form a new windows themselves. If window_size is an int, it is assumed that the extracted features are a single feature vectors and should be re-windowed. The expected output shapes for the feature_extractor are:

  • window_size is None: [num_new_windows, new_window_size, features]
  • window_size is not None: [num_windows, features]

The expected input shape for the feature_extractor is always [num_windows, window_size, features].

Parameters:

Name Type Description Default
reader AbstractReader

The dataset reader for the desired dataset, e.g., CmapssLoader.

required
batch_size int

The size of the batches built by the data loaders.

required
feature_extractor Optional[Callable]

A feature extractor that extracts feature vectors from windows.

None
window_size Optional[int]

The new window size to apply after the feature extractor.

None
degraded_only Optional[List[Literal['dev', 'val', 'test']]]

Whether to load only degraded samples for the dev, 'val' or 'test' split.

None

check_compatibility(other)

Check if another RulDataModule is compatible to be used together with this one.

RulDataModules can be used together in higher-order data modules, e.g. AdaptionDataModule. This function checks if other is compatible to this data module to do so. It checks the underlying dataset readers, matching batch size, feature extractor and window size. If anything is incompatible, this function will raise a ValueError.

Parameters:

Name Type Description Default
other RulDataModule

The RulDataModule to check compatibility with.

required

is_mutually_exclusive(other)

Check if the other data module is mutually exclusive to this one. See AbstractReader.is_mutually_exclusive.

Parameters:

Name Type Description Default
other RulDataModule

Data module to check exclusivity against.

required

Returns:

Type Description
bool

Whether both data modules are mutually exclusive.

load_split(split, alias=None, degraded_only=None)

Load a split from the underlying reader and apply the feature extractor.

By setting alias, it is possible to load a split aliased as another split, e.g., load the test split and treat it as the dev split. The data of the split is loaded, but all pre-processing steps of alias are carried out.

If degraded_only is set, only degraded samples are loaded. This is only possible if the underlying reader has a max_rul set or norm_rul is set to True. The degraded_only argument takes precedence over the degraded_only of the data module.

Parameters:

Name Type Description Default
split str

The desired split to load.

required
alias Optional[str]

The split as which the loaded data should be treated.

None
degraded_only Optional[bool]

Whether to only load degraded samples.

None

Returns:

Type Description
Tuple[List[ndarray], List[ndarray]]

The feature and target tensors of the split's runs.

prepare_data(*args, **kwargs)

Download and pre-process the underlying data.

This calls the prepare_data function of the underlying reader. All previously completed preparation steps are skipped. It is called automatically by pytorch_lightning and executed on the first GPU in distributed mode.

Parameters:

Name Type Description Default
*args Any

Ignored. Only for adhering to parent class interface.

()
**kwargs Any

Ignored. Only for adhering to parent class interface.

{}

setup(stage=None)

Load all splits as tensors into memory and optionally apply feature extractor.

The splits are placed inside the data property. If a split is empty, a tuple of empty tensors with the correct number of dimensions is created as a placeholder. This ensures compatibility with higher-order data modules.

If the data module was constructed with a feature_extractor argument, the feature windows are passed to the feature extractor. The resulting, new features may be re-windowed.

Parameters:

Name Type Description Default
stage Optional[str]

Ignored. Only for adhering to parent class interface.

None

test_dataloader(*args, **kwargs)

Create a data loader for the test split.

The data loader is configured to leave the data unshuffled. The pin_memory option is activated to achieve maximum transfer speed to the GPU.

The whole split is held in memory. Therefore, the num_workers are set to zero which uses the main process for creating batches.

Parameters:

Name Type Description Default
*args Any

Ignored. Only for adhering to parent class interface.

()
**kwargs Any

Ignored. Only for adhering to parent class interface.

{}

Returns:

Type Description
DataLoader

The test data loader

to_dataset(split, alias=None)

Create a dataset of a split.

This convenience function creates a plain tensor dataset to use outside the rul_datasets library.

The data placed inside the dataset will be from the specified split. If alias is set, the loaded data will be treated as if from the alias split. For example, one could load the test data and treat them as if it was the training data. This may be useful for inductive domain adaption.

Parameters:

Name Type Description Default
split str

The split to place inside the dataset.

required
alias Optional[str]

The split the loaded data should be treated as.

None

Returns:

Type Description
RulDataset

A dataset containing the requested split.

train_dataloader(*args, **kwargs)

Create a data loader for the training split.

The data loader is configured to shuffle the data. The pin_memory option is activated to achieve maximum transfer speed to the GPU. The data loader is also configured to drop the last batch of the data if it would only contain one sample.

The whole split is held in memory. Therefore, the num_workers are set to zero which uses the main process for creating batches.

Parameters:

Name Type Description Default
*args Any

Ignored. Only for adhering to parent class interface.

()
**kwargs Any

Ignored. Only for adhering to parent class interface.

{}

Returns:

Type Description
DataLoader

The training data loader

val_dataloader(*args, **kwargs)

Create a data loader for the validation split.

The data loader is configured to leave the data unshuffled. The pin_memory option is activated to achieve maximum transfer speed to the GPU.

The whole split is held in memory. Therefore, the num_workers are set to zero which uses the main process for creating batches.

Parameters:

Name Type Description Default
*args Any

Ignored. Only for adhering to parent class interface.

()
**kwargs Any

Ignored. Only for adhering to parent class interface.

{}

Returns:

Type Description
DataLoader

The validation data loader

RulDataset

Bases: Dataset

Internal dataset to hold multiple runs.

Its length is the sum of all runs' lengths.

__init__(features, *targets, copy_tensors=False)

Create a new dataset from multiple runs.

If copy_tensors is true, the tensors are copied to avoid side effects when modifying them. Otherwise, the tensors use the same memory as the original Numpy arrays to save space.

Parameters:

Name Type Description Default
features List[ndarray]

The features of each run.

required
targets List[ndarray]

The targets of each run.

()
copy_tensors bool

Whether to copy the tensors or not.

False