core

Basic data modules for experiments involving only a single subset of any RUL dataset.

`PairedRulDataset`

Bases: IterableDataset

A dataset of sample pairs drawn from the same time series.

The dataset uses the runs exactly as loaded by the passed data modules. Options like degraded_only need to be set there.

`RulDataModule`

Bases: LightningDataModule

A data module to provide windowed time series features with RUL targets. It exposes the splits of the underlying dataset for easy usage with PyTorch and PyTorch Lightning.

The data module implements the hparams property used by PyTorch Lightning to save hyperparameters to checkpoints. It retrieves the hyperparameters of its underlying reader and adds the batch size to them.

If you want to extract features from the windows, you can pass the feature_extractor and window_size arguments to the constructor. The feature_extractor is a callable that takes a windowed time series as a numpy array with the shape [num_windows, window_size, num_features] and returns another numpy array. Depending on window_size, the expected output shapes for the feature_extractor are:

window_size is None: [num_new_windows, new_window_size, features]
window_size is not None: [num_windows, features]

If window_size is set, the extracted features are re-windowed.

Examples:

Default

>>> import rul_datasets
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(cmapss, batch_size=32)

With Feature Extractor

>>> import rul_datasets
>>> import numpy as np
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(
...     cmapss,
...     batch_size=32,
...     feature_extractor=lambda x: np.mean(x, axis=1),
...     window_size=10
... )

Only Degraded Validation and Test Samples

>>> import rul_datasets
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(cmapss, 32, degraded_only=["val", "test"])

`data: Dict[str, Tuple[List[np.ndarray], List[np.ndarray]]]` `property`

A dictionary of the training, validation and test splits.

Each split is a tuple of feature and target tensors. The keys are dev (training split), val (validation split) and test (test split).

`fds` `property`

Index list of the available subsets of the underlying dataset, i.e. [1, 2, 3, 4] for CMAPSS.

`reader: AbstractReader` `property`

The underlying dataset reader.

`init(reader, batch_size, feature_extractor=None, window_size=None, degraded_only=None)`

Create a new RUL data module from a reader.

This data module exposes a training, validation and test data loader for the underlying dataset. First, prepare_data is called to download and pre-process the dataset. Afterward, setup_data is called to load all splits into memory.

If a feature_extractor is supplied, the data module extracts new features from each window of the time series. If window_size is None, it is assumed that the extracted features form a new windows themselves. If window_size is an int, it is assumed that the extracted features are a single feature vectors and should be re-windowed. The expected output shapes for the feature_extractor are:

window_size is None: [num_new_windows, new_window_size, features]
window_size is not None: [num_windows, features]

The expected input shape for the feature_extractor is always [num_windows, window_size, features].

Parameters:

Name	Type	Description	Default
`reader`	`AbstractReader`	The dataset reader for the desired dataset, e.g., CmapssLoader.	required
`batch_size`	`int`	The size of the batches built by the data loaders.	required
`feature_extractor`	`Optional[Callable]`	A feature extractor that extracts feature vectors from windows.	`None`
`window_size`	`Optional[int]`	The new window size to apply after the feature extractor.	`None`
`degraded_only`	`Optional[List[Literal['dev', 'val', 'test']]]`	Whether to load only degraded samples for the `dev`, 'val' or 'test' split.	`None`

`check_compatibility(other)`

Check if another RulDataModule is compatible to be used together with this one.

RulDataModules can be used together in higher-order data modules, e.g. AdaptionDataModule. This function checks if other is compatible to this data module to do so. It checks the underlying dataset readers, matching batch size, feature extractor and window size. If anything is incompatible, this function will raise a ValueError.

Parameters:

Name	Type	Description	Default
`other`	`RulDataModule`	The RulDataModule to check compatibility with.	required

`is_mutually_exclusive(other)`

Check if the other data module is mutually exclusive to this one. See AbstractReader.is_mutually_exclusive.

Parameters:

Name	Type	Description	Default
`other`	`RulDataModule`	Data module to check exclusivity against.	required

Returns:

Type	Description
`bool`	Whether both data modules are mutually exclusive.

`load_split(split, alias=None, degraded_only=None)`

Load a split from the underlying reader and apply the feature extractor.

By setting alias, it is possible to load a split aliased as another split, e.g., load the test split and treat it as the dev split. The data of the split is loaded, but all pre-processing steps of alias are carried out.

If degraded_only is set, only degraded samples are loaded. This is only possible if the underlying reader has a max_rul set or norm_rul is set to True. The degraded_only argument takes precedence over the degraded_only of the data module.

Parameters:

Name	Type	Description	Default
`split`	`str`	The desired split to load.	required
`alias`	`Optional[str]`	The split as which the loaded data should be treated.	`None`
`degraded_only`	`Optional[bool]`	Whether to only load degraded samples.	`None`

Returns:

Type	Description
`Tuple[List[ndarray], List[ndarray]]`	The feature and target tensors of the split's runs.

`prepare_data(*args, **kwargs)`

Download and pre-process the underlying data.

This calls the prepare_data function of the underlying reader. All previously completed preparation steps are skipped. It is called automatically by pytorch_lightning and executed on the first GPU in distributed mode.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Ignored. Only for adhering to parent class interface.	`()`
`**kwargs`	`Any`	Ignored. Only for adhering to parent class interface.	`{}`

`setup(stage=None)`

Load all splits as tensors into memory and optionally apply feature extractor.

The splits are placed inside the data property. If a split is empty, a tuple of empty tensors with the correct number of dimensions is created as a placeholder. This ensures compatibility with higher-order data modules.

If the data module was constructed with a feature_extractor argument, the feature windows are passed to the feature extractor. The resulting, new features may be re-windowed.

Parameters:

Name	Type	Description	Default
`stage`	`Optional[str]`	Ignored. Only for adhering to parent class interface.	`None`

`test_dataloader(*args, **kwargs)`

Create a data loader for the test split.

The data loader is configured to leave the data unshuffled. The pin_memory option is activated to achieve maximum transfer speed to the GPU.

The whole split is held in memory. Therefore, the num_workers are set to zero which uses the main process for creating batches.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Ignored. Only for adhering to parent class interface.	`()`
`**kwargs`	`Any`	Ignored. Only for adhering to parent class interface.	`{}`

Returns:

Type	Description
`DataLoader`	The test data loader

`to_dataset(split, alias=None)`

Create a dataset of a split.

This convenience function creates a plain tensor dataset to use outside the rul_datasets library.

The data placed inside the dataset will be from the specified split. If alias is set, the loaded data will be treated as if from the alias split. For example, one could load the test data and treat them as if it was the training data. This may be useful for inductive domain adaption.

Parameters:

Name	Type	Description	Default
`split`	`str`	The split to place inside the dataset.	required
`alias`	`Optional[str]`	The split the loaded data should be treated as.	`None`

Returns:

Type	Description
`RulDataset`	A dataset containing the requested split.

`train_dataloader(*args, **kwargs)`

Create a data loader for the training split.

The data loader is configured to shuffle the data. The pin_memory option is activated to achieve maximum transfer speed to the GPU. The data loader is also configured to drop the last batch of the data if it would only contain one sample.

The whole split is held in memory. Therefore, the num_workers are set to zero which uses the main process for creating batches.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Ignored. Only for adhering to parent class interface.	`()`
`**kwargs`	`Any`	Ignored. Only for adhering to parent class interface.	`{}`

Returns:

Type	Description
`DataLoader`	The training data loader

`val_dataloader(*args, **kwargs)`

Create a data loader for the validation split.

The data loader is configured to leave the data unshuffled. The pin_memory option is activated to achieve maximum transfer speed to the GPU.

The whole split is held in memory. Therefore, the num_workers are set to zero which uses the main process for creating batches.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Ignored. Only for adhering to parent class interface.	`()`
`**kwargs`	`Any`	Ignored. Only for adhering to parent class interface.	`{}`

Returns:

Type	Description
`DataLoader`	The validation data loader

`RulDataset`

Bases: Dataset

Internal dataset to hold multiple runs.

Its length is the sum of all runs' lengths.

`init(features, *targets, copy_tensors=False)`

Create a new dataset from multiple runs.

If copy_tensors is true, the tensors are copied to avoid side effects when modifying them. Otherwise, the tensors use the same memory as the original Numpy arrays to save space.

Parameters:

Name	Type	Description	Default
`features`	`List[ndarray]`	The features of each run.	required
`targets`	`List[ndarray]`	The targets of each run.	`()`
`copy_tensors`	`bool`	Whether to copy the tensors or not.	`False`

core

PairedRulDataset

RulDataModule

data: Dict[str, Tuple[List[np.ndarray], List[np.ndarray]]] property

fds property

reader: AbstractReader property

__init__(reader, batch_size, feature_extractor=None, window_size=None, degraded_only=None)

check_compatibility(other)

is_mutually_exclusive(other)

load_split(split, alias=None, degraded_only=None)

prepare_data(*args, **kwargs)

setup(stage=None)

test_dataloader(*args, **kwargs)

to_dataset(split, alias=None)

train_dataloader(*args, **kwargs)

val_dataloader(*args, **kwargs)

RulDataset

__init__(features, *targets, copy_tensors=False)

`PairedRulDataset`

`RulDataModule`

`data: Dict[str, Tuple[List[np.ndarray], List[np.ndarray]]]` `property`

`fds` `property`

`reader: AbstractReader` `property`

`init(reader, batch_size, feature_extractor=None, window_size=None, degraded_only=None)`

`check_compatibility(other)`

`is_mutually_exclusive(other)`

`load_split(split, alias=None, degraded_only=None)`

`prepare_data(*args, **kwargs)`

`setup(stage=None)`

`test_dataloader(*args, **kwargs)`

`to_dataset(split, alias=None)`

`train_dataloader(*args, **kwargs)`

`val_dataloader(*args, **kwargs)`

`RulDataset`

`init(features, *targets, copy_tensors=False)`