core
Basic data modules for experiments involving only a single subset of any RUL dataset.
PairedRulDataset
Bases: IterableDataset
A dataset of sample pairs drawn from the same time series.
The dataset uses the runs exactly as loaded by the passed data modules. Options
like degraded_only
need to be set there.
RulDataModule
Bases: LightningDataModule
A data module to provide windowed time series features with RUL targets. It exposes the splits of the underlying dataset for easy usage with PyTorch and PyTorch Lightning.
The data module implements the hparams
property used by PyTorch Lightning to
save hyperparameters to checkpoints. It retrieves the hyperparameters of its
underlying reader and adds the batch size to them.
If you want to extract features from the windows, you can pass the
feature_extractor
and window_size
arguments to the constructor. The
feature_extractor
is a callable that takes a windowed time series as a numpy
array with the shape [num_windows, window_size, num_features]
and returns
another numpy array. Depending on window_size
, the expected output shapes for
the feature_extractor
are:
window_size is None
:[num_new_windows, new_window_size, features]
window_size is not None
:[num_windows, features]
If window_size
is set, the extracted features are re-windowed.
Examples:
Default
>>> import rul_datasets
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(cmapss, batch_size=32)
With Feature Extractor
>>> import rul_datasets
>>> import numpy as np
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(
... cmapss,
... batch_size=32,
... feature_extractor=lambda x: np.mean(x, axis=1),
... window_size=10
... )
Only Degraded Validation and Test Samples
>>> import rul_datasets
>>> cmapss = rul_datasets.reader.CmapssReader(fd=1)
>>> dm = rul_datasets.RulDataModule(cmapss, 32, degraded_only=["val", "test"])
data: Dict[str, Tuple[List[np.ndarray], List[np.ndarray]]]
property
A dictionary of the training, validation and test splits.
Each split is a tuple of feature and target tensors.
The keys are dev
(training split), val
(validation split) and test
(test split).
fds
property
Index list of the available subsets of the underlying dataset, i.e.
[1, 2, 3, 4]
for CMAPSS
.
reader: AbstractReader
property
The underlying dataset reader.
__init__(reader, batch_size, feature_extractor=None, window_size=None, degraded_only=None)
Create a new RUL data module from a reader.
This data module exposes a training, validation and test data loader for the
underlying dataset. First, prepare_data
is called to download and
pre-process the dataset. Afterward, setup_data
is called to load all
splits into memory.
If a feature_extractor
is supplied, the data module extracts new features
from each window of the time series. If window_size
is None
,
it is assumed that the extracted features form a new windows themselves. If
window_size
is an int, it is assumed that the extracted features are a
single feature vectors and should be re-windowed. The expected output shapes
for the feature_extractor
are:
window_size is None
:[num_new_windows, new_window_size, features]
window_size is not None
:[num_windows, features]
The expected input shape for the feature_extractor
is always
[num_windows, window_size, features]
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reader |
AbstractReader
|
The dataset reader for the desired dataset, e.g., CmapssLoader. |
required |
batch_size |
int
|
The size of the batches built by the data loaders. |
required |
feature_extractor |
Optional[Callable]
|
A feature extractor that extracts feature vectors from windows. |
None
|
window_size |
Optional[int]
|
The new window size to apply after the feature extractor. |
None
|
degraded_only |
Optional[List[Literal['dev', 'val', 'test']]]
|
Whether to load only degraded samples for the |
None
|
check_compatibility(other)
Check if another RulDataModule is compatible to be used together with this one.
RulDataModules can be used together in higher-order data modules,
e.g. AdaptionDataModule. This function checks if other
is compatible to
this data module to do so. It checks the underlying dataset readers, matching
batch size, feature extractor and window size. If anything is incompatible,
this function will raise a ValueError.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
RulDataModule
|
The RulDataModule to check compatibility with. |
required |
is_mutually_exclusive(other)
Check if the other data module is mutually exclusive to this one. See AbstractReader.is_mutually_exclusive.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
RulDataModule
|
Data module to check exclusivity against. |
required |
Returns:
Type | Description |
---|---|
bool
|
Whether both data modules are mutually exclusive. |
load_split(split, alias=None, degraded_only=None)
Load a split from the underlying reader and apply the feature extractor.
By setting alias, it is possible to load a split aliased as another split, e.g., load the test split and treat it as the dev split. The data of the split is loaded, but all pre-processing steps of alias are carried out.
If degraded_only
is set, only degraded samples are loaded. This is only
possible if the underlying reader has a max_rul
set or norm_rul
is set to
True
. The degraded_only
argument takes precedence over the degraded_only
of the data module.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
split |
str
|
The desired split to load. |
required |
alias |
Optional[str]
|
The split as which the loaded data should be treated. |
None
|
degraded_only |
Optional[bool]
|
Whether to only load degraded samples. |
None
|
Returns:
Type | Description |
---|---|
Tuple[List[ndarray], List[ndarray]]
|
The feature and target tensors of the split's runs. |
prepare_data(*args, **kwargs)
Download and pre-process the underlying data.
This calls the prepare_data
function of the underlying reader. All
previously completed preparation steps are skipped. It is called
automatically by pytorch_lightning
and executed on the first GPU in
distributed mode.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Ignored. Only for adhering to parent class interface. |
()
|
**kwargs |
Any
|
Ignored. Only for adhering to parent class interface. |
{}
|
setup(stage=None)
Load all splits as tensors into memory and optionally apply feature extractor.
The splits are placed inside the data property. If a split is empty, a tuple of empty tensors with the correct number of dimensions is created as a placeholder. This ensures compatibility with higher-order data modules.
If the data module was constructed with a feature_extractor
argument,
the feature windows are passed to the feature extractor. The resulting,
new features may be re-windowed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stage |
Optional[str]
|
Ignored. Only for adhering to parent class interface. |
None
|
test_dataloader(*args, **kwargs)
Create a data loader for the test split.
The data loader is configured to leave the data unshuffled. The pin_memory
option is activated to achieve maximum transfer speed to the GPU.
The whole split is held in memory. Therefore, the num_workers
are set to
zero which uses the main process for creating batches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Ignored. Only for adhering to parent class interface. |
()
|
**kwargs |
Any
|
Ignored. Only for adhering to parent class interface. |
{}
|
Returns:
Type | Description |
---|---|
DataLoader
|
The test data loader |
to_dataset(split, alias=None)
Create a dataset of a split.
This convenience function creates a plain tensor dataset to use outside the rul_datasets
library.
The data placed inside the dataset will be from the specified split
. If
alias
is set, the loaded data will be treated as if from the alias
split.
For example, one could load the test data and treat them as if it was the
training data. This may be useful for inductive domain adaption.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
split |
str
|
The split to place inside the dataset. |
required |
alias |
Optional[str]
|
The split the loaded data should be treated as. |
None
|
Returns:
Type | Description |
---|---|
RulDataset
|
A dataset containing the requested split. |
train_dataloader(*args, **kwargs)
Create a data loader for the training split.
The data loader is configured to shuffle the data. The pin_memory
option is
activated to achieve maximum transfer speed to the GPU. The data loader is also
configured to drop the last batch of the data if it would only contain one
sample.
The whole split is held in memory. Therefore, the num_workers
are set to
zero which uses the main process for creating batches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Ignored. Only for adhering to parent class interface. |
()
|
**kwargs |
Any
|
Ignored. Only for adhering to parent class interface. |
{}
|
Returns:
Type | Description |
---|---|
DataLoader
|
The training data loader |
val_dataloader(*args, **kwargs)
Create a data loader for the validation split.
The data loader is configured to leave the data unshuffled. The pin_memory
option is activated to achieve maximum transfer speed to the GPU.
The whole split is held in memory. Therefore, the num_workers
are set to
zero which uses the main process for creating batches.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Ignored. Only for adhering to parent class interface. |
()
|
**kwargs |
Any
|
Ignored. Only for adhering to parent class interface. |
{}
|
Returns:
Type | Description |
---|---|
DataLoader
|
The validation data loader |
RulDataset
Bases: Dataset
Internal dataset to hold multiple runs.
Its length is the sum of all runs' lengths.
__init__(features, *targets, copy_tensors=False)
Create a new dataset from multiple runs.
If copy_tensors
is true, the tensors are copied to avoid side effects when
modifying them. Otherwise, the tensors use the same memory as the original
Numpy arrays to save space.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[ndarray]
|
The features of each run. |
required |
targets |
List[ndarray]
|
The targets of each run. |
()
|
copy_tensors |
bool
|
Whether to copy the tensors or not. |
False
|