Skip to content

abstract

This module contains the base class for all readers. It is only relevant to people that want to extend this package with their own dataset.

AbstractReader

This reader is the abstract base class of all readers.

In case you want to extend this library with a dataset of your own, you should create a subclass of AbstractReader. It defines the public interface that all data modules in this library use. Just inherit from this class implement the abstract functions, and you should be good to go.

Please consider contributing your work afterward to help the community.

Examples:

>>> import rul_datasets
>>> class MyReader(rul_datasets.reader.AbstractReader):
...     @property
...     def dataset_name(self):
...         return "my_dataset"
...
...     @property
...     def fds(self):
...         return [1]
...
...     def prepare_data(self):
...         pass
...
...     def default_window_size(self, fd):
...         return 30
...
...     def load_complete_split(self, split, alias):
...         features = [np.random.randn(100, 2, 30) for _ in range(10)]
...         targets = [np.arange(100, 0, -1) for _ in range(10)]
...
...         return features, targets
...
>>> my_reader = MyReader(fd=1)
>>> features, targets = my_reader.load_split("dev")
>>> features[0].shape
(100, 2, 30)

dataset_name: str abstractmethod property

Name of the dataset.

fds: List[int] abstractmethod property

The indices of available sub-datasets.

hparams: Dict[str, Any] property

All information logged by the data modules as hyperparameters in PyTorch Lightning.

__init__(fd, window_size=None, max_rul=None, percent_broken=None, percent_fail_runs=None, truncate_val=False, truncate_degraded_only=False)

Create a new reader. If your reader needs additional input arguments, create your own __init__ function and call this one from within as super( ).__init__(...).

For more information about using readers refer to the reader module page.

Parameters:

Name Type Description Default
fd int

Index of the selected sub-dataset

required
window_size Optional[int]

Size of the sliding window. Defaults to 2560.

None
max_rul Optional[int]

Maximum RUL value of targets.

None
percent_broken Optional[float]

The maximum relative degradation per time series.

None
percent_fail_runs Optional[Union[float, List[int]]]

The percentage or index list of available time series.

None
truncate_val bool

Truncate the validation data with percent_broken, too.

False
truncate_degraded_only bool

Only truncate the degraded part of the data (< max RUL).

False

check_compatibility(other)

Check if the other reader is compatible with this one.

Compatibility of two readers ensures that training with both will probably succeed and produce valid results. Two readers are considered compatible, if they:

  • are both children of AbstractReader

  • have the same window size

  • have the same max_rul

If any of these conditions is not met, the readers are considered misconfigured and a ValueError is thrown.

Parameters:

Name Type Description Default
other AbstractReader

Another reader object.

required

default_window_size(fd) abstractmethod

The default window size of the data set. This may vary from sub-dataset to sub-dataset.

Parameters:

Name Type Description Default
fd int

The index of a sub-dataset.

required

Returns:

Type Description
int

The default window size for the sub-dataset.

get_compatible(fd=None, percent_broken=None, percent_fail_runs=None, truncate_val=None, consolidate_window_size='override')

Create a new reader of the desired sub-dataset that is compatible to this one (see check_compatibility). Useful for domain adaption.

The values for percent_broken, percent_fail_runs and truncate_val of the new reader can be overridden.

When constructing a compatible reader for another sub-dataset, the window size of this reader will be used to override the default window size of the new reader. This behavior can be changed by setting consolidate_window_size to "min". The window size of this reader and the new one will be set to the minimum of this readers window size and the default window size of the new reader. Please be aware that this will change the window size of this reader, too. If the new reader should use its default window size, set consolidate_window_size to "none".

Parameters:

Name Type Description Default
fd Optional[int]

The index of the sub-dataset for the new reader.

None
percent_broken Optional[float]

Override this value in the new reader.

None
percent_fail_runs Union[float, List[int], None]

Override this value in the new reader.

None
truncate_val Optional[bool]

Override this value in the new reader.

None
consolidate_window_size Literal['override', 'min', 'none']

How to consolidate the window size of the readers.

'override'

Returns:

Type Description
AbstractReader

A compatible reader with optional overrides.

get_complement(percent_broken=None, truncate_val=None)

Get a compatible reader that contains all development runs that are not in this reader (see check_compatibility). Useful for semi-supervised learning.

The new reader will contain the development runs that were discarded in this reader due to truncation through percent_fail_runs. If percent_fail_runs was not set or this reader contains all development runs, it returns a reader with an empty development set.

The values for percent_broken, and truncate_val of the new reader can be overridden.

Parameters:

Name Type Description Default
percent_broken Optional[float]

Override this value in the new reader.

None
truncate_val Optional[bool]

Override this value in the new reader.

None

Returns:

Type Description
AbstractReader

A compatible reader with all development runs missing in this one.

is_mutually_exclusive(other)

Check if this reader is mutually exclusive to another reader.

Two readers are mutually exclusive if:

  • they are not of the same class and therefore do not share a dataset
  • their percent_fail_runs arguments do not overlap (float arguments overlap if they are greater than zero)
  • one of them is empty

Parameters:

Name Type Description Default
other AbstractReader

The reader to check exclusivity against.

required

Returns:

Type Description
bool

Whether the readers are mutually exclusive.

load_complete_split(split, alias) abstractmethod

Load a complete split without truncation.

This function should return the features and targets of the desired split. Both should be contained in a list of numpy arrays. Each of the arrays contains one time series. The features should have a shape of [num_windows, window_size, num_channels] and the targets a shape of [num_windows]. The features should be scaled as desired. The targets should be capped by max_rul.

By setting alias, it should be possible to load a split aliased as another split, e.g. load the test split and treat it as the dev split. The data of split should be loaded but all pre-processing steps of alias should be carried out.

This function is used internally in load_split which takes care of truncation.

Parameters:

Name Type Description Default
split str

The name of the split to load.

required
alias str

The split as which the loaded data should be treated.

required

Returns:

Name Type Description
features List[ndarray]

The complete, scaled features of the desired split.

targets List[ndarray]

The capped target values corresponding to the features.

load_split(split, alias=None)

Load a split as tensors and apply truncation to it.

This function loads the scaled features and the targets of a split into memory. Afterwards, truncation is applied if the split is set to dev. The validation set is also truncated with percent_broken if truncate_val is set to True.

By setting alias, it is possible to load a split aliased as another split, e.g. load the test split and treat it as the dev split. The data of split is loaded but all pre-processing steps of alias are carried out.

Parameters:

Name Type Description Default
split str

The desired split to load.

required
alias Optional[str]

The split as which the loaded data should be treated.

None

Returns:

Name Type Description
features List[ndarray]

The scaled, truncated features of the desired split.

targets List[ndarray]

The truncated targets of the desired split.

prepare_data() abstractmethod

Prepare the data. This function should take care of things that need to be done once, before the data can be used. This may include downloading, extracting or transforming the data, as well as fitting scalers. It is best practice to check if a preparation step was completed before to avoid repeating it unnecessarily.