abstract
This module contains the base class for all readers. It is only relevant to people that want to extend this package with their own dataset.
AbstractReader
This reader is the abstract base class of all readers.
In case you want to extend this library with a dataset of your own, you should
create a subclass of AbstractReader
. It defines the public interface that all
data modules in this library use. Just inherit from this class implement the
abstract functions, and you should be good to go.
Please consider contributing your work afterward to help the community.
Examples:
>>> import rul_datasets
>>> class MyReader(rul_datasets.reader.AbstractReader):
... @property
... def dataset_name(self):
... return "my_dataset"
...
... @property
... def fds(self):
... return [1]
...
... def prepare_data(self):
... pass
...
... def default_window_size(self, fd):
... return 30
...
... def load_complete_split(self, split, alias):
... features = [np.random.randn(100, 2, 30) for _ in range(10)]
... targets = [np.arange(100, 0, -1) for _ in range(10)]
...
... return features, targets
...
>>> my_reader = MyReader(fd=1)
>>> features, targets = my_reader.load_split("dev")
>>> features[0].shape
(100, 2, 30)
dataset_name: str
abstractmethod
property
Name of the dataset.
fds: List[int]
abstractmethod
property
The indices of available sub-datasets.
hparams: Dict[str, Any]
property
All information logged by the data modules as hyperparameters in PyTorch Lightning.
__init__(fd, window_size=None, max_rul=None, percent_broken=None, percent_fail_runs=None, truncate_val=False, truncate_degraded_only=False)
Create a new reader. If your reader needs additional input arguments,
create your own __init__
function and call this one from within as super(
).__init__(...)
.
For more information about using readers refer to the reader module page.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fd |
int
|
Index of the selected sub-dataset |
required |
window_size |
Optional[int]
|
Size of the sliding window. Defaults to 2560. |
None
|
max_rul |
Optional[int]
|
Maximum RUL value of targets. |
None
|
percent_broken |
Optional[float]
|
The maximum relative degradation per time series. |
None
|
percent_fail_runs |
Optional[Union[float, List[int]]]
|
The percentage or index list of available time series. |
None
|
truncate_val |
bool
|
Truncate the validation data with |
False
|
truncate_degraded_only |
bool
|
Only truncate the degraded part of the data (< max RUL). |
False
|
check_compatibility(other)
Check if the other reader is compatible with this one.
Compatibility of two readers ensures that training with both will probably succeed and produce valid results. Two readers are considered compatible, if they:
-
are both children of AbstractReader
-
have the same
window size
-
have the same
max_rul
If any of these conditions is not met, the readers are considered
misconfigured and a ValueError
is thrown.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
AbstractReader
|
Another reader object. |
required |
default_window_size(fd)
abstractmethod
The default window size of the data set. This may vary from sub-dataset to sub-dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fd |
int
|
The index of a sub-dataset. |
required |
Returns:
Type | Description |
---|---|
int
|
The default window size for the sub-dataset. |
get_compatible(fd=None, percent_broken=None, percent_fail_runs=None, truncate_val=None, consolidate_window_size='override')
Create a new reader of the desired sub-dataset that is compatible to this one (see check_compatibility). Useful for domain adaption.
The values for percent_broken
, percent_fail_runs
and truncate_val
of
the new reader can be overridden.
When constructing a compatible reader for another sub-dataset, the window
size of this reader will be used to override the default window size of the
new reader. This behavior can be changed by setting consolidate_window_size
to "min"
. The window size of this reader and the new one will be set to the
minimum of this readers window size and the default window size of the new
reader. Please be aware that this will change the window size of this
reader, too. If the new reader should use its default window size,
set consolidate_window_size
to "none"
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fd |
Optional[int]
|
The index of the sub-dataset for the new reader. |
None
|
percent_broken |
Optional[float]
|
Override this value in the new reader. |
None
|
percent_fail_runs |
Union[float, List[int], None]
|
Override this value in the new reader. |
None
|
truncate_val |
Optional[bool]
|
Override this value in the new reader. |
None
|
consolidate_window_size |
Literal['override', 'min', 'none']
|
How to consolidate the window size of the readers. |
'override'
|
Returns:
Type | Description |
---|---|
AbstractReader
|
A compatible reader with optional overrides. |
get_complement(percent_broken=None, truncate_val=None)
Get a compatible reader that contains all development runs that are not in this reader (see check_compatibility). Useful for semi-supervised learning.
The new reader will contain the development runs that were discarded in this
reader due to truncation through percent_fail_runs
. If percent_fail_runs
was not set or this reader contains all development runs, it returns a reader
with an empty development set.
The values for percent_broken
, and truncate_val
of the new reader can be
overridden.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
percent_broken |
Optional[float]
|
Override this value in the new reader. |
None
|
truncate_val |
Optional[bool]
|
Override this value in the new reader. |
None
|
Returns:
Type | Description |
---|---|
AbstractReader
|
A compatible reader with all development runs missing in this one. |
is_mutually_exclusive(other)
Check if this reader is mutually exclusive to another reader.
Two readers are mutually exclusive if:
- they are not of the same class and therefore do not share a dataset
- their
percent_fail_runs
arguments do not overlap (float arguments overlap if they are greater than zero) - one of them is empty
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other |
AbstractReader
|
The reader to check exclusivity against. |
required |
Returns:
Type | Description |
---|---|
bool
|
Whether the readers are mutually exclusive. |
load_complete_split(split, alias)
abstractmethod
Load a complete split without truncation.
This function should return the features and targets of the desired split.
Both should be contained in a list of numpy arrays. Each of the arrays
contains one time series. The features should have a shape of [num_windows,
window_size, num_channels]
and the targets a shape of [num_windows]
. The
features should be scaled as desired. The targets should be capped by
max_rul
.
By setting alias
, it should be possible to load a split aliased as another
split, e.g. load the test split and treat it as the dev split. The data of
split
should be loaded but all pre-processing steps of alias
should be
carried out.
This function is used internally in load_split which takes care of truncation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
split |
str
|
The name of the split to load. |
required |
alias |
str
|
The split as which the loaded data should be treated. |
required |
Returns:
Name | Type | Description |
---|---|---|
features |
List[ndarray]
|
The complete, scaled features of the desired split. |
targets |
List[ndarray]
|
The capped target values corresponding to the features. |
load_split(split, alias=None)
Load a split as tensors and apply truncation to it.
This function loads the scaled features and the targets of a split into
memory. Afterwards, truncation is applied if the split
is set to dev
. The
validation set is also truncated with percent_broken
if truncate_val
is
set to True
.
By setting alias
, it is possible to load a split aliased as another split,
e.g. load the test split and treat it as the dev split. The data of split
is loaded but all pre-processing steps of alias
are carried out.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
split |
str
|
The desired split to load. |
required |
alias |
Optional[str]
|
The split as which the loaded data should be treated. |
None
|
Returns:
Name | Type | Description |
---|---|---|
features |
List[ndarray]
|
The scaled, truncated features of the desired split. |
targets |
List[ndarray]
|
The truncated targets of the desired split. |
prepare_data()
abstractmethod
Prepare the data. This function should take care of things that need to be done once, before the data can be used. This may include downloading, extracting or transforming the data, as well as fitting scalers. It is best practice to check if a preparation step was completed before to avoid repeating it unnecessarily.