adaption
Higher-order data modules to run unsupervised domain adaption experiments.
AdaptionDataset
Bases: Dataset
A torch dataset for unsupervised domain adaption. The dataset takes a labeled source and one or multiple unlabeled target dataset and combines them.
For each label/features pair from the source dataset, a random sample of features is drawn from each target dataset. The datasets are supposed to provide a sample as a tuple of tensors. The target datasets' labels are assumed to be the last element of the tuple and are omitted. The datasets length is determined by the source dataset. This setup can be used to train with common unsupervised domain adaption methods like DAN, DANN or JAN.
Examples:
>>> import torch
>>> import rul_datasets
>>> source = torch.utils.data.TensorDataset(torch.randn(10), torch.randn(10))
>>> target = torch.utils.data.TensorDataset(torch.randn(10), torch.randn(10))
>>> dataset = rul_datasets.adaption.AdaptionDataset(source, target)
>>> source_features, source_label, target_features = dataset[0]
__init__(labeled, *unlabeled, deterministic=False)
Create a new adaption data set from a labeled source and one or multiple unlabeled target dataset.
By default, a random sample is drawn from each target dataset when a source
sample is accessed. This is the recommended setting for training. To
deactivate this behavior and fix the pairing of source and target samples,
set deterministic
to True
. This is the recommended setting for evaluation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
labeled |
Dataset
|
The dataset from the labeled domain. |
required |
*unlabeled |
Dataset
|
The dataset(s) from the unlabeled domain(s). |
()
|
deterministic |
bool
|
Return the same target sample for each source sample. |
False
|
DomainAdaptionDataModule
Bases: LightningDataModule
A higher-order data module used for unsupervised domain adaption of a labeled source to an unlabeled target domain. The training data of both domains is wrapped in a AdaptionDataset which provides a random sample of the target domain with each sample of the source domain. It provides the validation and test splits of both domains, and optionally a paired dataset for both.
Examples:
>>> import rul_datasets
>>> fd1 = rul_datasets.CmapssReader(fd=1, window_size=20)
>>> fd2 = rul_datasets.CmapssReader(fd=2, percent_broken=0.8)
>>> source = rul_datasets.RulDataModule(fd1, 32)
>>> target = rul_datasets.RulDataModule(fd2, 32)
>>> dm = rul_datasets.DomainAdaptionDataModule(source, target)
>>> dm.prepare_data()
>>> dm.setup()
>>> train_1_2 = dm.train_dataloader()
>>> val_1, val_2 = dm.val_dataloader()
>>> test_1, test_2 = dm.test_dataloader()
__init__(source, target, paired_val=False, inductive=False)
Create a new domain adaption data module from a source and target RulDataModule. The source domain is considered labeled and the target domain unlabeled.
The source and target data modules are checked for compatability (see
RulDataModule). These
checks include that the fd
differs between them, as they come from the same
domain otherwise.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
RulDataModule
|
The data module of the labeled source domain. |
required |
target |
RulDataModule
|
The data module of the unlabeled target domain. |
required |
paired_val |
bool
|
Whether to include paired data in validation. |
False
|
inductive |
bool
|
Whether to use the target test set for training. |
False
|
prepare_data(*args, **kwargs)
Download and pre-process the underlying data.
This calls the prepare_data
function for source and target domain. All
previously completed preparation steps are skipped. It is called
automatically by pytorch_lightning
and executed on the first GPU in
distributed mode.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Passed down to each data module's |
()
|
**kwargs |
Any
|
Passed down to each data module's |
{}
|
setup(stage=None)
Load source and target domain into memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stage |
Optional[str]
|
Passed down to each data module's |
None
|
test_dataloader(*args, **kwargs)
Create a data loader of the source and target test data.
The data loaders are the return values of source.test_dataloader
and target.test_dataloader
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Ignored. Only for adhering to parent class interface. |
()
|
**kwargs |
Any
|
Ignored. Only for adhering to parent class interface. |
{}
|
Returns:
Type | Description |
---|---|
List[DataLoader]
|
The source and target test data loader. |
train_dataloader(*args, **kwargs)
Create a data loader of an AdaptionDataset using source and target domain.
The data loader is configured to shuffle the data. The pin_memory
option is
activated to achieve maximum transfer speed to the GPU.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Ignored. Only for adhering to parent class interface. |
()
|
**kwargs |
Any
|
Ignored. Only for adhering to parent class interface. |
{}
|
Returns:
Type | Description |
---|---|
DataLoader
|
The training data loader |
val_dataloader(*args, **kwargs)
Create a data loader of the source, target and paired validation data.
By default, two data loaders are returned, which correspond to the source
and the target validation data loader. An optional third is a data loader of a
PairedRulDataset using both source and
target is returned if paired_val
was set to True
in the constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
Any
|
Ignored. Only for adhering to parent class interface. |
()
|
**kwargs |
Any
|
Ignored. Only for adhering to parent class interface. |
{}
|
Returns:
Type | Description |
---|---|
List[DataLoader]
|
The source, target and an optional paired validation data loader. |
LatentAlignDataModule
Bases: DomainAdaptionDataModule
A higher-order data module based on DomainAdaptionDataModule.
It is specifically made to work with the latent space alignment approach by Zhang et al. The training data of both domains is wrapped in a AdaptionDataset which splits the data into healthy and degrading. For each sample of degrading source data, a random sample of degrading target data and healthy sample of either source or target data is drawn. The number of steps in degradation are supplied for each degrading sample, as well. The data module also provides the validation and test splits of both domains, and optionally a paired dataset for both.
Examples:
>>> import rul_datasets
>>> fd1 = rul_datasets.CmapssReader(fd=1, window_size=20)
>>> fd2 = rul_datasets.CmapssReader(fd=2, percent_broken=0.8)
>>> src = rul_datasets.RulDataModule(fd1, 32)
>>> trg = rul_datasets.RulDataModule(fd2, 32)
>>> dm = rul_datasets.LatentAlignDataModule(src, trg, split_by_max_rul=True)
>>> dm.prepare_data()
>>> dm.setup()
>>> train_1_2 = dm.train_dataloader()
>>> val_1, val_2 = dm.val_dataloader()
>>> test_1, test_2 = dm.test_dataloader()
__init__(source, target, paired_val=False, inductive=False, split_by_max_rul=False, split_by_steps=None)
Create a new latent align data module from a source and target RulDataModule. The source domain is considered labeled and the target domain unlabeled.
The source and target data modules are checked for compatability (see
RulDataModule). These
checks include that the fd
differs between them, as they come from the same
domain otherwise.
The healthy and degrading data can be split by either maximum RUL value or the number of time steps. See split_healthy for more information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
RulDataModule
|
The data module of the labeled source domain. |
required |
target |
RulDataModule
|
The data module of the unlabeled target domain. |
required |
paired_val |
bool
|
Whether to include paired data in validation. |
False
|
split_by_max_rul |
bool
|
Whether to split healthy and degrading by max RUL value. |
False
|
split_by_steps |
Optional[int]
|
Split the healthy and degrading data after this number of time steps. |
None
|
split_healthy(features, targets, by_max_rul=False, by_steps=None)
Split the feature and target time series into healthy and degrading parts and return a dataset of each.
If by_max_rul
is set to True
the time steps with the maximum RUL value in
each time series is considered healthy. This option is intended for labeled data
with piece-wise linear RUL functions. If by_steps
is set to an integer,
the first by_steps
time steps of each series are considered healthy. This
option is intended for unlabeled data or data with a linear RUL function.
One option has to be set and both are mutually exclusive.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[ndarray]
|
List of feature time series. |
required |
targets |
List[ndarray]
|
List of target time series. |
required |
by_max_rul |
bool
|
Whether to split healthy and degrading data by max RUL value. |
False
|
by_steps |
Optional[int]
|
Split healthy and degrading data after this number of time steps. |
None
|
Returns:
Name | Type | Description |
---|---|---|
healthy |
RulDataset
|
Dataset of healthy data. |
degraded |
RulDataset
|
Dataset of degrading data. |