Skip to content

adaption

Higher-order data modules to run unsupervised domain adaption experiments.

AdaptionDataset

Bases: Dataset

A torch dataset for unsupervised domain adaption. The dataset takes a labeled source and one or multiple unlabeled target dataset and combines them.

For each label/features pair from the source dataset, a random sample of features is drawn from each target dataset. The datasets are supposed to provide a sample as a tuple of tensors. The target datasets' labels are assumed to be the last element of the tuple and are omitted. The datasets length is determined by the source dataset. This setup can be used to train with common unsupervised domain adaption methods like DAN, DANN or JAN.

Examples:

>>> import torch
>>> import rul_datasets
>>> source = torch.utils.data.TensorDataset(torch.randn(10), torch.randn(10))
>>> target = torch.utils.data.TensorDataset(torch.randn(10), torch.randn(10))
>>> dataset = rul_datasets.adaption.AdaptionDataset(source, target)
>>> source_features, source_label, target_features = dataset[0]

__init__(labeled, *unlabeled, deterministic=False)

Create a new adaption data set from a labeled source and one or multiple unlabeled target dataset.

By default, a random sample is drawn from each target dataset when a source sample is accessed. This is the recommended setting for training. To deactivate this behavior and fix the pairing of source and target samples, set deterministic to True. This is the recommended setting for evaluation.

Parameters:

Name Type Description Default
labeled Dataset

The dataset from the labeled domain.

required
*unlabeled Dataset

The dataset(s) from the unlabeled domain(s).

()
deterministic bool

Return the same target sample for each source sample.

False

DomainAdaptionDataModule

Bases: LightningDataModule

A higher-order data module used for unsupervised domain adaption of a labeled source to an unlabeled target domain. The training data of both domains is wrapped in a AdaptionDataset which provides a random sample of the target domain with each sample of the source domain. It provides the validation and test splits of both domains, and optionally a paired dataset for both.

Examples:

>>> import rul_datasets
>>> fd1 = rul_datasets.CmapssReader(fd=1, window_size=20)
>>> fd2 = rul_datasets.CmapssReader(fd=2, percent_broken=0.8)
>>> source = rul_datasets.RulDataModule(fd1, 32)
>>> target = rul_datasets.RulDataModule(fd2, 32)
>>> dm = rul_datasets.DomainAdaptionDataModule(source, target)
>>> dm.prepare_data()
>>> dm.setup()
>>> train_1_2 = dm.train_dataloader()
>>> val_1, val_2 = dm.val_dataloader()
>>> test_1, test_2 = dm.test_dataloader()

__init__(source, target, paired_val=False, inductive=False)

Create a new domain adaption data module from a source and target RulDataModule. The source domain is considered labeled and the target domain unlabeled.

The source and target data modules are checked for compatability (see RulDataModule). These checks include that the fd differs between them, as they come from the same domain otherwise.

Parameters:

Name Type Description Default
source RulDataModule

The data module of the labeled source domain.

required
target RulDataModule

The data module of the unlabeled target domain.

required
paired_val bool

Whether to include paired data in validation.

False
inductive bool

Whether to use the target test set for training.

False

prepare_data(*args, **kwargs)

Download and pre-process the underlying data.

This calls the prepare_data function for source and target domain. All previously completed preparation steps are skipped. It is called automatically by pytorch_lightning and executed on the first GPU in distributed mode.

Parameters:

Name Type Description Default
*args Any

Passed down to each data module's prepare_data function.

()
**kwargs Any

Passed down to each data module's prepare_data function..

{}

setup(stage=None)

Load source and target domain into memory.

Parameters:

Name Type Description Default
stage Optional[str]

Passed down to each data module's setup function.

None

test_dataloader(*args, **kwargs)

Create a data loader of the source and target test data.

The data loaders are the return values of source.test_dataloader and target.test_dataloader.

Parameters:

Name Type Description Default
*args Any

Ignored. Only for adhering to parent class interface.

()
**kwargs Any

Ignored. Only for adhering to parent class interface.

{}

Returns:

Type Description
List[DataLoader]

The source and target test data loader.

train_dataloader(*args, **kwargs)

Create a data loader of an AdaptionDataset using source and target domain.

The data loader is configured to shuffle the data. The pin_memory option is activated to achieve maximum transfer speed to the GPU.

Parameters:

Name Type Description Default
*args Any

Ignored. Only for adhering to parent class interface.

()
**kwargs Any

Ignored. Only for adhering to parent class interface.

{}

Returns:

Type Description
DataLoader

The training data loader

val_dataloader(*args, **kwargs)

Create a data loader of the source, target and paired validation data.

By default, two data loaders are returned, which correspond to the source and the target validation data loader. An optional third is a data loader of a PairedRulDataset using both source and target is returned if paired_val was set to True in the constructor.

Parameters:

Name Type Description Default
*args Any

Ignored. Only for adhering to parent class interface.

()
**kwargs Any

Ignored. Only for adhering to parent class interface.

{}

Returns:

Type Description
List[DataLoader]

The source, target and an optional paired validation data loader.

LatentAlignDataModule

Bases: DomainAdaptionDataModule

A higher-order data module based on DomainAdaptionDataModule.

It is specifically made to work with the latent space alignment approach by Zhang et al. The training data of both domains is wrapped in a AdaptionDataset which splits the data into healthy and degrading. For each sample of degrading source data, a random sample of degrading target data and healthy sample of either source or target data is drawn. The number of steps in degradation are supplied for each degrading sample, as well. The data module also provides the validation and test splits of both domains, and optionally a paired dataset for both.

Examples:

>>> import rul_datasets
>>> fd1 = rul_datasets.CmapssReader(fd=1, window_size=20)
>>> fd2 = rul_datasets.CmapssReader(fd=2, percent_broken=0.8)
>>> src = rul_datasets.RulDataModule(fd1, 32)
>>> trg = rul_datasets.RulDataModule(fd2, 32)
>>> dm = rul_datasets.LatentAlignDataModule(src, trg, split_by_max_rul=True)
>>> dm.prepare_data()
>>> dm.setup()
>>> train_1_2 = dm.train_dataloader()
>>> val_1, val_2 = dm.val_dataloader()
>>> test_1, test_2 = dm.test_dataloader()

__init__(source, target, paired_val=False, inductive=False, split_by_max_rul=False, split_by_steps=None)

Create a new latent align data module from a source and target RulDataModule. The source domain is considered labeled and the target domain unlabeled.

The source and target data modules are checked for compatability (see RulDataModule). These checks include that the fd differs between them, as they come from the same domain otherwise.

The healthy and degrading data can be split by either maximum RUL value or the number of time steps. See split_healthy for more information.

Parameters:

Name Type Description Default
source RulDataModule

The data module of the labeled source domain.

required
target RulDataModule

The data module of the unlabeled target domain.

required
paired_val bool

Whether to include paired data in validation.

False
split_by_max_rul bool

Whether to split healthy and degrading by max RUL value.

False
split_by_steps Optional[int]

Split the healthy and degrading data after this number of time steps.

None

split_healthy(features, targets, by_max_rul=False, by_steps=None)

Split the feature and target time series into healthy and degrading parts and return a dataset of each.

If by_max_rul is set to True the time steps with the maximum RUL value in each time series is considered healthy. This option is intended for labeled data with piece-wise linear RUL functions. If by_steps is set to an integer, the first by_steps time steps of each series are considered healthy. This option is intended for unlabeled data or data with a linear RUL function.

One option has to be set and both are mutually exclusive.

Parameters:

Name Type Description Default
features List[ndarray]

List of feature time series.

required
targets List[ndarray]

List of target time series.

required
by_max_rul bool

Whether to split healthy and degrading data by max RUL value.

False
by_steps Optional[int]

Split healthy and degrading data after this number of time steps.

None

Returns:

Name Type Description
healthy RulDataset

Dataset of healthy data.

degraded RulDataset

Dataset of degrading data.