adaption

Higher-order data modules to run unsupervised domain adaption experiments.

`AdaptionDataset`

Bases: Dataset

A torch dataset for unsupervised domain adaption. The dataset takes a labeled source and one or multiple unlabeled target dataset and combines them.

For each label/features pair from the source dataset, a random sample of features is drawn from each target dataset. The datasets are supposed to provide a sample as a tuple of tensors. The target datasets' labels are assumed to be the last element of the tuple and are omitted. The datasets length is determined by the source dataset. This setup can be used to train with common unsupervised domain adaption methods like DAN, DANN or JAN.

Examples:

>>> import torch
>>> import rul_datasets
>>> source = torch.utils.data.TensorDataset(torch.randn(10), torch.randn(10))
>>> target = torch.utils.data.TensorDataset(torch.randn(10), torch.randn(10))
>>> dataset = rul_datasets.adaption.AdaptionDataset(source, target)
>>> source_features, source_label, target_features = dataset[0]

`init(labeled, *unlabeled, deterministic=False)`

Create a new adaption data set from a labeled source and one or multiple unlabeled target dataset.

By default, a random sample is drawn from each target dataset when a source sample is accessed. This is the recommended setting for training. To deactivate this behavior and fix the pairing of source and target samples, set deterministic to True. This is the recommended setting for evaluation.

Parameters:

Name	Type	Description	Default
`labeled`	`Dataset`	The dataset from the labeled domain.	required
`*unlabeled`	`Dataset`	The dataset(s) from the unlabeled domain(s).	`()`
`deterministic`	`bool`	Return the same target sample for each source sample.	`False`

`DomainAdaptionDataModule`

Bases: LightningDataModule

A higher-order data module used for unsupervised domain adaption of a labeled source to an unlabeled target domain. The training data of both domains is wrapped in a AdaptionDataset which provides a random sample of the target domain with each sample of the source domain. It provides the validation and test splits of both domains, and optionally a paired dataset for both.

Examples:

>>> import rul_datasets
>>> fd1 = rul_datasets.CmapssReader(fd=1, window_size=20)
>>> fd2 = rul_datasets.CmapssReader(fd=2, percent_broken=0.8)
>>> source = rul_datasets.RulDataModule(fd1, 32)
>>> target = rul_datasets.RulDataModule(fd2, 32)
>>> dm = rul_datasets.DomainAdaptionDataModule(source, target)
>>> dm.prepare_data()
>>> dm.setup()
>>> train_1_2 = dm.train_dataloader()
>>> val_1, val_2 = dm.val_dataloader()
>>> test_1, test_2 = dm.test_dataloader()

`init(source, target, paired_val=False, inductive=False)`

Create a new domain adaption data module from a source and target RulDataModule. The source domain is considered labeled and the target domain unlabeled.

The source and target data modules are checked for compatability (see RulDataModule). These checks include that the fd differs between them, as they come from the same domain otherwise.

Parameters:

Name	Type	Description	Default
`source`	`RulDataModule`	The data module of the labeled source domain.	required
`target`	`RulDataModule`	The data module of the unlabeled target domain.	required
`paired_val`	`bool`	Whether to include paired data in validation.	`False`
`inductive`	`bool`	Whether to use the target test set for training.	`False`

`prepare_data(*args, **kwargs)`

Download and pre-process the underlying data.

This calls the prepare_data function for source and target domain. All previously completed preparation steps are skipped. It is called automatically by pytorch_lightning and executed on the first GPU in distributed mode.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Passed down to each data module's `prepare_data` function.	`()`
`**kwargs`	`Any`	Passed down to each data module's `prepare_data` function..	`{}`

`setup(stage=None)`

Load source and target domain into memory.

Parameters:

Name	Type	Description	Default
`stage`	`Optional[str]`	Passed down to each data module's `setup` function.	`None`

`test_dataloader(*args, **kwargs)`

Create a data loader of the source and target test data.

The data loaders are the return values of source.test_dataloader and target.test_dataloader.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Ignored. Only for adhering to parent class interface.	`()`
`**kwargs`	`Any`	Ignored. Only for adhering to parent class interface.	`{}`

Returns:

Type	Description
`List[DataLoader]`	The source and target test data loader.

`train_dataloader(*args, **kwargs)`

Create a data loader of an AdaptionDataset using source and target domain.

The data loader is configured to shuffle the data. The pin_memory option is activated to achieve maximum transfer speed to the GPU.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Ignored. Only for adhering to parent class interface.	`()`
`**kwargs`	`Any`	Ignored. Only for adhering to parent class interface.	`{}`

Returns:

Type	Description
`DataLoader`	The training data loader

`val_dataloader(*args, **kwargs)`

Create a data loader of the source, target and paired validation data.

By default, two data loaders are returned, which correspond to the source and the target validation data loader. An optional third is a data loader of a PairedRulDataset using both source and target is returned if paired_val was set to True in the constructor.

Parameters:

Name	Type	Description	Default
`*args`	`Any`	Ignored. Only for adhering to parent class interface.	`()`
`**kwargs`	`Any`	Ignored. Only for adhering to parent class interface.	`{}`

Returns:

Type	Description
`List[DataLoader]`	The source, target and an optional paired validation data loader.

`LatentAlignDataModule`

Bases: DomainAdaptionDataModule

A higher-order data module based on DomainAdaptionDataModule.

It is specifically made to work with the latent space alignment approach by Zhang et al. The training data of both domains is wrapped in a AdaptionDataset which splits the data into healthy and degrading. For each sample of degrading source data, a random sample of degrading target data and healthy sample of either source or target data is drawn. The number of steps in degradation are supplied for each degrading sample, as well. The data module also provides the validation and test splits of both domains, and optionally a paired dataset for both.

Examples:

>>> import rul_datasets
>>> fd1 = rul_datasets.CmapssReader(fd=1, window_size=20)
>>> fd2 = rul_datasets.CmapssReader(fd=2, percent_broken=0.8)
>>> src = rul_datasets.RulDataModule(fd1, 32)
>>> trg = rul_datasets.RulDataModule(fd2, 32)
>>> dm = rul_datasets.LatentAlignDataModule(src, trg, split_by_max_rul=True)
>>> dm.prepare_data()
>>> dm.setup()
>>> train_1_2 = dm.train_dataloader()
>>> val_1, val_2 = dm.val_dataloader()
>>> test_1, test_2 = dm.test_dataloader()

`init(source, target, paired_val=False, inductive=False, split_by_max_rul=False, split_by_steps=None)`

Create a new latent align data module from a source and target RulDataModule. The source domain is considered labeled and the target domain unlabeled.

The source and target data modules are checked for compatability (see RulDataModule). These checks include that the fd differs between them, as they come from the same domain otherwise.

The healthy and degrading data can be split by either maximum RUL value or the number of time steps. See split_healthy for more information.

Parameters:

Name	Type	Description	Default
`source`	`RulDataModule`	The data module of the labeled source domain.	required
`target`	`RulDataModule`	The data module of the unlabeled target domain.	required
`paired_val`	`bool`	Whether to include paired data in validation.	`False`
`split_by_max_rul`	`bool`	Whether to split healthy and degrading by max RUL value.	`False`
`split_by_steps`	`Optional[int]`	Split the healthy and degrading data after this number of time steps.	`None`

`split_healthy(features, targets, by_max_rul=False, by_steps=None)`

Split the feature and target time series into healthy and degrading parts and return a dataset of each.

If by_max_rul is set to True the time steps with the maximum RUL value in each time series is considered healthy. This option is intended for labeled data with piece-wise linear RUL functions. If by_steps is set to an integer, the first by_steps time steps of each series are considered healthy. This option is intended for unlabeled data or data with a linear RUL function.

One option has to be set and both are mutually exclusive.

Parameters:

Name	Type	Description	Default
`features`	`List[ndarray]`	List of feature time series.	required
`targets`	`List[ndarray]`	List of target time series.	required
`by_max_rul`	`bool`	Whether to split healthy and degrading data by max RUL value.	`False`
`by_steps`	`Optional[int]`	Split healthy and degrading data after this number of time steps.	`None`

Returns:

Name	Type	Description
`healthy`	`RulDataset`	Dataset of healthy data.
`degraded`	`RulDataset`	Dataset of degrading data.

adaption

AdaptionDataset

__init__(labeled, *unlabeled, deterministic=False)

DomainAdaptionDataModule

__init__(source, target, paired_val=False, inductive=False)

prepare_data(*args, **kwargs)

setup(stage=None)

test_dataloader(*args, **kwargs)

train_dataloader(*args, **kwargs)

val_dataloader(*args, **kwargs)

LatentAlignDataModule

__init__(source, target, paired_val=False, inductive=False, split_by_max_rul=False, split_by_steps=None)

split_healthy(features, targets, by_max_rul=False, by_steps=None)

`AdaptionDataset`

`init(labeled, *unlabeled, deterministic=False)`

`DomainAdaptionDataModule`

`init(source, target, paired_val=False, inductive=False)`

`prepare_data(*args, **kwargs)`

`setup(stage=None)`

`test_dataloader(*args, **kwargs)`

`train_dataloader(*args, **kwargs)`

`val_dataloader(*args, **kwargs)`

`LatentAlignDataModule`

`init(source, target, paired_val=False, inductive=False, split_by_max_rul=False, split_by_steps=None)`

`split_healthy(features, targets, by_max_rul=False, by_steps=None)`