Latent Alignment Approach¶

In [1]:

Copied!





import matplotlib.pyplot as plt
import omegaconf
import pytorch_lightning as pl
import rul_datasets
import torch
from torch.utils.data import DataLoader

import rul_adapt
import matplotlib.pyplot as plt
import omegaconf
import pytorch_lightning as pl
import rul_datasets
import torch
from torch.utils.data import DataLoader

import rul_adapt

Training Preparations¶

The Latent Alignment uses a first-time-to-predict (FTTP) estimation approach to appropriately label the XJTU-SY dataset. The FTTP approach uses a GAN trained only on the first few windows of each bearing which are considered healthy. The discriminator of the GAN is used to calculate a healthy index for each window. The FTTP is the index of the first window that exceeds a threshold defined as a multiple of the average health index of the training data.

The RUL Adapt library implements the FTTP approach but includes the results in the configuration for an easier replication process. The GAN was trained as follows:

In [4]:

Copied!





reader = rul_datasets.XjtuSyReader(
    1, run_split_dist={"dev": [1, 2, 3, 4, 5], "val": [], "test": []}
)
reader.prepare_data()
chunk_extractor = rul_adapt.approach.latent_align.ChunkWindowExtractor(5, 256)
features, targets = reader.load_split("dev")
features, targets = zip(
    *(
        chunk_extractor(feature, target)
        for feature, target in zip(features, targets)
    )
)
healthy, _ = rul_datasets.adaption.split_healthy(
    features, targets, by_steps=15 * 128
)
healthy_dl = DataLoader(healthy, batch_size=128, shuffle=True)

feature_extractor = rul_adapt.model.CnnExtractor(
    2,
    [64, 32, 16],
    features[0].shape[1],
    fc_units=512,
    act_func=torch.nn.LeakyReLU,
    fc_act_func=torch.nn.LeakyReLU,
)
regressor = rul_adapt.model.FullyConnectedHead(
    512, [1], act_func_on_last_layer=False
)
generator = rul_adapt.model.CnnExtractor(
    1,
    [64, 32, 16, 512, 2],
    features[0].shape[1],
    [10, 10, 10, 1, 1],
    padding=True,
    act_func=torch.nn.LeakyReLU,
)

approach = rul_adapt.approach.LatentAlignFttpApproach(
    features[0].shape[1], lr=1e-5
)
approach.set_model(feature_extractor, regressor, generator)

trainer = pl.Trainer(max_epochs=1)

trainer.fit(approach, healthy_dl)
reader = rul_datasets.XjtuSyReader(
    1, run_split_dist={"dev": [1, 2, 3, 4, 5], "val": [], "test": []}
)
reader.prepare_data()
chunk_extractor = rul_adapt.approach.latent_align.ChunkWindowExtractor(5, 256)
features, targets = reader.load_split("dev")
features, targets = zip(
    *(
        chunk_extractor(feature, target)
        for feature, target in zip(features, targets)
    )
)
healthy, _ = rul_datasets.adaption.split_healthy(
    features, targets, by_steps=15 * 128
)
healthy_dl = DataLoader(healthy, batch_size=128, shuffle=True)

feature_extractor = rul_adapt.model.CnnExtractor(
    2,
    [64, 32, 16],
    features[0].shape[1],
    fc_units=512,
    act_func=torch.nn.LeakyReLU,
    fc_act_func=torch.nn.LeakyReLU,
)
regressor = rul_adapt.model.FullyConnectedHead(
    512, [1], act_func_on_last_layer=False
)
generator = rul_adapt.model.CnnExtractor(
    1,
    [64, 32, 16, 512, 2],
    features[0].shape[1],
    [10, 10, 10, 1, 1],
    padding=True,
    act_func=torch.nn.LeakyReLU,
)

approach = rul_adapt.approach.LatentAlignFttpApproach(
    features[0].shape[1], lr=1e-5
)
approach.set_model(feature_extractor, regressor, generator)

trainer = pl.Trainer(max_epochs=1)

trainer.fit(approach, healthy_dl)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name               | Type                  | Params
-------------------------------------------------------------
0 | gan_loss           | BCEWithLogitsLoss     | 0     
1 | grl                | GradientReversalLayer | 0     
2 | _feature_extractor | CnnExtractor          | 10.4 M
3 | _regressor         | FullyConnectedHead    | 513   
4 | _generator         | CnnExtractor          | 36.1 K
-------------------------------------------------------------
10.5 M    Trainable params
0         Non-trainable params
10.5 M    Total params
41.928    Total estimated model params size (MB)

Training: 0it [00:00, ?it/s]

/home/tilman/Programming/rul-adapt/.venv/lib/python3.8/site-packages/torch/nn/modules/conv.py:309: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at ../aten/src/ATen/native/Convolution.cpp:895.)
  return F.conv1d(input, weight, bias, self.stride,
`Trainer.fit` stopped: `max_epochs=1` reached.

Afterward, the GAN discriminator is used to calculate the health index. We chose 2.5 times the average health index of the training data as the threshold because none was given in the paper.

In [5]:

Copied!





features, _ = reader.load_split("dev")
health_indicator = rul_adapt.approach.latent_align.get_health_indicator(
    approach, features[0], 5, 256
)
threshold = 2.5 * health_indicator[:10].mean()
plt.plot(health_indicator)
plt.axhline(threshold, color="red", linestyle="--")
plt.show()
features, _ = reader.load_split("dev")
health_indicator = rul_adapt.approach.latent_align.get_health_indicator(
    approach, features[0], 5, 256
)
threshold = 2.5 * health_indicator[:10].mean()
plt.plot(health_indicator)
plt.axhline(threshold, color="red", linestyle="--")
plt.show()

No description has been provided for this image

The following function can be used for conveniently calculating the FTTP for a given bearing.

In [6]:

Copied!





fttp = rul_adapt.approach.latent_align.get_first_time_to_predict(
    approach,
    features[0],
    window_size=5,
    chunk_size=256,
    healthy_index=15,
    threshold_coefficient=2.5
)
print(f"FTTP of Bearing 1-1: {fttp}")
fttp = rul_adapt.approach.latent_align.get_first_time_to_predict(
    approach,
    features[0],
    window_size=5,
    chunk_size=256,
    healthy_index=15,
    threshold_coefficient=2.5
)
print(f"FTTP of Bearing 1-1: {fttp}")

FTTP of Bearing 1-1: 83

Reproduce original configurations¶

You can reproduce the original experiments of Zhang et al. by using the get_latent_align constructor function. Known differences to the original paper:

The healthy state cutoff for FTTP estimation was set to 15 because none was given in the paper.
The threshold coefficient for FTTP estimation was set to 2.5 because none was given in the paper.
The generator architecture for FTTP estimation was altered as we found no way to make the one from the paper work.

In this example we re-create the configuration for adapting CMAPSS FD003 to FD001. Additional kwargs for the trainer, e.g. accelerator="gpu" for training on a GPU, can be passed to the function, too.

In [7]:

Copied!

pl.seed_everything(42)  # make reproducible
dm, latent, trainer = rul_adapt.construct.get_latent_align("cmapss", 3, 1, max_epochs=1)
pl.seed_everything(42)  # make reproducible
dm, latent, trainer = rul_adapt.construct.get_latent_align("cmapss", 3, 1, max_epochs=1)

Global seed set to 42
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

The networks, feature_extractor and regressor, can be accessed as properties of the latent object.

In [8]:

Copied!

latent.feature_extractor
latent.feature_extractor

Out[8]:

CnnExtractor(
  (_layers): Sequential(
    (conv_0): Sequential(
      (0): Conv1d(14, 32, kernel_size=(3,), stride=(1,), padding=valid)
      (1): LeakyReLU(negative_slope=0.01)
    )
    (conv_1): Sequential(
      (0): Conv1d(32, 16, kernel_size=(3,), stride=(1,), padding=valid)
      (1): LeakyReLU(negative_slope=0.01)
    )
    (conv_2): Sequential(
      (0): Conv1d(16, 1, kernel_size=(3,), stride=(1,), padding=valid)
      (1): LeakyReLU(negative_slope=0.01)
    )
    (3): Flatten(start_dim=1, end_dim=-1)
    (fc): Sequential(
      (0): Dropout(p=0.5, inplace=False)
      (1): Linear(in_features=24, out_features=256, bias=True)
      (2): LeakyReLU(negative_slope=0.01)
    )
  )
)

Training is done in the PyTorch Lightning fashion. We used the trainer_kwargs to train only one epoch for demonstration purposes.

In [9]:

Copied!

trainer.fit(latent, dm)
trainer.test(latent, dm)
trainer.fit(latent, dm)
trainer.test(latent, dm)

  | Name               | Type                               | Params
--------------------------------------------------------------------------
0 | train_mse          | MeanSquaredError                   | 0     
1 | healthy_align      | HealthyStateAlignmentLoss          | 0     
2 | direction_align    | DegradationDirectionAlignmentLoss  | 0     
3 | level_align        | DegradationLevelRegularizationLoss | 0     
4 | fusion_align       | MaximumMeanDiscrepancyLoss         | 0     
5 | evaluator          | AdaptionEvaluator                  | 0     
6 | _feature_extractor | CnnExtractor                       | 9.4 K 
7 | _regressor         | FullyConnectedHead                 | 257   
--------------------------------------------------------------------------
9.6 K     Trainable params
0         Non-trainable params
9.6 K     Total params
0.039     Total estimated model params size (MB)

Sanity Checking: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=1` reached.

Testing: 0it [00:00, ?it/s]

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0             DataLoader 1
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    test/source/rmse         83.4994888305664
    test/source/score           368967.1875
    test/target/rmse                                  84.54061889648438
    test/target/score                                     340130.5
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Out[9]:

[{'test/source/rmse/dataloader_idx_0': 83.4994888305664,
  'test/source/score/dataloader_idx_0': 368967.1875},
 {'test/target/rmse/dataloader_idx_1': 84.54061889648438,
  'test/target/score/dataloader_idx_1': 340130.5}]

If you only want to see the hyperparameters, you can use the get_latent_align_config function. This returns an omegaconf.DictConfig which you can modify.

In [10]:

Copied!

three2one_config = rul_adapt.construct.get_latent_align_config("cmapss", 3, 1)
print(omegaconf.OmegaConf.to_yaml(three2one_config, resolve=True))
three2one_config = rul_adapt.construct.get_latent_align_config("cmapss", 3, 1)
print(omegaconf.OmegaConf.to_yaml(three2one_config, resolve=True))

dm:
  source:
    _target_: rul_datasets.CmapssReader
    fd: 3
    window_size: 30
  target:
    _target_: rul_datasets.CmapssReader
    fd: 1
    window_size: 30
    percent_broken: 1.0
  kwargs:
    batch_size: 128
  adaption_kwargs:
    inductive: true
    split_by_steps: 80
feature_extractor:
  _convert_: all
  _target_: rul_adapt.model.CnnExtractor
  input_channels: 14
  units:
  - 32
  - 16
  - 1
  seq_len: 30
  fc_units: 256
  fc_dropout: 0.5
  act_func: torch.nn.LeakyReLU
  fc_act_func: torch.nn.LeakyReLU
regressor:
  _convert_: all
  _target_: rul_adapt.model.FullyConnectedHead
  input_channels: 256
  act_func_on_last_layer: false
  units:
  - 1
latent_align:
  _target_: rul_adapt.approach.LatentAlignApproach
  alpha_healthy: 1.0
  alpha_direction: 1.0
  alpha_level: 1.0
  alpha_fusion: 1.0
  labels_as_percentage: true
  lr: 0.0005
trainer:
  _target_: pytorch_lightning.Trainer
  max_epochs: 2000

Run your own experiments¶

You can use the Latent Alignment implementation to run your own experiments with different hyperparameters or on different datasets. Here we adapt from FD001 to FD003 of XJTU-SY which is missing from the original paper.

In [2]:

Copied!





source = rul_datasets.XjtuSyReader(
    1,
    first_time_to_predict=[100] * 5,  # dummy values
    norm_rul=True,
    run_split_dist={"dev": [1, 2, 3, 4, 5], "val": [], "test": []}
)
target = rul_datasets.XjtuSyReader(
    3,
    percent_broken=0.5,
    truncate_degraded_only=True,
    first_time_to_predict=[100] * 5,  # dummy values
    norm_rul=True,
    run_split_dist={"dev": [3], "val": [], "test": [3]},
)
extractor = rul_adapt.approach.latent_align.ChunkWindowExtractor(5, 256)
dm = rul_datasets.LatentAlignDataModule(
    rul_datasets.RulDataModule(source, 128, extractor),
    rul_datasets.RulDataModule(target, 128, extractor),
    split_by_max_rul=True,
    inductive=True,
)

feature_extractor = rul_adapt.model.CnnExtractor(
    2,
    [32, 16, 1],
    1280,
    fc_units=256,
    fc_dropout=0.5,
    act_func=torch.nn.LeakyReLU,
    fc_act_func=torch.nn.LeakyReLU,
)

regressor = rul_adapt.model.FullyConnectedHead(256, [1], act_func_on_last_layer=False)

latent = rul_adapt.approach.LatentAlignApproach(1.0, 1.0, 1.0, 1.0, lr=5e-4)
latent.set_model(feature_extractor, regressor)

trainer = pl.Trainer(max_epochs=1)

trainer.fit(latent, dm)
trainer.test(latent, dm)
source = rul_datasets.XjtuSyReader(
    1,
    first_time_to_predict=[100] * 5,  # dummy values
    norm_rul=True,
    run_split_dist={"dev": [1, 2, 3, 4, 5], "val": [], "test": []}
)
target = rul_datasets.XjtuSyReader(
    3,
    percent_broken=0.5,
    truncate_degraded_only=True,
    first_time_to_predict=[100] * 5,  # dummy values
    norm_rul=True,
    run_split_dist={"dev": [3], "val": [], "test": [3]},
)
extractor = rul_adapt.approach.latent_align.ChunkWindowExtractor(5, 256)
dm = rul_datasets.LatentAlignDataModule(
    rul_datasets.RulDataModule(source, 128, extractor),
    rul_datasets.RulDataModule(target, 128, extractor),
    split_by_max_rul=True,
    inductive=True,
)

feature_extractor = rul_adapt.model.CnnExtractor(
    2,
    [32, 16, 1],
    1280,
    fc_units=256,
    fc_dropout=0.5,
    act_func=torch.nn.LeakyReLU,
    fc_act_func=torch.nn.LeakyReLU,
)

regressor = rul_adapt.model.FullyConnectedHead(256, [1], act_func_on_last_layer=False)

latent = rul_adapt.approach.LatentAlignApproach(1.0, 1.0, 1.0, 1.0, lr=5e-4)
latent.set_model(feature_extractor, regressor)

trainer = pl.Trainer(max_epochs=1)

trainer.fit(latent, dm)
trainer.test(latent, dm)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

  | Name               | Type                               | Params
--------------------------------------------------------------------------
0 | train_mse          | MeanSquaredError                   | 0     
1 | healthy_align      | HealthyStateAlignmentLoss          | 0     
2 | direction_align    | DegradationDirectionAlignmentLoss  | 0     
3 | level_align        | DegradationLevelRegularizationLoss | 0     
4 | fusion_align       | MaximumMeanDiscrepancyLoss         | 0     
5 | evaluator          | AdaptionEvaluator                  | 0     
6 | _feature_extractor | CnnExtractor                       | 328 K 
7 | _regressor         | FullyConnectedHead                 | 257   
--------------------------------------------------------------------------
328 K     Trainable params
0         Non-trainable params
328 K     Total params
1.314     Total estimated model params size (MB)

Sanity Checking: 0it [00:00, ?it/s]

/home/tilman/Programming/rul-adapt/.venv/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:109: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(

Training: 0it [00:00, ?it/s]

`Trainer.fit` stopped: `max_epochs=1` reached.
/home/tilman/Programming/rul-adapt/.venv/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:109: UserWarning: Total length of `DataLoader` across ranks is zero. Please make sure this was your intention.
  rank_zero_warn(

Testing: 0it [00:00, ?it/s]

/home/tilman/Programming/rul-adapt/.venv/lib/python3.8/site-packages/lightning_fabric/utilities/data.py:55: UserWarning: `DataLoader` returned 0 length. Please make sure this was your intention.
  rank_zero_warn(

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0             DataLoader 1
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    test/target/rmse                                 0.3762059807777405
    test/target/score                                 1309.618896484375
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Out[2]:

[{},
 {'test/target/rmse/dataloader_idx_1': 0.3762059807777405,
  'test/target/score/dataloader_idx_1': 1309.618896484375}]