Unsupervised Transfer Learning Method via Cycle-Flow Adversarial Networks for Transient Fault Detection under Various Operation Conditions

Yang, Xiaoyue; Chen, Long; Feng, Qidong; Yang, Yucheng; Xie, Sen

doi:10.3390/s24154839

Open AccessArticle

Unsupervised Transfer Learning Method via Cycle-Flow Adversarial Networks for Transient Fault Detection under Various Operation Conditions

by

Xiaoyue Yang

¹

,

Long Chen

¹,

Qidong Feng

²,

Yucheng Yang

¹ and

Sen Xie

^3,*

¹

School of Rail Transportation, Wuyi University, Jiangmen 529020, China

²

CRRC Guangdong Railway Vehicles Co., Ltd., Jiangmen 529100, China

³

Institute of Intelligence Science and Engineering, Shenzhen Polytechnic University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(15), 4839; https://doi.org/10.3390/s24154839

Submission received: 19 May 2024 / Revised: 18 July 2024 / Accepted: 23 July 2024 / Published: 25 July 2024

(This article belongs to the Special Issue Advanced Sensing and Fault Diagnosis for Complex Manufacturing Processes)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The efficient fault detection (FD) of traction control systems (TCSs) is crucial for ensuring the safe operation of high-speed trains. Transient faults (TFs) can arise due to prolonged operation and harsh environmental conditions, often being masked by background noise, particularly during dynamic operating conditions. Moreover, acquiring a sufficient number of samples across the entire scenario presents a challenging task, resulting in imbalanced data for FD. To address these limitations, an unsupervised transfer learning (TL) method via federated Cycle-Flow adversarial networks (CFANs) is proposed to effectively detect TFs under various operating conditions. Firstly, a CFAN is specifically designed for extracting latent features and reconstructing data in the source domain. Subsequently, a transfer learning framework employing federated CFANs collectively adjusts the modified knowledge resulting from domain alterations. Finally, the designed federated CFANs execute transient FD by constructing residuals in the target domain. The efficacy of the proposed methodology is demonstrated through comparative experiments.

Keywords:

fault detection; Cycle-Flow adversarial network (CFAN); transfer learning (TL); traction control system (TCS); various operation conditions

1. Introduction

High-speed trains have emerged as one of the most crucial components within intelligent transportation systems. Traction control systems (TCSs), serving as the core power systems for high-speed trains, are intricately linked to trains’ reliability and safety. However, they also represent a major source of faults in both long-term operation and harsh operating environments. Consequently, fault detection and diagnosis (FDD) has become an active area of research over the past few decades [1,2,3].

Currently, FDD methods for high-speed trains can be broadly categorized into three groups: model-based approaches, signal-based approaches, and data-driven approaches. Despite their accessibility and high efficiency in producing FDD results, establishing model-based methods is challenging due to practical uncertainties and complex designs. Signal-based methods exhibit limited effectiveness in detecting minor symptoms, particularly in dynamic scenarios [4].

In the meantime, due to the widespread deployment of sensors in complex systems, data-driven methods have been extensively advocated for accomplishing fault detection and diagnosis (FDD) tasks by effectively processing a massive volume of data [1,5,6,7]. In [8], the authors proposed a discriminative stacked autoencoder (D-SAE) network based on feature integration boosting for bearing fault diagnosis. This method mitigated the performance degradation and enhanced the generalization ability in various scenarios. Ref. [9] proposed an innovative fault detection (FD) method for bogie. In this study, a Monte Carlo-based perturbation technique is employed to amplify the distinction between unexpected faults and known ones. Consequently, the FD outcome for unexpected faults can be obtained using dropout-based Bayesian deep learning. The authors in [10] proposed a fault diagnosis method for braking friction based on a one-dimensional convolutional neural network (1DCNN) and the GraphSAGE network. This approach effectively addresses the challenge of imbalanced fault samples by considering the correlation between different fault features. In addition, ref. [11] presented an incipient FDD method for running gear systems that leveraged Hellinger distance and slow feature analysis.

The aforementioned FDD methods primarily address permanent faults (PFs) in mechanical components or systems. However, transient faults (TFs), as a type of incipient fault, have the potential to develop into PFs and are responsible for most failures observed in electronic devices such as power electronics, sensors, and traction control units (TCUs) within TCSs.

In the context of complex industrial systems, multiple fault detection and diagnosis (FDD) methods have been developed specifically for transfer functions (TFs) [12,13,14,15,16,17,18]. Ref. [12] assesses and demonstrates the ability of a bulk built-in current sensor’s (BBICS) architecture to detect multiple and simultaneous TFs for integrated circuits. Ref. [14] studies fault tolerance in switching reconfigurable nano-crossbar arrays, considering both TFs and PFs. In [15], an innovative ontology-based fault propagation analysis approach (ontologyFPA) is proposed to analyze transient fault propagation effects in networked control systems (NCSs). Ref. [17] presents a TF detection and classification approach in power transmission lines based on graph convolutional neural networks. In [18], an optimal fractional-order method is proposed for TF diagnosis, which suppresses background noise and amplifies the faulty part of the signal. Afterward, kurtosis and the fault duration time are applied to locate the fault component.

However, the methods mentioned above perform in static or one fixed operation condition, which are not involved in dynamic cases [4]. Different operation conditions may lead to significant distribution differences, which means that an intelligent FDD model trained on data under a certain operation condition is usually not applicable to other operation conditions [19,20]. Traditional deep learning approaches necessitate a plethora of samples from diverse operational conditions for effective model training. Conversely, a TCS typically operates under steady-state conditions, resulting in imbalanced distributions across various operation conditions [21].

The primary challenges in the field of TF detection encompass the following:

TFs exhibit sporadic and stochastic behavior, leading to impermanent damage that disappears unpredictably.
The distribution of samples across different operational conditions is imbalanced, particularly for faulty samples which are significantly underrepresented.
The features of TFs are inherently weak and can easily be overshadowed by background noise, especially in dynamic scenarios.

These characteristics make TFs challenging to detect.

In this context, transfer learning (TL) has been extensively discussed for extracting latent feature information and achieving precise fault detection under dynamic operation conditions. TL aims to enhance the performance of target domains by leveraging the knowledge embedded in diverse but related source domains, thereby reducing the reliance on a substantial amount of target domain data for constructing target learners [21,22].

Several FDD methods with TL have been developed for electrical systems. Ref. [23] proposes an FD method for traction converter faults in traction drive systems. This method consists of a federal neural network based on a variational autoencoder (VAE), which can perform the FD task with performance degradation. The authors of ref. [24] developed a hierarchical method for transformer rectifier unit (TRU) fault diagnosis and a transfer learning-based fault diagnosis method without training new models for different TRUs. In [25], a novel transferrable open-circuit fault diagnosis method is proposed for insulated gate bipolar transistors in three-phase inverters, which can be applied to different systems with the same topology but different parameters. The authors of ref. [26] developed an adversarial-based deep TL model that can detect and classify short-circuit faults in DC microgrids without using historical fault data. Ref. [27] proposes a transfer learning-based fault location method for voltage source convertor-based high-voltage direct current (VSC-HVDC) transmission lines. This method can locate faults with small training datasets. However, executing the task of transient FD for TCS in dynamic operation conditions is still an urgent problem that needs to be solved.

Motivated by the discussions above, we propose a TL strategy to detect the transient faults of TCS under various operation conditions. In the proposed method, a Cycle-Flow adversarial network (CFAN) is first constructed for latent feature extraction and data reconstruction in steady operation conditions. Secondly, a TL framework with the federated CFANs jointly adjust the changed information caused by varied operation conditions. The two mentioned steps are to learn and preserve knowledge under normal cases. Finally, designed federated CFANs reconstruct residuals with faulty data for transient FD under dynamic operation conditions.

The contributions of the proposed method are summarized as follows:

A CFAN is proposed for latent variable extraction and data reconstruction, which consists of an invertible flow model and two discriminative networks; the loss function is designed as well. Specifically, bidirectional optimization can enhance the quality of reconstruction while mitigating interference caused by background noise through adversarial training and flexible inference.
The proposed federated CFAN-based TL is divided into two stages. Initially, the first CFAN model is trained using normal data in steady operation conditions. Subsequently, the second CFAN calibrates the changed information caused by varied operation conditions utilizing limited data. In conclusion, the federated CFANs can jointly learn latent knowledge in a steady state and be applied to transient fault detection in various operation conditions.
Simulation experiments are conducted on various transient faults using the normal steady state of TCS as the source domain and the dynamic operation condition as the target domain. The simulation results show that the federated CFAN-based TL method can improve the performance of transient fault detection.

The remainder of this paper is organized as follows: Section 2 states the transient fault detection problems and flow basics. Section 3 details the proposed transfer learning fault detection strategy based on federated CFANs. In Section 4, the experiment results and data sources are briefly described. Finally, the conclusions and prospects are given in Section 5.

2. Background and Preliminaries

2.1. Problem Statement

The schematic diagram of the TCS is shown in Figure 1. The pantograph delivers single-phase AC power from the public grid to the transformer. The rectifier receives a lower voltage

u_{n}

and current

i_{n}

from the transformer and converts single-phase AC into DC voltages (

u_{c d 1}

,

u_{c d 2}

) stabled by DC-link. The inverter then outputs three-phase AC voltage/current (

u_{u} / i_{s a}

,

u_{v} / i_{s b}

,

u_{w} / i_{s c}

) to drive the asynchronous traction motors. In addition, the traction control unit (TCU) receives the sensor signals and sends the gate control signals

s p w m

and

s v p w m

.

As the attended time of high-speed trains increases, irreversible scenarios will arise in components of the TCS [1]. TFs caused by these irreversible changes are temporary faults but may not necessarily cause permanent damage. TFs are usually induced by the internal structural defects and manufacturing processes of active components. Furthermore, noise signals such as electromagnetic interference, spark discharge, lightning strikes, load fluctuation, etc., also contribute to TFs.

There is analog signal interference in its external communication connections for TCU faults. Consider the three-phase current,

i_{s a, s b, s c}

, where its fault current is as follows:

i_{s a, s b, s c}^{f} = i_{s a, s b, s c}^{0} + f (p, q, A)

(1)

where

f (p, q, A)

represents transient pulses described by a double-exponential module, and

p

and

q

are the time coefficients of the injection signal, which codetermines the width of the injection pulse, rising time, and falling time.

A

is the amplitude coefficient of the injection signal. The control strategy will compensate for the aforementioned fault by leveraging closed-loop regulation, thereby rendering them challenging to detect or diagnose using conventional methodologies.

Furthermore, sensor faults caused by surges in power, ground wires, and grid-side voltage fluctuations are also causes of TFs. For the value of the

U

-phase current

i_{u}

, its fault current is as follows:

i_{u}^{f} (t) = i_{u}^{0} (t) + δ (t)

(2)

where

i_{u}^{0}

is the current value in a normal state, and

δ (t)

is the short-duration pulse value caused by the above factors.

In addition, soft errors of components in the traction control unit can also lead to non-permanent mutations in the output of the sensors. TFs usually appear randomly and disappear in a short period, which results in uncertainty [18].

2.2. Preliminaries of Normalizing Flow

Normalizing flow (NF) is a transformation of a simple probability distribution into a more complex distribution by a sequence of invertible and differentiable mappings, which allows for an exact likelihood calculation [28,29]. Therefore, NF has been widely used in image processing, denoising, and anomaly detection [30,31,32,33]. Suppose

x

is a high-dimensional random vector with a known probability density function (PDF)

p_{x} (x)

. The latent variable z is typically assumed to follow a specific distribution, usually the multivariate unit Gaussian distribution

N (0, I)

, which compels the model to learn the input data distribution. Assuming

z ~ p_{z} (z)

,

x

and

z

are all D-dimensional; the PDFs of the given data are as follows:

\int_{z} p_{z} (z) d z = \int_{x} p_{x} (x) d x = 1

(3)

|p_{z} (z) \cdot d z| = |p_{x} (x) \cdot d x|

(4)

p_{x} (x) = p_{z} (z) |\frac{d z}{d x}| = p_{z} (z) |\det (\frac{\partial z}{\partial x})|

(5)

\log p_{x} (x) = \log p_{z} (z) + \log | d e t (\frac{\partial z}{\partial x}) |

(6)

The generation process can be expressed as follows:

z = f (x), x = g (z)

(7)

where

f (\cdot)

is a reversible function that transforms a random variable

x

into

z

, which is also called bijection.

g (\cdot)

is the inverse function of

f (\cdot)

such that given for a data

x

, the variable inference is completed by

z = f (x) = g^{- 1} (x)

, and

θ

is the parameter.

Therefore, (5) and (6) can be written as follows:

p_{x} (x) = p_{z} (z) |d e t (\frac{\partial f}{\partial x})| {= p}_{z} (z) |\det J (Z)|

(8)

\log p_{x} (x) = \log p_{z} (z) + \log | d e t (\frac{\partial f}{\partial x}) | = \log p_{z} (z) + \log | \det J (Z) |

(9)

where

J (Z)

is the

B \times B

Jacobian matrix.

As shown in Figure 2, transformation

g

molds the PDF

p_{z} (z)

into

p_{x} (x)

. The absolute Jacobian determinant

| \det J (Z) |

quantifies the relative volume change in a small neighborhood around

z

due to

g

[34].

Based on the above, NF can complete the distribution transformation of any complexity.

As shown in (7). Considering the forward process, fitting a flow-based model

f (\cdot)

can be achieved by minimizing the Kullback–Leibler (KL) divergence between the target distribution and the

p_{z} (z)

can be expressed as follows:

\begin{array}{l} L (θ) & = D_{K L} [p_{x}^{*} (x) | | p_{x} (x)] \\ = - E_{p_{x}^{*} (x)} [\log p_{x} (x)] + c \\ = - E_{p_{x}^{*} (x)} [\log p_{z} (z) + \log |\det J (Z)|] + c \\ = - E_{p_{x}^{*} (x)} [\log p_{z} (g (z)) + \log |\det J (Z)|] + c \end{array}

(10)

c = - M \cdot \log a

(11)

where

a

is determined by the discretization level of the data and

M

is the dimension of

z

. Assuming the target data samples

{z (n)}_{n = 1}^{N}

from target distribution, the expectation of target distribution can be estimated by Monte Carlo as follows:

L (θ) \approx - \frac{1}{N} \sum_{n = 1}^{N} \log p_{z} (g (z (n))) + \log |\det J (z (n))| + c

(12)

3. The Proposed Federated CFAN-Based Transfer Learning Strategy

Motivated by the research on image processing based on NF [35], a federated CFAN-based TL strategy to detect transient faults in TCSs is proposed due to its reversibility and flexibility in modeling various distributions.

In this work, the source domain is represented by

D^{s}

, where

d

represents data, which denotes the measurements under the steady operation of TCSs. Similarly, target domain data are represented by

D^{t}

, which represents the measurements in dynamic operation. There will be some differences in data distribution between the source and target domain. New knowledge can be acquired through reasonable adjustments of previous knowledge. This transfer-learning approach can achieve better FD performance than using only the target domain data [36].

3.1. Principle of CFAN

The framework of the proposed CFAN model is shown in Figure 3. In this work, the forward process of CFAN can be defined as

H

, and the reverse process is expressed as

H^{- 1}

. Consider source domain sample

{d_{s} (k)}_{k = 1}^{M} \in D^{s}

. The target of the defined model

H

is to learn the potential features of the source domain in which

θ_{1}

represents hyperparameters. Model

H

contains two mapping functions, the forward process

H

, and the reverse process

H^{- 1}

. In addition, two adversarial discriminative networks,

D_{f}

and

D_{r}

, are introduced, where

D_{f}

aims to differentiate between

d_{s} (k)

and the generated data

H (d_{s} (k); θ_{1})

. Similarly,

D_{r}

aims to distinguish between

d_{s} (k)

and

H^{- 1} (d_{s} (k); θ_{1})

.

D_{f}

encourages

H

to transform

d_{s} (k)

into an output (itself) that is indistinguishable from

d_{s} (k)

and vice versa for

D_{r}

and

H^{- 1}

.

Given a

B

dimensional input,

d_{s} (k)

:

d_{s} (k) = [\begin{matrix} d_{s}^{1} (k) \\ ⋮ \\ d_{s}^{B} (k) \end{matrix}] \in R^{B}

(13)

d_{s} (k)

is split into

d_{s}^{1 : b} (k)

and

d_{s}^{b + 1 : B} (k)

, which are given as follows:

d_{s}^{1 : b} (k) = [\begin{matrix} d_{s}^{1} (k) \\ ⋮ \\ d_{s}^{b} (k) \end{matrix}] \in R^{b}

(14)

d_{s}^{b + 1 : B} (k) = [\begin{matrix} d_{s}^{b + 1} (k) \\ ⋮ \\ d_{s}^{B} (k) \end{matrix}] \in R^{B - b}

(15)

The affine coupling layer is presented in Figure 4; the output

h (k)

of an affine coupling layer follows Equations (16) and (17).

h_{1 : b} (k) = d_{s}^{1 : b} (k)

(16)

h_{b + 1 : B} (k) = d_{s}^{b + 1 : B} (k) ⨀ e x p (s (d_{s}^{1 : b} (k))) + t (d_{s}^{1 : b} (k))

(17)

Finally,

h_{1 : b} (k)

and

h_{b + 1 : B} (k)

are merged into one group

h (k)

.

As the reverse input

h (k)

and output

x (k)

, its reverse process can be expressed as follows:

x_{1 : b} (k) = h_{1 : b} (k) = d_{s}^{1 : b} (k)

(18)

x_{b + 1 : B} (k) = \frac{h_{b + 1 : B} (k) - t (h_{1 : b} (k))}{e x p (s (h_{1 : b} (k)))} = \frac{d_{s}^{b + 1 : B} (k) - t (d_{s}^{1 : b} (k))}{e x p (s (d_{s}^{1 : b} (k)))}

(19)

where

s (\cdot)

represents the scaling function,

t (\cdot)

represents the translation function, and ⨀ is the Hadamard or element-wise product.

Considering the forward process, the Jacobian matrix of transformation

f (\cdot)

can be expressed as follows:

\frac{\partial h}{\partial {d_{s} (k)}^{T}} = [\begin{matrix} i d e n t i t y & 0 \\ \frac{\partial h_{b + 1 : B}}{\partial {d_{s}^{1 : b} (k)}^{T}} & d i a g o n a l \end{matrix}]

(20)

The upper left area of the Jacobian matrix is an identity matrix

I

. Since

d_{s}^{1 : b} (k)

is irrelevant to

h_{b + 1 : B} (k)

, the upper right area of the Jacobian matrix is a zero matrix, 0. The lower right area of the Jacobian matrix is a diagonal matrix with the diagonal element

e x p (s (d_{s}^{1 : b} (k)))

. Therefore, the calculation of the lower left area of the Jacobian matrix can be ignored. Because the Jacobian of

s (\cdot)

or

t (\cdot)

is not necessary for computing the Jacobian determinant of the coupling layer,

s (\cdot)

or

t (\cdot)

can be arbitrarily complex for various network designs.

Although the coupling layer may be powerful, the distribution is often very complex in practice. Moreover, it is challenging to transform a complex distribution into another; one transformation is often insufficient. In addition, the forward transformation leaves some components unchanged, with the first d dimensions being identical to the initial data. Figure 5 illustrates the composition of the coupling layer in an alternating pattern. This structure allows different parts of the data to be passed through different transformation paths. It ensures that the final generated data do not contain components originating from the initial data [30]. Combining coupling layers is carried out as follows:

H = f_{1} ◦ f_{2} ◦ \dots ◦ f_{k}

(21)

Then, its reverse process can be expressed as follows:

H^{- 1} = {(f_{k} {◦ \dots ◦ f_{2} ◦ f}_{1})}^{- 1} = {f_{1}}^{- 1} ◦ {f_{2}}^{- 1} ◦ \dots ◦ {f_{k}}^{- 1}

(22)

As mentioned above, to minimize the error between the input and reconstructed output, the expectation of

p_{D^{s}} (d_{s} (k))

can be estimated by Monte Carlo as follows:

L (θ) \approx - \frac{1}{N} \sum_{n = 1}^{N} \log p_{d_{s} (k) ~ D^{s}} (H (d_{s} (k); θ_{1}))) + \log |\det J_{H} (d_{s} (k))|

(23)

Similarly, for the reverse process,

L (θ) \approx - \frac{1}{N} \sum_{n = 1}^{N} \log p_{d_{s} (k) ~ D^{s}} (H^{- 1} (d_{s} (k); θ_{1}))) + \log |\det J_{H^{- 1}} (d_{s} (k))|

(24)

For the forward process, the loss function

{l o s s}_{H}

of the model

H

in this work can be expressed as follows:

{l o s s}_{H} = E_{d_{s} (k) ~ D^{s}} [\log D_{f} (d_{s} (k))] + E_{d_{s} (k) ~ D^{s}} [\log (1 - D_{f} (H (d_{s} (k); θ_{1})))]

(25)

where

D_{f}

is a discriminative network and

θ_{f}

is a hyperparameter, and then the loss function of

D_{f}

can be expressed as follows:

{l o s s}_{D_{f}} = E_{d_{s} (k) ~ D^{s}} [{(D_{f} (d_{s} (k)) - 1)}^{2}] + E_{d_{s} (k) ~ D^{s}} [{(D_{f} (H (d_{s} (k); θ_{1})))}^{2}]

(26)

Similarly, the loss function

{l o s s}_{H^{- 1}}

of the reverse process can be expressed as follows:

{l o s s}_{H^{- 1}} = E_{d_{s} (k) ~ D^{s}} [\log D_{r} (d_{s} (k))] + E_{d_{s} (k) ~ D^{s}} [\log (1 - D_{r} (H^{- 1} (d_{s} (k); θ_{1})))]

(27)

where

D_{r}

is a discriminative network, and

θ_{r}

is a hyperparameter, and then the loss function of

D_{r}

can be expressed as follows:

{l o s s}_{D_{r}} = E_{d_{s} (k) ~ D^{s}} [{(D_{r} (d_{s} (k)) - 1)}^{2}] + E_{d_{s} (k) ~ D^{s}} [{(D_{r} (H^{- 1} (d_{s} (k); θ_{1})))}^{2}]

(28)

The total loss

L_{t o t a l}

of the proposed CFAN is presented as follows:

L_{t o t a l} = {l o s s}_{H} + {l o s s}_{H^{- 1}} + {l o s s}_{D_{f}} + {l o s s}_{D_{r}}

(29)

The overall optimization objective of the model can be written as follows:

〈θ_{1}^{*}, θ_{f}^{*}, θ_{r}^{*}〉 = a r g \min_{H, H^{- 1}} \max_{D_{f}, D_{r}} L_{t o t a l}

(30)

In summary, the CFAN can learn knowledge in the source domains by adversarial training, and the trained hyperparameter is

{θ_{1}}^{*}

.

The trained

{C F A N}^{*}

model can perform FD tasks under steady operation conditions(The training progress is detailed in Algorithm 1). However, the distributed discrepancies arising from diverse operational conditions result in a decline in its overall performance. To mitigate this issue, fine-tuning of the model is necessary to attain optimal FD performance through TL.

Algorithm 1: offline

Loop

for number of training iterations do

Sample from dataset D^{s}

, {d_{s} (k)}_{k = 1}^{M} ϵ D^{s}

Learning H

,

H^{- 1}

:

For d_{s} (1)

, compute H (d_{s} (1); θ_{1})

, H^{- 1} (d_{s} (1); θ_{1})

Backward propagate {l o s s}_{H}

,

{l o s s}_{H^{- 1}}

,

update θ_{1}

by Adam optimizer [36]

Learning D_{f}

,

D_{r}

:

For d_{s} (1)

, compute {D_{f} (d}_{s} (1))

, {D_{r} (d}_{s} (1))

Backward propagate {l o s s}_{D_{f}}

, {l o s s}_{D_{r}}

,

update θ_{f}

,

θ_{r}

by Adam optimizer

For d_{s} (1)

, compute D_{f} (H (d_{s} (1)))

,

D_{r} (H^{- 1} (d_{s} (1)))

Backward propagate {l o s s}_{D_{f}}

,

{l o s s}_{D_{r}}

,

update θ_{f}

,

θ_{r}

by Adam optimizer

end for

end loop

3.2. Fault Detection with Transfer Learning Based on Federated CFANs

This work aims to establish an FD model under dynamic operation with TL. The first CFAN reflects the information on steady-state operation in the system, which was trained in the previous step. The second CFAN learns the performance changes influenced by domain changes. This design concept involves neural model-aided learning to identify changing and unchanging crucial parameters. The framework of the proposed TL strategy is illustrated in Figure 6.

The data of target domain

D^{t}

are input into the

{C F A N}_{1}^{*}

after training. Due to the different data distribution between the

D^{s}

and

D^{t}

, their performance will also change. Consider target domain sample

{d_{t} (n)}_{n = 1}^{L}

from

D^{t}

, where the residual signal

e_{1} (n) ~ E_{1}

can be expressed as follows:

e_{1} (n) = d_{t} (n) - {C F A N}_{1}^{*} (d_{t} (n); {θ_{1}}^{*})

(31)

From the above formula,

e_{1}

is the path between the source and the target domain, which retains the information when the operation conditions change. The

{C F A N}_{2}

has the ability to calibrate the knowledge changes caused by the varied operation conditions. The construction of

{C F A N}_{2}

is similar to

{C F A N}_{1}

, and

θ_{2}

is a hyperparameter. In addition, it also includes the discriminative network

D_{f 2}

and

D_{r 2}

, in which

θ_{f 2}

and

θ_{r 2}

are hyperparameters. The loss function

{l o s s}_{H_{2}}

of

{C F A N}_{2}

can be expressed as follows:

{l o s s}_{H_{2}} = E_{d_{t} (n) ~ D^{t}} [\log D_{f 2} (e_{1} (n))] + E_{d_{t} (n) ~ D^{t}} [\log (1 - D_{f 2} (H_{2} (d_{t} (n); θ_{2})))]

(32)

The loss function

{l o s s}_{D_{f 2}}

of

D_{f 2}

can be expressed as follows:

{l o s s}_{D_{f 2}} = E_{d_{t} (n) ~ D^{t}} [{(D_{f 2} (e_{1} (n)) - 1)}^{2}] + E_{d_{t} (n) ~ D^{t}} [{(D_{f 2} (H_{2} (d_{t} (n); θ_{2})))}^{2}]

(33)

The loss function of the reverse process

H_{2}^{- 1}

can be expressed as follows:

{l o s s}_{H_{2}^{- 1}} = E_{d_{t} (n) ~ D^{t}} [\log D_{r 2} (d_{t} (n))] + E_{e_{1} (n) ~ E_{1}} [\log (1 - D_{r 2} (H_{2}^{- 1} (e_{1} (n); θ_{2})))]

(34)

The loss function

{l o s s}_{D_{r 2}}

of

D_{r 2}

can be expressed as follows:

{l o s s}_{D_{r 2}} = E_{d_{t} (n) ~ D^{t}} [{(D_{r 2} (d_{t} (n)) - 1)}^{2}] + E_{e_{1} (n) ~ E_{1}} [{(D_{r 2} (H_{2}^{- 1} (e_{1} (n); θ_{2})))}^{2}]

(35)

In summary, the total loss

L_{t o t a l 2}

of

{C F A N}_{2}

is formulated as follows:

L_{t o t a l 2} = {l o s s}_{H_{2}} + {l o s s}_{H_{2}^{- 1}} + {l o s s}_{D_{f 2}} + {l o s s}_{D_{r 2}}

(36)

The overall optimization objective of the proposed TL model is provided as follows:

〈θ_{2}^{*}, θ_{f 2}^{*}, θ_{r 2}^{*}〉 = a r g \min_{H_{2}, H_{2}^{- 1}} \max_{D_{f 2}, D_{r 2}} L_{t o t a l 2}

(37)

The

{C F A N}_{2}^{*}

learns the performance variation of the

{C F A N}_{1}^{*}

due to varied operation conditions; the training process of

{C F A N}_{2}^{*}

is detailed in Algorithm 2. The change information

{\hat{e}}_{1} (k)

is obtained using the following formula:

{\hat{e}}_{1} (m) = {C F A N}_{2}^{*} (d_{t} (m); {θ_{2}}^{*})

(38)

Algorithm 2: offline

Loop

for number of training iterations do

Sample from dataset {d_{t} (n)}_{n = 1}^{L} ϵ D^{t}

Learning H_{2}

,

H_{2}^{- 1}

:

For d_{t} (1)

, compute e_{1} (n) = d_{t} (1) - {C F A N}_{1}^{*} (d_{t} (1); {θ_{1}}^{*})

For d_{t} (1)

, compute H_{2} (d_{t} (1); θ_{2})

, a H_{2}^{- 1} (e_{1} (n); θ_{2})

Backward propagate {l o s s}_{H_{2}}

, {l o s s}_{H_{2}^{- 1}}

, update θ_{2}

by Adam optimizer

Learning D_{f 2}

, D_{r 2}

:

For e_{1} (n)

, d_{t} (1)

, compute D_{f 2}

(e_{1} (n)

), D_{r 2}

(d_{t} (1)

)

Backward propagate {l o s s}_{D_{f 2}}

, {l o s s}_{D_{f 2}}

,

update θ_{f 2}

,

θ_{r 2}

by Adam optimizer

For d_{t} (1)

, e_{1} (n)

, compute D_{f 2} (H_{2} (d_{t} (1); θ_{2}))

,

D_{r 2} (H_{2}^{- 1} (e_{1} (n); θ_{2})

Backward propagate {l o s s}_{D_{f 2}}

,

{l o s s}_{D_{r 2}}

,

update θ_{f 2}

,

θ_{r 2}

by Adam optimizer

end for

end loop

Based on the above analysis, the residual signal

r (m)

used for the final FD decision is defined as follows:

r (m) = {e_{1} (m) - \hat{e}}_{1} (m) = d_{t} (m) - {C F A N}_{1}^{*} (d_{t} (m); {θ_{1}}^{*}) - {C F A N}_{2}^{*} (d_{t} (m); {θ_{2}}^{*})

(39)

According to the final decision signal

r

,

m

represents the dimension of

r

. The framework of the proposed federated CFANs is depicted in Figure 7.

This work utilizes the root mean square (RMS) norm to maintain satisfactory false alarm rates (FARs) in high-dimensional situations. The RMS measures the average energy of a signal

r

and is defined by the following formula:

{J (r (m))}_{R M S} = \frac{1}{n} {(r (m)}^{T} r (m))

(40)

The threshold is set to be

J_{t h} = \sup_{fault - free} {J (r)}_{R M S}

(41)

Then, the fault detection logic becomes

\{\begin{matrix} {J (r (m))}_{R M S} \leq J_{t h} ⟹ n o a l a r m, f a u l t - f r e e \\ {J (r (m))}_{R M S} > J_{t h} ⟹ a l a r m, a f a u l t i s d e t e c t e d . \end{matrix}

(42)

The flowchart of the proposed method is illustrated in Figure 8, comprising an offline training phase and an online fault detection (FD) phase. The first CFAN-based model

{C F A N}_{1}

is trained by using the normal data

D^{s}

obtained during steady operation conditions to extract latent variables and reconstruct data. Subsequently, the model

{C F A N}_{2}

undergoes federated training based on dynamic operation condition data

D^{t}

. The trained federated neural networks

{C F A N}_{1}^{*}

and

{C F A N}_{2}^{*}

enable feature extraction and the reconstruction of the healthy data. Thus, the residual

r

is calculated using the federated CFANs. Finally, with the FD threshold

J_{t h}

being determined by the RMS statistics of the residual

r

, the

{J (r (m))}_{R M S}

of the testing data is compared with

J_{t h}

to realize the FD of the TCS.

4. Experiment Results and Analysis

In this section, the data source and experimental platform are briefly described. To verify the effectiveness of the proposed method, FD tasks with different methods were performed on the TCS under dynamic operation conditions. Some discussions are proposed based on the experimental results.

4.1. Data Description

In this case, a TCS is adopted to demonstrate the effectiveness of the proposed FD method. A simulation platform of traction drive control systems named “TDCS-FIB” is presented in [37,38]. TDCS-FIB develops fault injection benchmarks based on simulation models. TDCS-FIB provides a variety of fault injection types for the main components in TCS, which provides reliable data support for fault detection and diagnosis.

To verify the proposed method, a TCS with different TFs is adopted. As depicted in Figure 9, the onboard TCS serves as the experimental system, with its specifications presented in Table 1. The sensor data were collected under traction operation conditions.

In practice, transient faults will lead to abnormal data from multiple sensors. Multi-sensor FD can reduce interference and improve detection efficiency [39,40]. Therefore, multi-sensor data are used to detect transient faults, which include the three-phase current output

[i_{s a} i_{s b} i_{s c}]

of an inverter, the voltage output

[u_{c d 1} u_{c d 2}]

of the upper and lower support capacitors in the DC link, and the transformer secondary voltage and current

[u_{n} i_{n}]

. The FD model of TCS is trained based on the sensor signals as follows:

[i_{s a} i_{s b} i_{s c} u_{c d 1} u_{c d 2} u_{n} i_{n}] \in D

(43)

where

[i_{s a} i_{s b} i_{s c} u_{c d 1} u_{c d 2} u_{n} i_{n}] \in D^{s}, D^{t}

. The collected data can be expressed as

D^{f}

for the transient faults under dynamic operation conditions.

Since the waveforms of the seven groups of sensors tend to be stable after the

1 \times 10^{4}

-th step,

1 \times 10^{3}

samples in the normal steady state of the TCS are obtained as the source domain training dataset

D^{s}

, and

2 \times 10^{2}

samples in the dynamic condition and

50

in steady are used as the target domain training dataset

D^{t}

.

The test dataset in the dynamic state contains four transient faults and fault-free scenarios. Each fault scenario contains

5 \times 10^{2}

samples, and the fault-free scenario contains

1 \times 10^{5}

samples. The evaluation of the experimental results is completed using the false alarm rate, fault detection rate (FDR), recall, and accuracy rate (ACR), which are defined as follows:

F D R = \frac{{T P}_{F x}}{{T P}_{F x} + {F P}_{F x}}

(44)

F A R = \frac{F N}{T N + F N}

(45)

r e c a l l = \frac{T P}{T P + F P}

(46)

A C R = \frac{T P + T N}{T P + F P + T N + F N}

(47)

Define fault samples as positive samples and normal samples as negative samples. The total number of fault samples predicted to be correct is called true positive (

T P

). The total number of fault samples predicted to be errors is called false positive (

F P

). The total number of normal samples predicted to be correct is true negative (

T N

), and the total number of errors is false negative (

F N

).

F x

represents the

x

type of fault.

The proposed model was built by Pytorch 1.13.1. The

{C F A N}_{1}

and

{C F A N}_{2}

models have the same structure and contain four affine coupling layers.

s (\cdot)

includes two fully connected layers with

2 \times 100

neurons.

t (\cdot)

includes two fully connected layers with

2 \times 100

neurons. The two discriminators

D (\cdot)

use the same fully connected structure with

(200,100,50,1)

. According to the loss function

L_{t o t a l}

defined in (29) and

L_{t o t a l 2}

defined in (35), the best weights and biases can be obtained via ADAM. The details of the CFANs and methods for comparison are given in Table 2 and Table 3.

4.2. Analysis and Discussion

Comparisons between each FD task and other methods were conducted, encompassing four types of FD tasks and fault-free detection tasks for each method. Figure 10, Figure 11, Figure 12 and Figure 13 illustrate the FD results obtained using both the proposed method and VAE (including transfer and non-transfer learning). The traditional VAE refers to the VAE method without TL, while the federated VAE, which incorporates a similar TL strategy as our proposed method, is adapted for dynamic operating conditions. As shown in Figure 10a, Figure 11a, Figure 12a and Figure 13a, the blue curve represents the a-phase current waveform

i_{s a}

, and the orange dotted line represents the fault injection time. For (b), (c), and (d) in Figure 10, Figure 11, Figure 12 and Figure 13, the blue curve represents the detection results using three methods, and the red dotted line in the figures represents the FD threshold

J_{t h}

.

The fault

F_{1}

is attributed to the damage incurred by manufacturing processes, overstress, and other contributing factors on the shielding layer of communication cables. The transmission of external pulses in combinational logic circuits induces variations in both the pulse width and amplitude, which leads to TF in the TCS.

The reason for

F_{2}

faults is that the sensor chip pins and wiring are loose or improperly connected. The sensor signal is instantaneously disturbed by vibration, thereby inducing transient fault

F_{2}

.

Transient shock faults

F_{3}

may arise from improper sensor installation and the degradation of insulating materials triggered by power and ground wire surges.

The occurrence of

F_{4}

can be attributed to IGBT damage resulting from internal structural defects, manufacturing processes, and other contributing factors. Furthermore, excessive stress induced by high temperatures may lead to gate driver circuit failure, such as TF caused by erroneous pulse control signals originating from the control circuit.

The comparison results of the three methods are illustrated in Figure 14, and Table 4 shows the ACR and average fault detection delay. The proposed method comprehensively achieves better performance for different FD tasks. Specifically, Figure 14a shows the FDR, and Figure 14b shows the FAR of different methods under four types of faults. The FDR of the other two FD methods is lower than that of the method described in this article. In Figure 14b and Table 3, the FAR, recall, and ACR of different methods are all lower than those of the method proposed in this work. The traditional VAE does not include the TL process and cannot adaptively adjust the changing knowledge based on the target domain data, which causes poor FD performance.

The data distributions vary across different operation conditions of TCSs, leading to a degradation in FD performance. However, there exists common knowledge among various operation conditions, necessitating the acquisition of knowledge from the steady-state operation of a TCS. As depicted in Figure 10, Figure 11, Figure 12 and Figure 13, due to the proposed TL strategy that leverages prior knowledge and mitigates the impact of operational variations, a federated VAE outperforms a traditional VAE. The proposed TL strategy based on federated CFANs effectively transfers and adapts knowledge between steady-state and dynamic operation conditions while ensuring the accurate extraction of latent variables and data reconstruction. By leveraging the adversarial training and reversibility properties of CFANs, the precise description of data distribution is achieved through bidirectional optimization, resulting in significant performance improvements as demonstrated in Figure 14 and Table 4. Especially for weak TFs (case studies

F_{1}

,

F_{2}

, and

F_{4}

), this proposed method exhibits superior fault detection capabilities under dynamic operating conditions.

In addition, FD experiments are also introduced under steady operating conditions. The test dataset in the steady state also contains four transient faults and fault-free scenarios which are similar to

F_{1}

,

F_{2}

,

F_{3}

, and

F_{4}

in dynamic operation conditions. The performance comparison of different methods is shown in Table 5, each fault scenario contains

1 \times 10^{3}

samples, and the fault-free scenario contains

1 \times 10^{5}

samples. The comparison results of the three methods are illustrated in Figure 15 and Table 6 and Table 7.

The comparison results in steady operation conditions are illustrated in Figure 15, and Table 7 shows the FAR, recall, ACR, and average FD delay. It can be concluded that the proposed second CFAN achieves better performance for different FD tasks under steady operation conditions, for the reason that the knowledge of steady states has been learned by a small amount of data in healthy condition.

The training loss curve is defined by the Mean Squared Error (MSE) for evaluating the reconstruction accuracy. As illustrated in Figure 16a, during the training of the proposed method, the training loss stabilizes at a lower level than the other two methods, indicating the superior data reconstruction capabilities of the proposed method. Figure 16b displays the loss of

{C F A N}_{2}

and federated VAE network2. The losses of two methods converge to a similar value, which illustrates that both networks have the ability to achieve performance adjustments.

The ROC-AUC (Receiver Operating Characteristic-Area Under the Curve) curves of three methods are shown in Figure 17, the AUC of the proposed method is 0.953, while the AUC values of the traditional VAE and the federated VAE are 0.826 and 0.907, respectively. The proposed method has the largest area under the curve, indicating superior performance in terms of FD.

Generally, to ensure the security of the system, the TCS typically works in normal states. As a result, the fault occurrences have a much lower chance of appearing than the healthy instances [21]. This unsupervised method only learns normal patterns from fault-free data, which is a feasible solution to the problem of imbalanced data. Therefore, unsupervised learning improves robustness without the cost of labeling. This FD method is not limited to the TCS of the train, but for faults in other electrical systems, this method has efficient transient FD performance.

5. Conclusions

In this work, we present a transient fault detection method under dynamic operation conditions. For the purpose of latent variable extraction and data reconstruction, a CFAN is established by an invertible flow model and two discriminative networks; additionally, the loss function was designed. Moreover, adversarial training and bidirectional optimization can enhance the reconstruction quality and depress interference caused by background noise.

Then, an unsupervised transfer learning strategy based on federated CFANs is proposed for transient fault detection under various operation conditions, which is divided into two stages. Initially, the first CFAN model is trained using the normal data in steady operation conditions. Subsequently, the second CFAN calibrates the changed information caused by varied operation conditions utilizing only a few samples. The federated CFANs can jointly learn latent knowledge in steady states and be applied to transient fault detection in various operation conditions.

By selecting the data-driven fault detection methods for comparative experiments, the effectiveness of the method is verified.

Several directions are available for future work. The first is to develop fault diagnosis technology and locate faulty components further. Otherwise, the FD method employed in this work is based on the CRH2 type, and data related to high-speed trains with different topological structures have not been explored. Such out-of-distribution (OOD) data, as mentioned in [41], may negatively impact FD performance. Future work will be considered, and fault diagnosis methods for high-speed trains of multiple types will be developed.

Author Contributions

Methodology, X.Y.; software, L.C.; validation, Q.F.; writing—original draft preparation, Y.Y.; funding acquisition, S.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (62373260), Shenzhen Science and Technology Program (20231127173014002), Jiangmen Basic and Theoretical Science Research Project (2023JC01021).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because of the data confidentiality restrictions of CRRC Corporation. Requests to access the datasets should be directed to CRRC Corporation.

Acknowledgments

The authors would like to acknowledge the funding support from Xiaoyue Yang and Sen Xie. We also thank CRRC Guangdong Railway Vehicles Co., Ltd. for the experimental validation and data support.

Conflicts of Interest

Author Qidong Feng was employed by the company CRRC Guangdong Railway Vehicles Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Chen, H.; Jiang, B.; Ding, S.X.; Huang, B. Data-driven fault diagnosis for traction systems in high-speed trains: A survey, challenges, and perspectives. IEEE Trans. Intell. Transp. Syst. 2020, 23, 1700–1716. [Google Scholar] [CrossRef]
Yu, M.; Liu, J.; Liu, D.; Chen, H.; Zhang, J. Investigation of aerodynamic effects on the high-speed train exposed to longitudinal and lateral wind velocities. J. Fluids Struct. 2016, 61, 347–361. [Google Scholar] [CrossRef]
Cheng, C.; Wang, J.; Chen, H.; Chen, Z.; Luo, H.; Xie, P. A review of intelligent fault diagnosis for high-speed trains: Qualitative approaches. Entropy 2020, 23, 1. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Jiang, B. A review of fault detection and diagnosis for the traction system in high-speed trains. IEEE Trans. Intell. Transp. Syst. 2019, 21, 450–465. [Google Scholar] [CrossRef]
Feng, J.; Xu, J.; Liao, W.; Liu, Y. Review on the traction system sensor technology of a rail transit train. Sensors 2017, 17, 1356. [Google Scholar] [CrossRef] [PubMed]
Kaffash, S.; Nguyen, A.T.; Zhu, J. Big data algorithms and applications in intelligent transportation system: A review and bibliometric analysis. Int. J. Prod. Econ. 2021, 231, 107868. [Google Scholar] [CrossRef]
Zhong, K.; Wang, J.; Xu, S.; Cheng, C.; Chen, H. Overview of fault prognosis for traction systems in high-speed trains: A deep learning perspective. Eng. Appl. Artif. Intell. 2023, 126, 106845. [Google Scholar] [CrossRef]
Liu, S.; He, J.; Chen, Z.; Chen, D.; Chen, Y. Discriminative Stacked Auto-encoder: Feature-Integration Boosting for Bearing Fault Diagnosis. IEEE Sens. J. 2023, 23, 27549–27558. [Google Scholar] [CrossRef]
Wu, Y.; Jin, W.; Li, Y.; Sun, Z.; Ren, J. Detecting unexpected faults of high-speed train bogie based on bayesian deep learning. IEEE Trans. Veh. Technol. 2020, 70, 158–172. [Google Scholar] [CrossRef]
Zhang, M.; Li, X.; Xiang, Z.; Mo, J.; Xu, S. Diagnosis of brake friction faults in high-speed trains based on 1DCNN and GraphSAGE under data imbalance. Measurement 2023, 207, 112378. [Google Scholar] [CrossRef]
Cheng, C.; Liu, M.; Chen, H.; Xie, P.; Zhou, Y. Slow feature analysis-aided detection and diagnosis of incipient faults for running gear systems of high-speed trains. ISA Trans. 2022, 125, 415–425. [Google Scholar] [CrossRef] [PubMed]
Viera, R.; Dutertre, J.-M.; Flottes, M.-L.; Potin, O.; Di Natale, G.; Rouzeyre, B.; Bastos, R.P. Assessing body built-in current sensors for detection of multiple transient faults. Microelectron. Reliab. 2018, 88, 128–134. [Google Scholar] [CrossRef]
de Paiva Leite, T.F.; Fesquet, L.; Bastos, R.P. A body built-in cell for detecting transient faults and dynamically biasing subcircuits of integrated systems. Microelectron. Reliab. 2018, 88, 122–127. [Google Scholar] [CrossRef]
Tunali, O.; Altun, M. Permanent and transient fault tolerance for reconfigurable nano-crossbar arrays. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2016, 36, 747–760. [Google Scholar] [CrossRef]
Zhou, C.; Huang, X.; Naixue, X.; Qin, Y.; Huang, S. A class of general transient faults propagation analysis for networked control systems. IEEE Trans. Syst. Man, Cybern. Syst. 2015, 45, 647–661. [Google Scholar] [CrossRef]
Cai, B.; Liu, Y.; Xie, M. A dynamic-Bayesian-network-based fault diagnosis methodology considering transient and intermittent faults. IEEE Trans. Autom. Sci. Eng. 2016, 14, 276–285. [Google Scholar] [CrossRef]
Tong, H.; Qiu, R.C.; Zhang, D.; Yang, H.; Ding, Q.; Shi, X. Detection and classification of transmission line transient faults based on graph convolutional neural network. CSEE J. Power Energy Syst. 2021, 7, 456–471. [Google Scholar]
Yang, X.; Yang, C.; Yang, C.; Peng, T.; Chen, Z.; Wu, Z.; Gui, W. Transient fault diagnosis for traction control system based on optimal fractional-order method. ISA Trans. 2020, 102, 365–375. [Google Scholar] [CrossRef] [PubMed]
Ding, S.X. Data-driven design of monitoring and diagnosis systems for dynamic processes: A review of subspace technique based schemes and some recent results. J. Process Control 2014, 24, 431–449. [Google Scholar] [CrossRef]
Yang, B.; Lei, Y.; Jia, F.; Xing, S. An intelligent fault diagnosis approach based on transfer learning from laboratory bearings to locomotive bearings. Mech. Syst. Signal Process. 2019, 122, 692–706. [Google Scholar] [CrossRef]
Li, W.; Huang, R.; Li, J.; Liao, Y.; Chen, Z.; He, G.; Yan, R.; Gryllias, K. A perspective survey on deep transfer learning for fault diagnosis in industrial scenarios: Theories, applications and challenges. Mech. Syst. Signal Process. 2022, 167, 108487. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Cheng, C.; Li, X.; Xie, P.; Yang, X. Transfer Learning-aided Fault Detection for Traction Drive Systems of High-Speed Trains. IEEE Trans. Artif. Intell. 2022, 4, 689–697. [Google Scholar] [CrossRef]
Chen, S.; Ge, H.; Li, H.; Sun, Y.; Qian, X. Hierarchical deep convolution neural networks based on transfer learning for transformer rectifier unit fault diagnosis. Measurement 2021, 167, 108257. [Google Scholar] [CrossRef]
Xia, Y.; Xu, Y. A transferrable data-driven method for IGBT open-circuit fault diagnosis in three-phase inverters. IEEE Trans. Power Electron. 2021, 36, 13478–13488. [Google Scholar] [CrossRef]
Wang, T.; Zhang, C.; Hao, Z.; Monti, A.; Ponci, F. Data-driven fault detection and isolation in DC microgrids without prior fault data: A transfer learning approach. Appl. Energy 2023, 336, 120708. [Google Scholar] [CrossRef]
Shang, B.; Luo, G.; Li, M.; Liu, Y.; Hei, J. Transfer learning-based fault location with small datasets in, V.S.C.-H.V.D.C. Int. J. Electr. Power Energy Syst. 2023, 151, 109131. [Google Scholar] [CrossRef]
Kobyzev, I.; Prince, S.J.; Brubaker, M.A. Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3964–3979. [Google Scholar] [CrossRef]
Bond-Taylor, S.; Leach, A.; Long, Y.; Willcocks, C.G. Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 40, 7327–7347. [Google Scholar] [CrossRef] [PubMed]
Dinh, L.; Krueger, D.; Bengio, Y. Nice: Non-linear independent components estimation. arXiv 2014, arXiv:1410.8516. [Google Scholar]
Horvat, C.; Pfister, J.P. Denoising normalizing flow. Adv. Neural Inf. Process. Syst. 2021, 34, 9099–9111. [Google Scholar]
Gudovskiy, D.; Ishizaka, S.; Kozuka, K. Cflow-ad: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 98–107. [Google Scholar]
Kingma, D.P.; Dhariwal, P. Glow: Generative flow with invertible 1x1 convolutions. Adv. Neural Inf. Process. Syst. 2018, 31. arXiv:1807.03039. [Google Scholar]
Papamakarios, G.; Nalisnick, E.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 2021, 22, 2617–2680. [Google Scholar]
Grover, A.; Dhar, M.; Ermon, S. Flow-gan: Combining maximum likelihood and adversarial learning in generative models. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; p. 32, arXiv:1705.08868. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Cheng, C.; Qiao, X.; Teng, W.; Gao, M.; Zhang, B.; Yin, X.; Luo, H. Principal component analysis and belief-rule-base aided health monitoring method for running gears of high-speed train. Sci. China Inf. Sci. 2020, 63, 199202. [Google Scholar] [CrossRef]
Yang, X.; Qiao, X.; Cheng, C.; Zhong, K.; Chen, H. A Tutorial on Hardware-Implemented Fault Injection and Online Fault Diagnosis for High-Speed Trains. Sensors 2021, 21, 5957. [Google Scholar] [CrossRef] [PubMed]
Yang, C.; Yang, C.; Peng, T.; Yang, X.; Gui, W. A Fault-Injection Strategy for Traction Drive Control Systems. IEEE Trans. Ind. Electron. 2017, 64, 5719–5727. [Google Scholar] [CrossRef]
Yang, X.; Yang, C.; Peng, T.; Chen, Z.; Liu, B.; Gui, W. Hardware-in-the-loop fault injection for traction control system. IEEE J. Emerg. Sel. Top. Power Electron. 2018, 6, 696–706. [Google Scholar] [CrossRef]
Kirichenko, P.; Izmailov, P.; Wilson, A.G. Why normalizing flows fail to detect out-of-distribution data. Adv. Neural Inf. Process. Syst. 2020, 33, 20578–20589. [Google Scholar]

Figure 1. Circuit topology of TCS in high-speed trains.

Figure 2. Change in variables.

Figure 3. The framework of the CFAN model.

Figure 4. The structure of a coupling layer.

Figure 5. Coupling layer stacking.

Figure 6. The federated CFAN-based TL strategy.

Figure 7. The structure of federated CFAN-based transfer learning.

Figure 8. The overall flowchart of the proposed FD method.

Figure 9. The onboard TCS in high-speed trains. (a) Traction control unit. (b) Main circuit of TCS.

Figure 10. The current waveform and FD results for

F_{1}

. (a) is the traction motor a-phase current waveform

i_{s a}

of the

F_{1}

fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 10. The current waveform and FD results for

F_{1}

. (a) is the traction motor a-phase current waveform

i_{s a}

of the

F_{1}

fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 11. The current waveform and FD results for

F_{2}

. (a) is the traction motor a-phase current waveform

i_{s a}

of the

F_{2}

fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 11. The current waveform and FD results for

F_{2}

. (a) is the traction motor a-phase current waveform

i_{s a}

of the

F_{2}

fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 12. The current waveform and FD results for

F_{3}

. (a) is the traction motor a-phase current waveform

i_{s a}

of the

F_{3}

fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 12. The current waveform and FD results for

F_{3}

. (a) is the traction motor a-phase current waveform

i_{s a}

of the

F_{3}

fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 13. The current waveform and FD results for

F_{4}

. (a) is the traction motor a-phase current waveform

i_{s a}

of the

F_{4}

fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 13. The current waveform and FD results for

F_{4}

. (a) is the traction motor a-phase current waveform

i_{s a}

of the

F_{4}

fault; (b) is the FD result obtained through the proposed method; (c) is the FD result obtained through the traditional VAE; (d) is the FD result obtained through the federated VAE.

Figure 14. The comparison of results among different methods: (a) the FDR of four types of transient faults; (b) the FAR, recall, and ACR of three methods.

Figure 15. The comparison results among different methods in steady operation conditions: (a) the FDRs of three methods; (b) the FAR, recall, and ACR of three methods.

Figure 16. Loss of three methods. (a) Comparison of three methods for first network loss; (b) second network loss comparison of proposed method and federated VAE.

Figure 17. ROC-AUC of three methods.

Table 1. Specifications of experimental system under normal case.

Parameter Setting	Parameter Description	Value
$R_{s}$	Stator’s resistance	$0.114 Ω$
$R_{r}$	Rotor’s resistance	$0.146 Ω$
$L_{m}$	Magnetizing inductance	$32.747 H$
$P_{r}$	Rated power of traction motor.	$300 K W$
$n_{p}$	Motor pole pairs	$2$
$u_{d}$	Voltage of dc link	$1500 V$ , $2600 V$
$R_{n}$	Leakage resistance on line side	$0.2 Ω$
$L_{n}$	Leakage inductance on line side	$0.002 H$
$C_{d 1}, C_{d 2}$	The filter capacitors of dc link	$0.016 F$
$R_{d 1}, R_{d 2}$	The filter resistances of dc link	$6000 Ω$
$i_{u, v, w}$	Three-phase currents	$\pm 103 A$
$V_{m a x}$	Speed	$196 k m / h$

Table 2. Configuration of federated CFAN models.

Parameter Setting	$Structure of s (\cdot)$	$Structure of t (\cdot)$	$Structure of D (\cdot)$
$The {C F A N}_{1}$ model	(100,100)	(100,100)	(200,100,50,1)
$The {C F A N}_{2}$ model	(100,100)	(100,100)	(200,100,50,1)
Initial Learning rates	0.01	0.01	0.01
Activation functions	Relu, Sigmoid	Relu, Sigmoid	Relu, Relu, Relu, Sigmoid

Table 3. Configuration of methods for comparison.

Models for Comparison	Parameter Setting	Structure of Encoder	Structure of Decoder
The traditional VAE	Model	(200,50,10)	(10,50,200)
	Initial learning rates	0.01	0.01
	Activation functions	Relu	Relu
The federated VAE	First model	(200,50,10)	(10,50,200)
	Second model	(200,50,10)	(10,50,200)
	Initial learning rates	0.01	0.01
	Activation functions	Relu	Relu

Table 4. Detection results of different methods.

Faults	Methods	FDR
$F_{1}$	The proposed method	94.3%
	The traditional VAE	88.7%
	The federated VAE	93.1%
$F_{2}$	The proposed method	95.2%
	The traditional VAE	89.7%
	The federated VAE	91.0%
$F_{3}$	The proposed method	100%
	The traditional VAE	100%
	The federated VAE	100%
$F_{4}$	The proposed method	92.7%
	The traditional VAE	86.2%
	The federated VAE	88.4%

Table 5. Performance comparison of different methods.

Methods	FAR	Recall	ACR	Average FD Delay
The proposed method	8.1%	95.5%	92.5%	0.00133 s
The traditional VAE	36.5%	91.2%	68.1%	0.00142 s
The federated VAE	12.3%	93.1%	88.6%	0.00136 s

Table 6. Detection results of different methods in steady operation conditions.

Faults	Methods	FDR
$F_{1}$	The proposed method	94.2%
	The traditional VAE	93.5%
	The federated VAE	93.7%
$F_{2}$	The proposed method	96.2%
	The traditional VAE	93.2%
	The federated VAE	91.5%
$F_{3}$	The proposed method	100%
	The traditional VAE	100%
	The federated VAE	100%
$F_{4}$	The proposed method	93.5%
	The traditional VAE	89.2%
	The federated VAE	91.6%

Table 7. Performance comparison of different methods in steady operation conditions.

Methods	FAR	Recall	ACR	Average FD Delay
The proposed method	6.8%	96.0%	94.1%	0.00131 s
The traditional VAE	32.4%	94.0%	75.1%	0.00135 s
The federated VAE	8.5%	94.2%	92.3%	0.00132 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, X.; Chen, L.; Feng, Q.; Yang, Y.; Xie, S. Unsupervised Transfer Learning Method via Cycle-Flow Adversarial Networks for Transient Fault Detection under Various Operation Conditions. Sensors 2024, 24, 4839. https://doi.org/10.3390/s24154839

AMA Style

Yang X, Chen L, Feng Q, Yang Y, Xie S. Unsupervised Transfer Learning Method via Cycle-Flow Adversarial Networks for Transient Fault Detection under Various Operation Conditions. Sensors. 2024; 24(15):4839. https://doi.org/10.3390/s24154839

Chicago/Turabian Style

Yang, Xiaoyue, Long Chen, Qidong Feng, Yucheng Yang, and Sen Xie. 2024. "Unsupervised Transfer Learning Method via Cycle-Flow Adversarial Networks for Transient Fault Detection under Various Operation Conditions" Sensors 24, no. 15: 4839. https://doi.org/10.3390/s24154839

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unsupervised Transfer Learning Method via Cycle-Flow Adversarial Networks for Transient Fault Detection under Various Operation Conditions

Abstract

1. Introduction

2. Background and Preliminaries

2.1. Problem Statement

2.2. Preliminaries of Normalizing Flow

3. The Proposed Federated CFAN-Based Transfer Learning Strategy

3.1. Principle of CFAN

3.2. Fault Detection with Transfer Learning Based on Federated CFANs

4. Experiment Results and Analysis

4.1. Data Description

4.2. Analysis and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI