CN115080654B

CN115080654B - A projection-based visualization method for abnormal component inspection

Info

Publication number: CN115080654B
Application number: CN202210599858.9A
Authority: CN
Inventors: 朱敏; 杨啸; 高雯雯; 李长林
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2025-03-21
Anticipated expiration: 2042-05-30
Also published as: CN115080654A

Abstract

The present invention discloses a projection-based visualization method for abnormal component inspection. First, the aviation gas turbofan engine data is preprocessed, including key steps such as clustering, dimensionality reduction, and anomaly detection; then, visual mapping is performed using visual channels such as position, transparency, and color according to data attributes. A component projection view is designed, which uses scatter plots and contour plots to present an overview of the distribution of training data, and a special glyph is designed for abnormal component instances in the view to find similar patterns of abnormal data; a data inspection view is designed, and the timing working mode of the component under working conditions is encoded with five colors and the connecting lines therebetween in the upper part of the view, and the real data of the monitoring parameters is displayed in the lower part of the view with an area chart. The present invention can assist predictive maintenance personnel to intuitively and effectively discover anomalies in the data, and analyze the causes and patterns of the anomalies, thereby guiding the work deployment of subsequent components to extend the service life of the components.

Description

Projection-based abnormal component inspection visualization method

Technical Field

The invention relates to the technical field of information visualization and visual analysis, in particular to a projection-based abnormal component inspection visualization method.

Background

With the advent of industry 4.0, modern industrial systems have now presented a new trend towards large, automated, complex developments. Due to the improvement of the complexity of the system, the failure of key equipment can bring about huge economic loss and even casualties. Taking an airplane as an example, due to uncertainty of a working environment, the engine has the problems of inconsistent exhaust gas temperature or rotor rotation speed, surge and the like, namely, subsystems or components show abnormal states. How to find abnormal components from mass data and analyze the abnormal modes and the reasons of the abnormal components is a focus of attention of predictive maintenance personnel.

In the existing aviation system, the abnormal component checking method is often independently operated as a black box and integrated into an alarm system, an analyst can only obtain a model calculation result, and abnormal discovery and attribution analysis are difficult to develop on example-level data. In the field of artificial intelligence and data mining, although a plurality of anomaly detection methods are proposed successively in recent years, the method focuses on improving the accuracy of a model, causes of anomalies of the model are lacking from the viewpoint of data, and the statistical analysis method has the defects of poor result interpretation, complex operation of the analysis method and the like.

The information visualization and visual analysis method provides a new solution for predictive maintenance personnel, and the analysis personnel can explore rules and modes behind the data by utilizing the intuitiveness of visual presentation and flexible interaction between views. The information visualization method converts the data into perceivable graphics, symbols, colors and the like to enhance the data identification efficiency and transfer effective information. Meanwhile, the visual analysis method can combine human intelligence and machine intelligence, and utilizes the interactive visual interface to introduce human knowledge and experience into the analysis flow, so as to assist analysts to complete data analysis and reasoning decision comprehensively and intuitively.

Visual analysis works to provide effective assistance in data insight in terms of anomaly discovery. In the specific practice of abnormal component identification, although the existing visual analysis work can identify abnormal data and give an alarm in time, only the comparison analysis of the abnormal data and the normal data is considered, the exploration and attribution of similar modes of the abnormal data are lacking, and the research of the similar modes has reference value for similar fault diagnosis, abnormal component screening and high-quality data set construction.

Disclosure of Invention

In view of the above problems, the present study is to provide a projection-based abnormal component inspection visualization method, which not only can help predictive maintenance personnel to quickly find abnormal components in data, but also can support abnormal mode discovery and abnormal attribution of component data examples, and can also guide work deployment of subsequent components so as to prolong the service life of the components. The specific technical scheme is as follows:

a projection-based abnormal-component-inspection visualization method, comprising the steps of:

S1, data acquisition and processing

After the data of the aviation gas turbofan engine are obtained, the data are cleaned and processed, and the data are arranged into structural data which can be directly used in a visual view;

s2 visual mapping

Visually mapping the data obtained in step S1 through a visual channel:

Designing a component projection view, and using a scatter diagram and a contour diagram to present a distribution overview of data, wherein the degree of closeness of the distribution overview represents the similarity of a component degradation mode;

displaying real data of the monitored variables by using an area chart, and vertically stacking and discharging the area chart so that an analyst can find the relation among the variables;

s3, visual layout and realization

Visual layout is carried out on the visual module which completes mapping in the step S2, and the visual layout is realized:

in the component projection view, mapping the component into each point in the scatter diagram, and reserving the relative positions among the scatter diagrams, wherein aiming at the diagrams, the running condition of the component is recorded for an improved radar diagram in the middle, and a plurality of overlapped circular rings are arranged on the outer side;

in the data inspection view, a time axis layout method is adopted, a dotted line diagram at the top of the view shows a working condition sequence running in the current time period, and a time axis module with an area diagram is arranged at the bottom of the view;

S4, interactive design

Under the scattered point view, when the mouse is suspended on a certain data point, the point is highlighted for a user to check the local area of the scattered point in detail through zooming, a closed irregular graph is drawn on the scattered point, the scattered point is replaced by a font for deep exploration of data, and the length of a time slice coded by each circular ring in the font is controlled through a sliding bar.

Further, in step S1, the data processing includes:

(1) And (3) identifying working conditions:

dividing the working state according to 3 operation conditions of height, mach number and sea level temperature, and clustering working conditions by using k-means;

(2) Data dimension reduction:

The component is characterized by multidimensional time sequence data, namely:

X^k＝[t₁,t₂…,t_i,…,t_T],k＝1,2,…,n,t_i∈R^M

Wherein n is the number of components, M is the number of selected features, T is the length of a life cycle, T _i is the value of each feature at the moment i, and k is the number of the components;

And (3) polymerizing data by adopting a sliding time window to obtain:

wherein, w is the size of a time window, x _i is the value of each characteristic of the window w_i after aggregation;

For the resulting x _{w_i}, using t-distributed random neighborhood embedding, let the x _{w_i} vector be projected into one-dimensional space, then we will represent the building block as:

X^k∈R^T-w+1,k=1,2,...,n

aiming at the degradation curve obtained after the dimension reduction, the dimension reduction of t-SNE is carried out to a two-dimensional space, and X ^k is expressed as:

X^k∈R²,k=1,2,...,n

wherein n is the number of components;

(3) Outlier calculation:

Defining anomalies using Pearson coefficients, two time series of the s-th monitored variable for building blocks numbered k _i and k _j The correlation is defined as follows:

Wherein: The value of the s-th monitored variable, which is the component numbered k _i, at time point t; The value of the s-th monitored variable for component k _j at time t; TS is the length of the current time interval; Is that Average value in time interval TS;

wherein outliers are defined as:

Wherein n is the number of members.

Further, in step S2, the visual mapping is specifically:

s21, mapping the shape, the position and the color of the scatter diagram in the component projection view, wherein the position of the scatter diagram is used for mapping the dimension reduction result of the component;

s22, carrying out color and area mapping on the contour map in the component projection view, namely filling blank positions among contour lines by adopting specific colors, mapping the density of the component projected in a local area by using the depth of the filling color, and mapping the quantity of behaviors by using the area of petals;

S23, carrying out shape, size and color mapping on an improved radar chart of a special font in a component projection view, wherein different working condition categories are mapped by using the colors of points;

S24, mapping positions, radians and colors of an improved radar chart of a special font in a component projection view, wherein different ring positions encode different monitoring indexes;

S25, mapping the colors and the positions of the data inspection view, wherein six symbols are adopted to map six working conditions, connecting lines between origins of different colors represent time sequence relations of the components working under different working conditions, and the left side of the area diagram is connected by using three Bezier curves aiming at highly relevant monitoring variables.

Further, the visual layout scheme of the component projection view specifically includes:

S3a, mapping the scatter diagram and the contour line on a screen according to the width and the height of the screen distributed by the system, and mapping the distribution density of components in different position areas through the density and the change of the transparency of the line in the contour line to complete the basic layout of the view;

S3b, for the component font view, selecting each slice range and the monitoring variable to be displayed in a control column, and calculating the font size according to the number of the monitoring variables;

s3c, after the size is determined, the fonts are scattered according to a collision detection algorithm based on force-guided layout, so that the problem of vision shielding of the view caused by dense data of local areas and sparse remote areas is solved;

S3d, counting the operation times of each component under six working conditions, and determining the distance from each point in the improved radar chart to the central point through a linear scale;

and S3e, for the superimposed ring graph, sequentially drawing the ring from inside to outside according to the selection sequence of the monitoring variables, cutting the ring into a plurality of fragments according to the time slice length, and completing the drawing of each fragment according to the abnormal value of each time slice.

Further, the mapping the density of the component distribution in the different location areas by the density and the transparency change of the contour line in S3a specifically includes:

S3a1:two-dimensional scalar field computation

1A) For a given set of data points (x ₁,x₂…,x_n;y₁,y₂,…,y_n), determining its minimum and maximum observations, i.e., (x ₁,y₁) and (x _n,y_n), where n is the number of components;

1b) Estimating a grid group distance h to be used in performing two-dimensional grid division, thereby obtaining grid demarcation points (a ₀,a₁,…,a_Ve) and (b ₀,b₁,…,b_He) of data, wherein a _ve+1-a_ve＝h,b_he+1-b_he =h, ve=0, 1,..;

the mesh group distance h is calculated as follows:

Wherein IQR (x) is the difference between the upper quartile value and the lower quartile value of the sample, n is the number of components;

1c) Counting the scattered data frequency in each grid area, namely (g _0,0,g_0,1,…,g_Ve,He);

S3a2 contour calculation

2A) Determining an initial threshold value, comparing the data value in the grid with the set threshold value, marking the grid larger than the threshold value as 1, and otherwise marking the grid value as 0, thereby obtaining a binary image;

2b) Constructing a binary index according to four values of corners of each cell, scanning around the cell in a clockwise direction, and generating a four-bit index by using bitwise OR operation from the highest displacement to the lowest displacement of the upper left corner;

2c) Accessing a pre-built look-up table using the cell index, listing the desired edges representing the cells;

2d) Applying linear interpolation between the raw data values to find the exact position of the contour along the cell edges;

s3a3 scale mapping and color mapping

The contour fill color is the contour color curve transitions smoothly from white to dark, and the system represents the area density as a value of 0 to 1.

Further, in the step S3c, the discrete opening method of the font is as follows:

S3c1, obtaining the maximum value and the minimum value of xy coordinates of all data selected by the noose, mapping the difference value between the maximum value and the minimum value into the width and the height of the canvas, recalculating the real coordinates of the data in the canvas according to a linear scale, and reprojecting the data points into a new canvas;

s3c2, assuming the data points of the canvas as dotted particles with the same mass radius, and enabling the particles to move by adding different mechanical models;

s3c3, adding collision force to each particle, and calculating the position and the speed of the particle after the delta t time by using Verlet;

and S3c4, repeatedly iterating for a plurality of times to obtain the position of the particle at the final speed of 0, namely the position of the final data point, wherein the particles are related and separated and tend to be stable.

Further, the S3d specifically is:

S3d1, mapping the operation statistics under each working condition into a distance cRadius from the center of the radar chart to the axis, wherein the calculation process is as follows:

Wherein: The maximum operation times under the working condition o; The sum _o is the running times of the current component under the working condition O, radarRadius is the radius of the radar chart, and O is the total number of working condition categories;

S3d2, calculating X-axis coordinates xpos _o and Y-axis coordinates ypos _o of the point in the graph according to the distance between the point and the circle center, wherein the X-axis coordinates xpos _o and the Y-axis coordinates ypos _o are specifically as follows:

s3d3, mapping the life cycle length of the component by the size of the center circle, mapping the operation times of the size of the scattered points on the radar coordinate axis under the working condition, keeping the same with the position away from the circle center, and calculating the secondary coding as follows:

Wherein rul _min is the shortest life cycle of the component, rul _max is the longest life cycle of the component, R is the maximum radius of the central circle, R is the maximum radius of the coordinate axis circle, rulRadius is the radius of the central circle, and dotArea _o is the distance from the point represented by the working condition o to the circle center;

S3d4, mapping the abnormal value of the subsequence of each monitoring variable into the length of an arc, and calculating the occupied angle of the arc as follows:

wherein RULLen is the life cycle length of the member degradation; representing the length of the s-th monitored variable of component k within the time interval TS;

s3d5, the distance between the circle represented by the S-th monitoring variable and the center of the circle is as follows:

rRadius_s=rDis+(rGap+rBandWidth)·(s-1),s=1,2,3,...,M

Where M is the number of selected features, rDis inner ring radius, rGap is the spacing distance between rings, and rBandWidth is the width of the rings.

Further, in step S3, the specific process of the visual layout and implementation of the data inspection view is as follows:

S31, determining a life cycle range of display, designing an area diagram, and connecting two highly-related monitoring variables by adopting a cubic Bezier curve according to calculation;

S32, determining the colors corresponding to the working conditions, determining the display forms of the dots, including the layout positions and the connection line drawing modes, and laying out and realizing the dot-line graph.

The beneficial effects of the invention are as follows:

1) The invention aims to overcome the defect of the existing method in the aspect of component abnormal data pattern analysis. The traditional alarm system often cannot acquire the specific reasons of the data abnormality. According to the visualization method, abnormal data in the data can be intuitively found through dimension reduction projection, and a special font is designed for positioning the abnormal reasons of the abnormal data subset. The abnormal mode can be clearly and intuitively displayed through the juxtaposition of the special fonts.

2) The invention overcomes the defects that the traditional system can only check the original data, has large cognitive load and is difficult to locate abnormally. The visualization method subdivides the working conditions of the components into six types, and a user can directly check the working condition sequence of the working of the components to explain the component abnormality caused by the abnormality of the working condition sequence. Meanwhile, the three-time Bessel connection can assist a user to find a monitoring index with extremely high relevance, and the abnormality can be conveniently located.

Drawings

FIG. 1 is a schematic view of the overall flow chart of the projection-based abnormal component inspection visualization method of the present invention.

FIG. 2 is a schematic diagram of a visual coding and visualization layout of a projection view of a component in the present invention.

FIG. 3 is a schematic diagram of a visual coding and visual layout of a medium value graph of the present invention.

Fig. 4 is a schematic diagram of visual coding and visual layout of glyphs in the present invention.

Fig. 5 is a schematic diagram showing the implementation effect and overall layout of the data inspection view in the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and specific data.

According to the invention, through an effective information visualization method and combining a multi-view linkage strategy and a flexible interaction means, multi-angle analysis of anomalies in aviation gas turbofan engine data is realized, predictive maintenance personnel are helped to find anomalies in the data, and the anomalies are analyzed in a mode and a reason of the anomalies. The technical scheme includes that the method comprises the steps of data acquisition and processing, visual mapping, visual layout and realization and interactive design. The method comprises the following specific steps:

step one, data acquisition and processing

And screening effective information according to aviation gas turbofan engine data provided by the NASA of the American aerospace agency, and storing the data.

1. Data acquisition the FD004 data set in the data is selected for use, wherein the FD004 data set contains data from initial state operation to complete failure of 249 engines. Wherein the maximum life cycle of the engine is 543 and the minimum life cycle is 128. The specific data comprises 26 dimensions, wherein 1-2 dimensions are machine numbers and time points, 3-5 dimensions are working conditions of the machine, namely height, mach number and sea level temperature, and 6-26 dimensions are monitoring indexes of an engine, namely fan inlet total temperature, low-pressure compressor temperature, high-pressure compressor temperature and low-pressure turbine temperature, fan inlet pressure, bypass conveying pipe pressure, high-pressure compressor air pressure, physical fan rotating speed, physical core rotating speed, engine pressure ratio, high-pressure compressor static pressure, fuel quantity and high-pressure turbine static pressure ratio, corrected fan rotating speed, corrected core rotating speed, bypass ratio, fuel-air ratio in a combustion chamber, exhaust valve heat content, required fan rotating speed, corrected required fan rotating speed, high-pressure turbine coolant discharge and low-pressure turbine coolant discharge.

2. And the data processing comprises working condition identification and data dimension reduction operation.

(1) Working condition identification, namely, the working conditions of the engine can be divided according to 3 operating conditions of height, mach number and sea level temperature because the engine works in different working conditions. And clustering the working conditions by using k_means. The main flow of the algorithm is that 1) k is designated, namely data is divided into k categories, 2) k points are randomly selected from the data to serve as the mass centers of each cluster, 3) the distance between the data points and the mass centers is measured through a certain distance calculation method, the data points are divided into the nearest cluster mass centers, 4) the mass centers of each cluster are recalculated, 5) if one of three conditions that the mass centers of newly formed clusters are not changed any more, the points are kept in the same cluster and the maximum iteration times is reached, iteration is ended, otherwise, the steps 3 to 5 are repeated. Experiments find that the effect is best when k=6, and the working state of the engine is divided into working condition 1, working condition 2, working condition 3, working condition 4, working condition 5 and working condition 6.

(2) The data dimension reduction means that the component can be characterized by multidimensional time sequence data, namely:

X^k＝[t₁,t₂…,t_i,…,t_T],k＝(1,2,…,n),t_j∈R^M

Wherein n is the number of components, M is the number of selected features, T is the life cycle length, T _i is the value of each feature at the moment i, and k is the component number.

In order to reduce the deviation caused by inaccurate calculation amount and data acquisition, a sliding time window is adopted to aggregate data, so that the method comprises the following steps of:

Wherein, w is the size of a time window, x _{w_i} is the value of each characteristic of the window w_i after aggregation;

for the obtained x _{w_i}, the random neighborhood embedding is carried out by utilizing t-distribution, and the algorithm steps are as follows:

a) Describing the similarity between vectors by using Euclidean distance, wherein the similarity is expressed by conditional probability:

Wherein σ _{w_i} is the variance.

B) Calculating the similarity between points in the two-dimensional space to be projected, and using conditional probability representation:

Where y _{w_i} is the position of the data point in the low-dimensional space.

C) Kullback-Leibler divergence is commonly used to measure the distance between two probability distributions to minimize the sum of K-L divergences:

Wherein n is the number of members.

D) To alleviate the problem of crowding data points in low dimensional space, the similarity between data points is calculated using t-distribution.

Where y _{w_i} is the position of the data point in the low-dimensional space.

Through the above steps, the x _{w_i} vector is projected into one-dimensional space, and the component can then be represented as:

X^k∈R^T-w+1,k=(1,2,...,n)

For the one-dimensional time sequence after dimension reduction, the dimension reduction is performed in the two-dimensional space by using the t-SNE, and X ^k can be expressed as follows:

X^k∈R²,k=(1,2,...,n)

Wherein n is the number of members.

(3) Outlier calculation Using Pearson coefficient to define anomalies for two time series of the No. k _i,k_j s-th monitored variableThe correlation is defined as follows:

wherein outliers are defined as:

Wherein n is the number of members.

Step two, visual mapping

After data acquisition and processing, the visual mapping scheme design is carried out on the component projection view (shown in fig. 2) and the data inspection (shown in fig. 5) in the invention.

1. Component projection view

(1) Scatter diagram

The method comprises the steps of carrying out shape, position and color mapping on a scatter diagram in a component projection view, wherein the dimension reduction result of the component is mapped by the position of the scatter diagram, the clustering result of the color mapping component is obtained, and the transparency of the point is mapped to the life cycle length of the component. Specific examples are as follows:

the position is that one point in the scatter diagram is mapped with a component, the two-dimensional coordinates of the point in the view represent the dimension reduction result of the component, and the distance between the points is mapped with the similarity degree between the degradation tracks of the component;

Color-clustering the components into 6 classes according to their degradation trajectories, each class being labeled with a different color. The points with the same color are gathered together, so that the clustering effect is better;

transparency-transparency of a dot characterizes the length of the lifecycle of a component, the longer the lifecycle, the less the transparency.

(2) Contour map

Color mapping the contour map in the component projection view, namely filling blank positions among contour lines by adopting specific colors, and mapping the density of the component projected in the local area by using the darkness of the filling colors.

The area of petals is used for mapping the number of behaviors. Specific examples are as follows:

Contour color-for blank positions between contours, the system fills them with green. If the density of the projected component is larger in the local area, the filled color is darker, so that an analyst is assisted to know the real situation of data distribution;

the larger the area, the more times are indicated, and the smaller the area, the less times are indicated.

(3) Special character form

This portion adopts a radial layout, comprising two portions INNERAREA and OuterArea, as shown in fig. 4 (a). At section INNERAREA, the number of runs per operating condition is mapped using a modified radar map. At section OuterArea, superimposed loop graphs are used to represent the degree of anomaly of the different monitoring indicators over various time periods.

InnerArea

The method comprises the steps of carrying out shape, size and color mapping on an improved radar chart of a special font in a component projection view, mapping different working condition categories by using the colors of points, mapping the life cycle length of the component by using the size of a center circle, encoding the running number of the component under six working conditions by using the shape of the radar chart, and comparing the running conditions of the component during degradation by comparing the shapes of hexagons. Specific examples are as follows:

color, namely mapping different working condition categories through 6 colors.

Size: the size of the center circle encodes the lifecycle length of the component.

The shape comprises mirror image arrangement of six shafts, the running quantity under six working conditions is encoded through the distance between the points on the shafts and the shaft center, the points on the shafts are connected end to form an irregular hexagon, and the running condition of the component during degradation can be roughly compared by comparing the shapes of the hexagons.

OuterArea

The method comprises the steps of mapping positions, radians and colors of an improved radar chart of a special font in a component projection view, wherein different ring positions encode different monitoring indexes, the radian encodes the length of a time slice, and the color of the ring encodes the abnormality degree of the time slice. Specific examples are as follows:

The location OuterArea includes a plurality of rings, each ring representing a monitoring index. In fig. 4 (a), four circles from inside to outside are the monitored data of the total fan inlet temperature, the high pressure compressor temperature, the low pressure turbine temperature, and the fan inlet pressure, respectively. Wherein the number of rings and the representative monitoring index can be selected by the user himself.

The radian, namely slicing the monitored data in time, wherein the length of the time slice is selected by a user, the radian maps the length of the time slice, and the degradation time period is divided into five time slices in the figure 4 (a);

color-each circular slice has a different color, and the degree of abnormality of the time slice is encoded using the color.

2. Data inspection view

The color and position mapping of the data inspection view comprises the steps of mapping six working conditions by adopting six symbols, representing the time sequence relation of the component working under different working conditions by connecting lines between origins of different colors, and connecting the left side of an area diagram by using a three-time Bezier curve aiming at highly relevant monitoring variables. Specific examples are as follows:

(1) Point diagram

The color adopts six color systems with obvious contrast ratio of red, orange, yellow, green, cyan and blue to map each working condition.

Length is the number of runs of the length coding member under the working condition by utilizing the rectangle.

(2) Area map

Height-the height of the area map maps the size of the real data.

The two ends of the red connecting line are respectively highly relevant monitoring variables, and when the Pearson coefficient is greater than 0.5, the system can be automatically connected with the corresponding variables.

Step three, visual layout and realization

1. Component projection view visualization layout and implementation

When the number of the components is too large, the scattered points in the scattered points can be blocked, so that an analyst cannot effectively acquire the data density in the current area. By mapping the density of the line and the change of the transparency in the contour map (shown in fig. 3) to the distribution density of the components in the areas at different positions, a user can clearly find dense or sparse areas in the data, and the method has important significance for finding the outlier islands in the data. The implementation method is as follows:

(1) Two-dimensional scalar field computation

A) For a given set of data points (x ₁,x₂…,x_n;y₁,y₂,…,y_n), its minimum and maximum observations, i.e., (x ₁,y₁) and (x _n,y_n), are determined, where n is the number of components.

B) The grid set distance h that should be used in performing the two-dimensional grid division is estimated to obtain grid demarcation points (a ₀,a₁,…,a_Ve) and (b ₀,b₁,…,b_He) of the data, where a _ve+1-a_ve＝h,b_he+1-b_he =h, ve=0, 1,... The calculation mode of h is as follows:

Where IQR (x) is the difference between the upper and lower quartile values of the sample, and n is the number of components.

C) The scatter data frequency in each grid area is counted, i.e., (g _0,0,g_0,1,…,g_Ve,He).

(2) Contour line calculation

A) Step one, determining an initial threshold value, comparing a data value in a grid with a set threshold value, marking the grid larger than the threshold value as 1, and otherwise marking the grid value as 0, thereby obtaining a binary image.

B) Assume that a pixel block of 2 x2 in a binary image is a contour element. A binary index is constructed from four values at each cell corner, scanned around the cell in a clockwise direction, and a four-bit index is generated from the highest displacement in the upper left corner to the lowest using a bitwise OR operation.

C) A pre-built look-up table is accessed using the cell index, which contains 16 entries listing the edges required to represent the cells.

D) Linear interpolation is applied between the raw data values to find the exact position of the contour along the cell edges. And obtaining a contour line of a certain threshold value through the four steps, and repeating the four steps with num% serving as a density interval to obtain num contour lines.

(3) Scale mapping and color mapping

And according to the width and the height of the screen distributed by the system, mapping the scatter diagram and the contour line onto the screen according to a linear scale, and completing the basic layout of the view.

Contour fill color-contour color curve transitions smoothly from white to dark green. The system expresses the area density as a value of 0 to 1, and then the final color expression is obtained by the following formula:

Wherein hexColor is RGB color hexadecimal value, data _i is density of the current area, data _min is density minimum value, and data _max is density maximum value.

After user interaction selects the collection of components, the view will preserve the relative positions between the components, expanding into a special glyph as shown in the right half of FIG. 2. The period of time and the monitored amount of abnormality of the member degradation process can be found by the glyph inspection. Meanwhile, the degradation pattern of the abnormal member can be found by discriminating the degree of similarity between glyphs. The special font implementation process is as follows:

(1) Spatial layout implementation

After the abnormal components to be analyzed are selected, the font design of the canvas display component set needs to be regenerated according to the circled area, and the shape selected by the user lasso cannot be obtained, so that when the local area is data-intensive and the remote area is sparse, serious visual occlusion of the view can occur, as shown in fig. 2. The collision detection algorithm based on the force guiding layout is realized, the mutual intersection between fonts is avoided, and the following is realized:

a) And obtaining the maximum value and the minimum value of xy coordinates of all data selected by the noose, mapping the difference value between the maximum value and the minimum value into the width and the height of the canvas, re-calculating the real coordinates of the data in the canvas according to the linear scale, and re-projecting the data points into a new canvas.

B) The data points of the canvas are assumed to be dotted particles with the same mass radius, and the particles are made to move by adding different mechanical models.

C) A certain collision force is added to each particle, and the position and velocity of the particle after Δt time are calculated using Verlet.

D) And repeatedly iterating for a plurality of times, and obtaining the position of the particle when the final speed is 0, namely the position of the final data point, wherein the particles are separated in a correlation way and tend to be stable. At this time, it can be found that the relative positions between the scattered points are well preserved, and cross overlapping occurs between the glyphs.

(2) Font design implementation

For a single component, the data includes operation statistics of the component under each working condition when the component is degraded, and the operation statistics are specifically defined as follows:

D={I_o|o∈[1,O]}

wherein O is the type of working condition, and I _o is the number of times of operation under the working condition O.

As shown in fig. 4 (b), the operation statistics under each condition are mapped to the distance cRadius from the center of the radar chart to the axis, and the calculation process is as follows:

Wherein: The maximum operation times under the working condition o; The method is characterized by comprising the steps of determining the minimum operation times under the working condition O, determining the operation times of a current component under the working condition O by sum _o, determining the radius of a radar chart by radarRadius, and determining the total number of working condition categories by O.

The X coordinate xpos _o and the Y coordinate ypos _o of the point in the graph are calculated according to the distance between the point and the circle center, and the method is concretely as follows:

the size of the center circle maps the life cycle length of the component, the operation times of the size mapping of the scattered points on the radar coordinate axis under the working condition are consistent with the position away from the center of a circle, the size mapping belongs to secondary coding, and the calculation is as follows:

Wherein rul _min is the shortest life cycle of the component, rul _max is the longest life cycle of the component, R is the maximum radius of the central circle, R is the maximum radius of the coordinate axis circle, rulRadius is the radius of the central circle, and dotArea _o is the distance from the point represented by the working condition o to the center of the circle.

The subsequence outlier of each monitored variable is mapped to the length of an arc, and the angle occupied by the arc is calculated as follows:

wherein RULLen is the life cycle length of the member degradation; representing the length of the s-th monitored variable of the component k within the time point interval TS;

the distance between the circle k and the center of the circle is as follows:

rRadius_s=rDis+(rGap+rBandWidth)·(s-1),s=1,2,3,...,M

where M is the number of selected features, rDis inner ring radius, rGap is the spacing distance between rings, and rBandWidth is the width of the rings, respectively.

2. Data inspection view visualization layout and implementation:

the connection line is drawn-the left side of the area diagram is marked with a red connection line for highly relevant variables as shown in fig. 5. When the Pearson coefficient is greater than 0.5, the system will draw a cubic bezier curve automatically connecting the corresponding variables. Four points are needed for drawing the three-time Bezier curve, namely a starting point spoint, an ending point epoint and two control points cpoint1 and cpoint2, and the result after drawing is shown in FIG. 4. The two control point coordinates are determined as follows:

Wherein spoint _x is the x-axis coordinate of the starting point, epoint _x is the x-axis coordinate of the ending point, epoint _y is the y-axis coordinate of the ending point, and spoint _y is the y-axis coordinate of the starting point.

Step four, interactive design

In the component projection view, highlighting, scaling, lasso, time slicing and reconfiguration operations are included:

highlighting, namely under the scattered point view, in order to help a user to quickly know basic information of the component, the system provides a highlight interaction mode, when an analyst focuses a mouse on a certain data point, the point is highlighted by utilizing color, and the information such as the number, the life cycle length, the working condition operation statistics condition and the like of the component is displayed by assisting a prompt box.

Scaling-taking into account that more data points may result in overlap between points, causing visual confusion. The user can view the local area in detail through zooming, and meanwhile, the selection of the data points is facilitated later.

Lasso-users often need to examine this portion of data after finding an outlier or point of interest. The method supports drawing a closed irregular graph on a scatter diagram, if a certain data point is positioned in the middle of the graph when a user completes lasso, the data point is selected, and the font corresponding to the data point is displayed in a pop-up view.

Time slicing-analysts can control the extent of the time slices by means of the slide bars above. Greater flexibility is provided herein in that a user may select overlapping time slices of unequal length to explore the relevance of the monitored variable of the component during degradation.

Reconfiguration-the present visualization method allows the user to switch between the scatter plot and the contour plot via tabs. When the component life cycle is too long, the transparency of the dots may be too low, causing hiding between the dots. Meanwhile, due to uneven data distribution or overlarge data quantity, serious shielding can occur to view parts, and at the moment, vision confusion is relieved by switching to the contour map through reconfiguration.

Claims

1. A projection-based abnormal component inspection visualization method, characterized by comprising the following steps:

S1: Data acquisition and processing

After obtaining the aviation gas turbofan engine data, the data is cleaned and processed, and organized into structured data that can be directly used in a visual view;

S2: Visual Mapping

Visually map the data obtained in step S1 through the visual channel:

Design component projection views, presenting an overview of data distribution using scatter plots and contour plots, with proximity indicating similarity in component degradation patterns; using a special glyph of radar charts nested in multiple layers of rings to contrast similar patterns in abnormal data;

Design data inspection views, use horizontal bar charts to map the number of times components operate under different working conditions; use area charts to display the real data of monitored variables, and stack the area charts vertically to help analysts discover the relationship between the variables;

S3: Visualization layout and implementation

Visualize and layout the visual modules that complete the mapping in S2:

In the component projection view, the component is mapped to each point in the scatter plot, and the glyph retains the relative position between the scatter points. For the glyph, there is an improved radar chart in the middle to record the operation of the component, and multiple superimposed rings on the outside.

In the data inspection view, the timeline layout method is adopted. The point-line graph at the top of the view shows the operating sequence running in the current time period, and the timeline module with an area graph is at the bottom of the view;

S4: Interaction Design

In the scatter plot view, hovering the mouse over a data point highlights the point, allowing users to zoom in and out to view the local area of the scatter plot in detail; by drawing a closed irregular shape on the scatter plot, the scatter point is replaced with a glyph to explore the data in depth; the length of the time slice encoded by each ring in the glyph is controlled by the slider;

In step S2, the visualization mapping is specifically as follows:

S21: Map the shape, position and color of the scatter plot in the projection view of the component: use the position of the scatter plot to map the dimension reduction result of the component; use the color to map the clustering result of the component; and use the transparency of the point to map the life cycle length of the component.

S22: Perform color and area mapping on the contour map in the component projection view: use specific colors to fill the blank spaces between the contour lines; use the depth of the filling color to map the density of the projected components in the local area; use the area of the petals to map the number of behaviors;

S23: Shape, size and color mapping of the improved radar chart with special glyphs in the component projection view: Use the color of the points to map different working condition categories; Use the size of the center circle to map the life cycle length of the component; Use the shape of the radar chart to encode the number of operations of the component under six working conditions, and compare the operation of the component during the degradation period by comparing the shapes of the hexagons;

S24: Map the position, arc and color of the improved radar chart of special glyphs in the component projection view: different ring positions encode different monitoring indicators; the arc encodes the length of the time slice; the color of the ring encodes the abnormality of the time slice;

S25: Data inspection view color and position mapping: six symbols are used to map six working conditions; the lines between the origins of different colors represent the timing relationship of the components working under different working conditions; for highly correlated monitoring variables, cubic Bezier curves are used to connect them on the left side of the area chart;

The visualization layout scheme of the component projection view is specifically as follows:

S3a: For scatter plots and contour plots, according to the screen width and height allocated by the system, the scatter plots and contour lines are mapped onto the screen in a linear scale. The density and transparency of the lines in the contour plots are used to map the distribution density of components in different location areas, thus completing the basic layout of the view.

S3b: For the component glyph view, select each slice range and the monitoring variables to be displayed in the control bar, and calculate the glyph size according to the number of monitoring variables;

S3c: After the size is determined, the glyphs are discretized according to the collision detection algorithm based on force-guided layout to solve the visual occlusion problem caused by dense data in local areas and sparse data in distant areas;

S3d: Count the number of operations of each component under six working conditions, and determine the distance from each point to the center point in the improved radar chart through a linear scale;

S3e: For the superimposed ring chart, the rings are drawn from the inside to the outside according to the selection order of the monitoring variables, and the rings are cut into multiple segments according to the length of the time slice. The drawing of each segment is completed according to the size of the outlier in each time slice.

2. The projection-based abnormal component inspection visualization method according to claim 1, characterized in that in step S1, data processing comprises:

(1) Working condition identification:

The working conditions are divided according to three operating conditions: altitude, Mach number, and sea level temperature, and the working conditions are clustered using k-means.

(2) Data Dimensionality Reduction:

The component is represented by multi-dimensional time series data, that is:

X ^k =[t ₁ ,t ₂ ...,t _i ,...,t _T ],k＝1,2,...,n,t _i ∈R ^M

Where: n is the number of components; M is the number of selected features; T is the life cycle length; _ti is the value of each feature at time i; k is the component number;

Aggregate the data using a sliding time window and get:

Where: w is the time window size; x _{w_i} is the value of each feature of window w_i after aggregation;

For the obtained x _{w_i,} use t-distributed random neighborhood embedding to project the x _{w_i} vector into one-dimensional space, then the component is represented as:

X ^k ∈ ^{R T-w+1} , k＝1,2,…,n

For the degradation curve obtained after dimensionality reduction, t-SNE is used to reduce the dimension to two-dimensional space, and X ^k is expressed as:

X ^k ∈ ^{R 2} , k＝1,2,…,n

Where: n is the number of components;

(3) Calculation of outliers:

Using the Pearson coefficient to define anomalies, for the two time series of the sth monitoring variable of components numbered k _i and k _j Its relevance is defined as follows:

Where: is the value of the sth monitoring variable of the component numbered k _i at time point t; is the value of the sth monitoring variable of the component numbered _kj at time point t; TS is the length of the current time interval; for The average value in the time interval TS;

The outliers are defined as:

Where: n is the number of components.

3. The method for visualizing abnormal component inspection based on projection according to claim 1 is characterized in that the component distribution density in different position areas is mapped by the changes in the density and transparency of the lines in the contour map in S3a, specifically comprising:

S3a1: 2D Scalar Field Computation

1a) For a given set of data points (x ₁ ,x ₂ …,x _n ; y ₁ ,y ₂ ,…,y _n ), determine its minimum and maximum observed values, i.e. (x ₁ ,y ₁ ) and (x _n ,y _n ), where n is the number of components;

1b) Estimate the grid group spacing h that should be used when performing two-dimensional grid division, so as to obtain the grid dividing points (a ₀ ,a ₁ ,…,a _Ve ) and (b ₀ ,b ₁ ,…,b _He ) of the data, where a _ve+1 -a _ve ＝h,b _he+1 -b _he ＝h,ve＝0,1,…,Ve-1,he＝0,1,…,He-1; Ve is the number of grid lines in the vertical direction; He is the number of grid lines in the horizontal direction;

The grid group distance h is calculated as follows:

Where: IQR(x) is the difference between the upper quartile and the lower quartile of the sample; n is the number of components;

1c) Count the frequency of scattered data in each grid area, i.e. (g _0,0 ,g _0,1 ,…,g _Ve,He );

S3a2: Contour calculation

2a) Determine an initial threshold, compare the data value in the grid with the set threshold, mark the grid with a value greater than the threshold as 1, and mark the grid value with a value less than the threshold as 0, thereby obtaining a binary image;

2b) Assume that a 2*2 pixel block in the binary image is a contour unit; construct a binary index based on the four values of each cell corner, scan around the cell in a clockwise direction, use a bitwise OR operation, move from the highest bit in the upper left corner to the lowest bit, and generate a four-bit index;

2c) Use the cell index to access a pre-built lookup table listing the edges needed to represent the cell;

2d) Apply linear interpolation between the original data values to find the exact position of the contour line along the cell edge;

S3a3: Scale Mapping and Color Mapping

Contour fill color: The contour color curve smoothly transitions from white to dark; the system represents area density as a value from 0 to 1.

4. The projection-based abnormal component inspection visualization method according to claim 1, characterized in that, in said S3c, the specific method of discretizing the glyphs is:

S3c1: Get the maximum and minimum xy coordinates of all data selected by the lasso, map the difference between the maximum and minimum values to the width and height of the canvas, recalculate the real coordinates of the data in the canvas according to the linear scale, and reproject the data points to the new canvas;

S3c2: Assume the data points on the canvas to be particles with the same mass and radius, and make the particles move by adding different mechanical models;

S3c3: Add collision force to each particle and use Verlet to calculate the position and velocity of the particle after Δt time;

S3c4: Repeat the iterations many times to obtain the position of the particle when its final velocity is 0, which is the position of the final data point. At this time, the particles are correlated and separated and tend to be stable.

5. The projection-based abnormal component inspection visualization method according to claim 1, characterized in that S3d is specifically:

S3d1: Map the running statistics under each working condition to the distance cRadius from the center of the radar chart to the axis. The calculation process is as follows:

Where: is the maximum number of operations under working condition o; is the minimum number of operations under working condition o; sum _o is the number of operations of the current component under working condition o; radarRadius is the radius of the radar chart; O is the total number of working condition categories;

S3d2: Calculate the X-coordinate xpos _o and Y-coordinate ypos _o of the point in the figure based on the distance from the point to the center of the circle, as follows:

S3d3: The size of the central circle maps the life cycle length of the component. The size of the scattered points on the radar coordinate axis maps the number of operations under the working condition, which is consistent with the position from the center of the circle. It belongs to secondary coding and is calculated as follows:

In the formula: rul _min is the shortest life cycle of the component; rul _max is the longest life cycle of the component; R is the maximum radius of the center circle; r is the maximum radius of the coordinate axis circle; rulRadius is the radius of the center circle; dotArea _o is the distance from the point represented by working condition o to the center of the circle;

S3d4: The subsequence outliers of each monitoring variable are mapped to the length of the arc, and the angle occupied by the arc is calculated as follows:

Where: RULLen is the life cycle length of component degradation; represents the length of the sth monitoring variable of component k in the time interval TS;

S3d5: The distance from the center of the circle represented by the sth monitoring variable is:

rRadius _s = rDis+(rGap+rBandWidth)·(s-1), s=1,2,3,…,M

Where: M is the number of selected features; rDis is the inner ring radius; rGap is the interval distance between rings; rBandWidth is the width of the ring.

6. The projection-based abnormal component inspection visualization method according to claim 1 is characterized in that, in step S3, the specific process of visualization layout and implementation of the data inspection view is as follows:

S31: Determine the life cycle range to be displayed, design an area chart, and connect two highly correlated monitoring variables using a cubic Bezier curve according to calculations;

S32: Determine the color corresponding to each working condition, determine the display form of each dot, including the layout position and the connection line drawing method, and layout and implement the point-line diagram.