Disclosure of Invention
According to the problems existing in the prior art, the invention provides a depression evaluation system based on eye movement and facial expression, and the association relationship between the eye movement, the expression and depression disorder is established by extracting the effective characteristics of the eye movement and the expression and carrying out fusion analysis on the effective characteristics of the eye movement and the expression, so that a non-invasive depression evaluation means with objective quantification is realized.
The technical scheme of the invention is as follows:
1. the depression evaluation system based on the eye movement and the facial expression is characterized by comprising an emotion stimulation module, an expression acquisition module, an eye movement feature extraction module, an expression feature extraction module, a machine learning classification module and an automatic evaluation module; the expression acquisition module is used for acquiring expression information when the testee watches different emotion stimulation pictures output by the emotion stimulation module; the eye movement acquisition module is used for acquiring eye movement information when the testee views different emotion stimulation pictures output by the emotion stimulation module; the eye movement feature extraction module extracts eye movement features from the obtained eye movement image information, and the expression feature extraction module extracts expression features from the obtained expression image information; the machine learning classification module performs feature fusion and machine learning classification; the automatic evaluation module evaluates the depression degree of the tested person according to the machine learning classification result.
2. The eye movement acquisition module comprises a foreground camera and an eye movement camera, wherein the foreground camera is arranged in the forehead middle area of the tested person and is used for shooting the field area of the tested person, the resolution is 1080p, and the sampling rate reaches 30fps; the eye movement camera is arranged in the left and right areas of the cheek of the tested person and is used for shooting pupil images of the left and right eyes of the tested person, the higher frame number is required, the resolution is 120x120, and the sampling rate reaches 200fps.
3. The eye movement acquisition module comprises a lens frame for accommodating a foreground camera and an eye movement camera; the glasses frame is made of polyurethane materials through 3D printing and comprises a foreground camera bracket, an lengthened nose pad and a pupil camera bracket; the foreground camera support is positioned above the eyebrows, the foreground camera is fixed at the center part, and the foreground camera support is supported on the nose through the lengthened nose support; the eye movement camera support is connected to the left side and the right side of the foreground camera support, and the joint of the eye movement camera support and the foreground camera support is provided with an arc-shaped structure, so that the temples of the traditional glasses can pass through the arc-shaped structure; the end part of the eye movement camera support is fixed with the eye movement camera, and the eye movement camera support is respectively rotated outwards by a certain angle, so that the eye movement camera can not shade the cheek part and photograph the pupil obliquely upwards, and the support can be used for stretching and rotating, so that the eye movement camera support is suitable for testees with different facial types.
4. The expression acquisition module comprises an expression acquisition camera which is arranged at a proper position in front of a tested person so as to shoot the complete face area of the tested person, and the resolution is 4096x 2160 and the sampling rate is 60fps by adopting the compass c1000 e.
5. The emotion stimulation module comprises picture materials capable of giving positive, neutral and negative different emotion stimulations to the testee and audio materials with the same emotion as that of the picture stimulations.
6. The eye movement feature extraction module extracts eye movement features, namely eye movement feature extraction is carried out on the obtained pupil image, and comprises the steps of extracting pupil radius and pupil center coordinate information in the image by adopting a Canny edge detection operator and a Hough circle detection algorithm, and further calculating the eye movement track and pupil size change; for eye movement data, gaussian filtering is used for denoising, then a Canny edge detection operator and Hough circle are used for detecting the circle center and the size of the pupil, and meanwhile, the characteristics of the gazing area and gazing time of a tested person are calculated.
7. The expression feature extraction module extracts expression features, namely TOFS features are extracted from the obtained expression images, a video stream of the whole sequence is cut into a plurality of video segments by adding a sliding window and based on MDMO features of an optical flow field, and the expression features are extracted to obtain 41-dimensional feature vectors; the method comprises the steps of firstly using a CNN convolutional neural network to calculate a human face region, calculating 66 characteristic points and 36 ROI interested regions of the human face, and finally calculating TOFS characteristics.
8. The machine learning classification module comprises the steps of feature fusion: and carrying out multi-mode parallel feature fusion on the two groups of feature vectors of the extracted eye movement features and expression features, combining the two groups of feature vectors into a complex vector space through the complex vector, and then carrying out dimension reduction on the vector space through principal component analysis.
9. The machine learning classification module includes the steps of training a classifier: marking the collected eye movement data and expression data according to whether the tested person is a depressed patient, then taking the eye movement data and the expression data together with whether the label is a depressed patient as training data, carrying out classification calculation by adopting a decision tree, and establishing and training a classifier model.
10. The automatic assessment module comprises the steps of automatic assessment: collecting eye movement data and expression data of a person to be tested with unknown depression, extracting features and fusing the features, inputting the collected eye movement data and expression data into a trained classifier model, automatically evaluating whether the person to be tested has depression tendency or not by the classifier model according to the input features, and outputting depression degree classification, wherein the classification result is as follows: normal or depressed.
The invention has the technical effects that:
according to the depression evaluation system based on the eye movement and the facial expression, provided by the invention, the association relationship between the eye movement, the expression and the depression disorder is established by extracting the effective characteristics of the eye movement and the expression and carrying out fusion analysis on the effective characteristics, so that a non-invasive depression evaluation means with objective quantification is realized.
The invention improves in hardware, and improves the structure of the glasses frame in order to avoid shielding important areas when collecting eye movement and expression data and to consider that most people need to wear glasses nowadays. In addition, the classification algorithm combines the processing of eye movement and expression data, the characteristic calculation and extraction of the eye movement and expression data and the multi-modal analysis, and a classification model is obtained through the machine learning classification algorithm. The trained classification model is applied to an automatic depression evaluation system, the detection process is visualized, the operation is convenient, and meanwhile, the depression degree of a tested person is evaluated by a wearable non-invasive method, so that the early evaluation function is realized.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of the overall framework of the system of the present invention. A depression evaluation system based on eye movement and facial expression comprises an emotion stimulation module, an expression acquisition module, an eye movement feature extraction module, an expression feature extraction module, a machine learning classification module and an automatic evaluation module; the expression acquisition module is used for acquiring expression information when the testee watches different emotion stimulation pictures output by the emotion stimulation module; the eye movement acquisition module is used for acquiring eye movement information when the testee views different emotion stimulation pictures output by the emotion stimulation module; the eye movement feature extraction module extracts eye movement features from the obtained eye movement image information, and the expression feature extraction module extracts expression features from the obtained expression image information; the machine learning classification module performs feature fusion and machine learning classification; the automatic evaluation module evaluates the depression degree of the tested person according to the machine learning classification result.
The eye movement acquisition module comprises a foreground camera and an eye movement camera, wherein the foreground camera is arranged in the forehead middle area of the tested person and is used for shooting the field area of the tested person, the resolution is 1080p, and the sampling rate reaches 30fps; the eye movement camera is arranged in the left and right areas of the cheek of the tested person and is used for shooting pupil images of the left and right eyes of the tested person, the higher frame number is required, the resolution is 120x120, and the sampling rate reaches 200fps. The expression acquisition module comprises an expression acquisition camera which is arranged at a proper position in front of the tested person so as to shoot the complete face area of the tested person, and the resolution is 4096x 2160 and the sampling rate is 60fps by adopting the compass c1000 e.
The embodiment of the invention adopts the 3D printing frame to arrange the foreground camera and the eye movement camera. As shown in fig. 2a, 2b, 2c, 2D, there are respectively a perspective view, a front view, a side view, and a top view of a 3D printing frame according to an embodiment of the present disclosure. The glasses frame is made of polyurethane materials through 3D printing and comprises a foreground camera bracket 1, an lengthened nose pad 2 and an eye movement camera bracket 6; the foreground camera support 1 is positioned above the eyebrows, the foreground camera 3 is fixed at the center part, and the foreground camera support 1 is supported on the nose through the lengthened nose support 2; the eye movement camera support 6 is connected to the left side and the right side of the foreground camera support 1, and the joint of the eye movement camera support 6 and the foreground camera support 1 is provided with a section of arc-shaped structure 5, so that the temples of the traditional glasses can pass through the arc-shaped structure; the end part of the eye movement camera bracket is fixed with the eye movement camera 4, the eye movement camera bracket 6 respectively rotates outwards by a certain angle, so that the eye movement camera can not shade the cheek part and photograph the pupil obliquely upwards, and the bracket can stretch out and draw back and rotate to adapt to the testees with different facial types.
As shown in fig. 3a and 3b, the 3D printing frame according to the embodiment of the present invention is a front view and a side view of a wearing live view schematic diagram, respectively. The design structure of the 3D printing glasses frame is mainly improved in two aspects of not shielding a facial expression interest area and adapting to a near-sighted person. The main body of the glasses frame is made of polyurethane materials, so that the thickness of the glasses frame is not too large, and the glasses frame has stronger toughness. The spectacle frame is innovated for experimental data acquisition, firstly, the acquired data comprise eye movements and expressions, the spectacle frame needs to minimum shielding of facial expressions, experiments prove that the shielding of forehead and nose has small influence on expression acquisition, and key parts such as eyebrows, glasses and mouth are prevented from being shielded as much as possible, so that the spectacle frame is upwards designed, the heights of the eyes and the spectacle frame are integrally raised, the front spectacle frame is positioned above the eyebrows, the nose pads of the glasses are prolonged and concentrated in front of the nose, and materials with good toughness and small volume are adopted, so that shielding of other positions is reduced; in addition, aiming at the increasing number of people wearing myopia glasses, the glasses frame is also adapted, and the heights of the glasses legs are properly improved. Meanwhile, an arc-shaped notch is additionally arranged at the position, close to the temple of a person, of the camera brackets at the two sides, so that the glasses legs of the myopia glasses can pass through the arc-shaped notch, and the tested eyes can be conveniently used; in addition, the eye movement camera support is rotated outwards by 15 degrees, so that a camera shooting the pupil can not shield the cheek part, and meanwhile, the camera orientation is selected, so that the camera can shoot the pupil obliquely upwards, the face is not shielded, and meanwhile, the pupil image acquisition is guaranteed.
Fig. 4 is a schematic diagram of the system workflow of the present invention. Before the experiment starts, firstly, the experiment environment is ensured to be comfortable and relatively quiet, the interference of the external environment to the testee is reduced, and the noise source is eliminated. After the experimental scene meets the requirements, synchronous eye movement and expression acquisition are started, picture stimulation of different emotions of a tested person is respectively given in the acquisition process, and the picture stimulation with the same emotion as the picture emotion is assisted, wherein the stimulated picture is divided into: positive, neutral and negative. After the original data of the eye movement and the expression are obtained, the eye movement and the expression data are preprocessed, then the eye movement characteristics and the expression characteristics are extracted, the characteristics are fused in a characteristic layer fusion mode to obtain effective characteristics, a decision tree of a machine learning classification algorithm is used for classifying, a classifier model is built and trained, the predicted value obtained through the model fits with a true value, the fact that the actual evaluation task is good is guaranteed, and the classifier is obtained. The automatic evaluation system based on the trained classifier is used for automatically evaluating the depression degree of a tested person, the same data acquisition and the same feature extraction process are carried out on a tested person with unknown depression condition, the features are input into the automatic evaluation system after the features are calculated, the classifier outputs the depression degree classification through the input features, and the classification result is as follows: normal or depressed.
Wherein the mood-stimulus module is designed to: 1) Nine-point positioning: yellow marks appear at nine points of the screen, namely, the middle, upper, lower, left, right, upper left, lower right, lower left and upper right, so that the tested person views the yellow marks as pupil positioning calibration 2) continuous picture stimulation: the picture material is from the international emotion picture system (IAPS) and comprises picture material capable of giving positive, neutral and negative different emotion stimuli to the subject and audio material of the same emotion as the picture stimulus. The experimental paradigm sequentially shows that the neutral, positive, neutral and negative pictures are repeated twice, each group of pictures shows 5 pictures with the same attribute, each picture shows 5s, and the different groups of pictures have 5s intervals.
After data is ready to be collected, a screen starts to play a video stimulation paradigm, and simultaneously an eye movement collection module and an expression collection module start to work to collect eye moving picture information and expression picture information of a tested person when watching different emotion stimulation pictures; the expression acquisition module and the eye movement acquisition module both adopt a mode of recording image data, record tested eye movement data and expression data, and store the data into a video format after the video stimulation paradigm is finished. The eye movement and expression data are marked according to whether the tested person is a depressed patient or not, and the eye movement and expression data and whether the tested person is the depressed patient or not are used as training data. Whether a patient is depressed is determined by a doctor in a hospital using a conventional diagnostic method such as inquiry.
The eye movement feature extraction module extracts eye movement features, namely eye movement feature extraction is carried out on the obtained pupil image, and comprises the steps of extracting pupil radius and pupil center coordinate information in the image by adopting a Canny edge detection operator and a Hough circle detection algorithm, and further calculating the eye movement track and pupil size change; for eye movement data, gaussian filtering is used for denoising, then a Canny edge detection operator and Hough circle are used for detecting the circle center and the size of the pupil, and meanwhile, the characteristics of the gazing area and gazing time of a tested person are calculated. The expression feature extraction module extracts expression features, namely extracts TOFS features from the obtained expression images, cuts the video stream of the whole sequence into a plurality of video segments based on the optical flow field features and adding a sliding window, and extracts the expression features to obtain 41-dimensional feature vectors; the method comprises the steps of firstly using a CNN convolutional neural network to calculate a human face region, calculating 66 characteristic points and 36 ROI interested regions of the human face, and finally calculating TOFS characteristics.
The eye movement feature extraction is mainly to extract eye movement features of a shot pupil image, the module comprises a Canny edge detection operator and pupil radius and pupil center coordinate information in a Hough circle detection extraction image, and further eye movement track and pupil size change are calculated, and the detailed steps are as follows:
the first step: canny edge detection operator
(1) Gaussian filtering
The Canny edge detection algorithm is sensitive to noise, so that smoothing filtering is firstly carried out on an image to reduce the influence of noise on an edge detection result, and a Gaussian filter is used for convolving the image for smoothing the image, so that the image is smoothed, and the influence of noise on the edge detection result is reduced. The generation equation of the gaussian filter kernel of size (2k+1) x (2k+1) is given by:
(2) Calculating gradient intensity and direction
The most important feature of the edge is that the gray value varies drastically, and then the change in gray value is described by a gradient. One pixel has 8 neighborhoods, then there is a gradient in four directions, up, down, left, right, diagonal, so the Canny algorithm uses four operators to detect horizontal, vertical, and diagonal edges in the image. The operator calculates the gradient in the form of image convolution, the following two templates are convolved with the original image to obtain a differential value graph of x and y axes, and finally the gradient G and the direction theta of the point are calculated, wherein the formula is as follows:
θ=arctan(G y /G x )
(3) Non-maximum suppression
After the gradient calculation is performed on the image in the previous step, the edge extracted based on the gradient value is still very blurred. Therefore, an edge-thinning algorithm, non-maximum suppression, is required, which serves to compare the gradient intensity of the current pixel with two pixels along the positive and negative gradient directions. If the gradient intensity of the current pixel is maximum compared to the other two pixels, the pixel point remains as an edge point, otherwise the pixel point will be suppressed. This allows for a more accurate identification of the actual edges of the image.
(4) Dual threshold detection
Although non-maximum suppression algorithms can detect actual edges more accurately, the presence of noise and color variations has an impact on the detection results. To address these spurious responses, it is necessary to filter edge pixels with weak gradient values and preserve edge pixels with high gradient values, i.e., when the gradient value is above a set threshold, the pixel point can be considered a strong edge pixel; conversely, below the threshold point, the point is considered a weak edge pixel point, which is suppressed in subsequent detection.
(5) Suppressing isolated low threshold points
The pixel point of the strong edge detected in the previous step is determined as an edge, however, the weak edge pixel may be an actual edge or an error caused by noise or color change. Therefore, in order to filter out noise while preserving the actual edges, by looking at the weak edge pixels and 8 neighborhood pixels thereof, the weak edge points can be preserved as true edges as long as one is a strong edge pixel.
And a second step of: hough circle detection
The Hough transform is a method of detecting a curve by taking advantage of the duality between points on the curve and the parameters of the curve. This work is widely used for the detection of certain analytical curves in gray scale images, in particular straight lines, circles and parabolas.
When there is a circle in the image, then its edges must belong to the edges of the image, and in the x-y coordinate system, the general equation for the circle is as follows:
(x-a) 2 +(y-b) 2 =r 2
converted from an x-y coordinate system to an a-b coordinate system. Written in the following form (a-x) 2 +(b-y) 2 =r 2 . Then a point on the circular boundary in the x-y coordinate system corresponds to a circle in the a-b coordinate system. The circular boundary of the x-y coordinate system contains countless points, and there are countless circles in the corresponding a-b coordinate system, and the circles meet the equal distance from the circle center (a, b), so that the circles on the a-b coordinate system intersect at a point, and the intersection point is the circle center (a, b) of the circle. The number of circles at the local intersection points is counted, the center coordinates can be obtained by taking each local maximum value, and the radius of the intersected circles is determined, so that the radius r value is obtained.
Calculating a fixation point, a first target interest point and a fixation time as characteristic values of eye movement through the change of the pupil center point; the radius of the pupil serves as a direct indicator of pupil constriction. The above eye movement locus and pupil variation are taken as eye movement characteristics.
And extracting the whole video sequence by using a sliding window, and comparing the change of the pupil radius and the pupil center coordinates of the previous frame picture and the next frame picture.
dx i =|x i -x i+1 |(i=0,1,2......)
dy i =|y i -y i+1 |(i=0,1,2......)
dr i =|r i -r i+1 |(i=0,1,2......)
If dr i If the eye movement type is smaller than the set threshold, the eye movement type of the two frames is determined to be gazing, and if the eye movement type is the following dr i+1 Still less than the threshold, is considered to be still in fixation state until dr n And if the time is greater than the threshold value, counting the current fixation time. Finally calculate the first fixation time (t f ) I.e. average gaze time
Sequentially averaging dx, dy, drFinally get->Five-dimensional feature vectors.
The expression feature extraction is to extract TOFS features from the obtained expression image, and the 41-dimensional feature vector is obtained by extracting the expression features in the whole sequence based on the light flow field features and the sliding window, and the detailed steps are as follows:
fig. 5 is a schematic flow chart of extracting TOFS features. The expression feature extraction module comprises CNN for identifying facial areas, calculating 68 facial key points, 36 interested areas and TOFS features. The TOFS feature calculation is based on MDMO (Main Directional Mean Optical-flow) feature and is combined with a feature selection algorithm of a sliding window, the expression recognition of the optical flow feature in a short sequence video has good accuracy, but the video sequence of the experiment is long, in order to ensure the accuracy and robustness of the recognition algorithm, the feature vector of the sliding window in unit time is extracted on the basis of the feature extraction, and then the TOFS feature of 41 dimensions is obtained.
The MDMO feature is based on optical flow and the dataset may be a video segment or picture. For example, for a sequence of images (f 1 ,f 2 ,...,f m ) This feature is based on a facial motion coding system, using 68 facial key points to divide the facial region of each frame into 36 regions of interest (ROIs), calculating the optical flow between frames. For each frame f i I > 1, will each ROIk=1, 2..the optical flow vectors in 36 are divided into 8-direction bins, the most numerous of optical flow vectors are selected, and the primary direction of optical flow is the average of all optical flow vectors in that bin. The optical flow vectors are represented in polar coordinates (ρ i ,θ i ),ρ i And theta i Is the magnitude and direction of the optical flow. In order to eliminate the influence of different frames of different video segments, we normalize the frames to obtain the final characteristics:
wherein:
we represent the 72-dimensional feature as:
where α is an adjustable parameter, we set the value of this parameter to 0.9 according to the experimental results of the paper of the optical flow characteristics.
The time-frequency domain statistical characteristic of the sliding window is that the duration of each video segment collected by the user in the experiment is about 2 minutes, and the segment of video records the emotion change of the testee in the test process. We have found through experimentation that if the optical flow characteristics are found for the entire video, the apparent degree of emotional change is eliminated to some extent. Moreover, the facial expression change of the tested person is not obvious in part of the time period, so the paper provides a sliding window algorithm to search the key video segment, and the optical flow characteristic is better extracted. For each sequence of pictures (f 1 ,f 2 ,...,f m ) The frame number of the sliding window is n, the picture sequence subset obtained by the sliding window is gamma, and the frame number of gamma is marked as frame γ The sliding window may be described as:
wherein, the value relation of i and n is as follows:
the optical flow change characteristics of the tested person in each small time period are obtained by utilizing a sliding window algorithm, and in order to better utilize the information, the time-frequency domain statistical characteristics of the sliding window are extracted: mean μ, variance s, standard deviation σ, skewness γ 1 Kurtosis K r . Sliding window optical flow feature sequence for each subjectThe time-frequency domain statistical characteristics are as follows:
skewness is a measure of the direction and extent of deflection of a statistical data distribution, and we use the skewness to measure the symmetry of optical flow features throughout the process:
wherein kappa is 2 ,κ 3 Representing second and third order central moments, respectively, E is the operation of averaging.
Kurtosis is normalized fourth-order central moment, and the kurtosis characteristic of sliding window data is extracted to measure the distribution condition of optical flow:
we combine these statistics with optical flow features to construct the final 41-dimensional feature TOFS:
the machine learning classification module includes steps of feature fusion and machine learning to build and train the classifier. Feature fusion is to obtain two sets of feature vectors about eye movement and expression features from the above steps, combine the two sets of feature vectors into a complex vector space by one complex vector using a parallel feature fusion method, and then perform dimension reduction on the vector space by using Principal Component Analysis (PCA), and includes the following steps:
(1) Forming the original data into an n (eigenvector) row m (sample number) column matrix X according to columns;
(2) Zero-equalizing each row of X, namely subtracting the average value of the row;
(3) Solving a covariance matrix;
(4) Obtaining eigenvalues and corresponding eigenvectors r of the covariance matrix;
(5) And arranging the eigenvectors into a matrix according to the corresponding eigenvalues from top to bottom, and taking the first k rows to form a matrix P, namely the data from dimension reduction to k dimension.
Training a classifier: after the collected eye movement data and expression data are marked according to whether a tested person is a depressed patient or not, the eye movement data and the expression data together with whether the label of the depressed patient is used as training data, the decision tree carries out classification calculation on the extracted effective feature matrix, a classifier model is built and trained, and a classification model with higher accuracy is obtained.
The automatic evaluation module is used for applying the classifier model obtained by the machine learning classification module, the tested person without the manual diagnosis result is subjected to the acquisition of eye movement data and expression data and the characteristic extraction and the characteristic fusion according to the data acquisition and the characteristic extraction modes, the effective characteristics are calculated, the effective characteristics are input into a trained classifier, the classifier is used for evaluating the depression degree, the depression degree classification of the tested person is output, and the classification result is that: normal or depressed.