CN112244873B

CN112244873B - Electroencephalogram space-time feature learning and emotion classification method based on hybrid neural network

Info

Publication number: CN112244873B
Application number: CN202011057296.2A
Authority: CN
Inventors: 陈景霞; 闵重丹; 郝为; 张鹏伟
Original assignee: Shaanxi University of Science and Technology
Current assignee: Shaanxi University of Science and Technology
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2024-07-16
Anticipated expiration: 2040-09-29
Also published as: CN112244873A

Abstract

The invention discloses an electroencephalogram space-time feature learning and emotion classification method based on a hybrid neural network, which comprises the following steps: collecting electroencephalogram signals of a plurality of channels; extracting PSD power spectral density characteristics from electroencephalogram signals of a plurality of channels; converting into a two-dimensional reticular matrix sequence; then dividing the mixture into a plurality of fragments P _j; establishing CASC _CNN_LSTM model and CASC _CNN_CNN model, extracting deep space features and time features of electroencephalogram signals from each segment P _j in a combined mode through the CASC _CNN_LSTM model, and inputting the deep space features and time features extracted by the CASC _CNN_LSTM model into a softmax layer corresponding to the CASC _CNN_LSTM model to conduct emotion type prediction; jointly extracting deep space features of a deeper level of an electroencephalogram signal from each segment P _j through CASC _CNN_CNN model; and inputting deeper deep space features extracted by the CASC _CNN_CNN model into a softmax layer corresponding to the CASC _CNN_CNN model for emotion type prediction. The emotion classification of the invention is more accurate.

Description

Electroencephalogram space-time feature learning and emotion classification method based on hybrid neural network

Technical Field

The invention belongs to the technical field of deep learning application, and particularly relates to an electroencephalogram space-time feature learning and emotion classification method based on a hybrid neural network.

Background

Emotions play a vital role in human life, positive emotions may help to improve the efficiency of our daily work, while negative emotions may affect our decisions, attention, etc. With the development of artificial intelligence technology, emotion recognition has become a hotspot in the fields of emotion calculation and pattern recognition research.

Firstly, EEG signals have a very low signal-to-noise ratio and are subject to a large variety of noise and secondly, people tend to only be interested in EEG signals associated with a particular brain activity, but it is difficult to separate this signal from the background. Thus, in order to determine and extract portions of the EEG signal that are related to a particular brain activity or emotion, complex EEG signal analysis and processing techniques are required that take into account both spatial and temporal correlations of the EEG signal.

In dealing with EEG emotion recognition problems, two major technical challenges are typically encountered, one is how to extract more discriminative emotion features from an EEG signal, and the other is how to develop a more efficient computational model for emotion feature recognition. Although the electroencephalogram emotion recognition method is endangered in recent years, in order to further improve the performance of electroencephalogram emotion recognition, some important problems still need to be studied: firstly, how to select and extract more effective brain electrical characteristics from original brain electrical signals and express the characteristics so that the characteristics have more obvious space-time correlation and discriminant; and secondly, how to construct an effective machine learning model, and excavate deeper emotion related features from input brain electrical features, so as to improve emotion recognition capability.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an electroencephalogram space-time feature learning and emotion classification method based on a hybrid neural network, which solves the problems of how to select and extract more effective electroencephalogram features from original electroencephalogram signals and express the features so that the features have more obvious space-time correlation and discriminance, and how to construct an effective machine learning model, and excavates deeper emotion correlation features from the input electroencephalogram features, thereby improving emotion recognition capability.

In order to achieve the above purpose, the present invention provides the following technical solutions: an electroencephalogram space-time feature learning and emotion classification method based on a hybrid neural network comprises the following steps:

step 1: collecting electroencephalogram signals of a plurality of channels;

step 2: extracting PSD power spectral density characteristics from electroencephalogram signals of a plurality of channels;

step 3: converting the vector sequence of PSD power spectrum density characteristics into a two-dimensional mesh matrix sequence;

Step 4: dividing the two-dimensional mesh matrix sequence into a plurality of fragments P _j by applying a sliding window;

Step 5: establishing CASC _CNN_LSTM model and CASC _CNN_CNN model, extracting deep space-time characteristics of the electroencephalogram signals from each segment P _j in a combined mode through the CASC _CNN_LSTM model, and inputting the deep space-time characteristics extracted by the CASC _CNN_LSTM model into a softmax layer corresponding to the CASC _CNN_LSTM model to conduct emotion type prediction;

Jointly extracting deep space features of a deeper level of an electroencephalogram signal from each segment P _j through CASC _CNN_CNN model; and inputting deeper deep space features extracted by the CASC _CNN_CNN model into a softmax layer corresponding to the CASC _CNN_CNN model for emotion type prediction.

Further, the specific steps of the step 1 are as follows:

extracting electroencephalogram signals of a plurality of channels from the DEAP data set, and reducing the sampling frequency to 128HZ;

The DEAP data set is obtained by the following steps: and carrying out multiple tests on a plurality of tested objects, and after the electroencephalogram signals of each test are collected, carrying out emotion evaluation on the electroencephalogram signals of each test object by using continuous numerical values of 1-9 in terms of arousal degree, titer, preference, dominance and familiarity to represent various indexes from weak to strong.

Further, the step 1 further includes preprocessing the acquired electroencephalogram signals, where the preprocessing includes: and filtering data by adopting a 4-45Hz band-pass filter, and removing electro-oculogram interference by adopting a blind source separation technology.

Further, the specific steps of the step 2 are as follows:

Based on the electroencephalogram signals of the multiple channels extracted in the step 1, performing non-overlapping segmentation on the electroencephalogram signals of each test, obtaining a plurality of samples for each test, and obtaining the total number of the samples to be tested, wherein each sample comprises a plurality of sampling points, and each sampling point comprises the data of the multiple channels extracted in the step 1, so as to obtain RAW characteristics;

Normalizing the segmented electroencephalogram signals according to channels to obtain NORM features on each tested time domain, and sliding and extracting PSD features of the NORM features on each channel of a single sample in a non-overlapping manner by using a Hamming window on a 4-45Hz frequency band by using a fast Fourier transform algorithm;

Finally, dividing the emotion evaluation values of RAW features, NORM features and PSD features, which are tested in each test, in the range of 1-9 into two classes by taking a median of 5 as a threshold value, wherein more than 5 represents a high class or a positive index, and the 1 represents the evaluation values on the titer and the arousal degree; less than or equal to 5 represents a low class or negative index, represented by 0; and then carrying out equalization treatment on the RAW characteristic, the NORM characteristic and the PSD characteristic, the evaluation of the RAW characteristic, the evaluation of the NORM characteristic and the evaluation of the PSD characteristic.

Further, the specific steps of the step 3 are as follows:

One-dimensional data vector for electroencephalogram signal at time point t And n represents the total number of channels of the acquisition system,Representing the reading of the nth electrode channel at the nth time point; the one-dimensional electroencephalogram signal vector sequence [ X _t,X_t+1,...,X_t+N-1 ] in the observation time period [ t, t+N-1] in each test is converted into a two-dimensional matrix sequence [ Y _t,Y_t+1,...,Y_t+N-1 ] according to the electrode space position relation of the electroencephalogram signal acquisition system, and non-zero data in the two-dimensional matrix is normalized through a Z-score algorithm.

Further, the specific steps of the step 4 are as follows:

The two-dimensional mesh matrix sequence is divided into segments P _j using a sliding window, each segment P _j having a fixed length and no overlap between two adjacent segments P _j, P _j＝[Y_t,Y_t+1,...,Y_t+s-1 ], where: s represents the size of the window, i.e. the number of sampling points; j=1, 2,..q, q is the number of observation time periods divided into electroencephalogram signal segments P _j.

Further, in the step 5, a CASC _cnn_lstm model is constructed, deep space-time features of the electroencephalogram signals are jointly extracted from each segment P _j, and the extracted deep space-time features are input into a softmax layer for emotion classification prediction, which comprises the following specific steps:

constructing CASC _CNN_LSTM model, inputting each segment P _j into a 2D-CNN network, and obtaining deep space features through learning;

Then, a sequence representing deep space features is input into a bidirectional LSTM model, each propagation direction of the model comprises s LSTM units, the hidden state of each LSTM unit at the current time point t is represented by H _t, H _t-1 represents the hidden state at the previous time point t-1, the information of the previous time point of the same layer is transmitted to the current time point, and the like affects the final output, the hidden state of each LSTM unit is taken as the output of the LSTM unit, the hidden state sequence [ H _t,h_t+1,...,h_t+s-1 ] output by the forward LSTM unit, the hidden state sequence [ H ' _t+s-1,...,h'_t+1,h'_t ] output by the reverse LSTM unit is taken from two directions, the outputs H _t+s-1 and H ' _t of the last time point extracted after all time points in the whole window are obtained after the LSTM unit is learned are respectively taken from two directions, H _t+s-1 and H ' _t are spliced in the dimension describing the size of a feature vector, and recorded as H _j, the learned time feature of the whole LSTM network is transmitted to the next full connection layer, and a final soft ax class probability prediction value is generated after the full connection layer.

Further, in the step 5, a CASC _cnn_cnn model is constructed, the CASC _cnn_cnn model extracts deep space features of a deeper level of an electroencephalogram from each segment P _j, and the specific steps of inputting the extracted deep space features of the deeper level into a softmax layer for emotion classification prediction are as follows:

Constructing CASC-CNN models, inputting each segment P _j into a CNN network, obtaining a sequence Q _j of deep space feature vectors of the segments P _j through learning, integrating the sequence Q _j into a matrix form, further extracting deep space features of the sequence Q _j by using a CNN II network, sending the deep space features of the deep layer to the next full-connection layer, and generating a final probability prediction value of each emotion by using a softmax layer after the full-connection layer, wherein the CNN II network is another CNN network.

Compared with the prior art, the invention at least comprises the following beneficial effects:

The invention provides an electroencephalogram space-time feature learning and emotion classification method based on a hybrid neural network, which comprises the steps of extracting PSD features from electroencephalogram signals of a plurality of channels, converting one-dimensional chained PSD features into a two-dimensional reticular matrix sequence, dividing the two-dimensional matrix sequence into equal-length time segments by utilizing a sliding window, selecting and extracting more effective electroencephalogram features from original electroencephalogram signals, representing the features so as to enable the features to have more obvious space-time correlation and discriminance, inputting the two-dimensional meshed PSD features into Casc _CNN_LSTM and Casc _CNN_CNN to be tested, mining deeper emotion correlation features, and respectively obtaining classification accuracy of 93.15 and 92.37, wherein the performances of the two hybrid models are superior to the classification performances of a plurality of reference models and a latest method; compared with the prior study, the model has less pretreatment on the original data, and is more suitable for real-time application such as BCI.

Drawings

FIG. 1 is a flow chart of processing raw EEG signals provided by the present invention;

FIG. 2 is a diagram of a two-dimensional mesh matrix sequence after converting a one-dimensional electroencephalogram sequence according to the present invention;

FIG. 3 is a diagram of CASC _CNN_LSTM model provided by the invention;

FIG. 4 is a diagram of a CASC _CNN_CNN model structure provided by the invention;

FIG. 5 is a table comparing accuracy of two classes of emotion classification in potency for a reference model and a cascade mixed model;

Detailed Description

The invention is further described below with reference to the drawings and the detailed description.

As shown in FIG. 1, the invention provides a brain electricity space-time feature learning and emotion classification method based on mixed nerves, and provides a new EEG feature representation method aiming at original EEG brain electricity signals on a large public DEAP data set, two new mixed depth neural network models are provided on the basis, deep space-time correlation features with higher discriminant are learned and extracted, and two types of emotion depending on a tested are classified, so that the classification accuracy is better compared with the prior method, and the method comprises the following steps:

step 1: collecting electroencephalogram signals of a plurality of channels and preprocessing the electroencephalogram signals of the channels; the invention performs electroencephalogram emotion classification experiments and verification of model performance on a large-scale electroencephalogram emotion data set DEAP;

Step 2: extracting PSD power spectral density characteristics from brain electrical signals of a plurality of channels preprocessed by the DEAP data set;

step 3: performing feature conversion on the PSD features, and converting the PSD features into a two-dimensional mesh matrix sequence from a one-dimensional vector sequence form;

Step 5: and constructing a cascade convolution-cyclic neural network (CASC _CNN_LSTM) model and a cascade convolution-cyclic neural network (CASC _CNN_CNN) model, jointly extracting deep space-time characteristics of an electroencephalogram signal from each segment P _j through the CASC _CNN_LSTM model, jointly extracting deep space characteristics of a deeper level of the electroencephalogram signal from each segment P _j through the CASC _CNN_CNN model, and respectively inputting the deep space-time characteristics extracted by the CASC _CNN_LSTM model and the deep space characteristics extracted by the CASC _CNN_CNN model into corresponding softmax layers to carry out emotion category prediction.

Specifically, as shown in the left half of fig. 1, the specific steps of step1 are as follows:

EEG-based emotion brain-machine interface systems typically use a portable wearable multichannel electrode cap to collect EEG signals, with sensors on the electrode cap capturing fluctuations in scalp current of the subject's brain as the subject views the stimulus video;

In this embodiment, experiments are performed based on the electroencephalogram signals collected in the public large-scale DEAP dataset, the dataset records that 32 testees watch 40 physiological signals such as electroencephalogram signals, electrocardio signals, myoelectricity signals and the like induced by music videos with different emotion tendencies for about 1 minute, and then the testees perform emotion evaluation on all indexes from weak to strong in terms of arousal degree, titer, preference, dominance and familiarity of the watched videos by using continuous values of 1-9, wherein the evaluation values respectively represent all indexes from negative to positive or from weak to strong from small to large. The 40 stimulus videos contain 20 high-titer/arousal degree stimulus and 20 low-titer/arousal degree stimulus;

In this embodiment, the EEG signals of 32 channels are extracted, the sampling frequency is reduced to 128Hz, in order to eliminate dc noise, power supply noise and other artifacts, a band-pass filter of 4-45Hz is used to perform data filtering, and then blind source separation technology is used to remove electro-oculogram interference, so as to obtain an EEG signal with a total duration of 63 seconds, including 60 seconds for watching video and 3 seconds for resting state before watching.

Further, the specific steps of step 2 are as follows:

the invention extracts each video-induced 60-second EEG sequence from the 63-second EEG signal for further analysis; to correct for the variation of stimulus independent signals over time, the EEG signal 3 seconds before video viewing was taken as a baseline, the baseline was removed from the 60 second test signal to obtain stimulus dependent sequence variation, each sequence was segmented non-overlapping with a window length of 1 second, 60 segments were obtained for each test, the total number of EEG segments (also referred to as samples) for each 40 tests tested was 40 x 60 = 2400, each segment contained 128 sampling points (i.e. window size 128), each sampling point contained 32 EEG channel data, simply referred to as RAW features, with dimension size expressed as 2400 x 128 x 32. And normalizing the segmented EEG data according to the channel to obtain NORM characteristics of each tested time domain. The 64 PSD features on the NORM feature were extracted by sliding without overlapping using a Hamming window of 0.5s on each channel of the 1s EEG segment using a fast Fourier transform algorithm over the 4-45Hz frequency band, the dimensions of the total extracted PSD features for each 40 trials tested were 2400X 64X 32;

Then, based on the emotion evaluation value of each video in the range of 1-9, dividing the evaluation values on the titer and the arousal degree into two types by taking a median 5 as a threshold, and when the problem of 2 classification is solved in a certain dimension, representing a high class or a positive index by more than 5, and representing the high class or the positive index by 1; less than or equal to 5 represents a low class or negative index, represented by 0. And carrying out equalization processing on the data and the labels, so that the quantity of the EEG data and the labels in each of the two types is the same, wherein the data respectively refer to the RAW characteristic, the NORM characteristic and the PSD characteristic, and the labels respectively refer to the evaluation of the RAW characteristic, the NORM characteristic and the PSD characteristic.

Further, the steps of the step 3 and the step 4 are shown in the right half of fig. 1, wherein an EEG electrode map shows the electrode position distribution on a BCI common electrode cap, and the distribution of electrodes of different BCI systems is different due to different numbers of electroencephalogram recording channels. The sensor readings acquired by the EEG acquisition system represent an electroencephalogram signal time sequence under a certain sampling frequency; typically, the raw EEG signal acquired at time t is represented by a one-dimensional data vectorAnd represents, where n represents the total number of channels of the acquisition system,Indicating the reading of the nth electrode channel at the t-th time point. For the observation period [ t, t+N-1], there are N one-dimensional such data vectors, each vector containing N elements, corresponding to the reading of the nth electrode on the electrode cap.

It can be seen from the EEG scalp electrode profile that each electrode is physically adjacent to a plurality of electrodes for measuring EEG signals of a region of the brain, and that different brain regions correspond to different brain activities. The one-dimensional chain type EEG data vector can only represent the correlation between two adjacent electrode positions, and the step 3 converts the one-dimensional chain type vector sequence of the PSD characteristics of the electroencephalogram signals into a two-dimensional mesh matrix sequence, so that the matrix structure corresponds to the brain area of the EEG electrode positions, thereby better representing the spatial correlation between the EEG signals of a plurality of physically adjacent electrodes, specifically:

The present invention converts the 32 channels of one-dimensional EEG data vector X _t into a two-dimensional mesh matrix Y _t as shown in FIG. 2, based on the electrode spatial positional relationship of the EEG acquisition system. Where t represents a specific point in time, the unused electrode position is set to 0 and is not active in the neural network. By conversion, the one-dimensional EEG vector sequence [ X _t,X_t+1,...,X_t+N-1 ] over the observation period [ t, t+n-1] is converted into a two-dimensional matrix sequence [ Y _t,Y_t+1,...,Y_t+N-1 ], where the number of two-dimensional mesh matrices is still N. And carrying out normalization processing on non-zero data in the two-dimensional matrix through a Z-score algorithm. Thus, the resulting two-dimensional mesh matrix sequence contains both temporal information and spatial information about the brain activity at that point in time.

The specific steps of the step 4 are as follows: then, as shown in the last step of fig. 1, a sliding window is applied to divide the two-dimensional mesh matrix sequence into individual segments P _j, each segment having a fixed length (window size) and non-overlapping between two adjacent segments, specifically denoted as: p _j＝[Y_t,Y_t+1,...,Y_t+s-1 ]. Wherein s represents the size of the window, i.e. the number of sampling points; j=1, 2,..q, q is the number of EEG sample fragments into which the observation period is divided.

Further, in the implementation process of CASC _cnn_lstm model in step 5: construction of CASC _CNN_LSTM model As shown in FIG. 3, the input to the model is a two-dimensional mesh matrix sequence (e.g., sample P _j) that has been pre-processed as described above, which is a three-dimensional data structure containing spatial and temporal information. Firstly, extracting deep space features of EEG data from each two-dimensional mesh matrix by adopting a CNN network, then inputting the extracted deep space feature sequences into an LSTM network, and further extracting time features of the EEG data. Finally, receiving the output of the last time point of the LSTM through a full connection layer, and inputting the obtained deep space-time characteristics into a softmax layer for final emotion type prediction; specific:

to extract the deep spatial features of each two-dimensional mesh matrix, we use a deep two-dimensional CNN network as shown in figure 3 for spatial feature learning,

Specific: firstly, a two-dimensional mesh matrix sequence of each tested electroencephalogram characteristic is obtained, then a cascading hybrid Casc _CNN_LSTM model is constructed as shown in fig. 3, the jth EEG segment of the input model is represented by a two-dimensional mesh matrix sequence P _j＝[Y_t,Y_t+1,...,Y_t+s-1]∈R^s×h×w, wherein the input model comprises s two-dimensional mesh matrix elements represented by Y _k (k=t, t+1,...

Each mesh matrix is input into a 2D-CNN network, and the corresponding deep space feature representation Z _k(k＝t,t+1,...,t+s-1)：Z_k＝CNN_2D(Y_k),Z_k∈R^l.Z_k is learned to be a one-dimensional feature vector containing l elements, whereby the input EEG matrix sequence is converted into a sequence of deep space feature vectors: cas-CNN: p _j→Q_j,where Q_j＝[Z_t,Z_t+1,...,Z_t+s-1]∈R^s×l. The 2D-CNN model comprises 4 convolution layers, each layer adopts 16, 32, 64 and 128 convolution kernels with the size of 3*3 to carry out non-filling convolution operation, each layer adopts ReLu activation functions and Adam optimizers, the learning rate is 0.0005, after learning, the first convolution layer obtains 16 characteristic images of 7*7, the second convolution layer obtains 32 characteristic images of 5*5, the third convolution layer obtains 64 characteristic images of 3*3, and the fourth convolution layer converts 128 characteristic images into a final deep space characteristic representation Z _k∈R¹²⁸ by using a full-connection layer containing 128 neurons after obtaining 128 characteristic images of 1*1.

Then, the deep space feature representation sequence Q _j is input into a single-layer bidirectional LSTM model shown in fig. 3 to further calculate correlation features of EEG segment time domains, and in the bidirectional LSTM model, the relation between a certain node EEG and the front and rear segments of the node EEG can be extracted through forward and backward sequence propagation, so that the extracted features of the model are objective and accurate. Each propagation direction of the model contains s LSTM units, the hidden state of the LSTM units at the current time point t is represented by h _t, h _t-1 represents the hidden state at the previous time point t-1, the information of the previous time point of the same layer is transmitted to the current time point, and the last output is affected by the same. The invention uses the hidden state of the LSTM unit as its output, the hidden state sequence of the forward LSTM unit [ h _t,h_t+1,...,h_t+s-1 ], the hidden state sequence of the reverse LSTM unit [ h' _t+s-1,...,h'_t+1,h'_t ]. Since we are interested in the emotion classification of the brain over the whole sample period, the outputs h _t+s-1 and h' _t at the last time point extracted after all time points in the whole window after LSTM learning is taken from both directions, respectively. H _t+s-1 and H' _t are stitched in the dimension describing the feature vector size, denoted as H _j, and fed into the next fully connected layer as the time feature learned by the whole LSTM network, as shown in the final stage of fig. 3.

To this end, the temporal feature H _j of the EEG mesh matrix sequence P _j is expressed as: cas-LSTM: h _t+s-1(h_t')＝RNN_lstm(Z_j),H_j∈R²ⁱ, where i represents the size of the hidden state of one LSTM cell. Finally, a softmax layer is adopted after the full connection layer to generate the final probability prediction value of each emotion type: softmax: c _j＝SoftMax(H_j),C_j∈R^k, wherein k represents the number of emotion categories that the model is ultimately to identify.

In summary, the process of processing the one-dimensional PSD characteristic sequence of the electroencephalogram signal through the feature transformation and Casc-CNN-LSTM model can be described as ：Input(s×n)-Trans(s×h×w)-Conv(s×h×w×16)-Conv(s×h×w×32)-Conv(s×h×w×64)-Conv(s×h×w×128)-FC(l)-LSTM(s×2i)-FC(l)-softmax(k)., wherein Input (sXn) represents the one-dimensional PSD characteristic sequence of the electroencephalogram signal with segment size s and n channels; the Trans (s×h×w) table is to convert the one-dimensional PSD characteristic sequence into a mesh matrix sequence with the length s of h×w; conv (s×h×w×m) means that a convolution layer learns m feature maps from a mesh matrix; FC (l) represents a fully connected layer with l neurons; LSTM (s×2i) means that hidden layers with s LSTM units in the forward and reverse propagation directions learn to obtain a hidden state with a size i: softmax (k) represents the number k of emotion categories that are predicted.

In the step 5, a CASC _cnn_cnn model is constructed, the CASC _cnn_cnn model further extracts deep space features of the electroencephalogram signals from each segment P _j, and the steps of inputting the extracted deep space features into a softmax layer for emotion classification prediction are approximately as follows: constructing CASC _CNN_CNN model shown in figure 4, wherein the input of CASC _CNN_CNN model is the same as the input of CASC _CNN_LSTM, and is a three-dimensional data tensor containing space and time information, then rearranging the extracted deep space features according to time sequence, inputting a CNN network again to continuously extract deep space features of a deeper layer on the basis of the deep space features extracted for the first time, finally receiving the output of CNN network through a full connection layer, and inputting the obtained feature vector into a softmax layer to carry out final emotion category prediction;

specifically, as shown in fig. 4, a CASC _cnn_cnn model is constructed, and the two-dimensional mesh matrix sequence P _j is input into a CNN network composed of four convolution layers and a full connection layer, and converted into a sequence Q _j having deep space feature vectors, i.e., Q _j＝[Z_t,Z_t+1,...,Z_t+s-1]∈R^s×l. And integrating Q _j to form a matrix such as sxl, then further extracting deeper deep space features of the Q _j obtained in the step by using a CNNII (CNNII is just another CNNI network) network, wherein s represents the length of a vector sequence, l represents the number of elements contained in each sequence, the CNNII consists of two convolution layers, two pooling layers and a full connection layer, the two pooling layers are respectively connected with the two convolution layers, the pooled data of the second pooling layer is quantized and then is connected with the full connection layer with the number of 512 neurons, finally, the full connection is connected with a soft max layer to generate a probability prediction value of each type of emotion, the two convolution layers respectively adopt convolution kernels with the sizes of 32 and 64 3*3 to carry out filled convolution operation, and the two pooling layers adopt Maxpooling filters with the sizes of 2 x2 and the step size to sample convolution results. The convolutional layer and the full-connection layer adopt ReLu activation functions and Adam optimizers, and the learning rate is 0.0001.

In summary, the process of processing the one-dimensional PSD signature sequence of the electroencephalogram signal by the feature transformation and the cas_cnn_cnn model can be described ：Input(s×n)-Trans(s×h×w)-Conv(s×h×w×16)-Conv(s×h×w×32)-Conv(s×h×w×64)-Conv(s×h×w×128)-FC(l)-Cat(b)-Trans(s×h×w)-Conv(s×h×w×32)-Pooling(max,2)-Conv(s×h×w×64)-Pooling(max,2)-FC(l)-softmax(k)., where Input (s×n) represents the one-dimensional PSD signature sequence with segment size s and containing n channel data; trans (sXhXw) represents the conversion of a one-dimensional EEG recording sequence into a mesh matrix sequence of length s of size h Xw; conv (s×h×w×m) means that a convolution layer learns m feature maps from a mesh matrix; cat (b) represents a chronological concatenation of b vectors; pooling (max, 2) represents the maximum pooling layer using a step size of 2 and a kernel of 2 x 2; FC (l) represents a fully connected layer with l neurons; softmax (k) represents the softmax layer used to predict k emotion categories.

According to the method, the two-dimensional mesh PSD features are taken as input Casc _CNN_LSTM and Casc _CNN_CNN models, which respectively obtain 93.15 and 92.37% emotion classification accuracy, and the two mixed models are better than the classification performance of several reference models and the latest method, so that the deep space and time information learned in the method is very critical for improving EEG emotion classification recognition performance.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The brain electricity space-time characteristic learning and emotion classification method based on the hybrid neural network is characterized by comprising the following steps of:

step 1: collecting electroencephalogram signals of a plurality of channels;

Jointly extracting deep space features of a deeper level of an electroencephalogram signal from each segment P _j through CASC _CNN_CNN model; inputting deeper deep space features extracted by the CASC _CNN_CNN model into a softmax layer corresponding to the CASC _CNN_CNN model for emotion type prediction;

In the step 5, a CASC _cnn_lstm model is constructed, deep space-time characteristics of the electroencephalogram signals are jointly extracted from each segment P _j, and the extracted deep space-time characteristics are input into a softmax layer for emotion type prediction, which comprises the following specific steps:

Then inputting a sequence representing deep space features into a bidirectional LSTM model, wherein each propagation direction of the model comprises s LSTM units, the hidden state of each LSTM unit at the current time point t is represented by H _t, H _t-1 represents the hidden state at the previous time point t-1, the information of the previous time point of the same layer is transmitted to the current time point, and the like affects the final output, the hidden state of each LSTM unit is taken as the output of the LSTM unit, the hidden state sequence [ H _t,h_t+1,...,h_t+s-1 ] output by the forward LSTM unit, the hidden state sequence [ H ' _t+s-1,...,h′_t+1,h′_t ] output by the reverse LSTM unit are respectively taken from two directions, the outputs H _t+s-1 and H ' _t of the last time point extracted after all time points in the whole window are obtained after the LSTM unit is learned, H _t+s-1 and H ' _t are spliced in the dimension describing the size of a feature vector, and recorded as H _j, the learned time feature of the whole LSTM network is transmitted to the next full connection layer, and a final soft ax class probability prediction value is generated after the full connection layer;

In the step 5, a CASC _cnn_cnn model is constructed, the CASC _cnn_cnn model extracts deep space features of a deeper level of an electroencephalogram signal from each segment P _j, and the specific steps of inputting the extracted deep space features into a softmax layer for emotion classification prediction are as follows:

2. The brain electricity space-time feature learning and emotion classification method based on the hybrid neural network according to claim 1, wherein the method is characterized in that: the specific steps of the step 1 are as follows:

3. The method for learning and emotion classification of electroencephalogram space-time features based on a hybrid neural network according to claim 2, wherein the step 1 further comprises preprocessing the acquired electroencephalogram signals, and the preprocessing comprises: and filtering data by adopting a 4-45Hz band-pass filter, and removing electro-oculogram interference by adopting a blind source separation technology.

4. The brain electricity space-time feature learning and emotion classification method based on the hybrid neural network according to claim 2, wherein the specific steps of the step 2 are as follows:

5. The brain electricity space-time feature learning and emotion classification method based on the hybrid neural network according to claim 1, wherein the specific steps of the step 3 are as follows:

6. The method for learning and emotion classification of brain electrical space-time characteristics based on the hybrid neural network according to claim 1, wherein the specific steps of the step 4 are as follows: