CN119888818B - Emotion recognition method and system based on human micro-movements - Google Patents

Emotion recognition method and system based on human micro-movements

Info

Publication number
CN119888818B
CN119888818B CN202411967118.1A CN202411967118A CN119888818B CN 119888818 B CN119888818 B CN 119888818B CN 202411967118 A CN202411967118 A CN 202411967118A CN 119888818 B CN119888818 B CN 119888818B
Authority
CN
China
Prior art keywords
micro
motion
target
features
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411967118.1A
Other languages
Chinese (zh)
Other versions
CN119888818A (en
Inventor
余初然
何召锋
茹一伟
吴惠甲
杨胡江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202411967118.1A priority Critical patent/CN119888818B/en
Publication of CN119888818A publication Critical patent/CN119888818A/en
Application granted granted Critical
Publication of CN119888818B publication Critical patent/CN119888818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Biophysics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种基于人体微动作的情绪识别方法及系统,属于微动作识别技术领域,该方法包括:基于识别目标的目标视频特征、目标骨架特征和目标幅度特征进行特征融合,得到识别目标的综合特征;基于综合特征得到识别目标的微动作特征;基于微动作特征进行分类,对识别目标进行情绪识别。本公开提供的基于人体微动作的情绪识别方法及系统能够提高微动作识别的准确性。

This disclosure provides a method and system for emotion recognition based on human micro-movements, belonging to the field of micro-movement recognition technology. The method comprises: performing feature fusion based on target video features, target skeleton features, and target amplitude features to obtain comprehensive features of the target; obtaining micro-movement features of the target based on the comprehensive features; and classifying the micro-movement features to perform emotion recognition on the target. The method and system for emotion recognition based on human micro-movements provided by this disclosure can improve the accuracy of micro-movement recognition.

Description

Emotion recognition method and system based on human micro-actions
Technical Field
The disclosure belongs to the technical field of micro-motion recognition, and more particularly relates to a method and a system for emotion recognition based on human micro-motion.
Background
With the rapid development of computer vision technology, human motion recognition (Human Action Recognition, HAR) has become an important research direction in this field. Traditional HAR tasks focus mainly on large-scale actions in video, such as walking, jumping, etc. However, in recent years, micro-actuation (Micro-Action Recognition, MAR) has begun to receive increasing attention as a new research direction. Micro-actions are those subtle, unconscious actions, typically manifested as short physical changes, gestures or facial expressions, etc., that reflect the internal information of an individual's emotion, mental state or intent. The challenge of the MAR task is that micro-actions are typically of small magnitude, fast execution speed, and often occur intermittently between different body parts, so that conventional HAR models are difficult to capture accurately.
Disclosure of Invention
The invention aims to provide a human micro-motion based emotion recognition method and system so as to improve accuracy of micro-motion recognition.
In a first aspect of an embodiment of the present disclosure, there is provided a method for emotion recognition based on human micro-actions, including:
Performing feature fusion based on the target video features, the target skeleton features and the target amplitude features of the recognition target to obtain comprehensive features of the recognition target;
Obtaining micro-motion characteristics of the identification target based on the comprehensive characteristics;
and classifying based on the micro-motion characteristics, and carrying out emotion recognition on the recognition target.
In a second aspect of embodiments of the present disclosure, there is provided a human micro-action based emotion recognition system, including:
the feature fusion module is used for carrying out feature fusion based on the target video features, the target skeleton features and the target amplitude features of the recognition target to obtain the comprehensive features of the recognition target;
The micro-motion recognition module is used for obtaining micro-motion characteristics of a recognition target based on the comprehensive characteristics;
And the emotion recognition module is used for classifying based on the micro-motion characteristics and recognizing emotion of the recognition target.
In a third aspect of the disclosed embodiments, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of the human micro-action based emotion recognition method described above when the computer program is executed.
In a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program, which when executed by a processor, implements the steps of the human micro-action based emotion recognition method described above.
The emotion recognition method and system based on human micro-actions have the beneficial effects that the accurate capture of the comprehensive features of the recognition targets is realized by fusing the target video features, the target skeleton features and the target amplitude features. Firstly, the complementarity of different modal information is fully utilized by integrating the target video features, the target skeleton features and the target amplitude features to carry out fusion, the action details of the identification target can be comprehensively and accurately captured, the limitation of single features is avoided, and the accuracy and the reliability of micro-motion identification are greatly improved. And secondly, the micro-action features are further extracted based on the comprehensive features, so that focusing on key features of the micro-actions is facilitated, differences among the micro-actions are highlighted, and a more distinguishing basis is provided for subsequent classification. Finally, efficient emotion recognition is performed based on the micro-motion features, and recognition accuracy and reliability are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a flowchart of a human micro-motion based emotion recognition method according to an embodiment of the present disclosure;
FIG. 2 is a schematic view of an articulation point motion amplitude image provided in an embodiment of the present disclosure;
FIG. 3 is a block diagram of a human micro-motion based emotion recognition system according to an embodiment of the present disclosure;
fig. 4 is a schematic block diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings.
Referring to fig. 1, fig. 1 is a flowchart illustrating a human micro-motion based emotion recognition method according to an embodiment of the disclosure, where the method includes:
And S101, carrying out feature fusion based on the target video features, the target skeleton features and the target amplitude features of the recognition target to obtain the comprehensive features of the recognition target.
In this embodiment, the target video features may include visual information in the video, such as dynamic changes in motion, representation of color, texture, etc. over time, extracted from the original video data by video processing techniques or pre-trained models.
The target skeleton feature refers to the position and connection information of the human body joint point, the skeleton feature can reflect the gesture and the motion track of the human body, and is helpful to capture the basic frame of the gesture and the motion of the human body, and the key point detection algorithm can be extracted from the video.
The target amplitude features mainly reflect the amplitude of the motion, and refer to the displacement amplitude of the joint points among different frames, reflect the motion intensity and time sequence information of the micro motion, and can obtain rich amplitude features by calculating the displacement amplitude of each joint point between adjacent frames and encoding the displacement amplitude into an image form. The target amplitude feature helps capture subtle changes in micro-motion and rapid-onset features.
In this embodiment, the target video feature, the target skeleton feature and the target amplitude feature are fused to obtain the comprehensive feature of the recognition target.
The embodiment can automatically learn the association and importance weights between different features through the splicing, weighted summation of feature vectors or the attention mechanism of a deep learning model (such as a transducer and the like), so as to obtain a high-dimensional feature representation integrating video, skeleton and amplitude information.
Illustratively, the integrated features of the recognition target may be obtained by fusion of a transducer model.
Suppose we have a video in which the character is performing a hand micro-motion. Obtaining the target video features after feature extractionTarget skeleton featureAnd target amplitude characteristics. Then willAndAccording toThe multi-modal vector is spliced into a transducer model. The self-attention mechanism of the transducer may automatically focus on the association between different modality features. For example, when the micro-motion of the hand tension is identified, the tight connection among the tight state of hand muscles in the video characteristics, the special angle change of finger joints in the skeleton characteristics and the quick and tiny motion amplitude change in the motion amplitude characteristics can be found, and the comprehensive characteristics which can accurately represent the micro-motion of the hand can be finally obtained after the multi-layer processing and are used for the subsequent tasks such as micro-motion identification, emotion analysis and the like.
And S102, obtaining micro-motion characteristics of the identification target based on the comprehensive characteristics.
In this embodiment, the micro-motion feature of the recognition target can be obtained by performing a multi-layer process based on the transducer model.
Illustratively, each layer of the transducer operation comprises a multi-head self-attention mechanism, residual connection, layer normalization and feedforward neural network, and the input features are continuously refined and abstracted through N layers of the transducer operation combination comprising the above operations, and the micro-motion features of the identification target are gradually extracted. Each layer further excavates deep information of the features on the basis of the previous layer to finally obtainThe micro-motion characteristics of the recognition target can be accurately reflected, and powerful support is provided for subsequent tasks such as classification, emotion recognition and the like. Identification of micro-motion features of a targetExpressed as:
wherein, the Representing a combination of N-layer transducer operations,Representing a multimodal vector.
And S103, classifying based on the micro-motion characteristics, and carrying out emotion recognition on the recognition target.
In the present embodiment, classification is performed based on the extracted micro-motion features to achieve emotion recognition of the recognition target. In the classifying process, an emotion classifying model can be established in advance, the emotion classifying model is trained based on a large amount of labeling data, and mapping relations between different micro-motion characteristics and emotion categories are learned. In practical application, the extracted micro-motion features are input into a trained model, and the model outputs the most probable corresponding emotion category by comparing the input features with the learned mode, so that the judgment of identifying the target emotion state is completed.
From the above, the present embodiment achieves accurate capturing of the comprehensive characteristics of the recognition target by fusing the target video characteristics, the target skeleton characteristics and the target amplitude characteristics. Firstly, the complementarity of different modal information is fully utilized by integrating the target video features, the target skeleton features and the target amplitude features to carry out fusion, the action details of the identification target can be comprehensively and accurately captured, the limitation of single features is avoided, and the accuracy and the reliability of micro-motion identification are greatly improved. And secondly, the micro-action features are further extracted based on the comprehensive features, so that focusing on key features of the micro-actions is facilitated, differences among the micro-actions are highlighted, and a more distinguishing basis is provided for subsequent classification. Finally, efficient emotion recognition is performed based on the micro-motion features, and recognition accuracy and reliability are improved.
In one embodiment of the present disclosure, further comprising:
Performing feature extraction on a target video of an identification target based on a convolutional neural network to obtain image features of each frame of image;
And obtaining the target video characteristic based on the image characteristic of each frame of image.
In the embodiment, the target video for identifying the target is taken as a processing unit of each frame and is input into a pretrained convolutional neural networkIs a kind of medium.
The pretrained convolutional neural network learns in advance on a large amount of image data, and has strong image feature extraction capability. For example, the basic visual elements of edges, textures, shapes, etc. in an image can be identified and progressively abstracted and combined by structures such as convolution layers, pooling layers, etc. to form a higher level representation of image features.
Image frame at time t in processing videoIn the time-course of which the first and second contact surfaces,The convolution operation can be carried out on different areas of the image according to the convolution kernel parameters and the network structure in the image, the local characteristics are extracted, then the dimension reduction treatment is carried out through the pooling layer, the calculated amount is reduced, key information is reserved, and finally the high-dimension image characteristics are output
The image characteristics of each frame are expressed as:
in this embodiment, the above operation is performed on each frame of the video to obtain a time-series image feature . The image feature sets form target video features, not only comprise static visual information of each frame of image, but also comprise change information of actions in the video in time dimension due to time sequence extraction, and provide important basic data for subsequent tasks such as micro-action recognition and emotion analysis.
It can be obtained from the above that, in this embodiment, the feature extraction is performed through the convolutional neural network, so that detailed information, such as object shape, texture, etc., in each frame of image can be accurately captured, and rich materials are provided for subsequent analysis. And secondly, the target video features are formed by integrating the image features of each frame, so that the key information of a single frame is reserved, and the dynamic change of the video is integrated, so that the obtained video features are more comprehensive and representative, and the accuracy and reliability of the follow-up micro-motion recognition and related analysis are greatly improved.
In one embodiment of the present disclosure, further comprising:
Extracting joint point data of an identification target based on a target video;
Generating two-dimensional skeleton data of the identification target based on the joint point data;
Analyzing the two-dimensional skeleton data based on a depth estimation algorithm to obtain three-dimensional skeleton data of the recognition target;
and extracting the characteristics of the three-dimensional skeleton data to obtain the target skeleton characteristics of the recognition target.
In this embodiment, MEDIA PIPE may be used to extract the joint points of the human body in the target video, and generate two-dimensional skeleton data (2D skeleton), which identifies the positions of the joints of the human body in the video frame, and determines the two-dimensional coordinates of each joint point, thereby constructing the basic structural framework of the human body. Expressed as:
wherein, the Is the firstTwo-dimensional coordinates of the individual nodes in the image frame,Is the total number of joint points.
Then, processing the two-dimensional skeleton data through a depth estimation algorithm of the convolutional neural network, predicting depth information of each joint point, and further generating three-dimensional skeleton data, wherein the three-dimensional skeleton data is expressed as:
wherein, the Is the firstThe three-dimensional coordinates of the individual articulation points,Is depth information.
In this embodiment, the target video may be a conventional monocular RGB video, and the RGB video is combined with depth information to obtain rgb+d skeleton data (i.e., three-dimensional skeleton data), which is expressed as:
wherein, the In the case of the RGB video data,For estimating the resulting depth information. Thus obtainedThe representation provides more rich spatial and depth information that helps capture the details of the micro-action.
In this embodiment, the skeleton feature of each frame may be processed using a deep neural network (e.g., a transducer or MLP network) based on MEDIA PIPE extracted rgb+d skeleton data. The deep neural network can learn the spatial structure characteristics and the change rule on the time sequence in the three-dimensional skeleton data, so that the target skeleton characteristics of the recognition target are extracted. The target skeleton characteristics can reflect the posture change and the joint movement mode of the human body in the micro-motion process, and provide key basic data for micro-motion recognition, emotion analysis and the like.
Illustratively, based on MEDIA PIPE extracted rgb+d skeleton data, the rgb+d skeleton data of each frame is processed through a deep neural network (such as a transform or a simple MLP network), so as to obtain target skeleton characteristics:
wherein, the Is the three-dimensional skeleton feature of the corresponding frame at the moment t,Is the extracted spatial structure feature. From the above, it can be obtained that, in this embodiment, by extracting the node data and generating two-dimensional and three-dimensional skeleton data, a body structure model of the recognition target can be precisely constructed, and a basic framework of the action can be clearly displayed. And by combining a depth estimation algorithm, the spatial depth information is increased, so that the understanding of the micro motion is more stereoscopic. The target skeleton characteristics are further extracted, the key joint changes of the motions can be focused, the gesture transition of the fine motions is effectively captured, key and accurate information support is provided for follow-up micro motion recognition, emotion analysis and the like, and the accuracy and reliability of overall recognition are improved.
In one embodiment of the present disclosure, further comprising:
Calculating the displacement amplitude of the joint point data between two adjacent frames in the target video;
generating a motion amplitude image of the recognition target based on the displacement amplitude;
and extracting the characteristics of the motion amplitude image to obtain the target amplitude characteristics.
In this embodiment, for the target video, the motion change condition of the articulation point is measured by calculating the displacement amplitude of the articulation point data between two adjacent frames. And calculating the displacement of each joint point at different moments so as to embody the dynamic characteristics of the action. Then, the displacement amplitude information is encoded into an image form, and the motion amplitude of each articulation point is integrated into a corresponding image, so that a motion amplitude image is generated. The image integrates the motion intensity information of all joint points along with time, and becomes a key data source for micro motion recognition.
The motion amplitude image may then be feature extracted using a convolutional neural network or a transducer model. Thereby mining hidden modes and rules in the image, and converting information in the motion amplitude image into more representative target amplitude characteristics. The characteristics can capture key information of the micro-actions in terms of amplitude change, and can comprehensively improve the recognition capability of the micro-actions by combining the key information with other characteristics (such as video characteristics and skeleton characteristics) so as to provide powerful support for subsequent analysis and application.
As shown in fig. 2, for example, in order to capture the spatiotemporal dynamic information of the micro-motion, the method of representing the motion amplitude image of the center (crotch), left crotch, right crotch, left knee, right knee, left ankle, right ankle, left heel, right heel, left toe, right toe) of the human body can be performed by 33 key nodes (key points are detected using Media PipePose: nose, left inside corner, left eye, left outside corner, right inside corner, right eye, right outside corner, left ear, over the center of the lips, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left palm root, right palm root, left thumb joint, right thumb joint, left index finger joint, right index finger joint, upper body center (crotch), left crotch, right crotch, left knee, left ankle, right ankle, left heel, right heel, left toe, right toe). For the t frame in the video, calculating the displacement amplitude of each joint point between two adjacent frames:
Wherein, the Indicating the abscissa of the ith keypoint at the t-th frame,An abscissa representing the ith keypoint at the (t-1) th frame; indicating the ordinate of the ith keypoint at the t-th frame, Representing the ordinate of the ith keypoint at the (t-1) th frame; Representing the depth coordinates of the ith keypoint at the t-th frame, Representing the depth coordinates of the ith keypoint at the (t-1) th frame.
The motion amplitude of each node is then encoded into an image form, expressed as:
wherein, the Is the firstThe individual nodes are atAmplitude of motion at frame time. Combining motion amplitude information of all time steps to generate a two-dimensional imageAs a motion amplitude image of the video. These amplitude images provide timing information of the intensity of the motion and are an important feature for micro-motion recognition.
Joint point motion amplitude image for each frameFeature extraction is performed by using a Convolutional Neural Network (CNN) or a transducer model to obtain target amplitude features of each frame:,
wherein, the Is the motion amplitude image of the joint point of each frame,Is the extracted target amplitude characteristic for each frame.
From the above, the embodiment can intuitively present the dynamic change of the micro-motion in the time dimension by calculating the displacement amplitude of the articulation point and generating the motion amplitude image, and provides clear amplitude change basis for subsequent analysis. The method has the advantages that the target amplitude characteristics are further extracted, the intensity and rhythm information of the motion can be accurately captured, the characteristics are combined with other modal characteristics, the overall perception capability of micro motion can be remarkably enhanced, the accuracy and the comprehensiveness of recognition are improved, and a more reliable data support and analysis basis is provided for related application of micro motion recognition.
In one embodiment of the present disclosure, classifying based on micro-motion features includes:
classifying the micro-actions based on a first formula;
The first formula is: ;
wherein, the The classification output expressed as a jog feature,Represented as a matrix of weights which,Represented as micro-motion features of the recognition target,Represented as a bias term,Represented as a probability distribution function.
In this embodiment, the micro-motion features of the target will be identifiedInput into the classification model. Weight matrixThe method has the function of performing linear transformation on the micro-motion features, and the weight distribution of the features on different categories is adjusted by learning the association strength between different micro-motion features and each category. Bias termOffset adjustment is performed on the transformation results to better fit the data.
Through the process ofAfter calculation of (2), the result is converted into a probability distribution using a softmax function. The softmax function can normalize the output value to between 0 and 1, and the sum of probabilities for all classes is 1. Thus obtainedThe probability that the micro-motion features belong to each category is represented, so that the category to which the micro-motion is most likely belongs can be determined according to the probability, and the classification of the micro-motion is realized.
From the above, the present embodiment can convert the micro-motion feature into a clear classification probability distribution by combining the micro-motion feature with the weight matrix and the bias term and processing the micro-motion feature by the softmax function. This allows the classification process of micro-actions to be standardized and easily understood, enabling efficient determination of the category to which the micro-action belongs.
In one embodiment of the present disclosure, further comprising:
updating the micro-motion feature according to the weight of the micro-motion feature to obtain a target micro-motion feature;
Classifying based on the target micro-motion characteristics, and carrying out emotion recognition on the recognition target.
In this embodiment, the adjustment and update operations are performed on the weights of the micro-motion features.
For example, the weight is dynamically changed according to the characteristic distribution of the sample, the difference between the sample and the real label and other factors in the training process, the part of the micro-motion characteristics which is more critical to emotion recognition is highlighted, and some possible noise or less relevant information is restrained, so that the target micro-motion characteristics are obtained.
And inputting the target micro-action characteristics subjected to weight updating into a classification model for emotion recognition. Because the target micro-motion characteristics are subjected to weight optimization, the target micro-motion characteristics can be more effectively matched and compared with the existing emotion type model in the classifying process, the accuracy and the reliability of emotion recognition are improved, and finally, emotion judgment of the recognition target is more in line with the actual situation.
In one embodiment of the present disclosure, updating the micro-motion feature with the weight of the micro-motion feature includes:
Updating the micro-motion feature based on the weight of the micro-motion feature by the second formula;
the second formula is:
wherein, the Represented as a weighted loss function,Expressed as the weight of the ith micro-motion feature sample,Denoted as the cross entropy loss of the ith micro-motion feature sample.
In this embodiment, in the training process of micro-motion recognition, in view of the locality and fine granularity difference characteristics of micro-motion, it is difficult for the conventional unified weight training method to effectively focus on key information. Dynamically determining the weight of each micro-motion feature sample according to the joint movement amplitude and the time sensitivity by introducing a weighted loss function of a second formula. For samples with larger motion amplitude and time critical, higher weight is givenSo that its corresponding cross entropy is lostAt the total loss functionThe medium duty cycle is larger.
In the back propagation process, the model can adjust the parameters according to the weighted loss function, so as to update the weight of the micro-motion characteristic. The model is more prone to learn micro-motion features which have larger influence on the recognition result in the learning process, and the weight distribution of the micro-motion features is gradually optimized, so that the recognition capability of the model on micro-motions is improved, the motion and fine motion changes of a key region can be more accurately captured, and noise and interference of irrelevant information are reduced.
In the process of micro motion identification, the motion amplitude and time sensitivity of the joint point contain key information. For joints with larger motion amplitudes, which often play an important role in micro-motion, which may be the initiation, turning or critical change point of motion, a higher weight should be given. In the time dimension, the joint motion at certain specific time points is closely related to the occurrence of key actions, so that the joint point data at certain time points have higher time sensitivity, and the joint point data at certain time points also need to be focused, and the weight of the joint point data should be correspondingly increased.
By the dynamic weight calculation method based on joint movement amplitude and time sensitivity, the movement of the model focused on the key areas can be guided during model training and recognition. For example, when the loss function is calculated, the weight adjustment can make the model attach more importance to the sample data corresponding to the key region, and strengthen the learning of the important information in the parameter updating process, so as to improve the understanding and recognition accuracy of the model to the micro-actions, better capture the detail characteristics and dynamic changes of the micro-actions and reduce the interference caused by irrelevant or secondary information.
From the above, the present embodiment dynamically adjusts the weights according to different conditions of the samplesEnabling the model to focus on more critical micro-motion feature samples. For those samples that have a greater impact on the recognition results, higher weights are given to enhance the learning of important information during the training process. The method is beneficial to improving the recognition precision of the model, reducing the interference caused by unimportant or noise samples, enabling the model to capture the micro-motion nuances more efficiently, and improving the performance of the whole micro-motion recognition system.
Corresponding to the human micro-action based emotion recognition method of the above embodiment, fig. 3 is a block diagram of a human micro-action based emotion recognition system according to an embodiment of the present disclosure. For ease of illustration, only portions relevant to embodiments of the present disclosure are shown. Referring to fig. 3, the emotion recognition system 20 based on human micro-actions includes a feature fusion module 21, a micro-action recognition module 22, and an emotion recognition module 23.
The feature fusion module 21 is configured to perform feature fusion based on the target video feature, the target skeleton feature and the target amplitude feature of the recognition target, so as to obtain a comprehensive feature of the recognition target;
A micro-motion recognition module 22 for obtaining micro-motion features of the recognition target based on the integrated features;
the emotion recognition module 23 is configured to classify based on the micro-motion features and perform emotion recognition on the recognition target.
In one embodiment of the present disclosure, the feature fusion module 21 is specifically configured to:
Performing feature extraction on a target video of an identification target based on a convolutional neural network to obtain image features of each frame of image;
And obtaining the target video characteristic based on the image characteristic of each frame of image.
In one embodiment of the present disclosure, the feature fusion module 21 is specifically configured to:
Extracting joint point data of an identification target based on a target video;
Generating two-dimensional skeleton data of the identification target based on the joint point data;
Analyzing the two-dimensional skeleton data based on a depth estimation algorithm to obtain three-dimensional skeleton data of the recognition target;
and extracting the characteristics of the three-dimensional skeleton data to obtain the target skeleton characteristics of the recognition target.
In one embodiment of the present disclosure, the feature fusion module 21 is specifically configured to:
Calculating the displacement amplitude of the joint point data between two adjacent frames in the target video;
generating a motion amplitude image of the recognition target based on the displacement amplitude;
and extracting the characteristics of the motion amplitude image to obtain the target amplitude characteristics.
In one embodiment of the present disclosure, emotion recognition module 23 is specifically configured to:
classifying the micro-actions based on a first formula;
The first formula is: ;
wherein, the The classification output expressed as a jog feature,Represented as a matrix of weights which,Represented as micro-motion features of the recognition target,Represented as a bias term,Represented as a probability distribution function.
In one embodiment of the present disclosure, the human micro-action based emotion recognition system 20 further includes an update module, specifically configured to:
updating the micro-motion feature by the weight of the micro-motion feature to obtain a target micro-motion feature;
and classifying the target micro-action features, and carrying out emotion recognition on the recognition target.
In one embodiment of the present disclosure, the update module is specifically further configured to:
updating the micro-motion feature based on a second formula for the weight of the micro-motion feature;
The second formula is:
wherein, the Represented as a weighted loss function,Expressed as the weight of the ith micro-motion feature sample,Denoted as the cross entropy loss of the ith micro-motion feature sample.
Referring to fig. 4, fig. 4 is a schematic block diagram of an electronic device according to an embodiment of the disclosure. The electronic device 300 in this embodiment as shown in fig. 4 may include one or more processors 301, one or more input devices 302, one or more output devices 303, and one or more memories 304. The processor 301, the input device 302, the output device 303, and the memory 304 communicate with each other via a communication bus 305. The memory 304 is used to store a computer program comprising program instructions. The processor 301 is configured to execute program instructions stored in the memory 304. Wherein the processor 301 is configured to invoke program instructions to perform the functions of the modules in the system embodiments described above, such as the functions of the modules 21-23 shown in fig. 3.
It should be appreciated that in the disclosed embodiments, the Processor 301 may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input device 302 may include a touch pad, a fingerprint collection sensor (for collecting fingerprint information of a user and direction information of a fingerprint), a microphone, etc., and the output device 303 may include a display (LCD, etc.), a speaker, etc.
The memory 304 may include read only memory and random access memory and provides instructions and data to the processor 301. A portion of memory 304 may also include non-volatile random access memory. For example, the memory 304 may also store information of device type.
In a specific implementation, the processor 301, the input device 302, and the output device 303 described in the embodiments of the present disclosure may perform the implementation manners described in the first embodiment and the second embodiment of the emotion recognition method based on human micro-actions provided in the embodiments of the present disclosure, and may also perform the implementation manners of the electronic device described in the embodiments of the present disclosure, which are not described herein again.
In another embodiment of the disclosure, a computer readable storage medium is provided, where the computer readable storage medium stores a computer program, where the computer program includes program instructions, where the program instructions, when executed by a processor, implement all or part of the procedures in the method embodiments described above, or may be implemented by instructing related hardware by the computer program, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by the processor, implements the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include any entity or system capable of carrying computer program code, recording medium, USB flash disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, among others.
The computer readable storage medium may be an internal storage unit of the electronic device of any of the foregoing embodiments, such as a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk provided on the electronic device, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the electronic device. The computer-readable storage medium is used to store a computer program and other programs and data required for the electronic device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic device and unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed electronic device and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the partitioning of elements is merely a logical functional partitioning, and there may be additional partitioning in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via some interfaces or units, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present disclosure.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a specific embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and any equivalent modifications or substitutions will be apparent to those skilled in the art within the scope of the present disclosure, and these modifications or substitutions should be covered in the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (9)

1. A human micro-motion based emotion recognition method, comprising:
Performing feature fusion based on the target video features, the target skeleton features and the target amplitude features of the recognition target to obtain comprehensive features of the recognition target;
Obtaining micro-motion characteristics of the identification target based on the comprehensive characteristics;
the method comprises the steps of obtaining a micro-motion characteristic sample, updating the micro-motion characteristic according to the weight of the micro-motion characteristic sample to obtain a target micro-motion characteristic, introducing a weighted loss function, dynamically determining the weight of each micro-motion characteristic sample according to the motion amplitude and the time sensitivity of a joint point, giving higher weight to the samples with larger motion amplitude and time keys so that the corresponding cross entropy loss accounts for larger proportion in the total loss function, adjusting own parameters according to the weighted loss function, and updating the micro-motion characteristic;
and classifying the target micro-action features, and carrying out emotion recognition on the recognition target.
2. The human micro-action based emotion recognition method of claim 1, further comprising:
Performing feature extraction on a target video of an identification target based on a convolutional neural network to obtain image features of each frame of image;
The target video characteristic is obtained based on the image characteristic of each frame of image.
3. The human micro-action based emotion recognition method of claim 1, further comprising:
Extracting joint point data of an identification target based on a target video;
generating two-dimensional skeleton data of an identification target based on the joint point data;
analyzing the two-dimensional skeleton data based on a depth estimation algorithm to obtain three-dimensional skeleton data of an identification target;
And extracting the characteristics of the three-dimensional skeleton data to obtain target skeleton characteristics of the recognition target.
4. The human micro-action based emotion recognition method of claim 3, further comprising:
Calculating the displacement amplitude of the joint point data between two adjacent frames in the target video;
generating a motion amplitude image of the recognition target based on the displacement amplitude;
and extracting the characteristics of the motion amplitude image to obtain the target amplitude characteristics.
5. The human micro-motion based emotion recognition method of claim 1, wherein said classifying based on said micro-motion features comprises:
Classifying the micro-actions based on a first formula;
the first formula is:
wherein, the The classification output expressed as a jog feature,Represented as a matrix of weights which,Represented as micro-motion features of the recognition target,Represented as a bias term,Represented as a probability distribution function.
6. The human micro-motion based emotion recognition method of claim 1, wherein updating the micro-motion features according to weights of the micro-motion feature samples comprises:
updating the micro-motion feature based on a second formula;
The second formula is:
wherein, the Represented as a weighted loss function,Expressed as the weight of the ith micro-motion feature sample,Denoted as the cross entropy loss of the ith micro-motion feature sample.
7. An emotion recognition system based on human micro-actions, comprising:
the feature fusion module is used for carrying out feature fusion based on the target video features, the target skeleton features and the target amplitude features of the recognition target to obtain the comprehensive features of the recognition target;
The micro-motion recognition module is used for obtaining micro-motion characteristics of a recognition target based on the comprehensive characteristics;
The system comprises a micro-motion feature sample, an updating module, a mood recognition module and a mood recognition module, wherein the micro-motion feature sample is used for updating the micro-motion feature according to the weight of the micro-motion feature sample to obtain a target micro-motion feature, the weighting loss function is introduced, the weight of each micro-motion feature sample is dynamically determined according to the movement amplitude and the time sensitivity of a joint point, the sample with larger movement amplitude and time key is given with higher weight, so that the corresponding cross entropy loss accounts for more in the total loss function, the self parameters are adjusted according to the weighted loss function, the micro-motion feature is updated, and the mood recognition module is used for classifying the target micro-motion feature and recognizing the mood.
8. An electronic device comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.
CN202411967118.1A 2024-12-30 2024-12-30 Emotion recognition method and system based on human micro-movements Active CN119888818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411967118.1A CN119888818B (en) 2024-12-30 2024-12-30 Emotion recognition method and system based on human micro-movements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411967118.1A CN119888818B (en) 2024-12-30 2024-12-30 Emotion recognition method and system based on human micro-movements

Publications (2)

Publication Number Publication Date
CN119888818A CN119888818A (en) 2025-04-25
CN119888818B true CN119888818B (en) 2025-10-28

Family

ID=95434106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411967118.1A Active CN119888818B (en) 2024-12-30 2024-12-30 Emotion recognition method and system based on human micro-movements

Country Status (1)

Country Link
CN (1) CN119888818B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152510A (en) * 2023-08-25 2023-12-01 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) An emotion recognition method and system using multi-cue joint learning
CN117475493A (en) * 2023-11-07 2024-01-30 中移(苏州)软件技术有限公司 Emotion recognition method and device, electronic equipment, chip and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9489570B2 (en) * 2013-12-31 2016-11-08 Konica Minolta Laboratory U.S.A., Inc. Method and system for emotion and behavior recognition
CN119007270B (en) * 2024-10-18 2025-02-11 宝鸡文理学院 A micro-expression recognition method based on multimodal fusion
CN119152562B (en) * 2024-11-18 2025-03-11 华南理工大学 Micro-expression recognition method and system based on dual-feature fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152510A (en) * 2023-08-25 2023-12-01 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) An emotion recognition method and system using multi-cue joint learning
CN117475493A (en) * 2023-11-07 2024-01-30 中移(苏州)软件技术有限公司 Emotion recognition method and device, electronic equipment, chip and medium

Also Published As

Publication number Publication date
CN119888818A (en) 2025-04-25

Similar Documents

Publication Publication Date Title
Boulahia et al. Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition
Sincan et al. Using motion history images with 3D convolutional networks in isolated sign language recognition
Nigam et al. A review of computational approaches for human behavior detection
Abbas et al. Unmanned aerial vehicles for human detection and recognition using neural-network model
Ashraf et al. A novel telerehabilitation system for physical exercise monitoring in elderly healthcare
CN106648078B (en) Multi-mode interaction method and system applied to intelligent robot
CN109325408A (en) A gesture judgment method and storage medium
Bukht et al. Physical Exergames Movements and Pattern Recognition using Convolutional Neural Network
Bukht et al. A Novel Activity Pattern Recognition via Convolutional Neural Networks and Advanced Skeleton Models.
Xavier et al. Real-time hand gesture recognition using MediaPipe and artificial neural networks
Haddad et al. Computer vision with deep learning for human activity recognition: Features representation
Xu RETRACTED ARTICLE: Optical image enhancement based on convolutional neural networks for key point detection in swimming posture analysis
Enikeev et al. Recognition of sign language using leap motion controller data
Du et al. SKIP: Accurate fall detection based on skeleton keypoint association and critical feature perception
CN119888818B (en) Emotion recognition method and system based on human micro-movements
Abdelrazik et al. Efficient deep learning algorithm for egyptian sign language recognition
CN118379586B (en) Key point prediction model training method, device, equipment, medium and product
Maragatham et al. Enhancing HCI Through Real-Time Gesture Recognition with Federated CNNs: Improving Performance and Responsiveness.
Bhagwat et al. Sign language to categorical text using convolutional neural network
CN114283461B (en) Image processing method, apparatus, device, storage medium, and computer program product
CN117850579A (en) Non-contact control system and method based on human body posture
Dawood et al. ARNet: Integrating Spatial and Temporal Deep Learning for Robust Action Recognition in Videos.
Maashi et al. Deep convolutional neural network-based Leveraging Lion Swarm Optimizer for gesture recognition and classification
Saxena et al. Hybrid Convolutional Neural Mixed Approached Model for Incorporating Sign Language Features
Chen et al. Meta-process-driven 3D skeleton feature learning for enhanced human action recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant