CN117523670A - Gait recognition method and system based on multi-modal feature fusion - Google Patents
Gait recognition method and system based on multi-modal feature fusion Download PDFInfo
- Publication number
- CN117523670A CN117523670A CN202311551406.4A CN202311551406A CN117523670A CN 117523670 A CN117523670 A CN 117523670A CN 202311551406 A CN202311551406 A CN 202311551406A CN 117523670 A CN117523670 A CN 117523670A
- Authority
- CN
- China
- Prior art keywords
- target object
- gait
- features
- face
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
- G06V40/25—Recognition of walking or running movements, e.g. gait recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a gait recognition method and system based on multi-mode feature fusion, comprising the following steps: acquiring a video stream containing a target object; extracting a gait sequence diagram of a target object from a video stream; selecting a human face optimal frame and a pedestrian optimal frame from a gait sequence diagram of a target object; extracting face features of a target object from the face optimal frame; extracting human body characteristics of a target object from the pedestrian optimal frame; extracting gait characteristics of a target object from the gait sequence diagram; weighting and fusing the face features, the body features and the gait features of the target object to obtain multi-mode fusion features of the target object; and carrying out identity recognition on the target object according to the multi-mode fusion characteristics of the target object to obtain an identity recognition result of the target object. The accuracy of identity recognition is improved.
Description
Technical Field
The invention relates to the technical field of identity recognition, in particular to a gait recognition method and system based on multi-mode feature fusion.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Gait is the change in posture of a person during walking. Gait recognition is a technology for recognizing natural people based on natural human biological characteristics and behavior characteristics contained in gait, and is a biological characteristic recognition method, wherein the basic goal is to find out the identity of a pedestrian to be detected, which corresponds to a person in a database, by acquiring a section of video of normal walking of the pedestrian to be detected and comparing the video with stored walking videos of the pedestrian. Because pedestrians have certain differences in muscle strength, tendon and bone length, bone density, gravity center and the like, a person can be uniquely marked based on the differences, and a human motion model can be built by utilizing the characteristics or features can be directly extracted from human contours to realize gait recognition.
Under the conditions of long distance and uncontrollable, gait recognition technology has achieved a certain result, but in practical application, the gait recognition technology has more limiting conditions, so that the effect is not good enough. Current gait recognition techniques rely primarily on features of the gait cycle such as step size, pace, gait rhythm, etc. Although these features may reflect the basic rules of walking of the human body, they may be greatly affected in a complex background, severe occlusion, different camera viewing angles, too bright or too dark lighting conditions, and other real environments, resulting in low recognition accuracy. And the gait recognition technology is limited by data samples and algorithms, a large number of data samples are required for training in technical improvement, and particularly under the condition of large crowd diversity, the acquisition and processing difficulties of the data samples are large, so that the accuracy of gait recognition is low.
Disclosure of Invention
In order to solve the problems, the invention provides a gait recognition method and a gait recognition system based on multi-mode feature fusion, which improve the accuracy of identity recognition.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in a first aspect, a gait recognition method based on multi-modal feature fusion is provided, including:
acquiring a video stream containing a target object;
extracting a gait sequence diagram of a target object from a video stream;
selecting a human face optimal frame and a pedestrian optimal frame from a gait sequence diagram of a target object;
extracting face features of a target object from the face optimal frame;
extracting human body characteristics of a target object from the pedestrian optimal frame;
extracting gait characteristics of a target object from the gait sequence diagram;
weighting and fusing the face features, the body features and the gait features of the target object to obtain multi-mode fusion features of the target object;
and carrying out identity recognition on the target object according to the multi-mode fusion characteristics of the target object to obtain an identity recognition result of the target object.
In a second aspect, a gait recognition system based on multi-modal feature fusion is provided, including:
the video acquisition module is used for acquiring a video stream containing a target object;
the gait sequence diagram acquisition module is used for extracting a gait sequence diagram of the target object from the video stream;
the optimal frame determining module is used for selecting a human face optimal frame and a pedestrian optimal frame from the gait sequence diagram of the target object;
the feature extraction module is used for extracting the face features of the target object from the face optimal frame; extracting human body characteristics of a target object from the pedestrian optimal frame; extracting gait characteristics of a target object from the gait sequence diagram;
the feature fusion module is used for carrying out weighted fusion on the face features, the human body features and the gait features of the target object to obtain multi-mode fusion features of the target object;
and the identity recognition module is used for carrying out identity recognition on the target object according to the multi-mode fusion characteristics of the target object to obtain an identity recognition result of the target object.
In a third aspect, an electronic device is provided that includes a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps described for a multi-modal feature fusion-based gait recognition method.
In a fourth aspect, a computer readable storage medium is provided for storing computer instructions that, when executed by a processor, perform the steps of a gait recognition method based on multi-modal feature fusion.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, the multi-mode fusion characteristic is obtained by fusing the gait characteristic, the human body characteristic and the human face characteristic, so that the identity of the target object is identified according to the fusion characteristic, and the accuracy of the identity identification of the target object is improved.
2. The invention determines the optimal human face frame and the optimal pedestrian frame from the gait sequence diagram, and further extracts the human face features and the human body features from the optimal human face frame and the optimal pedestrian frame respectively, thereby ensuring the accuracy of the extracted human face features and human body features; when the face features, the human body features and the gait features are used for fusion, and further the identity recognition of the target object is performed, the accuracy of the identity recognition of the target object is ensured.
3. The invention evaluates the quality scores of the face features, the human body features and the gait features, further determines the weights of the features according to the quality scores, and performs weighted fusion on the face features, the human body features and the gait features according to the weights, wherein the obtained multi-mode fusion features consider the quality of the features, and do not occupy excessive specific gravity for the features with low quality scores, so that the accuracy of the identification result is further improved when the multi-mode fusion features are used for carrying out target object identification.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application.
FIG. 1 is a flow chart of an embodiment of the disclosed method;
fig. 2 is a quality judgment flow chart of the embodiment.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
Example 1
Current gait recognition techniques rely primarily on features of the gait cycle such as step size, pace, gait rhythm, etc. Although these features may reflect the basic rules of walking of a human body, they may be greatly affected in real environments, such as complex backgrounds, severe occlusions, different camera views, too bright or too dark lighting conditions, etc., resulting in poor recognition accuracy. And gait recognition technology is limited by data samples and algorithms, a large number of data samples are required for training in technical improvement, and particularly under the condition of large crowd diversity, the acquisition and processing difficulties of the data samples are large.
In addition, current developments in gait recognition technology ignore the effects of other biological features, such as height, weight, age, etc. These biological characteristics can have some effect on human walking, such as height and weight can affect step size and stride frequency, age and health can affect gait rhythm and pace, etc. Whereas current gait recognition techniques do not take into account the effects of these features, resulting in lower recognition accuracy.
In order to solve the above technical problem, in this embodiment, a gait recognition method based on multi-modal feature fusion is disclosed, as shown in fig. 1, including:
s1: a video stream containing a target object is acquired.
In this embodiment, a video stream including a target object is acquired by a camera or the like.
And the acquired video stream may contain one or more target objects.
S2: a gait sequence diagram of the target object is extracted from the video stream.
When a plurality of target objects are contained in the video stream, a gait sequence diagram of each target object is extracted from the video stream.
The process for extracting the gait sequence diagram of the target object from the video stream comprises the following steps: detecting and tracking a target object in a video stream to obtain an image sequence of the target object; and obtaining a gait sequence diagram of the target object according to the image sequence of the target object.
Specifically, for each of the video streams a target object x is included i The video frames of the video frames are subjected to preprocessing such as size adjustment, normalization, channel sequence adjustment and the like, and preprocessed images are obtained.
And carrying out target detection on the preprocessed images by adopting a target detection model, and determining a target object of each preprocessed image.
Preferably, the target detection model takes the preprocessed image as input, takes the identified target object as output, and is obtained through yolov7 network construction.
Inputting the preprocessed image into a target detection model, wherein the target detection model firstly extracts semantic information and spatial information of the preprocessed image through a convolutional neural network; fusing information features of different layers to obtain more comprehensive and accurate feature representation, classifying and positioning target objects on the feature map after feature fusion by utilizing a convolution and a full-connection layer, performing post-processing on detection results, including non-maximum suppression (NMS), confidence threshold screening and the like, and finally outputting target objects x i Is a result of detection of (a).
Upon detection of the target object x i Then, target tracking is performed by using the deep sort method, a unique ID number is generated for each detected target object, and a deep learning model is used to extract its feature vector, followed by target matching. For the detected target object x i Extracting appearance characteristics, calculating similarity scores of all tracked target objects in a previous frame, and selecting one with the highest score as a matching target; next, for a matching target whose similarity score exceeds a threshold, a kalman filter is used to predict the position of its next frame and update its state (position, speed, etc.) information. The deep sort method can still mark the target as an initial ID after the target is blocked and appears, so that the tracking precision of the moving target is improved. By detecting and tracking the target object in the video stream, an image sequence of the target object is obtained.
And determining a humanoid region of the target object in the image sequence of the target object, and obtaining the gait sequence of the target object by utilizing the humanoid region.
S3: and selecting a human face optimal frame and a pedestrian optimal frame from the gait sequence diagram of the target object.
Obtaining a target object x i After gait sequence diagram of (2), selecting the target object x from the current sequence i The frame with the best face and the best human quality is used as the face optimal frame and the pedestrian optimal frame.
Preferably, a maximum motion stabilization method is adopted, and a human face optimal frame and a pedestrian optimal frame are selected from a gait sequence diagram of a target object. Specific:
at the target object x i In the target tracking process of (2), the target object x is subjected to i Tracking the human face area and the human body area respectively, calculating the motion speed and the motion direction of the human face area and the human body area in different frames, and calculating the motion speed difference and the motion direction variation of each frame and the adjacent frames respectively for the human face area and the human body area; the motion speed difference and the direction variation are weighted and summed to obtain the motion variation of each frame and the adjacent frames, a frame with the minimum motion variation with the adjacent frames is selected as the frame with the most stable motion, and the frame with the most stable motion is taken as the optimal frame; the optimal frame determined according to the human face area is a human face optimal frame, and the optimal frame determined according to the human body area is a pedestrian optimal frame.
The maximum motion stabilization method can avoid the condition that the target moves rapidly or the motion direction changes, and improves the tracking robustness.
S4: extracting face features of a target object from the face optimal frame; extracting human body characteristics of a target object from the pedestrian optimal frame; gait characteristics of the target object are extracted from the gait sequence diagram.
In this embodiment, after the face optimal frame and the pedestrian optimal frame of the target object are selected, face features and human body features are extracted, and gait recognition of the target object is assisted to be completed through the face features and the human body features.
When the face features and the human body features are extracted, determining a face area of a target object in the face optimal frame and a human body area of the target object in the pedestrian optimal frame; extracting face features f of a target object from a face region of the target object i1 The method comprises the steps of carrying out a first treatment on the surface of the Extracting human body characteristics f of target object from human body area of target object i2 。
Preferably, the human face feature extraction network is adopted to extract human face features from the human face region, the human body feature extraction network is adopted to extract human body features from the human body region, the human face feature extraction network and the human body feature extraction network are both constructed by adopting a deep neural network, and the human face feature extraction network and the human body feature extraction network are obtained after training by adopting corresponding training data.
Wherein the face feature f i1 Including the sex, age, ethnicity, and the like of the subject.
Human body characteristics f i2 Including the height, weight, shooting angle, etc. of the target object.
In addition, the present embodiment also employs a gait feature extraction network from the target object x i Extracting gait feature f from gait sequence diagram of (2) i3 。
The human face feature extraction network, the human body feature extraction network and the gait feature extraction network are all constructed by adopting a deep convolutional neural network and are obtained by training corresponding training data.
Gait feature f i3 Including stride, stride frequency, and gait cycle characteristics.
S5: and carrying out weighted fusion on the face features, the human body features and the gait features of the target object to obtain the multi-modal fusion features of the target object.
Because the characteristics of the face, the human body, the gait and the like generated by the pedestrian target in the walking process can be interfered by various factors, the three characteristics are subjected to quality evaluation before the characteristics of the face, the human body and the gait are extracted. The unsupervised quality judgment algorithm designed by the method combines corresponding face, human body and gait feature extraction models, and provides correct quality scores for input images from the recognition angle instead of the visual quality angle by receiving a face optimal frame, a pedestrian optimal frame and a gait sequence diagram which need to be evaluated.
The method comprises the steps of firstly determining quality scores of each of face features, human body features and gait features; and determining the weight of the corresponding feature according to the quality score of each feature, and carrying out weighted fusion on the three features.
Preferably, the quality score of each feature is mapped into a range from 0 to 1, and the weight of each feature is obtained.
The process of determining each feature quality score is:
as shown in fig. 2, inputI is an input vector I, specifically, a vector that is input into a model after an image is subjected to operations such as preprocessing, and the like, specifically, a face optimal frame, a pedestrian optimal frame and a gait sequence diagram; output O (I) is a feature extraction network M to Output a feature O (I), and when the feature extraction network M is a face feature extraction network, the face feature is Output; outputting human body characteristics when the characteristic extraction network M is a human body characteristic extraction network; and outputting gait features when the feature extraction network M is a gait feature extraction network.
The feature extraction network M comprises a plurality of feature extraction subnetworks M i Feature extraction subnetwork representing feature extraction network M, subnetwork M i Is obtained by the network M through some neurons of random dropout, o i Representing the feature vector of the corresponding subnet output. For the feature vectors output by all the subnets, the Euclidean distance d (o) between every two features is calculated i ,o j ) After all the feature vectors are calculated, the mass calculation is carried out, and the formula is as follows:
where q (O (I)) is a quality score of the output feature O (I), σ is a sigmoid activation function, m is the number of feature extraction network subnets, and q is mapped into a section from 0 to 1 using the sigmiod activation function.
By adopting the mode, the quality evaluation is respectively carried out on the face characteristics, the human body characteristics and the gait characteristics to obtain the quality scores of the face characteristics, the human body characteristics and the gait characteristics, the quality scores are mapped into the interval from 0 to 1 to obtain the weights of the face characteristics, the human body characteristics and the gait characteristics, which are respectively q 1 ,q 2 ,q 3 。
By weight q 1 ,q 2 ,q 3 For the face feature f acquired in S4 i1 Human body characteristics f i2 And gait characteristics f i3 Weighting fusion is carried out to obtain the multi-modal fusion characteristic f of the target object i . Wherein:
wherein f i Representing the method for the target object x i Extracted multimodal fusion features, q 1 ,q 2 ,q 3 Respectively represent the face features f i1 Human body characteristics f i2 And gait characteristics f i3 Is a weight of (2).
S6: and carrying out identity recognition on the target object according to the multi-mode fusion characteristics of the target object to obtain an identity recognition result of the target object.
When the multi-mode fusion characteristics of the target object obtained through S5 calculation are used for carrying out identity recognition, the face characteristics, the body characteristics and the gait characteristics are comprehensively considered, so that the accuracy of the identity recognition is effectively improved.
Example 2
In this embodiment, a gait recognition system based on multi-modal feature fusion is disclosed, comprising:
the video acquisition module is used for acquiring a video stream containing a target object;
the gait sequence diagram acquisition module is used for extracting a gait sequence diagram of the target object from the video stream;
the optimal frame determining module is used for selecting a human face optimal frame and a pedestrian optimal frame from the gait sequence diagram of the target object;
the feature extraction module is used for extracting the face features of the target object from the face optimal frame; extracting human body characteristics of a target object from the pedestrian optimal frame; extracting gait characteristics of a target object from the gait sequence diagram;
the feature fusion module is used for carrying out weighted fusion on the face features, the human body features and the gait features of the target object to obtain multi-mode fusion features of the target object;
and the identity recognition module is used for carrying out identity recognition on the target object according to the multi-mode fusion characteristics of the target object to obtain an identity recognition result of the target object.
Example 3
In this embodiment, an electronic device is disclosed that includes a memory and a processor, and computer instructions stored on the memory and running on the processor that, when executed by the processor, perform the steps described in the gait recognition method based on multimodal feature fusion disclosed in embodiment 1.
Example 4
In this embodiment, a computer readable storage medium is disclosed for storing computer instructions that, when executed by a processor, perform the steps of a gait recognition method based on multimodal feature fusion disclosed in embodiment 1.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.
Claims (10)
1. A gait recognition method based on multi-modal feature fusion is characterized by comprising the following steps:
acquiring a video stream containing a target object;
extracting a gait sequence diagram of a target object from a video stream;
selecting a human face optimal frame and a pedestrian optimal frame from a gait sequence diagram of a target object;
extracting face features of a target object from the face optimal frame;
extracting human body characteristics of a target object from the pedestrian optimal frame;
extracting gait characteristics of a target object from the gait sequence diagram;
weighting and fusing the face features, the body features and the gait features of the target object to obtain multi-mode fusion features of the target object;
and carrying out identity recognition on the target object according to the multi-mode fusion characteristics of the target object to obtain an identity recognition result of the target object.
2. The gait recognition method based on multi-modal feature fusion of claim 1, wherein the target object in the video stream is detected and tracked to obtain an image sequence of the target object; and obtaining a gait sequence diagram of the target object according to the image sequence of the target object.
3. The gait recognition method based on multi-modal feature fusion of claim 1, wherein a face region of a target object in a face optimal frame and a human body region of a target object in a pedestrian optimal frame are determined;
extracting the face characteristics of the target object from the face area of the target object;
human body characteristics of the target object are extracted from a human body region of the target object.
4. The gait recognition method based on multi-modal feature fusion of claim 1, wherein the optimal frames of the face and the optimal frames of the pedestrian are selected from the gait sequence diagram of the target object by using a maximum motion stabilization method.
5. The method for gait recognition based on multi-modal feature fusion of claim 1, wherein a quality score for each of the face feature, the body feature and the gait feature is determined; and determining the weight of the corresponding feature according to the quality score of each feature, and carrying out weighted fusion on the three features.
6. The gait recognition method based on multi-modal feature fusion of claim 5, wherein the quality score of each feature is mapped into a 0 to 1 interval, and a weight for each feature is obtained.
7. The method for gait recognition based on multi-modal feature fusion as claimed in claim 1, wherein the gait features comprise stride, stride frequency and gait cycle;
the face features include gender, age, and ethnicity of the target subject;
the human body characteristics include the height, weight and shooting angle of the target object.
8. A gait recognition system based on multi-modal feature fusion, comprising:
the video acquisition module is used for acquiring a video stream containing a target object;
the gait sequence diagram acquisition module is used for extracting a gait sequence diagram of the target object from the video stream;
the optimal frame determining module is used for selecting a human face optimal frame and a pedestrian optimal frame from the gait sequence diagram of the target object;
the feature extraction module is used for extracting the face features of the target object from the face optimal frame; extracting human body characteristics of a target object from the pedestrian optimal frame; extracting gait characteristics of a target object from the gait sequence diagram;
the feature fusion module is used for carrying out weighted fusion on the face features, the human body features and the gait features of the target object to obtain multi-mode fusion features of the target object;
and the identity recognition module is used for carrying out identity recognition on the target object according to the multi-mode fusion characteristics of the target object to obtain an identity recognition result of the target object.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of a multimodal feature fusion-based gait recognition method as claimed in any one of claims 1 to 7.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of a multimodal feature fusion-based gait recognition method as claimed in any one of claims 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311551406.4A CN117523670A (en) | 2023-11-20 | 2023-11-20 | Gait recognition method and system based on multi-modal feature fusion |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311551406.4A CN117523670A (en) | 2023-11-20 | 2023-11-20 | Gait recognition method and system based on multi-modal feature fusion |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117523670A true CN117523670A (en) | 2024-02-06 |
Family
ID=89760414
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311551406.4A Pending CN117523670A (en) | 2023-11-20 | 2023-11-20 | Gait recognition method and system based on multi-modal feature fusion |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117523670A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118840789A (en) * | 2024-06-06 | 2024-10-25 | 北京永泰安达科技有限公司 | Object recognition method and device based on multi-modal characteristics |
| CN120279599A (en) * | 2025-04-17 | 2025-07-08 | 山东大学 | Automatic identification method and application of obesity patient based on gait analysis |
-
2023
- 2023-11-20 CN CN202311551406.4A patent/CN117523670A/en active Pending
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118840789A (en) * | 2024-06-06 | 2024-10-25 | 北京永泰安达科技有限公司 | Object recognition method and device based on multi-modal characteristics |
| CN120279599A (en) * | 2025-04-17 | 2025-07-08 | 山东大学 | Automatic identification method and application of obesity patient based on gait analysis |
| CN120279599B (en) * | 2025-04-17 | 2025-10-03 | 山东大学 | Automatic identification method and application of obesity patient based on gait analysis |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Nadeem et al. | Automatic human posture estimation for sport activity recognition with robust body parts detection and entropy markov model | |
| Pervaiz et al. | Artificial neural network for human object interaction system over Aerial images | |
| US11854306B1 (en) | Fitness action recognition model, method of training model, and method of recognizing fitness action | |
| CN114067358A (en) | Human body posture recognition method and system based on key point detection technology | |
| CN107944431A (en) | A kind of intelligent identification Method based on motion change | |
| CN113378649A (en) | Identity, position and action recognition method, system, electronic equipment and storage medium | |
| CN113516005B (en) | Dance action evaluation system based on deep learning and gesture estimation | |
| CN119169701B (en) | Three-dimensional motion capturing and intelligent analyzing system and method based on monocular camera | |
| Hasan et al. | Multi-level feature fusion for robust pose-based gait recognition using RNN | |
| CN111144165B (en) | Gait information identification method, system and storage medium | |
| CN117523670A (en) | Gait recognition method and system based on multi-modal feature fusion | |
| US12208309B2 (en) | Method and device for recommending golf-related contents, and non-transitory computer-readable recording medium | |
| Tsai et al. | Enhancing accuracy of human action Recognition System using Skeleton Point correction method | |
| Dong et al. | An improved deep neural network method for an athlete's human motion posture recognition | |
| Debalaxmi et al. | Analyzing yoga pose recognition: A comparison of MediaPipe and YOLO keypoint detection with ensemble techniques | |
| Yan et al. | Human-object interaction recognition using multitask neural network | |
| Zhu et al. | Dance Action Recognition and Pose Estimation Based on Deep Convolutional Neural Network. | |
| CN119762384A (en) | Data analysis system for upper limb function assessment | |
| Hanzla et al. | Robust human pose estimation and action recognition over multi-level perceptron | |
| Krzeszowski et al. | Gait recognition based on marker-less 3D motion capture | |
| Zaidi et al. | Mae Mai Muay Thai Style Classification in Movement Appling Long-Term Recurrent Convolution Networks | |
| Waheed et al. | An automated human action recognition and classification framework using deep learning | |
| Xu | RETRACTED ARTICLE: Optical image enhancement based on convolutional neural networks for key point detection in swimming posture analysis | |
| KR20220142673A (en) | LSTM-based behavior recognition method using human joint coordinate system | |
| Batool et al. | Fundamental recognition of ADL assessments using machine learning engineering |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |