CN111814846A

CN111814846A - Training method and recognition method of attribute recognition model and related equipment

Info

Publication number: CN111814846A
Application number: CN202010564682.4A
Authority: CN
Inventors: 李禹�; 唐邦杰; 潘华东; 殷俊; 张兴明
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2020-10-23
Anticipated expiration: 2040-06-19
Also published as: CN111814846B

Abstract

The application discloses a training method, an identification method, electronic equipment and a storage medium of an attribute identification model. The training method comprises the following steps: acquiring a target sample set; counting real categories corresponding to the attributes in the target sample set to obtain the distribution condition of the samples of the categories of the attributes; identifying the target sample set by using an attribute identification model to obtain an attribute identification result of each sample image; determining a loss value of the attribute identification model based on the category difference between the real category and the prediction category of each attribute and the distribution condition of each category sample of each attribute; adjusting parameters of the attribute identification model based on the loss values; the above process is repeatedly executed until the preset condition of stopping training is met. Through the mode, the identification result of the attribute identification model can be more accurate.

Description

Training method and recognition method of attribute recognition model and related equipment

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a training method, an identification method, an electronic device, and a storage medium for an attribute identification model.

Background

With the continuous maturity of Artificial Intelligence (AI) technology, attribute recognition starts to be performed by means of the AI technology in more and more scenes, and specifically, attribute recognition can be performed on an image of a target object by means of an attribute recognition model. For example, attributes such as gender, clothes, hair, etc. of the pedestrian are identified based on the image of the pedestrian, and attributes such as color, model, etc. of the vehicle are identified based on the image of the vehicle.

However, the recognition result obtained by the existing attribute recognition model is not accurate enough.

Disclosure of Invention

The application provides a training method of an attribute recognition model, an electronic device and a storage medium, which can solve the problem that the recognition result obtained by the existing attribute recognition model in the prior art is not accurate enough.

In order to solve the technical problem, the application adopts a technical scheme that: providing an acquisition target sample set, wherein the target sample set comprises sample images of a plurality of target objects, and the sample images are labeled with real categories corresponding to at least one attribute of the target objects; counting real categories corresponding to the attributes in the target sample set to obtain the distribution condition of the samples of the categories of the attributes; identifying the target sample set by using an attribute identification model to obtain an attribute identification result of each sample image, wherein the attribute identification result comprises a prediction category of each attribute of the target object; determining a loss value of the attribute identification model based on the category difference between the real category and the prediction category of each attribute and the distribution condition of each category sample of each attribute, wherein the more balanced the distribution of each category sample of the attribute is, the smaller the influence of the category difference of the corresponding attribute on the loss value is; adjusting parameters of the attribute identification model based on the loss values; the above process is repeatedly executed until the preset condition of stopping training is met.

In order to solve the above technical problem, another technical solution adopted by the present application is: a method for enhancing attribute identification, the method comprising: acquiring an image to be identified; and identifying the image to be identified by using the attribute identification model to obtain an attribute identification result of the image to be identified, wherein the attribute identification model is obtained by training by using the training method.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an electronic device comprising a processor having stored thereon program instructions for executing a program of instructions stored in a memory to implement the method described above.

In order to solve the above technical problem, the present application adopts another technical solution that: there is provided a storage medium storing program instructions that when executed enable the above method to be implemented.

Different from the prior art, the beneficial effects of the application are that: obtaining the distribution condition of the category samples corresponding to each attribute based on the real category of each attribute of the sample images in the target sample set, namely the distribution of the number of sample images of a plurality of categories corresponding to each attribute, and the predicted categories corresponding to the attributes of the obtained sample images are identified by utilizing the attribute identification model, determining a loss value of the attribute recognition model based on the class difference between the real classes corresponding to the attributes of the sample image, since the more uniform the distribution of class samples of an attribute, the greater the uncertainty of the class corresponding to the attribute, therefore, the influence degree of the category difference on the loss value of the attribute identification model can be determined according to the category sample distribution condition of the attribute, so as to realize the dynamic adjustment of the loss generated by the attribute identification model based on the sample distribution condition, and the trained attribute recognition model has better generalization performance on a real scene, so that the recognition result of the attribute recognition model is more accurate.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a training method for an attribute recognition model according to the present application;

FIG. 2 is a schematic view of a detailed flow of S110 in FIG. 1;

FIG. 3 is a detailed flowchart of S113 in FIG. 2;

FIG. 4 is a schematic diagram of the detailed process of S1133 in FIG. 3;

FIG. 5 is a schematic view of a detailed flow chart of S120 in FIG. 1;

FIG. 6 is a schematic view of a detailed flow chart of S140 in FIG. 1;

FIG. 7 is a schematic flowchart of an embodiment of an attribute identification method of the present application;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device of the present application;

FIG. 9 is a schematic structural diagram of an embodiment of a storage medium according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. All directional indications (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps is not limited to only those steps recited, but may alternatively include other steps not recited, or may alternatively include other steps inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

FIG. 1 is a flowchart illustrating an embodiment of a training method for an attribute recognition model according to the present application. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 1 is not limited in this embodiment. As shown in fig. 1, the present embodiment includes:

s110: a target sample set is obtained.

The target sample set comprises a plurality of sample images containing the target object, and the sample images are marked with real categories corresponding to at least one attribute of the target object.

The sample image of the target object can be acquired from the video, and the target object can be a pedestrian or a movable object such as a vehicle. Each sample image may contain a target object, and the target sample set includes at least one sample image of the same target object. The target object at least has one attribute, each attribute corresponds to at least two types of categories, and the sample image of the target object is marked with a real category corresponding to each attribute. For example, a pedestrian has a "gender" attribute including categories "male" and "female", and when the pedestrian is female, the sample image of the pedestrian is labeled "female", that is, the sample image of the pedestrian is labeled "female". It will be appreciated that in other embodiments, each sample image may also include more than two target objects, and in this case, the sample image may be considered a sample image of any of the target objects it contains.

In the present embodiment, the sample image of the target object included in the target sample set may also be referred to as a sample in the target sample set.

Referring to fig. 2, S110 may include:

s111: acquiring a plurality of frames of images in a video.

The video can be acquired by front-end equipment, and the front-end equipment can be a camera, electronic equipment with a shooting function and the like. Multiple frames of images can be acquired from a video at a specified frame rate.

S112: and carrying out target identification on each frame of image to obtain a target object in each frame of image.

Target identification can be carried out on each frame of image based on a target detection algorithm, and coordinates of all target objects in each frame of image are obtained. The target objects in different frame images may or may not be the same. For example, when the target object is a pedestrian, the coordinates of all pedestrians in each frame of image can be identified based on a pedestrian detection algorithm.

S113: a sample image of each target object is obtained based on the multi-frame image.

A method for acquiring a sample image of each target object is described below, and referring to fig. 3, S113 may specifically include:

s1131: and acquiring the track information of the target object in the multi-frame image.

The track information of the current target object can be acquired based on the coordinates of the current target object in different frame images. For example, the coordinates of the current target object in different frame images can be associated through a target tracking algorithm, so as to obtain the track information of the current target object.

S1132: and obtaining at least one frame of image containing the target object based on the track information of the target object.

Based on the track information of the current target object, an image containing the current target object in the multi-frame image can be obtained. The image containing the current target object may further contain other target objects. For example, the current target object is a pedestrian a, and an image including the pedestrian a may further include a pedestrian b.

S1133: based on the image containing the target object, a sample image of the target object is obtained.

Referring to fig. 4, S1133 may include:

s11331: and respectively scratching regions corresponding to the target object from the image containing the target object to obtain a target object sub-image.

The region corresponding to the current target object can be extracted from the image containing the current target object by utilizing a matting algorithm to serve as a sub-image of the current target object, namely a target object sub-image.

S11332: the mass fraction of each target object sub-image is calculated separately.

Wherein the quality score may be at least one of a sharpness score, an occlusion score, or a completeness score of the target object sub-image.

S11333: a sample image of the target object is selected from the target object sub-images based on the quality score.

And selecting a sample image of the current target object based on the quality score of each target object sub-image of the current target object.

The target object sub-image with the highest quality score may be taken as the sample image containing the target object. Of course, in other embodiments, a quality score threshold may be preset, and the target object sub-image with the quality score exceeding the quality score threshold may be used as the sample image of the target object. Or, when the target object is a pedestrian, the sample image of the current pedestrian can be selected by combining the postures of the pedestrian in the target object sub-images and the mass fractions of the target object sub-images, and specifically, one target object sub-image with the highest mass of the current pedestrian in each posture can be selected as the sample image of the current pedestrian.

S120: and counting the real categories corresponding to the attributes in the target sample set to obtain the distribution condition of the samples of the categories corresponding to the attributes.

In a specific embodiment, the distribution of the sample images of each category corresponding to each attribute may be a distribution of the number of sample images of each category corresponding to each attribute. The method comprises the following specific steps:

referring to fig. 5, S120 may include:

s121: and counting the number of each real category sample corresponding to each attribute in the target sample set.

The target sample set attribute may be an attribute that a sample image in the target sample set has, and the number of each real category sample corresponding to each attribute may be the number of sample images labeled with each real category corresponding to each attribute.

Still taking the target object as a pedestrian for explanation, now that the target sample set has two attributes of "gender" and "hair", where the "gender" attribute includes two categories of "male" and "female", and the "hair" attribute includes two categories of "long" and "short", it is necessary to count the number of sample images with the attribute of "gender" and the category of "female", the number of sample images with the attribute of "gender" and the category of "male", the number of sample images with the attribute of "hair" and the category of "long", and the number of sample images with the attribute of "hair" and the category of "short".

S122: and determining the Gini index of each attribute by using the number of samples of each real category and the total number of samples of the attribute.

And the sum of the number of the sample images of each real category corresponding to the current attribute is the total number of the samples of the current attribute. The number of samples for each real category corresponding to the current attribute and the total number of samples for the current attribute may be used to determine the kini index for the current attribute.

For example, the target sample set has a "gender" attribute, and the sum of the number of samples with gender as women and the number of samples with category as men is the total number of samples with gender as the "gender" attribute. The number of samples of gender "female", the number of samples of category "male" and the total number of samples of the gender attribute in the target sample set may be used to determine the kini index for the gender attribute. The specific determination method comprises the following steps:

in one embodiment, the kini index of each attribute may be determined by using the ratio of the sample image of each real category corresponding to the current attribute in the sample image having the current attribute. Specifically, the following formula can be used to obtain the kini index of each attribute:

wherein D is_mRepresenting a sample image set with the mth attribute, | D_mI denotes D_mK represents the total number of real categories corresponding to the mth attribute, C_kmRepresents | D_mThe number of samples in | corresponding to the kth real class, i.e. | D_mThe number of samples in the kth real category is labeled in |.

For example, the sample image in the target sample set has an attribute corresponding to K categories, and the K-th category ratio corresponding to the attribute is p_kThen the kini index of the attribute is:

when the attribute includes 2 categories, that is, K is 2, the ratio of the sample images of the first category is p, then the kini index of the attribute is:

Gini(p)＝2p(1-p)。

the larger the kini index of the attribute is, the more balanced the number distribution of the image of each category sample of the attribute is, that is, the more balanced the distribution of each category sample of the attribute is, the larger the uncertainty of the category sample corresponding to the attribute is.

The present application does not limit the statistical method for the distribution of each type of sample corresponding to each attribute, and in other specific embodiments, the distribution of each type of sample corresponding to each attribute may be counted in other manners, and when the distribution of each type of sample corresponding to each attribute is counted in other manners, the calculation method for the kini index of each attribute may be different.

S130: and identifying the target sample set by using the attribute identification model to obtain an attribute identification result of each sample image.

The attribute identification model can identify each sample image in the target sample set and output the identification result of each sample image. The attribute identification result may include a prediction category corresponding to each attribute of the target object, and may further include a prediction category confidence corresponding to each attribute.

S140: and determining the loss value of the attribute identification model based on the category difference between the real category and the prediction category of each attribute and the distribution condition of each category sample of each attribute.

Optionally, the more evenly the distribution of each class sample of the attribute is, the less the influence of the class difference of the corresponding attribute on the loss value is. Specifically, the more uniform the distribution of the sample of each category of the attribute is, the more difficult the category corresponding to the attribute is to be specified, and therefore the smaller the influence of the category difference corresponding to the attribute on the loss value is.

In a specific embodiment, the loss value of the attribute identification model may be determined based on the class difference of each attribute, the confidence of the predicted class, and the distribution of samples of each class. Referring to fig. 6, a particular method may include:

s141: and respectively obtaining the weight of the corresponding attribute by utilizing the distribution condition of each type of sample of each attribute.

Optionally, the more balanced the distribution of each class sample of the attribute is, the smaller the weight of the corresponding attribute is. Specifically, since the distribution of the samples of each category of the attribute is balanced, it means that it is difficult to specify the category corresponding to the attribute of the sample image of the target sample set, and the weight corresponding to the attribute is set to be small, so that the loss value calculated later can be made more effective.

Optionally, a ratio between a preset index and the kini index of each attribute is respectively used as the weight of the corresponding attribute, wherein the preset index is greater than or equal to 1. Therefore, the larger the kini index of the current attribute is, the smaller the calculated weight is, and the influence of the attribute with large corresponding category uncertainty on the loss value can be weakened. For example, the mth attribute may be weighted by

Wherein lambda is a preset index, and lambda is more than or equal to 1.

S142: and performing cross entropy loss calculation by using the category difference, the prediction category confidence coefficient and the weight of each attribute to obtain a loss value of the attribute identification model.

Specifically, the loss value L of the attribute identification model can be obtained by using the following formula. If the two categories corresponding to the attributes are provided, the formula is as follows:

if the attribute corresponds to more than two categories, the formula is:

where N represents the number of sample images in the target sample set, M represents the number of attributes, W_mWeight, y, representing the m-th attribute_imRepresenting the category difference of the mth attribute of the ith sample image, wherein if the real category of the mth attribute of the ith sample image is the same as the preset category, the category difference is 1, otherwise, the category difference is 0, p_imThe confidence of the mth attribute of the ith sample image is represented.

S150: and judging whether a preset condition for stopping training is met or not.

The preset condition for stopping training may be that the loss value is smaller than a preset threshold, or that the training frequency reaches a preset threshold, or the like.

If not, executing S160; if yes, go to S170.

S160: parameters of the attribute identification model are adjusted based on the loss values.

The parameters of the attribute identification model are continuously adjusted through the loss value of the identification result of the attribute identification model, so that the attribute identification model can be continuously optimized, and the attribute identification result of the attribute identification model is more accurate.

After the step is executed, the process jumps to S130 to repeat the above process until the preset condition for stopping training is satisfied.

S170: the training is stopped.

By implementing the embodiment, the category sample distribution condition corresponding to each attribute is obtained based on the real category of each attribute of the sample images in the target sample set, namely the number distribution condition of the sample images of a plurality of categories corresponding to each attribute, the prediction category corresponding to each attribute of the sample images obtained by the attribute identification model is used for identifying the category difference between the real categories corresponding to each attribute of the sample images, and the loss value of the attribute identification model is determined, because the more balanced the category sample distribution of the attributes is, the larger the uncertainty of the category corresponding to the attribute is represented, the influence degree of the category difference on the loss value of the attribute identification model can be determined according to the category sample distribution condition of the attributes, so that the loss generated by dynamically adjusting the attribute identification model based on the sample distribution condition is realized, and the trained attribute identification model has better generalization performance to the real scene, therefore, the identification result of the attribute identification model is more accurate.

The attribute recognition model obtained by the training method can be used for performing attribute recognition on an image, and fig. 7 is a schematic flow chart of an embodiment of the attribute recognition method. It should be noted that, if the result is substantially the same, the flow sequence shown in fig. 7 is not limited in this embodiment. As shown in fig. 7, the present embodiment includes:

s210: and acquiring an image to be identified.

Please refer to the previous embodiment for the method for acquiring the image to be recognized, which is not repeated here.

S220: and identifying the image to be identified by using the attribute identification model to obtain an attribute identification result of the image to be identified.

For a detailed description of the identification process, reference is made to the previous embodiments, which are not repeated here.

The attribute recognition model can be obtained by training the above-mentioned training method of the attribute recognition model.

Fig. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application. As shown in fig. 8, the electronic device includes a processor 310, a memory 320 coupled to the processor.

Wherein the memory 320 stores program instructions for implementing the method of any of the embodiments described above; the processor 310 is configured to execute program instructions stored by the memory 320 to implement the steps of the above-described method embodiments. The processor 310 may also be referred to as a Central Processing Unit (CPU). The processor 310 may be an integrated circuit chip having signal processing capabilities. The processor 310 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

FIG. 9 is a schematic structural diagram of an embodiment of a storage medium according to the present application. The storage medium 400 of the embodiment of the present application stores program instructions 410, and the program instructions 410 implement the methods provided by the above-mentioned embodiments of the present application when executed. The program instructions 410 may form a program file stored in the storage medium 400 in the form of a software product, so as to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium 400 includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. A training method of an attribute recognition model is characterized by comprising the following steps:

acquiring a target sample set, wherein the target sample set comprises sample images of a plurality of target objects, and the sample images are marked with real categories corresponding to at least one attribute of the target objects;

counting real categories corresponding to the attributes in the target sample set to obtain the distribution condition of the samples of the categories of the attributes;

identifying the target sample set by using the attribute identification model to obtain an attribute identification result of each sample image, wherein the attribute identification result comprises a prediction category of each attribute of the target object;

determining a loss value of the attribute identification model based on a category difference between a real category and a prediction category of each attribute and a distribution condition of each category sample of each attribute, wherein the more uniform the distribution of each category sample of the attribute is, the less the influence of the category difference corresponding to the attribute on the loss value is;

adjusting parameters of the attribute identification model based on the loss values;

the above process is repeatedly executed until the preset condition of stopping training is met.

2. The method of claim 1,

the attribute identification result further comprises a confidence degree of a prediction category of each attribute, and the determining of the loss value of the attribute identification model based on the difference between the real category and the prediction category of each attribute and the distribution condition of each category sample of each attribute comprises:

and determining a loss value of the attribute identification model based on the category difference of each attribute, the confidence of the prediction category and the distribution condition of each category sample of each attribute.

3. The method of claim 2,

determining a loss value of the attribute identification model based on the category difference of each attribute, the preset category confidence and the distribution condition of each category sample of each attribute, including:

respectively obtaining weights corresponding to the attributes by utilizing the distribution condition of each type of sample of each attribute, wherein the more balanced the distribution of each type of sample of the attributes is, the smaller the weight corresponding to the attributes is;

and performing cross entropy loss calculation by using the category difference, the prediction category confidence coefficient and the weight of each attribute to obtain a loss value of the attribute identification model.

4. The method of claim 3,

the obtaining of the loss value of the attribute identification model by performing cross entropy loss calculation by using the category difference, the prediction category confidence and the weight of each attribute comprises:

the loss value L of the attribute identification model is obtained using the following formula, wherein,

if the two categories corresponding to the attributes are provided, the formula is as follows:

if there are more than two categories of the corresponding attribute, the formula is:

wherein N represents the number of the sample images in the target sample set, M represents the number of the attributes, W_mWeight, y, representing the mth said attribute_imThe class difference, p, representing the m-th attribute of the i-th sample image_imThe prediction class confidence representing the mth attribute of the ith sample image.

5. The method of claim 3,

the step of counting the real categories of the attributes in the target sample set to obtain the distribution condition of the samples of the categories of the attributes comprises:

counting the number of each real category sample corresponding to each attribute in the target sample set;

determining a kiney index for each of the attributes using the number of samples for each real category and the total number of samples for the attribute;

the obtaining the weights corresponding to the attributes respectively by using the distribution condition of each category of samples of each attribute comprises:

and respectively taking the ratio of a preset index to the Gini index of each attribute as the weight corresponding to the attribute, wherein the preset index is greater than or equal to 1.

6. The method of claim 5,

determining a kini index for each of the attributes using the number of samples per real category and the total number of samples for the attribute, comprising:

obtaining a kini index for each of said attributes using the following formula:

wherein D is_mRepresenting a set of sample images, | D, in the target sample set having the mth one of the attributes_mI denotes D_mK represents the total number of real categories corresponding to the mth attribute, C_kmRepresents D_mCorresponding to the number of samples of the kth said real category.

7. The method of claim 1,

the target object is a pedestrian; and/or the presence of a gas in the gas,

the obtaining a target sample set comprises:

acquiring a plurality of frame images in a video;

carrying out target identification on each frame of image to obtain a target object in each frame of image;

a sample image of each of the target objects is obtained based on a plurality of frames of the images.

8. The method of claim 7,

the obtaining a sample image of each target object based on the multi-frame image includes:

for each of the target objects:

acquiring track information of the target object in the multi-frame image;

obtaining at least one frame of image containing the target object based on the track information of the target object;

based on an image containing the target object, a sample image of the target object is obtained.

9. The method of claim 8,

the obtaining a sample image of the target object based on the image containing the target object comprises:

scratching a region corresponding to the target object from an image containing the target object to obtain a target object sub-image;

respectively calculating the mass fraction of each target object sub-image;

selecting a sample image of the target object from a target object sub-image based on the quality score.

10. An attribute identification method, comprising:

acquiring an image to be identified;

identifying the image to be identified by using an attribute identification model to obtain an attribute identification result of the image to be identified;

wherein the attribute recognition model is trained by the method of any one of claims 1 to 9.

11. An electronic device comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions;

the processor is configured to execute the program instructions stored by the memory to implement the method of any of claims 1-10.

12. A storage medium, characterized in that the storage medium stores program instructions which, when executed, implement the method of any one of claims 1-10.