CN110334354B - Chinese relation extraction method - Google Patents
Chinese relation extraction method Download PDFInfo
- Publication number
- CN110334354B CN110334354B CN201910626307.5A CN201910626307A CN110334354B CN 110334354 B CN110334354 B CN 110334354B CN 201910626307 A CN201910626307 A CN 201910626307A CN 110334354 B CN110334354 B CN 110334354B
- Authority
- CN
- China
- Prior art keywords
- word
- vector
- level
- hidden state
- state vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a Chinese relation extraction method, which comprises the following steps: s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text; s2: feature coding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a character and a hidden state vector of a word through the distributed vectors of the three levels of the character, the word and the word meaning, and further obtaining a final hidden state vector of the word level; s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level. The method effectively solves the problems of word segmentation ambiguity and polysemous word ambiguity, greatly improves the expression of the model on the relation extraction task, and improves the accuracy and the robustness of Chinese relation extraction.
Description
Technical Field
The invention relates to the technical field of computer application, in particular to a Chinese relation extraction method.
Background
Natural language processing is a sub-discipline of artificial intelligence, as well as a cross-discipline of computer science and computational linguistics. Among them, the relation extraction is one of the basic tasks in the natural language processing field. The goal is to accurately find the relationships between entities for a given sentence and well-labeled entities (generally, nouns). The relation extraction technology can be used for constructing a large-scale knowledge graph, and the knowledge graph is a semantic network consisting of concepts, entities, entity attributes and entity relations and is a structural representation of the real world. The construction of large-scale knowledge maps can provide comprehensive and structured external knowledge for artificial intelligence systems, thereby developing more enhanced applications.
The traditional relation extraction tasks have certain problems, and the traditional relation extraction tasks are often characterized by manual work, so that the model can effectively run on a small-range specific data set, and the method limits the development of the relation extraction field.
Meanwhile, due to the dependence on manual characteristics, the traditional relation extraction technology has poor robustness and expandability, so that the model cannot be generalized on different data and linguistic data.
In recent years, deep learning-based relationship extraction has been greatly advanced, and these methods have many advantages over conventional relationship extraction methods. Firstly, due to the application of the neural network, the models can automatically learn the semantic features of the text, so that the problem that the features are manually designed aiming at specific data is avoided, the labor cost is reduced, and a better effect is achieved. This neural network model provides an end-to-end solution that minimizes human involvement. Meanwhile, the model based on the neural network also has higher robustness, and can learn mapping from different characteristics to output aiming at the ever-changing natural language.
However, even deep learning models face some unsolved problems. For languages without natural separators like chinese, the current approach is to implement the mainstream approach at word level or word level. The input sequence of the former is input into the model by taking characters as units, and the method can make the model difficult to learn the word-level characteristics in the semantic space, thereby causing insufficient information and reducing the accuracy of the relation extraction task; the latter is to use word segmentation tool to segment words of input sequence and then input the segmented words into model, although the method can consider word level information, the phenomenon of word segmentation ambiguity is easy to generate by means of external word segmentation tool, so that the error of external tool can be spread in the whole model, and the development of relation extraction task is limited. In addition, no matter the model is in a word level or a word level, the polysemous phenomenon of words is not considered, only one word vector is used for representing word characteristics, and the strategy can not process the polysemous phenomenon of words, so that the upper limit of the model is reduced.
Disclosure of Invention
The invention provides a Chinese relation extraction method, aiming at solving the problems of word segmentation ambiguity and polysemous word ambiguity in Chinese relation extraction in the prior art.
In order to solve the above problems, the technical solution adopted by the present invention is as follows:
a Chinese relation extraction method comprises the following steps: s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text; s2: feature coding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a character and a hidden state vector of a word through the distributed vectors of the three levels of the character, the word and the word meaning, and further obtaining a final hidden state vector of the word level; s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vectors of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level.
Preferably, extracting the distributed vectors at the word level comprises extracting a word vector and a location vector; the word vector is: word-level sequence s = { c } for text given the input data 1 ,...,c M M has a number of characters, each character c is divided into c by using the word2vec method i Are all mapped into a word vectorWherein, c i Which represents the (i) th character,is a word vector of the ith character, R is a real number space, d c Is a dimension of the word vector; the position vector represents the character c i To two entities P 1 And P 2 Relative position therebetweenWherein,the calculation method of (2) is as follows:wherein, b 1 And e 1 Are the start and end positions of the first entity P1,are calculated byIs the same as the calculation method ofAndis converted into a corresponding position vector ofAndfor representing a position characteristic of said word-level sequence, d p A dimension representing a position vector;
the final representation of the word-level distributed vector is a concatenation of the word vector and the two position vectors, i.e.:at this time, the process of the present invention,d=d c +2*d p d is the total dimension after the word vector and the position vector are spliced;
Preferably, extracting the word-level distributed vector comprises: word-level sequence of text s = { c) for a given said input data 1 ,...,c M And word-level sequence s = { w = } 1 ,...,w M Using start position b and end position e to represent a word, i.e. w b,e (ii) a Word w is divided by word2vec method b,e Distributed vector converted to word level
Preferably, each word w is obtained from the external semantic knowledge base knowledge network b,e Sense set of (w) b,e ) Each sense in the set of senses isAll converted into a word sense level distributed vectorNamely, it isWherein K is the word w b,e The number of sense of words of (1).
Preferably, step S2 comprises: s21: directly inputting the character level sequence of the text of the input data into the bidirectional long-time and short-time memory network by taking characters as basic units to obtain the hidden state vector of the characters; s22: acquiring all word sense vectors of words of a word level sequence of the input data with each word as a tail through an external semantic knowledge base known network, inputting the word sense vectors into the bidirectional long-time and short-time memory network to calculate word sense level hidden state vectors, and fusing all the word sense level hidden state vectors by using a weighted summation method to obtain the word hidden state vectors; s23: and calculating the weights of the word and the word by using a gate unit, and fusing the hidden state vector of the word and the hidden state vector of the word into a final hidden state vector of the word by a weighted sum method.
Preferably, step S21 includes: the calculation process of inputting the jth word in the word level sequence of the text into the bidirectional long-short time memory network is as follows:
where i is an input gate for controlling which information is stored; f is a forgetting gate for controlling which information is to be forgotten; o is an output gate for controlling which information is to be output; c is a cell unit, U and b are parameters to be learned in the bidirectional long-short time memory network, h represents a hidden state vector and is determined by the hidden state of the previous moment and the data input of the current moment.
Preferably, in step S22, for words w starting with a subscript b and ending with a subscript e b,e The word is represented asCell unit of the word input into the bidirectional long-and-short term memory networkThe calculation is as follows:
for the word w b,e The kth sense of (b) represents a vector ofCell unit of word sense levelThe calculation process of (c) is as follows:
an additional gate mechanism is introduced to control the contribution of each word sense information:
the word cell state calculation method fusing a plurality of word sense information is as follows:
For character c e The calculation method is as follows:
wherein,andis a normalized representation of the gate structure, and the calculation method is as follows:
the cell unit corresponding to each word fuses the information of the word and the word meaning level, and then the final hidden state vector of the word is obtained:
the final hidden state vector of the word is fed into the classifier, which synthesizes a corresponding sentence-level feature representation.
H=tanh(h)
α=softmax(w T H)
h * =hα T
then h * Will be sent to a softmax classification layer, calculate the probability distribution of each classification:
o=Wh * +b
p(y|s)=softmax(o)
for T training data, the entire training process will be optimized by the following cross entropy loss function:
wherein, d h Is the dimension of a hidden state variable, M is the length of the input sequence, R is the real space, T represents the transpose, w is a parameter to be learned, a is the weight vector of h,is the transfer matrix, b ∈ R Y Is a bias vector, Y represents the total number of all classes, p (Y) represents the probability of predicting a certain class, and θ represents all parameters that need to be trained in the whole model.
Preferably, a dropout mechanism is adopted in the training process, and each neuron of the bidirectional long-term memory network is closed with a probability of 50% in the training process.
The invention also provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth in any one of the above.
The invention has the beneficial effects that: the method for extracting the Chinese relation is provided, the text of the input data is subjected to pre-training processing of multi-granularity information to extract three levels of distributed vectors of characters, words and word senses in the text, so that semantic features can be automatically learned, and the manual participation degree is greatly reduced; the method can effectively solve the problems of word segmentation ambiguity and polysemous word ambiguity, greatly improves the expression of the model on the relation extraction task, and improves the accuracy and robustness of Chinese relation extraction.
Drawings
FIG. 1 is a diagram illustrating a method for extracting Chinese relationships according to an embodiment of the present invention.
FIG. 2 is a flow chart illustrating a Chinese relationship extraction method according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. The connection may be for fixing or for circuit connection.
It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings to facilitate the description of the embodiments of the invention and to simplify the description, and are not intended to indicate or imply that the device or element so referred to must have a particular orientation, be constructed in a particular orientation, and be constructed in a particular manner of operation, and are not to be construed as limiting the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.
Example 1
As shown in FIG. 1, the present invention provides a Chinese relationship extraction method. The method comprises the following steps:
s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text;
s2: feature encoding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector through the distributed vectors of the three levels of the characters, the words and the word senses, and further obtaining a final hidden state vector of the character level;
s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level.
The data preprocessing step is to perform multi-granularity information pre-training processing on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text, the traditional pre-training processing usually only uses one word vector to correspondingly represent one word, and in the invention, a distributed word sense vector is generated for each word sense of each word.
In the step of feature coding, a lattice long-time memory network of a multilayer path is realized, so that semantic information of multiple levels is effectively utilized to learn a hidden state variable of a word level, and the variable can be regarded as a feature automatically extracted from data; the hidden variables learned in the feature coding step are input into the relationship classification step, and a gating attention mechanism is introduced to automatically distribute and fuse weights of the hidden state sequences, so that noise information is filtered out in the weighted fusion process, remarkable feature information is reserved, and more accurate relationship types are obtained through final classification output.
The invention comprises two stages: a training phase and a prediction phase. In the training stage, an initial model is defined, parameters in the model are initialized randomly, data with relation class labels are input into the model continuously in the training process, and the model learns the training data continuously in the training process to update the parameters. Meanwhile, the cross entropy between the prediction output of the model and the correct answer is used as a loss function to measure the prediction effect of the model, when the value of the loss function tends to be stable, the model effect is converged, the training is finished, and a trained relation extraction device is obtained; in the prediction stage, the data to be predicted is directly input into the trained relation extraction device to obtain the corresponding predicted entity relation.
Data preprocessing:
in this step, the main purpose is to convert the text in the input data into a distributed vector that can be read and manipulated by the computer, which contains implicit semantic information. Meanwhile, in order that a subsequent module can utilize multi-granularity information of words, words and word senses in the text, the step performs vector learning representation for all three language granularities.
For word vectors, the technique uses a commonly used word2vec algorithm in a large-scale corpus to train to obtain implicit feature representations of each word. The expression utilizes the context information in the large-scale corpus in which the words are positioned, and can fully embody the grammar and semantic information of the words.
For word vectors, generally speaking, the training method and the word vectors use the word2vec algorithm, but the word vectors are trained by taking words in texts as basic units, and the word vectors are trained by taking words as basic units after automatic word segmentation is performed on the texts by using a word segmentation tool; however, in this way, each word corresponds to only one fixed word vector, and word ambiguity is ignored, so the present invention chooses to vector represent word senses rather than words.
For a word sense vector, since it is not possible to directly distinguish from the word plane whether the word is a polysemous word, which word senses are present in each case, the word senses are modeled by means of an external semantic knowledge base knowledge network (HowNet). In HowNet, a variety of word senses and semaphores (the smallest unit representing a semantic meaning) of each word are manually and explicitly labeled, by which the word sense of each word is obtained and a word sense vector is trained. In this way, a word may be represented by multiple word sense vectors and thus may be input into subsequent modules, so that the model may dynamically select the most appropriate semantics of the current word in the current sentence during training, helping the model capture the more deep semantic information and features in the sentence.
Feature coding:
a neural network structure capable of effectively utilizing multi-granularity information features is realized in the feature coding step. Compared with a traditional Recurrent Neural Network (RNN) model, the LSTM can more flexibly and effectively process context information, store important information in input, forget invalid information in input and avoid the problems of gradient disappearance and gradient explosion easily encountered by a deep neural network. However, the traditional LSTM model cannot solve the problems of word segmentation ambiguity and polysemous word ambiguity of Chinese relation extraction, so that the invention carries out a series of improvements.
Firstly, in order to avoid the error transmission of a word segmentation tool, the invention takes characters as basic units, takes each sentence text as a character-level sequence as direct input, and inputs the sequence into a bidirectional LSTM unit to obtain a hidden state vector of the sequence; then, in order to simultaneously consider word-level information during the encoding process, for each word in a sentence, the present invention adds all words ending with that word in the sentence to the unit calculation of LSTM. Such as: the sentence "darwinian" studied the "cuckoo" word in all cuckoo "in which" cuckoo "is a word ending with" cuckoo ". All the words ending with the current word are then also fed into another bi-directional LSTM unit, and the word-level hidden state is computed. And finally, calculating the weights of the characters and the words by using a gate unit, and fusing the hidden states of the characters and the words by a weighted summation method to obtain a final hidden state vector of the current character, wherein the vector simultaneously contains information of the levels of the characters and the words.
Although the method can combine the information of the words and the words to effectively avoid the influence of word segmentation errors on the model, the existence of ambiguous words in sentences is not considered. For example, in the above example, "azalea" is an ambiguous word, and has two distinct semantics, namely "azalea" and "azalea bird". Thus, still further, the present invention incorporates the word senses of each word into the computation of the hidden state. Specifically, for each word ending with the current word, howNet is firstly queried to obtain all word sense vectors of the word, then the word sense vectors are input into a bidirectional LSTM unit like the word vector to calculate hidden states of word sense levels, and finally the states of all word sense levels are fused by using a weighted summation method to obtain the hidden states of the word. Compared with the prior art that the word hidden state is directly obtained by one word vector, the method can dynamically fuse and select the most suitable word meaning to form the word hidden state. After the hidden state of the word is obtained, the hidden states of the word and the word are fused in the same way as the previous method, and the final hidden state vector of the current word is obtained.
And (4) relation classification:
the step inputs the learnt sentence feature representation into a classifier to obtain a predicted relation category label. In the last module, the encoder learns the feature representation (hidden state vector) of each word, but since the relationship class is extracted in units of sentences, it is necessary to merge the feature representations of all word levels into corresponding sentence feature representations. The invention introduces a gated attention mechanism to automatically assign weights to the feature representation of the level words, and then performs weighted summation on all the words based on the weights to obtain the final sentence feature representation. The intuitive meaning of this method is that in a sentence, the degree of importance of each word is different, noise words or common words such as "of" and "of" should be given smaller weight, and keywords such as words in entities and words in verbs should correspond to higher attention, so that the sentence representation obtained by fusion is more accurate.
After the feature expression vectors of the sentences are obtained through fusion, the vectors are input into a full connection layer, and new vectors with dimensions of the total relation category number are obtained through mapping. And then, normalizing the new vector by a softmax function, so that each dimension value in the vector is a probability value in an interval of 0 to 1, and the probability that the sentence is classified into the relationship category corresponding to the current dimension is represented. For the training stage, defining a loss function of the model as a cross entropy between the normalized vector and an indication vector (one-hot) of the current correct relation category, and updating parameters in the model by a gradient descent method; and in the prediction stage, the currently predicted relationship category is the relationship category corresponding to the dimension with the maximum probability value in the normalized vector.
Example 2
As shown in fig. 2, the embodiment adopts the chinese relation extraction method provided by the present invention, which is defined as a given sentence s and two specified entities in the sentence, and determines what relation the two entities are in the sentence s. For example, given the sentence "darwinia research all azalea" and the named entities "darwinia" and "azalea", the goal is to determine what relationships "darwinia" and "azalea" are in that sentence.
Step 1, data preprocessing:
1.1 word level representation
S = { c) for a given input sequence 1 ,...,c M M has a number of characters, the invention uses word2vec method to combine each character (i-th one as example) c i Are all mapped into a word vector Is a word vector of the ith character, R is a real number space, d c Is the dimension of the word vector. In addition to word vectors, the present technique employs position vectors to represent the relative position of a word to two entities. In particular, for the ith character c i In other words, its relative position to a given two entities can be expressed as The calculation method of (2) is as follows:
here, b 1 And e 1 Are the start and end positions of the first entity,are calculated by a computerAlmost exactly the same. In this way it is possible to obtain,andwill be converted into a corresponding position vector ofAndfor indicating a position characteristic of said word-level sequence, d p Representing the dimensions of the position vector.
In one embodiment of the invention, the input is defined as a sentence and two entities specified therein; in fact, if a sentence has multiple entities, the relation extraction is performed on every two entities, and the input result is the relation of the two currently specified entities in the sentence.
Thus, for the ith word c of the input i The final expression is to splice the common word vector and two position vectors, i.e. to obtainAt this time, the process of the present invention,d=d c +2*d p d is the total dimension after the word vector and the position vector are spliced; the word representation of the input sequence becomesAnd then sent to the subsequent encoding step.
1.2 representation of word level:
although the input to the model is a word-level sequence, to obtain word-level features, the present inventionThe invention performs word-level representation learning on all possible candidate words in the sentence. For an input sequence s, it can not only be expressed as s = { c =, for example 1 ,...,c M A word-level sequence of the form, which can also be expressed as s = { w = } 1 ,...,w M The word-level sequence of. In this section, the present invention uses a starting position b and an ending position e to denote a word, i.e., w b,e . Still by the word2vec method, the word w b,e Can be converted into word vectors
For each word, the invention obtains its set of word senses (w) from HowNet b,e ) Then, each word sense in the set, namely the word sense of Skip-Gram methodAll are converted into a word sense vectorTo represent a word sense alone. Thus, a word may be represented by a vector of word senses (assuming it has K word senses), i.e.
The word sense representation vector will be used in the training of the encoder module so that the model can dynamically utilize word sense information.
Step 2, feature coding:
a common word-level LSTM unit consists essentially of three gates: an input gate i for controlling which information is to be stored; a forgetting gate f for controlling which information is to be forgotten; an output gate o controls which information is to be output. For the jth word, the LSTM unit is calculated as follows:
where c denotes a cell unit which stores information of the sequence from the start to the current position. h represents a hidden state vector which is determined by the hidden state at the previous moment and the input at the current moment. U and b are the parameters to be learned in LSTM.
For a word w starting with a subscript b and ending with a subscript e b,e The word of which is represented asIn Lattice LSTM, a cell unit of a wordThe calculation is as follows:
i.e. for a word c b The invention will first look for all words that begin with and match the external dictionary and then calculate the cell units c for those words w . On the basis, the invention expands the calculation of word sense level, namely, each word sense of each word is allocated with an additional LSTM unit for calculation. Mentioned in the presentation learning module, for the word w b,e The k-th sense of (1), its expression vector isThus, a sense level cell unitThe calculation process of (c) is as follows:
all the meaning cell units are then fused intoA word cell unit, such that the model takes into account word ambiguity information. The word cell unit after fusing a plurality of word senses isTo calculateAn additional gate mechanism needs to be introduced to control the contribution of each sense information:
the word cell state calculation method fusing a plurality of word sense information is as follows:
through the above calculation, for each word w b,e The model can calculate the cell state fused with multi-semantic information
Then for character c e The invention will each be represented by c e And the information of the ending words is fused to obtain a brand-new word-level cell state. The calculation method is as follows:
wherein,andis a normalized representation of the gate structure, and the calculation method is as follows:
after the calculation, the cell unit corresponding to each word fuses the information of the word and the word meaning level, and then the final hidden state vector of the word is calculated:
the final hidden state vector of the word is fed into the classifier, which synthesizes a sentence-level representation and then obtains the probability distribution of the answer.
And step 3, relation classification:
after the hidden state vector h of each word is learned, the invention adopts a word-level attention mechanism to fuse the hidden states of the word levels into a hidden state of a sentence level
Where d is h Is the dimension of the hidden state variable, and M is the length of the input sequence. h is a total of * Is a weighted sum of automatically assigned weights:
H=tanh(h)
α=softmax(wTH)
h * =hα T
where T represents the transpose, w is a parameter to be learned, α is a weight vector of H, H is a value of H transformed by a tanh function, and the tanh function maps the value of each dimension in H to the range of [ -1,1], which can effectively alleviate the problems of gradient explosion and the like in the training process.
Then h * Will be sent to a softmax classification layer, calculate the probability distribution of each classification:
o=Wh * +b
p(y|s)=softmax(o)
here, theIs the transition matrix, b ∈ R Y Is a bias vector. Y denotes the total number of all classes and p (Y) denotes the probability of predicting a certain class.
For T training data, the entire training process will be optimized by the following cross entropy loss function:
here θ represents all parameters in the entire model that need to be trained. Meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed with 50% of probability in the training process (namely, one half of hidden layer nodes are not involved in calculation in each training process); the testing phase keeps all the trained neurons involved in the calculation.
All or part of the flow of the method of the embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a processor, to instruct related hardware to implement the steps of the embodiments of the methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.
Claims (6)
1. A Chinese relation extraction method is characterized by comprising the following steps:
s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text;
extracting the word-level distributed vectors comprises extracting word vectors and position vectors;
the word vector is: word-level sequence s = { c } for text given the input data 1 ,…,c M M has a character, each character c is divided into c by using the word2vec method i Are all mapped into a word vectorWherein, c i Which represents the (i) th character,is a word vector of the ith character, R being a real numberSpace, d c Is a dimension of the word vector;
the position vector represents the character c i To two entities P 1 And P 2 Relative position therebetweenWherein,the calculation method of (2) is as follows:
wherein, b 1 And e 1 Are the start and end positions of the first entity P1,are calculated by a computerIs the same as the calculation method ofAndis converted into a corresponding position vector ofAndfor representing a position characteristic of said word-level sequence, d p A dimension representing a position vector;
the final representation of the word-level distributed vector is a concatenation of the word vector and the two position vectors, i.e.:
at this time, the process of the present invention,d is the total dimension after the word vector and the position vector are spliced;
Extracting the word-level distributed vectors includes:
word-level sequence s = { c } for text given the input data 1 ,…,c M H and a word-level sequence s = { w = 1 ,…,w M Using start position b and end position e to represent a word, i.e. w b,e (ii) a Word w is divided by word2vec method b,e Distributed vector converted to word level
Obtaining each word w from the external semantic knowledge base knowledge network b,e Sense set of (w) b,e ) Each sense in the set of senses isAll converted into a word sense level distributed vectorNamely that
Wherein K is the word w b,e The number of sense of (a);
s2: feature encoding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a word and a hidden state vector of a word through the distributed vectors of the three levels of the word, the word and the word meaning, and further obtaining a final hidden state vector of the word level;
wherein, step S2 includes:
s21: directly inputting the character level sequence of the text of the input data into the bidirectional long-time and short-time memory network by taking characters as basic units to obtain the hidden state vector of the characters;
s22: acquiring all word sense vectors of words of a word level sequence of the input data with each word as a tail through an external semantic knowledge base known network, inputting the word sense vectors into the bidirectional long-time and short-time memory network to calculate word sense level hidden state vectors, and fusing all the word sense level hidden state vectors by using a weighted summation method to obtain the word hidden state vectors;
s23: calculating the weights of the character and the word by using a gate unit, and fusing the hidden state vector of the character and the hidden state vector of the word into a final hidden state vector of the character by a weighted sum method;
s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level.
2. The method for extracting chinese relation according to claim 1, wherein the step S21 comprises: the calculation process of inputting the jth word in the word level sequence of the text into the bidirectional long-short time memory network is as follows:
where i is an input gate for controlling which information is stored; f is a forgetting gate for controlling which information is to be forgotten; o is an output gate for controlling which information is to be output; c is a cell unit, U and b are parameters to be learned in the bidirectional long-short time memory network, h represents a hidden state vector and is determined by the hidden state at the last moment and the data input at the current moment.
3. The method of extracting chinese relation according to claim 2, wherein in step S22, for a word w starting with subscript b and ending with subscript e b,e The word is represented asInputting the cell unit of the word into the bidirectional long-and-short term memory networkThe calculation is as follows:
for the word w b,e The kth sense of (b) represents a vector ofCell unit of word sense levelThe calculation process of (2) is as follows:
an additional gate mechanism is introduced to control the contribution of each sense information:
the word cell state calculation method fusing a plurality of word sense information is as follows:
For character c e The calculation method is as follows:
wherein,andis a normalized representation of the gate structure, and its calculation method is as follows:
the cell unit corresponding to each word fuses the information of the word and the word meaning level, and then the final hidden state vector of the word is obtained:
the final hidden state vector of the word is fed into the classifier, which synthesizes a corresponding sentence-level feature representation.
4. The method of claim 3, wherein the hidden state vector at sentence level is a hidden state vectorh * Is calculated as follows:
H=tanh (h)
α=softmax(w T H)
h * =hα T
then h is * Will be sent to a softmax classification layer, calculate the probability distribution of each classification:
o=Wh * +b
p(y|s)=softmax(o)
for T training data, the entire training process will be optimized by the following cross entropy loss function:
wherein, d h Is the dimension of the hidden state variable, M is the length of the input sequence, R is the real number space, T represents the transposeW is a parameter to be learned, α is a weight vector of h,is the transfer matrix, b ∈ R Y Is a bias vector, Y represents the total number of all classes, p (Y) represents the probability of predicting a certain class, and θ represents all parameters that need to be trained in the whole model.
5. The method of claim 4, wherein a dropout mechanism is used in the training process, and each neuron of the bidirectional long term memory network is turned off with a probability of 50% in the training process.
6. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910626307.5A CN110334354B (en) | 2019-07-11 | 2019-07-11 | Chinese relation extraction method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910626307.5A CN110334354B (en) | 2019-07-11 | 2019-07-11 | Chinese relation extraction method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110334354A CN110334354A (en) | 2019-10-15 |
| CN110334354B true CN110334354B (en) | 2022-12-09 |
Family
ID=68146526
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910626307.5A Active CN110334354B (en) | 2019-07-11 | 2019-07-11 | Chinese relation extraction method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110334354B (en) |
Families Citing this family (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112948535B (en) * | 2019-12-10 | 2022-06-14 | 复旦大学 | Method and device for extracting knowledge triples of text and storage medium |
| CN111160017B (en) * | 2019-12-12 | 2021-09-03 | 中电金信软件有限公司 | Keyword extraction method, phonetics scoring method and phonetics recommendation method |
| CN111291556B (en) * | 2019-12-17 | 2021-10-26 | 东华大学 | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item |
| CN111061843B (en) * | 2019-12-26 | 2023-08-25 | 武汉大学 | Knowledge-graph-guided false news detection method |
| CN111274394B (en) * | 2020-01-16 | 2022-10-25 | 重庆邮电大学 | An entity relationship extraction method, device, device and storage medium |
| CN111428505B (en) * | 2020-01-17 | 2021-05-04 | 北京理工大学 | An Entity Relationship Extraction Method Based on Recognition Features of Trigger Words |
| CN111274794B (en) * | 2020-01-19 | 2022-03-18 | 浙江大学 | Synonym expansion method based on transmission |
| CN111709240A (en) * | 2020-05-14 | 2020-09-25 | 腾讯科技(武汉)有限公司 | Entity relationship extraction method, device, device and storage medium thereof |
| CN111783418B (en) * | 2020-06-09 | 2024-04-05 | 北京北大软件工程股份有限公司 | Chinese word meaning representation learning method and device |
| CN111859978B (en) * | 2020-06-11 | 2023-06-20 | 南京邮电大学 | Deep learning-based emotion text generation method |
| CN111680510B (en) * | 2020-07-07 | 2021-10-15 | 腾讯科技(深圳)有限公司 | Text processing method and device, computer equipment and storage medium |
| CN112015891B (en) * | 2020-07-17 | 2025-01-14 | 山东师范大学 | Method and system for classifying messages on online government inquiry platforms based on deep neural networks |
| CN112380872B (en) * | 2020-11-27 | 2023-11-24 | 深圳市慧择时代科技有限公司 | Method and device for determining emotion tendencies of target entity |
| CN112560487A (en) * | 2020-12-04 | 2021-03-26 | 中国电子科技集团公司第十五研究所 | Entity relationship extraction method and system based on domestic equipment |
| CN112883153B (en) * | 2021-01-28 | 2023-06-23 | 北京联合大学 | Relation Classification Method and Device Based on Information Enhanced BERT |
| CN113239663B (en) * | 2021-03-23 | 2022-07-12 | 国家计算机网络与信息安全管理中心 | Multi-meaning word Chinese entity relation identification method based on Hopkinson |
| CN112883194B (en) * | 2021-04-06 | 2024-02-20 | 讯飞医疗科技股份有限公司 | A symptom information extraction method, device, equipment and storage medium |
| CN113051371B (en) * | 2021-04-12 | 2023-02-07 | 平安国际智慧城市科技股份有限公司 | Chinese machine reading understanding method and device, electronic equipment and storage medium |
| CN113326676B (en) * | 2021-04-19 | 2024-09-20 | 北京快确信息科技有限公司 | Method for establishing deep learning model for structuring financial text into form |
| CN113392648B (en) * | 2021-06-02 | 2022-10-18 | 北京三快在线科技有限公司 | Entity relationship acquisition method and device |
| CN114372125A (en) * | 2021-12-03 | 2022-04-19 | 北京北明数科信息技术有限公司 | Government affair knowledge base construction method, system, equipment and medium based on knowledge graph |
| CN114579695A (en) * | 2022-01-20 | 2022-06-03 | 杭州量知数据科技有限公司 | Event extraction method, device, equipment and storage medium |
| CN115169326B (en) * | 2022-04-15 | 2024-07-19 | 长河信息股份有限公司 | Chinese relation extraction method, device, terminal and storage medium |
| CN115034302B (en) * | 2022-06-07 | 2023-04-11 | 四川大学 | Relation extraction method, device, equipment and medium for optimizing information fusion strategy |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080275694A1 (en) * | 2007-05-04 | 2008-11-06 | Expert System S.P.A. | Method and system for automatically extracting relations between concepts included in text |
| CN108733792A (en) * | 2018-05-14 | 2018-11-02 | 北京大学深圳研究生院 | A kind of entity relation extraction method |
-
2019
- 2019-07-11 CN CN201910626307.5A patent/CN110334354B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080275694A1 (en) * | 2007-05-04 | 2008-11-06 | Expert System S.P.A. | Method and system for automatically extracting relations between concepts included in text |
| CN108733792A (en) * | 2018-05-14 | 2018-11-02 | 北京大学深圳研究生院 | A kind of entity relation extraction method |
Non-Patent Citations (2)
| Title |
|---|
| Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification;Peng Zhou et al.;《Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics》;20160807;第207-212页 * |
| 基于深度学习的中文实体关系抽取方法;孙紫阳 等;《计算机工程》;20180930;第44卷(第9期);第164-170页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110334354A (en) | 2019-10-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110334354B (en) | Chinese relation extraction method | |
| Yao et al. | An improved LSTM structure for natural language processing | |
| CN114510570B (en) | Intention classification method, device and computer equipment based on small sample corpus | |
| CN111078836B (en) | Machine reading understanding method, system and device based on external knowledge enhancement | |
| WO2023024412A1 (en) | Visual question answering method and apparatus based on deep learning model, and medium and device | |
| CN109726389B (en) | A Chinese missing pronoun completion method based on common sense and reasoning | |
| CN111611810B (en) | A polyphone pronunciation disambiguation device and method | |
| CN110096711B (en) | Natural language semantic matching method for sequence global attention and local dynamic attention | |
| CN115859164B (en) | Building entity identification and classification method and system based on prompt | |
| CN110609891A (en) | A Visual Dialogue Generation Method Based on Context-Aware Graph Neural Network | |
| CN109214006B (en) | A Natural Language Inference Method for Image Enhanced Hierarchical Semantic Representation | |
| CN110532558B (en) | Multi-intention recognition method and system based on sentence structure deep parsing | |
| CN117033602A (en) | A method of constructing a multi-modal user mental perception question and answer model | |
| CN107358948A (en) | Language in-put relevance detection method based on attention model | |
| CN112541356A (en) | Method and system for recognizing biomedical named entities | |
| CN114492441A (en) | BilSTM-BiDAF named entity identification method based on machine reading understanding | |
| CN111428525A (en) | Implicit discourse relation identification method and system and readable storage medium | |
| CN107590127A (en) | A kind of exam pool knowledge point automatic marking method and system | |
| CN114692615B (en) | Small sample intention recognition method for small languages | |
| CN110162789A (en) | A kind of vocabulary sign method and device based on the Chinese phonetic alphabet | |
| CN118227791A (en) | A method for predicting learning outcomes of MOOC learners based on multi-level enhanced contrastive learning | |
| US11941360B2 (en) | Acronym definition network | |
| CN110019795A (en) | The training method and system of sensitive word detection model | |
| US20240354638A1 (en) | Named Entity Recognition System based on Enhanced Label Embedding and Curriculum Learning | |
| CN115906846B (en) | A document-level named entity recognition method based on hierarchical feature fusion of two graphs |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |























































































































































