CN110334354B - Chinese relation extraction method - Google Patents

Chinese relation extraction method Download PDF

Info

Publication number
CN110334354B
CN110334354B CN201910626307.5A CN201910626307A CN110334354B CN 110334354 B CN110334354 B CN 110334354B CN 201910626307 A CN201910626307 A CN 201910626307A CN 110334354 B CN110334354 B CN 110334354B
Authority
CN
China
Prior art keywords
word
vector
level
hidden state
state vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910626307.5A
Other languages
Chinese (zh)
Other versions
CN110334354A (en
Inventor
丁宁
李自然
郑海涛
刘知远
沈颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Tsinghua University
Original Assignee
Shenzhen Graduate School Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Tsinghua University filed Critical Shenzhen Graduate School Tsinghua University
Priority to CN201910626307.5A priority Critical patent/CN110334354B/en
Publication of CN110334354A publication Critical patent/CN110334354A/en
Application granted granted Critical
Publication of CN110334354B publication Critical patent/CN110334354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a Chinese relation extraction method, which comprises the following steps: s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text; s2: feature coding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a character and a hidden state vector of a word through the distributed vectors of the three levels of the character, the word and the word meaning, and further obtaining a final hidden state vector of the word level; s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level. The method effectively solves the problems of word segmentation ambiguity and polysemous word ambiguity, greatly improves the expression of the model on the relation extraction task, and improves the accuracy and the robustness of Chinese relation extraction.

Description

Chinese relation extraction method
Technical Field
The invention relates to the technical field of computer application, in particular to a Chinese relation extraction method.
Background
Natural language processing is a sub-discipline of artificial intelligence, as well as a cross-discipline of computer science and computational linguistics. Among them, the relation extraction is one of the basic tasks in the natural language processing field. The goal is to accurately find the relationships between entities for a given sentence and well-labeled entities (generally, nouns). The relation extraction technology can be used for constructing a large-scale knowledge graph, and the knowledge graph is a semantic network consisting of concepts, entities, entity attributes and entity relations and is a structural representation of the real world. The construction of large-scale knowledge maps can provide comprehensive and structured external knowledge for artificial intelligence systems, thereby developing more enhanced applications.
The traditional relation extraction tasks have certain problems, and the traditional relation extraction tasks are often characterized by manual work, so that the model can effectively run on a small-range specific data set, and the method limits the development of the relation extraction field.
Meanwhile, due to the dependence on manual characteristics, the traditional relation extraction technology has poor robustness and expandability, so that the model cannot be generalized on different data and linguistic data.
In recent years, deep learning-based relationship extraction has been greatly advanced, and these methods have many advantages over conventional relationship extraction methods. Firstly, due to the application of the neural network, the models can automatically learn the semantic features of the text, so that the problem that the features are manually designed aiming at specific data is avoided, the labor cost is reduced, and a better effect is achieved. This neural network model provides an end-to-end solution that minimizes human involvement. Meanwhile, the model based on the neural network also has higher robustness, and can learn mapping from different characteristics to output aiming at the ever-changing natural language.
However, even deep learning models face some unsolved problems. For languages without natural separators like chinese, the current approach is to implement the mainstream approach at word level or word level. The input sequence of the former is input into the model by taking characters as units, and the method can make the model difficult to learn the word-level characteristics in the semantic space, thereby causing insufficient information and reducing the accuracy of the relation extraction task; the latter is to use word segmentation tool to segment words of input sequence and then input the segmented words into model, although the method can consider word level information, the phenomenon of word segmentation ambiguity is easy to generate by means of external word segmentation tool, so that the error of external tool can be spread in the whole model, and the development of relation extraction task is limited. In addition, no matter the model is in a word level or a word level, the polysemous phenomenon of words is not considered, only one word vector is used for representing word characteristics, and the strategy can not process the polysemous phenomenon of words, so that the upper limit of the model is reduced.
Disclosure of Invention
The invention provides a Chinese relation extraction method, aiming at solving the problems of word segmentation ambiguity and polysemous word ambiguity in Chinese relation extraction in the prior art.
In order to solve the above problems, the technical solution adopted by the present invention is as follows:
a Chinese relation extraction method comprises the following steps: s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text; s2: feature coding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a character and a hidden state vector of a word through the distributed vectors of the three levels of the character, the word and the word meaning, and further obtaining a final hidden state vector of the word level; s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vectors of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level.
Preferably, extracting the distributed vectors at the word level comprises extracting a word vector and a location vector; the word vector is: word-level sequence s = { c } for text given the input data 1 ,...,c M M has a number of characters, each character c is divided into c by using the word2vec method i Are all mapped into a word vector
Figure BDA0002127219480000021
Wherein, c i Which represents the (i) th character,
Figure BDA0002127219480000022
is a word vector of the ith character, R is a real number space, d c Is a dimension of the word vector; the position vector represents the character c i To two entities P 1 And P 2 Relative position therebetween
Figure BDA0002127219480000023
Wherein,
Figure BDA0002127219480000024
the calculation method of (2) is as follows:
Figure BDA0002127219480000025
wherein, b 1 And e 1 Are the start and end positions of the first entity P1,
Figure BDA0002127219480000026
are calculated by
Figure BDA0002127219480000027
Is the same as the calculation method of
Figure BDA0002127219480000028
And
Figure BDA0002127219480000029
is converted into a corresponding position vector of
Figure BDA00021272194800000210
And
Figure BDA00021272194800000211
for representing a position characteristic of said word-level sequence, d p A dimension representing a position vector;
the final representation of the word-level distributed vector is a concatenation of the word vector and the two position vectors, i.e.:
Figure BDA00021272194800000212
at this time, the process of the present invention,
Figure BDA00021272194800000213
d=d c +2*d p d is the total dimension after the word vector and the position vector are spliced;
at this time, the representation of the word-level sequence of the text of the input data becomes
Figure BDA0002127219480000031
Preferably, extracting the word-level distributed vector comprises: word-level sequence of text s = { c) for a given said input data 1 ,...,c M And word-level sequence s = { w = } 1 ,...,w M Using start position b and end position e to represent a word, i.e. w b,e (ii) a Word w is divided by word2vec method b,e Distributed vector converted to word level
Figure BDA0002127219480000032
Preferably, each word w is obtained from the external semantic knowledge base knowledge network b,e Sense set of (w) b,e ) Each sense in the set of senses is
Figure BDA0002127219480000033
All converted into a word sense level distributed vector
Figure BDA0002127219480000034
Namely, it is
Figure BDA0002127219480000035
Wherein K is the word w b,e The number of sense of words of (1).
Preferably, step S2 comprises: s21: directly inputting the character level sequence of the text of the input data into the bidirectional long-time and short-time memory network by taking characters as basic units to obtain the hidden state vector of the characters; s22: acquiring all word sense vectors of words of a word level sequence of the input data with each word as a tail through an external semantic knowledge base known network, inputting the word sense vectors into the bidirectional long-time and short-time memory network to calculate word sense level hidden state vectors, and fusing all the word sense level hidden state vectors by using a weighted summation method to obtain the word hidden state vectors; s23: and calculating the weights of the word and the word by using a gate unit, and fusing the hidden state vector of the word and the hidden state vector of the word into a final hidden state vector of the word by a weighted sum method.
Preferably, step S21 includes: the calculation process of inputting the jth word in the word level sequence of the text into the bidirectional long-short time memory network is as follows:
Figure BDA0002127219480000036
Figure BDA0002127219480000037
Figure BDA0002127219480000038
Figure BDA0002127219480000039
Figure BDA00021272194800000310
Figure BDA0002127219480000041
where i is an input gate for controlling which information is stored; f is a forgetting gate for controlling which information is to be forgotten; o is an output gate for controlling which information is to be output; c is a cell unit, U and b are parameters to be learned in the bidirectional long-short time memory network, h represents a hidden state vector and is determined by the hidden state of the previous moment and the data input of the current moment.
Preferably, in step S22, for words w starting with a subscript b and ending with a subscript e b,e The word is represented as
Figure BDA0002127219480000042
Cell unit of the word input into the bidirectional long-and-short term memory network
Figure BDA0002127219480000043
The calculation is as follows:
Figure BDA0002127219480000044
Figure BDA0002127219480000045
Figure BDA0002127219480000046
Figure BDA0002127219480000047
for the word w b,e The kth sense of (b) represents a vector of
Figure BDA0002127219480000048
Cell unit of word sense level
Figure BDA0002127219480000049
The calculation process of (c) is as follows:
Figure BDA00021272194800000410
Figure BDA00021272194800000411
Figure BDA00021272194800000412
Figure BDA00021272194800000413
an additional gate mechanism is introduced to control the contribution of each word sense information:
Figure BDA00021272194800000414
the word cell state calculation method fusing a plurality of word sense information is as follows:
Figure BDA00021272194800000415
Figure BDA00021272194800000416
all the meaning cell units are then fused into a word cell state
Figure BDA0002127219480000051
For character c e The calculation method is as follows:
Figure BDA0002127219480000052
Figure BDA0002127219480000053
wherein,
Figure BDA0002127219480000054
and
Figure BDA0002127219480000055
is a normalized representation of the gate structure, and the calculation method is as follows:
Figure BDA0002127219480000056
Figure BDA0002127219480000057
the cell unit corresponding to each word fuses the information of the word and the word meaning level, and then the final hidden state vector of the word is obtained:
Figure BDA0002127219480000058
the final hidden state vector of the word is fed into the classifier, which synthesizes a corresponding sentence-level feature representation.
Preferably, the sentence-level hidden state vector
Figure BDA0002127219480000059
h * Is calculated as follows:
H=tanh(h)
α=softmax(w T H)
h * =hα T
then h * Will be sent to a softmax classification layer, calculate the probability distribution of each classification:
o=Wh * +b
p(y|s)=softmax(o)
for T training data, the entire training process will be optimized by the following cross entropy loss function:
Figure BDA0002127219480000061
wherein, d h Is the dimension of a hidden state variable, M is the length of the input sequence, R is the real space, T represents the transpose, w is a parameter to be learned, a is the weight vector of h,
Figure BDA0002127219480000062
is the transfer matrix, b ∈ R Y Is a bias vector, Y represents the total number of all classes, p (Y) represents the probability of predicting a certain class, and θ represents all parameters that need to be trained in the whole model.
Preferably, a dropout mechanism is adopted in the training process, and each neuron of the bidirectional long-term memory network is closed with a probability of 50% in the training process.
The invention also provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth in any one of the above.
The invention has the beneficial effects that: the method for extracting the Chinese relation is provided, the text of the input data is subjected to pre-training processing of multi-granularity information to extract three levels of distributed vectors of characters, words and word senses in the text, so that semantic features can be automatically learned, and the manual participation degree is greatly reduced; the method can effectively solve the problems of word segmentation ambiguity and polysemous word ambiguity, greatly improves the expression of the model on the relation extraction task, and improves the accuracy and robustness of Chinese relation extraction.
Drawings
FIG. 1 is a diagram illustrating a method for extracting Chinese relationships according to an embodiment of the present invention.
FIG. 2 is a flow chart illustrating a Chinese relationship extraction method according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. The connection may be for fixing or for circuit connection.
It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings to facilitate the description of the embodiments of the invention and to simplify the description, and are not intended to indicate or imply that the device or element so referred to must have a particular orientation, be constructed in a particular orientation, and be constructed in a particular manner of operation, and are not to be construed as limiting the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.
Example 1
As shown in FIG. 1, the present invention provides a Chinese relationship extraction method. The method comprises the following steps:
s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text;
s2: feature encoding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector through the distributed vectors of the three levels of the characters, the words and the word senses, and further obtaining a final hidden state vector of the character level;
s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level.
The data preprocessing step is to perform multi-granularity information pre-training processing on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text, the traditional pre-training processing usually only uses one word vector to correspondingly represent one word, and in the invention, a distributed word sense vector is generated for each word sense of each word.
In the step of feature coding, a lattice long-time memory network of a multilayer path is realized, so that semantic information of multiple levels is effectively utilized to learn a hidden state variable of a word level, and the variable can be regarded as a feature automatically extracted from data; the hidden variables learned in the feature coding step are input into the relationship classification step, and a gating attention mechanism is introduced to automatically distribute and fuse weights of the hidden state sequences, so that noise information is filtered out in the weighted fusion process, remarkable feature information is reserved, and more accurate relationship types are obtained through final classification output.
The invention comprises two stages: a training phase and a prediction phase. In the training stage, an initial model is defined, parameters in the model are initialized randomly, data with relation class labels are input into the model continuously in the training process, and the model learns the training data continuously in the training process to update the parameters. Meanwhile, the cross entropy between the prediction output of the model and the correct answer is used as a loss function to measure the prediction effect of the model, when the value of the loss function tends to be stable, the model effect is converged, the training is finished, and a trained relation extraction device is obtained; in the prediction stage, the data to be predicted is directly input into the trained relation extraction device to obtain the corresponding predicted entity relation.
Data preprocessing:
in this step, the main purpose is to convert the text in the input data into a distributed vector that can be read and manipulated by the computer, which contains implicit semantic information. Meanwhile, in order that a subsequent module can utilize multi-granularity information of words, words and word senses in the text, the step performs vector learning representation for all three language granularities.
For word vectors, the technique uses a commonly used word2vec algorithm in a large-scale corpus to train to obtain implicit feature representations of each word. The expression utilizes the context information in the large-scale corpus in which the words are positioned, and can fully embody the grammar and semantic information of the words.
For word vectors, generally speaking, the training method and the word vectors use the word2vec algorithm, but the word vectors are trained by taking words in texts as basic units, and the word vectors are trained by taking words as basic units after automatic word segmentation is performed on the texts by using a word segmentation tool; however, in this way, each word corresponds to only one fixed word vector, and word ambiguity is ignored, so the present invention chooses to vector represent word senses rather than words.
For a word sense vector, since it is not possible to directly distinguish from the word plane whether the word is a polysemous word, which word senses are present in each case, the word senses are modeled by means of an external semantic knowledge base knowledge network (HowNet). In HowNet, a variety of word senses and semaphores (the smallest unit representing a semantic meaning) of each word are manually and explicitly labeled, by which the word sense of each word is obtained and a word sense vector is trained. In this way, a word may be represented by multiple word sense vectors and thus may be input into subsequent modules, so that the model may dynamically select the most appropriate semantics of the current word in the current sentence during training, helping the model capture the more deep semantic information and features in the sentence.
Feature coding:
a neural network structure capable of effectively utilizing multi-granularity information features is realized in the feature coding step. Compared with a traditional Recurrent Neural Network (RNN) model, the LSTM can more flexibly and effectively process context information, store important information in input, forget invalid information in input and avoid the problems of gradient disappearance and gradient explosion easily encountered by a deep neural network. However, the traditional LSTM model cannot solve the problems of word segmentation ambiguity and polysemous word ambiguity of Chinese relation extraction, so that the invention carries out a series of improvements.
Firstly, in order to avoid the error transmission of a word segmentation tool, the invention takes characters as basic units, takes each sentence text as a character-level sequence as direct input, and inputs the sequence into a bidirectional LSTM unit to obtain a hidden state vector of the sequence; then, in order to simultaneously consider word-level information during the encoding process, for each word in a sentence, the present invention adds all words ending with that word in the sentence to the unit calculation of LSTM. Such as: the sentence "darwinian" studied the "cuckoo" word in all cuckoo "in which" cuckoo "is a word ending with" cuckoo ". All the words ending with the current word are then also fed into another bi-directional LSTM unit, and the word-level hidden state is computed. And finally, calculating the weights of the characters and the words by using a gate unit, and fusing the hidden states of the characters and the words by a weighted summation method to obtain a final hidden state vector of the current character, wherein the vector simultaneously contains information of the levels of the characters and the words.
Although the method can combine the information of the words and the words to effectively avoid the influence of word segmentation errors on the model, the existence of ambiguous words in sentences is not considered. For example, in the above example, "azalea" is an ambiguous word, and has two distinct semantics, namely "azalea" and "azalea bird". Thus, still further, the present invention incorporates the word senses of each word into the computation of the hidden state. Specifically, for each word ending with the current word, howNet is firstly queried to obtain all word sense vectors of the word, then the word sense vectors are input into a bidirectional LSTM unit like the word vector to calculate hidden states of word sense levels, and finally the states of all word sense levels are fused by using a weighted summation method to obtain the hidden states of the word. Compared with the prior art that the word hidden state is directly obtained by one word vector, the method can dynamically fuse and select the most suitable word meaning to form the word hidden state. After the hidden state of the word is obtained, the hidden states of the word and the word are fused in the same way as the previous method, and the final hidden state vector of the current word is obtained.
And (4) relation classification:
the step inputs the learnt sentence feature representation into a classifier to obtain a predicted relation category label. In the last module, the encoder learns the feature representation (hidden state vector) of each word, but since the relationship class is extracted in units of sentences, it is necessary to merge the feature representations of all word levels into corresponding sentence feature representations. The invention introduces a gated attention mechanism to automatically assign weights to the feature representation of the level words, and then performs weighted summation on all the words based on the weights to obtain the final sentence feature representation. The intuitive meaning of this method is that in a sentence, the degree of importance of each word is different, noise words or common words such as "of" and "of" should be given smaller weight, and keywords such as words in entities and words in verbs should correspond to higher attention, so that the sentence representation obtained by fusion is more accurate.
After the feature expression vectors of the sentences are obtained through fusion, the vectors are input into a full connection layer, and new vectors with dimensions of the total relation category number are obtained through mapping. And then, normalizing the new vector by a softmax function, so that each dimension value in the vector is a probability value in an interval of 0 to 1, and the probability that the sentence is classified into the relationship category corresponding to the current dimension is represented. For the training stage, defining a loss function of the model as a cross entropy between the normalized vector and an indication vector (one-hot) of the current correct relation category, and updating parameters in the model by a gradient descent method; and in the prediction stage, the currently predicted relationship category is the relationship category corresponding to the dimension with the maximum probability value in the normalized vector.
Example 2
As shown in fig. 2, the embodiment adopts the chinese relation extraction method provided by the present invention, which is defined as a given sentence s and two specified entities in the sentence, and determines what relation the two entities are in the sentence s. For example, given the sentence "darwinia research all azalea" and the named entities "darwinia" and "azalea", the goal is to determine what relationships "darwinia" and "azalea" are in that sentence.
Step 1, data preprocessing:
1.1 word level representation
S = { c) for a given input sequence 1 ,...,c M M has a number of characters, the invention uses word2vec method to combine each character (i-th one as example) c i Are all mapped into a word vector
Figure BDA0002127219480000101
Figure BDA0002127219480000102
Is a word vector of the ith character, R is a real number space, d c Is the dimension of the word vector. In addition to word vectors, the present technique employs position vectors to represent the relative position of a word to two entities. In particular, for the ith character c i In other words, its relative position to a given two entities can be expressed as
Figure BDA0002127219480000103
Figure BDA0002127219480000104
The calculation method of (2) is as follows:
Figure BDA0002127219480000105
here, b 1 And e 1 Are the start and end positions of the first entity,
Figure BDA0002127219480000111
are calculated by a computer
Figure BDA0002127219480000112
Almost exactly the same. In this way it is possible to obtain,
Figure BDA0002127219480000113
and
Figure BDA0002127219480000114
will be converted into a corresponding position vector of
Figure BDA0002127219480000115
And
Figure BDA0002127219480000116
for indicating a position characteristic of said word-level sequence, d p Representing the dimensions of the position vector.
In one embodiment of the invention, the input is defined as a sentence and two entities specified therein; in fact, if a sentence has multiple entities, the relation extraction is performed on every two entities, and the input result is the relation of the two currently specified entities in the sentence.
Thus, for the ith word c of the input i The final expression is to splice the common word vector and two position vectors, i.e. to obtain
Figure BDA0002127219480000117
At this time, the process of the present invention,
Figure BDA0002127219480000118
d=d c +2*d p d is the total dimension after the word vector and the position vector are spliced; the word representation of the input sequence becomes
Figure BDA0002127219480000119
And then sent to the subsequent encoding step.
1.2 representation of word level:
although the input to the model is a word-level sequence, to obtain word-level features, the present inventionThe invention performs word-level representation learning on all possible candidate words in the sentence. For an input sequence s, it can not only be expressed as s = { c =, for example 1 ,...,c M A word-level sequence of the form, which can also be expressed as s = { w = } 1 ,...,w M The word-level sequence of. In this section, the present invention uses a starting position b and an ending position e to denote a word, i.e., w b,e . Still by the word2vec method, the word w b,e Can be converted into word vectors
Figure BDA00021272194800001110
For each word, the invention obtains its set of word senses (w) from HowNet b,e ) Then, each word sense in the set, namely the word sense of Skip-Gram method
Figure BDA00021272194800001111
All are converted into a word sense vector
Figure BDA00021272194800001112
To represent a word sense alone. Thus, a word may be represented by a vector of word senses (assuming it has K word senses), i.e.
Figure BDA00021272194800001113
The word sense representation vector will be used in the training of the encoder module so that the model can dynamically utilize word sense information.
Step 2, feature coding:
a common word-level LSTM unit consists essentially of three gates: an input gate i for controlling which information is to be stored; a forgetting gate f for controlling which information is to be forgotten; an output gate o controls which information is to be output. For the jth word, the LSTM unit is calculated as follows:
Figure BDA0002127219480000121
Figure BDA0002127219480000122
Figure BDA0002127219480000123
Figure BDA0002127219480000124
Figure BDA0002127219480000125
Figure BDA0002127219480000126
where c denotes a cell unit which stores information of the sequence from the start to the current position. h represents a hidden state vector which is determined by the hidden state at the previous moment and the input at the current moment. U and b are the parameters to be learned in LSTM.
For a word w starting with a subscript b and ending with a subscript e b,e The word of which is represented as
Figure BDA0002127219480000127
In Lattice LSTM, a cell unit of a word
Figure BDA0002127219480000128
The calculation is as follows:
Figure BDA0002127219480000129
Figure BDA00021272194800001210
Figure BDA00021272194800001211
Figure BDA00021272194800001212
i.e. for a word c b The invention will first look for all words that begin with and match the external dictionary and then calculate the cell units c for those words w . On the basis, the invention expands the calculation of word sense level, namely, each word sense of each word is allocated with an additional LSTM unit for calculation. Mentioned in the presentation learning module, for the word w b,e The k-th sense of (1), its expression vector is
Figure BDA00021272194800001213
Thus, a sense level cell unit
Figure BDA00021272194800001214
The calculation process of (c) is as follows:
Figure BDA0002127219480000131
Figure BDA0002127219480000132
Figure BDA0002127219480000133
Figure BDA0002127219480000134
all the meaning cell units are then fused intoA word cell unit, such that the model takes into account word ambiguity information. The word cell unit after fusing a plurality of word senses is
Figure BDA0002127219480000135
To calculate
Figure BDA0002127219480000136
An additional gate mechanism needs to be introduced to control the contribution of each sense information:
Figure BDA0002127219480000137
the word cell state calculation method fusing a plurality of word sense information is as follows:
Figure BDA0002127219480000138
Figure BDA0002127219480000139
through the above calculation, for each word w b,e The model can calculate the cell state fused with multi-semantic information
Figure BDA00021272194800001310
Then for character c e The invention will each be represented by c e And the information of the ending words is fused to obtain a brand-new word-level cell state. The calculation method is as follows:
Figure BDA00021272194800001311
Figure BDA00021272194800001312
wherein,
Figure BDA00021272194800001313
and
Figure BDA00021272194800001314
is a normalized representation of the gate structure, and the calculation method is as follows:
Figure BDA00021272194800001315
Figure BDA0002127219480000141
after the calculation, the cell unit corresponding to each word fuses the information of the word and the word meaning level, and then the final hidden state vector of the word is calculated:
Figure BDA0002127219480000142
the final hidden state vector of the word is fed into the classifier, which synthesizes a sentence-level representation and then obtains the probability distribution of the answer.
And step 3, relation classification:
after the hidden state vector h of each word is learned, the invention adopts a word-level attention mechanism to fuse the hidden states of the word levels into a hidden state of a sentence level
Figure BDA0002127219480000143
Where d is h Is the dimension of the hidden state variable, and M is the length of the input sequence. h is a total of * Is a weighted sum of automatically assigned weights:
H=tanh(h)
α=softmax(wTH)
h * =hα T
where T represents the transpose, w is a parameter to be learned, α is a weight vector of H, H is a value of H transformed by a tanh function, and the tanh function maps the value of each dimension in H to the range of [ -1,1], which can effectively alleviate the problems of gradient explosion and the like in the training process.
Then h * Will be sent to a softmax classification layer, calculate the probability distribution of each classification:
o=Wh * +b
p(y|s)=softmax(o)
here, the
Figure BDA0002127219480000144
Is the transition matrix, b ∈ R Y Is a bias vector. Y denotes the total number of all classes and p (Y) denotes the probability of predicting a certain class.
For T training data, the entire training process will be optimized by the following cross entropy loss function:
Figure BDA0002127219480000151
here θ represents all parameters in the entire model that need to be trained. Meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed with 50% of probability in the training process (namely, one half of hidden layer nodes are not involved in calculation in each training process); the testing phase keeps all the trained neurons involved in the calculation.
All or part of the flow of the method of the embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a processor, to instruct related hardware to implement the steps of the embodiments of the methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims (6)

1. A Chinese relation extraction method is characterized by comprising the following steps:
s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text;
extracting the word-level distributed vectors comprises extracting word vectors and position vectors;
the word vector is: word-level sequence s = { c } for text given the input data 1 ,…,c M M has a character, each character c is divided into c by using the word2vec method i Are all mapped into a word vector
Figure FDA0003899097310000011
Wherein, c i Which represents the (i) th character,
Figure FDA0003899097310000012
is a word vector of the ith character, R being a real numberSpace, d c Is a dimension of the word vector;
the position vector represents the character c i To two entities P 1 And P 2 Relative position therebetween
Figure FDA0003899097310000013
Wherein,
Figure FDA0003899097310000014
the calculation method of (2) is as follows:
Figure FDA0003899097310000015
wherein, b 1 And e 1 Are the start and end positions of the first entity P1,
Figure FDA0003899097310000016
are calculated by a computer
Figure FDA0003899097310000017
Is the same as the calculation method of
Figure FDA0003899097310000018
And
Figure FDA0003899097310000019
is converted into a corresponding position vector of
Figure FDA00038990973100000110
And
Figure FDA00038990973100000111
for representing a position characteristic of said word-level sequence, d p A dimension representing a position vector;
the final representation of the word-level distributed vector is a concatenation of the word vector and the two position vectors, i.e.:
Figure FDA00038990973100000112
at this time, the process of the present invention,
Figure FDA00038990973100000113
d is the total dimension after the word vector and the position vector are spliced;
at this time, the representation of the word-level sequence of the text of the input data becomes
Figure FDA00038990973100000114
Extracting the word-level distributed vectors includes:
word-level sequence s = { c } for text given the input data 1 ,…,c M H and a word-level sequence s = { w = 1 ,…,w M Using start position b and end position e to represent a word, i.e. w b,e (ii) a Word w is divided by word2vec method b,e Distributed vector converted to word level
Figure FDA00038990973100000115
Obtaining each word w from the external semantic knowledge base knowledge network b,e Sense set of (w) b,e ) Each sense in the set of senses is
Figure FDA0003899097310000021
All converted into a word sense level distributed vector
Figure FDA0003899097310000022
Namely that
Figure FDA0003899097310000023
Wherein K is the word w b,e The number of sense of (a);
s2: feature encoding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a word and a hidden state vector of a word through the distributed vectors of the three levels of the word, the word and the word meaning, and further obtaining a final hidden state vector of the word level;
wherein, step S2 includes:
s21: directly inputting the character level sequence of the text of the input data into the bidirectional long-time and short-time memory network by taking characters as basic units to obtain the hidden state vector of the characters;
s22: acquiring all word sense vectors of words of a word level sequence of the input data with each word as a tail through an external semantic knowledge base known network, inputting the word sense vectors into the bidirectional long-time and short-time memory network to calculate word sense level hidden state vectors, and fusing all the word sense level hidden state vectors by using a weighted summation method to obtain the word hidden state vectors;
s23: calculating the weights of the character and the word by using a gate unit, and fusing the hidden state vector of the character and the hidden state vector of the word into a final hidden state vector of the character by a weighted sum method;
s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level.
2. The method for extracting chinese relation according to claim 1, wherein the step S21 comprises: the calculation process of inputting the jth word in the word level sequence of the text into the bidirectional long-short time memory network is as follows:
Figure FDA0003899097310000024
Figure FDA0003899097310000025
Figure FDA0003899097310000026
Figure FDA0003899097310000027
Figure FDA0003899097310000028
Figure FDA0003899097310000031
where i is an input gate for controlling which information is stored; f is a forgetting gate for controlling which information is to be forgotten; o is an output gate for controlling which information is to be output; c is a cell unit, U and b are parameters to be learned in the bidirectional long-short time memory network, h represents a hidden state vector and is determined by the hidden state at the last moment and the data input at the current moment.
3. The method of extracting chinese relation according to claim 2, wherein in step S22, for a word w starting with subscript b and ending with subscript e b,e The word is represented as
Figure FDA0003899097310000032
Inputting the cell unit of the word into the bidirectional long-and-short term memory network
Figure FDA0003899097310000033
The calculation is as follows:
Figure FDA0003899097310000034
Figure FDA0003899097310000035
Figure FDA0003899097310000036
Figure FDA0003899097310000037
for the word w b,e The kth sense of (b) represents a vector of
Figure FDA0003899097310000038
Cell unit of word sense level
Figure FDA0003899097310000039
The calculation process of (2) is as follows:
Figure FDA00038990973100000310
Figure FDA00038990973100000311
Figure FDA00038990973100000312
Figure FDA00038990973100000313
an additional gate mechanism is introduced to control the contribution of each sense information:
Figure FDA00038990973100000314
the word cell state calculation method fusing a plurality of word sense information is as follows:
Figure FDA00038990973100000315
Figure FDA00038990973100000316
all the meaning cell units are then fused into a word cell state
Figure FDA0003899097310000041
For character c e The calculation method is as follows:
Figure FDA0003899097310000042
Figure FDA0003899097310000043
wherein,
Figure FDA0003899097310000044
and
Figure FDA0003899097310000045
is a normalized representation of the gate structure, and its calculation method is as follows:
Figure FDA0003899097310000046
Figure FDA0003899097310000047
the cell unit corresponding to each word fuses the information of the word and the word meaning level, and then the final hidden state vector of the word is obtained:
Figure FDA0003899097310000048
the final hidden state vector of the word is fed into the classifier, which synthesizes a corresponding sentence-level feature representation.
4. The method of claim 3, wherein the hidden state vector at sentence level is a hidden state vector
Figure FDA0003899097310000049
h * Is calculated as follows:
H=tanh (h)
α=softmax(w T H)
h * =hα T
then h is * Will be sent to a softmax classification layer, calculate the probability distribution of each classification:
o=Wh * +b
p(y|s)=softmax(o)
for T training data, the entire training process will be optimized by the following cross entropy loss function:
Figure FDA00038990973100000410
wherein, d h Is the dimension of the hidden state variable, M is the length of the input sequence, R is the real number space, T represents the transposeW is a parameter to be learned, α is a weight vector of h,
Figure FDA0003899097310000051
is the transfer matrix, b ∈ R Y Is a bias vector, Y represents the total number of all classes, p (Y) represents the probability of predicting a certain class, and θ represents all parameters that need to be trained in the whole model.
5. The method of claim 4, wherein a dropout mechanism is used in the training process, and each neuron of the bidirectional long term memory network is turned off with a probability of 50% in the training process.
6. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN201910626307.5A 2019-07-11 2019-07-11 Chinese relation extraction method Active CN110334354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910626307.5A CN110334354B (en) 2019-07-11 2019-07-11 Chinese relation extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910626307.5A CN110334354B (en) 2019-07-11 2019-07-11 Chinese relation extraction method

Publications (2)

Publication Number Publication Date
CN110334354A CN110334354A (en) 2019-10-15
CN110334354B true CN110334354B (en) 2022-12-09

Family

ID=68146526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910626307.5A Active CN110334354B (en) 2019-07-11 2019-07-11 Chinese relation extraction method

Country Status (1)

Country Link
CN (1) CN110334354B (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948535B (en) * 2019-12-10 2022-06-14 复旦大学 Method and device for extracting knowledge triples of text and storage medium
CN111160017B (en) * 2019-12-12 2021-09-03 中电金信软件有限公司 Keyword extraction method, phonetics scoring method and phonetics recommendation method
CN111291556B (en) * 2019-12-17 2021-10-26 东华大学 Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN111061843B (en) * 2019-12-26 2023-08-25 武汉大学 Knowledge-graph-guided false news detection method
CN111274394B (en) * 2020-01-16 2022-10-25 重庆邮电大学 An entity relationship extraction method, device, device and storage medium
CN111428505B (en) * 2020-01-17 2021-05-04 北京理工大学 An Entity Relationship Extraction Method Based on Recognition Features of Trigger Words
CN111274794B (en) * 2020-01-19 2022-03-18 浙江大学 Synonym expansion method based on transmission
CN111709240A (en) * 2020-05-14 2020-09-25 腾讯科技(武汉)有限公司 Entity relationship extraction method, device, device and storage medium thereof
CN111783418B (en) * 2020-06-09 2024-04-05 北京北大软件工程股份有限公司 Chinese word meaning representation learning method and device
CN111859978B (en) * 2020-06-11 2023-06-20 南京邮电大学 Deep learning-based emotion text generation method
CN111680510B (en) * 2020-07-07 2021-10-15 腾讯科技(深圳)有限公司 Text processing method and device, computer equipment and storage medium
CN112015891B (en) * 2020-07-17 2025-01-14 山东师范大学 Method and system for classifying messages on online government inquiry platforms based on deep neural networks
CN112380872B (en) * 2020-11-27 2023-11-24 深圳市慧择时代科技有限公司 Method and device for determining emotion tendencies of target entity
CN112560487A (en) * 2020-12-04 2021-03-26 中国电子科技集团公司第十五研究所 Entity relationship extraction method and system based on domestic equipment
CN112883153B (en) * 2021-01-28 2023-06-23 北京联合大学 Relation Classification Method and Device Based on Information Enhanced BERT
CN113239663B (en) * 2021-03-23 2022-07-12 国家计算机网络与信息安全管理中心 Multi-meaning word Chinese entity relation identification method based on Hopkinson
CN112883194B (en) * 2021-04-06 2024-02-20 讯飞医疗科技股份有限公司 A symptom information extraction method, device, equipment and storage medium
CN113051371B (en) * 2021-04-12 2023-02-07 平安国际智慧城市科技股份有限公司 Chinese machine reading understanding method and device, electronic equipment and storage medium
CN113326676B (en) * 2021-04-19 2024-09-20 北京快确信息科技有限公司 Method for establishing deep learning model for structuring financial text into form
CN113392648B (en) * 2021-06-02 2022-10-18 北京三快在线科技有限公司 Entity relationship acquisition method and device
CN114372125A (en) * 2021-12-03 2022-04-19 北京北明数科信息技术有限公司 Government affair knowledge base construction method, system, equipment and medium based on knowledge graph
CN114579695A (en) * 2022-01-20 2022-06-03 杭州量知数据科技有限公司 Event extraction method, device, equipment and storage medium
CN115169326B (en) * 2022-04-15 2024-07-19 长河信息股份有限公司 Chinese relation extraction method, device, terminal and storage medium
CN115034302B (en) * 2022-06-07 2023-04-11 四川大学 Relation extraction method, device, equipment and medium for optimizing information fusion strategy

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080275694A1 (en) * 2007-05-04 2008-11-06 Expert System S.P.A. Method and system for automatically extracting relations between concepts included in text
CN108733792A (en) * 2018-05-14 2018-11-02 北京大学深圳研究生院 A kind of entity relation extraction method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080275694A1 (en) * 2007-05-04 2008-11-06 Expert System S.P.A. Method and system for automatically extracting relations between concepts included in text
CN108733792A (en) * 2018-05-14 2018-11-02 北京大学深圳研究生院 A kind of entity relation extraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification;Peng Zhou et al.;《Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics》;20160807;第207-212页 *
基于深度学习的中文实体关系抽取方法;孙紫阳 等;《计算机工程》;20180930;第44卷(第9期);第164-170页 *

Also Published As

Publication number Publication date
CN110334354A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN110334354B (en) Chinese relation extraction method
Yao et al. An improved LSTM structure for natural language processing
CN114510570B (en) Intention classification method, device and computer equipment based on small sample corpus
CN111078836B (en) Machine reading understanding method, system and device based on external knowledge enhancement
WO2023024412A1 (en) Visual question answering method and apparatus based on deep learning model, and medium and device
CN109726389B (en) A Chinese missing pronoun completion method based on common sense and reasoning
CN111611810B (en) A polyphone pronunciation disambiguation device and method
CN110096711B (en) Natural language semantic matching method for sequence global attention and local dynamic attention
CN115859164B (en) Building entity identification and classification method and system based on prompt
CN110609891A (en) A Visual Dialogue Generation Method Based on Context-Aware Graph Neural Network
CN109214006B (en) A Natural Language Inference Method for Image Enhanced Hierarchical Semantic Representation
CN110532558B (en) Multi-intention recognition method and system based on sentence structure deep parsing
CN117033602A (en) A method of constructing a multi-modal user mental perception question and answer model
CN107358948A (en) Language in-put relevance detection method based on attention model
CN112541356A (en) Method and system for recognizing biomedical named entities
CN114492441A (en) BilSTM-BiDAF named entity identification method based on machine reading understanding
CN111428525A (en) Implicit discourse relation identification method and system and readable storage medium
CN107590127A (en) A kind of exam pool knowledge point automatic marking method and system
CN114692615B (en) Small sample intention recognition method for small languages
CN110162789A (en) A kind of vocabulary sign method and device based on the Chinese phonetic alphabet
CN118227791A (en) A method for predicting learning outcomes of MOOC learners based on multi-level enhanced contrastive learning
US11941360B2 (en) Acronym definition network
CN110019795A (en) The training method and system of sensitive word detection model
US20240354638A1 (en) Named Entity Recognition System based on Enhanced Label Embedding and Curriculum Learning
CN115906846B (en) A document-level named entity recognition method based on hierarchical feature fusion of two graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant