CN110334354B

CN110334354B - Chinese relation extraction method

Info

Publication number: CN110334354B
Application number: CN201910626307.5A
Authority: CN
Inventors: 丁宁; 李自然; 郑海涛; 刘知远; 沈颖
Original assignee: Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Graduate School Tsinghua University
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2022-12-09
Anticipated expiration: 2039-07-11
Also published as: CN110334354A

Abstract

The invention provides a Chinese relation extraction method, which comprises the following steps: s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text; s2: feature coding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a character and a hidden state vector of a word through the distributed vectors of the three levels of the character, the word and the word meaning, and further obtaining a final hidden state vector of the word level; s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level. The method effectively solves the problems of word segmentation ambiguity and polysemous word ambiguity, greatly improves the expression of the model on the relation extraction task, and improves the accuracy and the robustness of Chinese relation extraction.

Description

Chinese relation extraction method

Technical Field

The invention relates to the technical field of computer application, in particular to a Chinese relation extraction method.

Background

Natural language processing is a sub-discipline of artificial intelligence, as well as a cross-discipline of computer science and computational linguistics. Among them, the relation extraction is one of the basic tasks in the natural language processing field. The goal is to accurately find the relationships between entities for a given sentence and well-labeled entities (generally, nouns). The relation extraction technology can be used for constructing a large-scale knowledge graph, and the knowledge graph is a semantic network consisting of concepts, entities, entity attributes and entity relations and is a structural representation of the real world. The construction of large-scale knowledge maps can provide comprehensive and structured external knowledge for artificial intelligence systems, thereby developing more enhanced applications.

The traditional relation extraction tasks have certain problems, and the traditional relation extraction tasks are often characterized by manual work, so that the model can effectively run on a small-range specific data set, and the method limits the development of the relation extraction field.

Meanwhile, due to the dependence on manual characteristics, the traditional relation extraction technology has poor robustness and expandability, so that the model cannot be generalized on different data and linguistic data.

In recent years, deep learning-based relationship extraction has been greatly advanced, and these methods have many advantages over conventional relationship extraction methods. Firstly, due to the application of the neural network, the models can automatically learn the semantic features of the text, so that the problem that the features are manually designed aiming at specific data is avoided, the labor cost is reduced, and a better effect is achieved. This neural network model provides an end-to-end solution that minimizes human involvement. Meanwhile, the model based on the neural network also has higher robustness, and can learn mapping from different characteristics to output aiming at the ever-changing natural language.

However, even deep learning models face some unsolved problems. For languages without natural separators like chinese, the current approach is to implement the mainstream approach at word level or word level. The input sequence of the former is input into the model by taking characters as units, and the method can make the model difficult to learn the word-level characteristics in the semantic space, thereby causing insufficient information and reducing the accuracy of the relation extraction task; the latter is to use word segmentation tool to segment words of input sequence and then input the segmented words into model, although the method can consider word level information, the phenomenon of word segmentation ambiguity is easy to generate by means of external word segmentation tool, so that the error of external tool can be spread in the whole model, and the development of relation extraction task is limited. In addition, no matter the model is in a word level or a word level, the polysemous phenomenon of words is not considered, only one word vector is used for representing word characteristics, and the strategy can not process the polysemous phenomenon of words, so that the upper limit of the model is reduced.

Disclosure of Invention

The invention provides a Chinese relation extraction method, aiming at solving the problems of word segmentation ambiguity and polysemous word ambiguity in Chinese relation extraction in the prior art.

In order to solve the above problems, the technical solution adopted by the present invention is as follows:

a Chinese relation extraction method comprises the following steps: s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text; s2: feature coding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a character and a hidden state vector of a word through the distributed vectors of the three levels of the character, the word and the word meaning, and further obtaining a final hidden state vector of the word level; s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vectors of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level.

Preferably, extracting the distributed vectors at the word level comprises extracting a word vector and a location vector; the word vector is: word-level sequence s = { c } for text given the input data ₁ ，...，c _M M has a number of characters, each character c is divided into c by using the word2vec method _i Are all mapped into a word vector

Wherein, c _i Which represents the (i) th character,

is a word vector of the ith character, R is a real number space, d ^c Is a dimension of the word vector; the position vector represents the character c _i To two entities P ¹ And P ² Relative position therebetween

Wherein,

the calculation method of (2) is as follows:

wherein, b ¹ And e ¹ Are the start and end positions of the first entity P1,

are calculated by

Is the same as the calculation method of

And

is converted into a corresponding position vector of

And

for representing a position characteristic of said word-level sequence, d ^p A dimension representing a position vector;

the final representation of the word-level distributed vector is a concatenation of the word vector and the two position vectors, i.e.:

at this time, the process of the present invention,

d＝d ^c +2*d ^p d is the total dimension after the word vector and the position vector are spliced;

at this time, the representation of the word-level sequence of the text of the input data becomes

Preferably, extracting the word-level distributed vector comprises: word-level sequence of text s = { c) for a given said input data ₁ ，...，c _M And word-level sequence s = { w = } ₁ ，...，w _M Using start position b and end position e to represent a word, i.e. w _b，e (ii) a Word w is divided by word2vec method _b，e Distributed vector converted to word level

Preferably, each word w is obtained from the external semantic knowledge base knowledge network _b，e Sense set of (w) _b，e ) Each sense in the set of senses is

All converted into a word sense level distributed vector

Namely, it is

Wherein K is the word w _b，e The number of sense of words of (1).

Preferably, step S2 comprises: s21: directly inputting the character level sequence of the text of the input data into the bidirectional long-time and short-time memory network by taking characters as basic units to obtain the hidden state vector of the characters; s22: acquiring all word sense vectors of words of a word level sequence of the input data with each word as a tail through an external semantic knowledge base known network, inputting the word sense vectors into the bidirectional long-time and short-time memory network to calculate word sense level hidden state vectors, and fusing all the word sense level hidden state vectors by using a weighted summation method to obtain the word hidden state vectors; s23: and calculating the weights of the word and the word by using a gate unit, and fusing the hidden state vector of the word and the hidden state vector of the word into a final hidden state vector of the word by a weighted sum method.

Preferably, step S21 includes: the calculation process of inputting the jth word in the word level sequence of the text into the bidirectional long-short time memory network is as follows:

where i is an input gate for controlling which information is stored; f is a forgetting gate for controlling which information is to be forgotten; o is an output gate for controlling which information is to be output; c is a cell unit, U and b are parameters to be learned in the bidirectional long-short time memory network, h represents a hidden state vector and is determined by the hidden state of the previous moment and the data input of the current moment.

Preferably, in step S22, for words w starting with a subscript b and ending with a subscript e _b，e The word is represented as

Cell unit of the word input into the bidirectional long-and-short term memory network

The calculation is as follows:

for the word w _b，e The kth sense of (b) represents a vector of

Cell unit of word sense level

The calculation process of (c) is as follows:

an additional gate mechanism is introduced to control the contribution of each word sense information:

the word cell state calculation method fusing a plurality of word sense information is as follows:

all the meaning cell units are then fused into a word cell state

For character c _e The calculation method is as follows:

wherein,

and

is a normalized representation of the gate structure, and the calculation method is as follows:

the cell unit corresponding to each word fuses the information of the word and the word meaning level, and then the final hidden state vector of the word is obtained:

the final hidden state vector of the word is fed into the classifier, which synthesizes a corresponding sentence-level feature representation.

Preferably, the sentence-level hidden state vector

h ^* Is calculated as follows:

H＝tanh(h)

α＝softmax(w ^T H)

h ^* ＝hα ^T

then h ^* Will be sent to a softmax classification layer, calculate the probability distribution of each classification:

o＝Wh ^* +b

p(y|s)＝softmax(o)

for T training data, the entire training process will be optimized by the following cross entropy loss function:

wherein, d _h Is the dimension of a hidden state variable, M is the length of the input sequence, R is the real space, T represents the transpose, w is a parameter to be learned, a is the weight vector of h,

is the transfer matrix, b ∈ R ^Y Is a bias vector, Y represents the total number of all classes, p (Y) represents the probability of predicting a certain class, and θ represents all parameters that need to be trained in the whole model.

Preferably, a dropout mechanism is adopted in the training process, and each neuron of the bidirectional long-term memory network is closed with a probability of 50% in the training process.

The invention also provides a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth in any one of the above.

The invention has the beneficial effects that: the method for extracting the Chinese relation is provided, the text of the input data is subjected to pre-training processing of multi-granularity information to extract three levels of distributed vectors of characters, words and word senses in the text, so that semantic features can be automatically learned, and the manual participation degree is greatly reduced; the method can effectively solve the problems of word segmentation ambiguity and polysemous word ambiguity, greatly improves the expression of the model on the relation extraction task, and improves the accuracy and robustness of Chinese relation extraction.

Drawings

FIG. 1 is a diagram illustrating a method for extracting Chinese relationships according to an embodiment of the present invention.

FIG. 2 is a flow chart illustrating a Chinese relationship extraction method according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. The connection may be for fixing or for circuit connection.

It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings to facilitate the description of the embodiments of the invention and to simplify the description, and are not intended to indicate or imply that the device or element so referred to must have a particular orientation, be constructed in a particular orientation, and be constructed in a particular manner of operation, and are not to be construed as limiting the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.

Example 1

As shown in FIG. 1, the present invention provides a Chinese relationship extraction method. The method comprises the following steps:

s1: data preprocessing: pre-training multi-granularity information on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text;

s2: feature encoding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector through the distributed vectors of the three levels of the characters, the words and the word senses, and further obtaining a final hidden state vector of the character level;

s3: and (4) relation classification: and learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a hidden state vector of a sentence level by adopting the attention mechanism of the word level.

The data preprocessing step is to perform multi-granularity information pre-training processing on a text of input data to extract three levels of distributed vectors of characters, words and word senses in the text, the traditional pre-training processing usually only uses one word vector to correspondingly represent one word, and in the invention, a distributed word sense vector is generated for each word sense of each word.

In the step of feature coding, a lattice long-time memory network of a multilayer path is realized, so that semantic information of multiple levels is effectively utilized to learn a hidden state variable of a word level, and the variable can be regarded as a feature automatically extracted from data; the hidden variables learned in the feature coding step are input into the relationship classification step, and a gating attention mechanism is introduced to automatically distribute and fuse weights of the hidden state sequences, so that noise information is filtered out in the weighted fusion process, remarkable feature information is reserved, and more accurate relationship types are obtained through final classification output.

The invention comprises two stages: a training phase and a prediction phase. In the training stage, an initial model is defined, parameters in the model are initialized randomly, data with relation class labels are input into the model continuously in the training process, and the model learns the training data continuously in the training process to update the parameters. Meanwhile, the cross entropy between the prediction output of the model and the correct answer is used as a loss function to measure the prediction effect of the model, when the value of the loss function tends to be stable, the model effect is converged, the training is finished, and a trained relation extraction device is obtained; in the prediction stage, the data to be predicted is directly input into the trained relation extraction device to obtain the corresponding predicted entity relation.

Data preprocessing:

in this step, the main purpose is to convert the text in the input data into a distributed vector that can be read and manipulated by the computer, which contains implicit semantic information. Meanwhile, in order that a subsequent module can utilize multi-granularity information of words, words and word senses in the text, the step performs vector learning representation for all three language granularities.

For word vectors, the technique uses a commonly used word2vec algorithm in a large-scale corpus to train to obtain implicit feature representations of each word. The expression utilizes the context information in the large-scale corpus in which the words are positioned, and can fully embody the grammar and semantic information of the words.

For word vectors, generally speaking, the training method and the word vectors use the word2vec algorithm, but the word vectors are trained by taking words in texts as basic units, and the word vectors are trained by taking words as basic units after automatic word segmentation is performed on the texts by using a word segmentation tool; however, in this way, each word corresponds to only one fixed word vector, and word ambiguity is ignored, so the present invention chooses to vector represent word senses rather than words.

For a word sense vector, since it is not possible to directly distinguish from the word plane whether the word is a polysemous word, which word senses are present in each case, the word senses are modeled by means of an external semantic knowledge base knowledge network (HowNet). In HowNet, a variety of word senses and semaphores (the smallest unit representing a semantic meaning) of each word are manually and explicitly labeled, by which the word sense of each word is obtained and a word sense vector is trained. In this way, a word may be represented by multiple word sense vectors and thus may be input into subsequent modules, so that the model may dynamically select the most appropriate semantics of the current word in the current sentence during training, helping the model capture the more deep semantic information and features in the sentence.

Feature coding:

a neural network structure capable of effectively utilizing multi-granularity information features is realized in the feature coding step. Compared with a traditional Recurrent Neural Network (RNN) model, the LSTM can more flexibly and effectively process context information, store important information in input, forget invalid information in input and avoid the problems of gradient disappearance and gradient explosion easily encountered by a deep neural network. However, the traditional LSTM model cannot solve the problems of word segmentation ambiguity and polysemous word ambiguity of Chinese relation extraction, so that the invention carries out a series of improvements.

Firstly, in order to avoid the error transmission of a word segmentation tool, the invention takes characters as basic units, takes each sentence text as a character-level sequence as direct input, and inputs the sequence into a bidirectional LSTM unit to obtain a hidden state vector of the sequence; then, in order to simultaneously consider word-level information during the encoding process, for each word in a sentence, the present invention adds all words ending with that word in the sentence to the unit calculation of LSTM. Such as: the sentence "darwinian" studied the "cuckoo" word in all cuckoo "in which" cuckoo "is a word ending with" cuckoo ". All the words ending with the current word are then also fed into another bi-directional LSTM unit, and the word-level hidden state is computed. And finally, calculating the weights of the characters and the words by using a gate unit, and fusing the hidden states of the characters and the words by a weighted summation method to obtain a final hidden state vector of the current character, wherein the vector simultaneously contains information of the levels of the characters and the words.

Although the method can combine the information of the words and the words to effectively avoid the influence of word segmentation errors on the model, the existence of ambiguous words in sentences is not considered. For example, in the above example, "azalea" is an ambiguous word, and has two distinct semantics, namely "azalea" and "azalea bird". Thus, still further, the present invention incorporates the word senses of each word into the computation of the hidden state. Specifically, for each word ending with the current word, howNet is firstly queried to obtain all word sense vectors of the word, then the word sense vectors are input into a bidirectional LSTM unit like the word vector to calculate hidden states of word sense levels, and finally the states of all word sense levels are fused by using a weighted summation method to obtain the hidden states of the word. Compared with the prior art that the word hidden state is directly obtained by one word vector, the method can dynamically fuse and select the most suitable word meaning to form the word hidden state. After the hidden state of the word is obtained, the hidden states of the word and the word are fused in the same way as the previous method, and the final hidden state vector of the current word is obtained.

And (4) relation classification:

the step inputs the learnt sentence feature representation into a classifier to obtain a predicted relation category label. In the last module, the encoder learns the feature representation (hidden state vector) of each word, but since the relationship class is extracted in units of sentences, it is necessary to merge the feature representations of all word levels into corresponding sentence feature representations. The invention introduces a gated attention mechanism to automatically assign weights to the feature representation of the level words, and then performs weighted summation on all the words based on the weights to obtain the final sentence feature representation. The intuitive meaning of this method is that in a sentence, the degree of importance of each word is different, noise words or common words such as "of" and "of" should be given smaller weight, and keywords such as words in entities and words in verbs should correspond to higher attention, so that the sentence representation obtained by fusion is more accurate.

After the feature expression vectors of the sentences are obtained through fusion, the vectors are input into a full connection layer, and new vectors with dimensions of the total relation category number are obtained through mapping. And then, normalizing the new vector by a softmax function, so that each dimension value in the vector is a probability value in an interval of 0 to 1, and the probability that the sentence is classified into the relationship category corresponding to the current dimension is represented. For the training stage, defining a loss function of the model as a cross entropy between the normalized vector and an indication vector (one-hot) of the current correct relation category, and updating parameters in the model by a gradient descent method; and in the prediction stage, the currently predicted relationship category is the relationship category corresponding to the dimension with the maximum probability value in the normalized vector.

Example 2

As shown in fig. 2, the embodiment adopts the chinese relation extraction method provided by the present invention, which is defined as a given sentence s and two specified entities in the sentence, and determines what relation the two entities are in the sentence s. For example, given the sentence "darwinia research all azalea" and the named entities "darwinia" and "azalea", the goal is to determine what relationships "darwinia" and "azalea" are in that sentence.

Step 1, data preprocessing:

1.1 word level representation

S = { c) for a given input sequence ₁ ，...，c _M M has a number of characters, the invention uses word2vec method to combine each character (i-th one as example) c _i Are all mapped into a word vector

Is a word vector of the ith character, R is a real number space, d ^c Is the dimension of the word vector. In addition to word vectors, the present technique employs position vectors to represent the relative position of a word to two entities. In particular, for the ith character c _i In other words, its relative position to a given two entities can be expressed as

The calculation method of (2) is as follows:

here, b ¹ And e ¹ Are the start and end positions of the first entity,

are calculated by a computer

Almost exactly the same. In this way it is possible to obtain,

and

will be converted into a corresponding position vector of

And

for indicating a position characteristic of said word-level sequence, d ^p Representing the dimensions of the position vector.

In one embodiment of the invention, the input is defined as a sentence and two entities specified therein; in fact, if a sentence has multiple entities, the relation extraction is performed on every two entities, and the input result is the relation of the two currently specified entities in the sentence.

Thus, for the ith word c of the input _i The final expression is to splice the common word vector and two position vectors, i.e. to obtain

At this time, the process of the present invention,

d＝d ^c +2*d ^p d is the total dimension after the word vector and the position vector are spliced; the word representation of the input sequence becomes

And then sent to the subsequent encoding step.

1.2 representation of word level:

although the input to the model is a word-level sequence, to obtain word-level features, the present inventionThe invention performs word-level representation learning on all possible candidate words in the sentence. For an input sequence s, it can not only be expressed as s = { c =, for example ₁ ，...，c _M A word-level sequence of the form, which can also be expressed as s = { w = } ₁ ，...，w _M The word-level sequence of. In this section, the present invention uses a starting position b and an ending position e to denote a word, i.e., w _b，e . Still by the word2vec method, the word w _b，e Can be converted into word vectors

For each word, the invention obtains its set of word senses (w) from HowNet _b，e ) Then, each word sense in the set, namely the word sense of Skip-Gram method

All are converted into a word sense vector

To represent a word sense alone. Thus, a word may be represented by a vector of word senses (assuming it has K word senses), i.e.

The word sense representation vector will be used in the training of the encoder module so that the model can dynamically utilize word sense information.

Step 2, feature coding:

a common word-level LSTM unit consists essentially of three gates: an input gate i for controlling which information is to be stored; a forgetting gate f for controlling which information is to be forgotten; an output gate o controls which information is to be output. For the jth word, the LSTM unit is calculated as follows:

where c denotes a cell unit which stores information of the sequence from the start to the current position. h represents a hidden state vector which is determined by the hidden state at the previous moment and the input at the current moment. U and b are the parameters to be learned in LSTM.

For a word w starting with a subscript b and ending with a subscript e _b，e The word of which is represented as

In Lattice LSTM, a cell unit of a word

The calculation is as follows:

i.e. for a word c ^b The invention will first look for all words that begin with and match the external dictionary and then calculate the cell units c for those words ^w . On the basis, the invention expands the calculation of word sense level, namely, each word sense of each word is allocated with an additional LSTM unit for calculation. Mentioned in the presentation learning module, for the word w _b，e The k-th sense of (1), its expression vector is

Thus, a sense level cell unit

The calculation process of (c) is as follows:

all the meaning cell units are then fused intoA word cell unit, such that the model takes into account word ambiguity information. The word cell unit after fusing a plurality of word senses is

To calculate

An additional gate mechanism needs to be introduced to control the contribution of each sense information:

through the above calculation, for each word w _b，e The model can calculate the cell state fused with multi-semantic information

Then for character c _e The invention will each be represented by c _e And the information of the ending words is fused to obtain a brand-new word-level cell state. The calculation method is as follows:

wherein,

and

after the calculation, the cell unit corresponding to each word fuses the information of the word and the word meaning level, and then the final hidden state vector of the word is calculated:

the final hidden state vector of the word is fed into the classifier, which synthesizes a sentence-level representation and then obtains the probability distribution of the answer.

And step 3, relation classification:

after the hidden state vector h of each word is learned, the invention adopts a word-level attention mechanism to fuse the hidden states of the word levels into a hidden state of a sentence level

Where d is _h Is the dimension of the hidden state variable, and M is the length of the input sequence. h is a total of ^* Is a weighted sum of automatically assigned weights:

H＝tanh(h)

α＝softmax(wTH)

h ^* ＝hα ^T

where T represents the transpose, w is a parameter to be learned, α is a weight vector of H, H is a value of H transformed by a tanh function, and the tanh function maps the value of each dimension in H to the range of [ -1,1], which can effectively alleviate the problems of gradient explosion and the like in the training process.

o＝Wh ^* +b

p(y|s)＝softmax(o)

here, the

Is the transition matrix, b ∈ R ^Y Is a bias vector. Y denotes the total number of all classes and p (Y) denotes the probability of predicting a certain class.

here θ represents all parameters in the entire model that need to be trained. Meanwhile, in order to prevent overfitting, a dropout mechanism is adopted in the training process, and each neuron is closed with 50% of probability in the training process (namely, one half of hidden layer nodes are not involved in calculation in each training process); the testing phase keeps all the trained neurons involved in the calculation.

All or part of the flow of the method of the embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a processor, to instruct related hardware to implement the steps of the embodiments of the methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. A Chinese relation extraction method is characterized by comprising the following steps:

extracting the word-level distributed vectors comprises extracting word vectors and position vectors;

the word vector is: word-level sequence s = { c } for text given the input data ₁ ,…,c _M M has a character, each character c is divided into c by using the word2vec method _i Are all mapped into a word vector

Wherein, c _i Which represents the (i) th character,

is a word vector of the ith character, R being a real numberSpace, d ^c Is a dimension of the word vector;

the position vector represents the character c _i To two entities P ¹ And P ² Relative position therebetween

Wherein,

the calculation method of (2) is as follows:

wherein, b ¹ And e ¹ Are the start and end positions of the first entity P1,

are calculated by a computer

Is the same as the calculation method of

And

is converted into a corresponding position vector of

And

at this time, the process of the present invention,

d is the total dimension after the word vector and the position vector are spliced;

Extracting the word-level distributed vectors includes:

word-level sequence s = { c } for text given the input data ₁ ,…,c _M H and a word-level sequence s = { w = ₁ ,…,w _M Using start position b and end position e to represent a word, i.e. w _b,e (ii) a Word w is divided by word2vec method _b,e Distributed vector converted to word level

Obtaining each word w from the external semantic knowledge base knowledge network _b,e Sense set of (w) _b,e ) Each sense in the set of senses is

All converted into a word sense level distributed vector

Namely that

Wherein K is the word w _b,e The number of sense of (a);

s2: feature encoding: taking a bidirectional long-and-short-term memory network as a basic framework, obtaining a hidden state vector of a word and a hidden state vector of a word through the distributed vectors of the three levels of the word, the word and the word meaning, and further obtaining a final hidden state vector of the word level;

wherein, step S2 includes:

s21: directly inputting the character level sequence of the text of the input data into the bidirectional long-time and short-time memory network by taking characters as basic units to obtain the hidden state vector of the characters;

s22: acquiring all word sense vectors of words of a word level sequence of the input data with each word as a tail through an external semantic knowledge base known network, inputting the word sense vectors into the bidirectional long-time and short-time memory network to calculate word sense level hidden state vectors, and fusing all the word sense level hidden state vectors by using a weighted summation method to obtain the word hidden state vectors;

s23: calculating the weights of the character and the word by using a gate unit, and fusing the hidden state vector of the character and the hidden state vector of the word into a final hidden state vector of the character by a weighted sum method;

2. The method for extracting chinese relation according to claim 1, wherein the step S21 comprises: the calculation process of inputting the jth word in the word level sequence of the text into the bidirectional long-short time memory network is as follows:

where i is an input gate for controlling which information is stored; f is a forgetting gate for controlling which information is to be forgotten; o is an output gate for controlling which information is to be output; c is a cell unit, U and b are parameters to be learned in the bidirectional long-short time memory network, h represents a hidden state vector and is determined by the hidden state at the last moment and the data input at the current moment.

3. The method of extracting chinese relation according to claim 2, wherein in step S22, for a word w starting with subscript b and ending with subscript e _b,e The word is represented as

Inputting the cell unit of the word into the bidirectional long-and-short term memory network

The calculation is as follows:

for the word w _b,e The kth sense of (b) represents a vector of

Cell unit of word sense level

The calculation process of (2) is as follows:

an additional gate mechanism is introduced to control the contribution of each sense information:

all the meaning cell units are then fused into a word cell state

For character c _e The calculation method is as follows:

wherein,

and

is a normalized representation of the gate structure, and its calculation method is as follows:

4. The method of claim 3, wherein the hidden state vector at sentence level is a hidden state vector

h ^* Is calculated as follows:

H＝tanh (h)

α＝softmax(w ^T H)

h ^* ＝hα ^T

then h is ^* Will be sent to a softmax classification layer, calculate the probability distribution of each classification:

o＝Wh ^* +b

p(y|s)＝softmax(o)

wherein, d _h Is the dimension of the hidden state variable, M is the length of the input sequence, R is the real number space, T represents the transposeW is a parameter to be learned, α is a weight vector of h,

5. The method of claim 4, wherein a dropout mechanism is used in the training process, and each neuron of the bidirectional long term memory network is turned off with a probability of 50% in the training process.

6. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.