CN110321432A - Textual event information extracting method, electronic device and non-volatile memory medium - Google Patents
Textual event information extracting method, electronic device and non-volatile memory medium Download PDFInfo
- Publication number
- CN110321432A CN110321432A CN201910548427.8A CN201910548427A CN110321432A CN 110321432 A CN110321432 A CN 110321432A CN 201910548427 A CN201910548427 A CN 201910548427A CN 110321432 A CN110321432 A CN 110321432A
- Authority
- CN
- China
- Prior art keywords
- text
- rule
- participle
- event information
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to technical field of information processing, in order to solve technical solution that event information in the prior art the extracts technical problem not high there are accuracy rate, the present invention provides a kind of first aspect present invention and provides a kind of textual event information extracting method, this method comprises: carrying out participle division to text, and term vector is obtained after participle done vector conversion, and term vector is input to neural network model, export entity;Based on the information type of text formatting characterizing definition, according to the associative mode rule of grammar definition, by the associative mode rule of participle and entity according to grammar definition in text block, text block after being organized into structuring;Event information extraction process is carried out to the text block after structuring, realizes keywording using the associative mode rule of grammar definition, and keyword is output in result template.Therefore, by neural network deep learning with rule combine in the way of, configuration event extract model, realize textual event information accurate extraction.
Description
Technical field
The present invention relates to event information extractive technique necks in technical field of information processing more particularly to text mining research
Domain, and in particular, to a kind of textual event information extracting method, electronic device and non-volatile memory medium.
Background technique
It is one of task most challenging in text mining research that event information, which extracts, it is intended to utilize computer from text
In automatically extract certain types of event and its element, event information extracts the key technology as field of information processing,
The fields such as information retrieval, automatic question answering, autoabstract, data mining, text mining have a wide range of applications.
Event information extracts current research and experiment, and summing up mainly has three classes: (1), rule-based text thing
Part extracts, and has using the canonical system of such methods: Ex Disco, Gen PAM etc..(2), the text thing based on trigger word detection
Part extracts, and core is the determination of trigger word detection and event argument and its role, and trigger word is can to state out certain well
The word of class event center meaning;For example, the words such as " appointment ", " resignation " in post variation event.(3) it is based on probability statistics mould
The Text Information Extraction of type, such as taken out with all domains of the hidden Markov model to computer Scientific Articles header information
It takes.
Although there are many research that statistical model is used for information extraction in this, in these researchs, data field to be extracted is all
A very compact sequence can be regarded as, and the statement of event does not often have this feature in text, needs to extract
Data field is dispersion, sparse, and the domain to be extracted that has is even apart from event statement center (where can be regarded as trigger word
Position) there is a certain distance;To need to be improved in accuracy rate.
Summary of the invention
In order to solve technical solution that event information in the prior art the extracts technical problem not high there are accuracy rate, this hair
It is bright that a kind of textual event information extracting method, electronic device and non-volatile memory medium are provided, utilize neural network depth
The mode combined with rule is practised, configuration event extracts model, realizes the accurate extraction of textual event information.
To achieve the goals above, technical solution provided by the invention includes:
First aspect present invention provides a kind of textual event information extracting method, which is characterized in that the described method includes:
Text is pre-processed, the pretreatment includes participle division being carried out to text, and participle is done vector conversion
After obtain term vector, and the term vector is input to neural network model, passes through the neural network model and export entity;
Piecemeal processing is carried out to text, obtains text block, this block sort extraction process of composing a piece of writing of going forward side by side, the text block sort
Extraction process includes: the information type based on text formatting characterizing definition, will be described according to the associative mode rule of grammar definition
Participle and entity in text block is regular according to the associative mode of the grammar definition, the text block after being organized into structuring;
Event information extraction process, the event information extraction process are carried out to the text block after structuring in the text
Associative mode rule including using the grammar definition realizes keywording, and keyword is output to event information and is extracted
In corresponding result template.
The embodiment of the present invention is preferably carried out in mode, when text is resume, the resume piecemeal processing
It afterwards include corresponding first text block of essential information, education experience corresponds to the second text block, work experience corresponds to third text block, training
Instruction undergoes corresponding 5th text block of corresponding 4th text block, credentials, corresponding 6th text block of job hunting wish;The format is special
Sign respectively includes essential information, education experience, work experience, training experience, credentials, the corresponding information spy of job hunting wish
Sign;The pattern rules of the grammar definition include fixed according to morphological analysis, syntactic analysis and the semantic analysis in Fundamentals of Compiling
The judgment rule of justice.
The embodiment of the present invention is preferably carried out in mode, described to carry out participle to divide including: in core lexicon to text
In tissue, using the method for even numbers group trie tree;For overlap type segmentation ambiguity, the method combined using rule with statistics;
For unknown word identification, using the recognition methods based on condition random field.
The embodiment of the present invention is preferably carried out in mode, and the deep neural network model includes Embedding layers, two-way
RNN layers and CRF layers;Described Embedding layers will obtain term vector after participle progress vector conversion, be sequentially sent to two-way
RNN layers, the probability distribution of participle label is obtained, the probability distribution of the participle label is sent into described CRF layers, it is corresponding to obtain entity
Entity tag sequence.
The embodiment of the present invention is preferably carried out in mode, the pattern rules be it is revisable, it is described rule match confidence
Breath extracts model, can be configured respectively according to different application scenarios;And generic letter is provided in the pattern rules
Breath, the Text Pretreatment further include context rule analysis in line of text, and context rule, which is analyzed, in the line of text includes
To text carry out participle cutting and Entity recognition as a result, be modified using scheduled rule regulating method to word segmentation result,
Ambiguous generic is identified again.
The embodiment of the present invention is preferably carried out in mode, and the pattern rules include abbreviation merging rule, and will be complicated
Length rule be placed on front, simple short rule is put behind.
The embodiment of the present invention is preferably carried out in mode, and the participle and entity by the text block is according to corresponding
Pattern rules, the text block after being organized into structuring include: to judge whether continuous row meets specifically using pattern rules sequence
Mode, and after completion text meets corresponding AD HOC, the pattern rules result that Jiang Gehang is matched is stored in character string
In the Multidimensional numerical of type.
The embodiment of the present invention is preferably carried out in mode, and the configuration information of the rule extracts model using rule description language
Speech NPRDL carries out expression writing, and NPRDL language is using BNF form;And the description language is based on complex characteristic
The means of collection describe the grammatical and semantic information of vocabulary, while using the dynamic described based on set of complex features in dynamic analysis
Attribute list describes.
Second aspect of the present invention also provides a kind of electronic device characterized by comprising
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor with reality
The now method as described in any one of first aspect offer.
Third aspect present invention also provides a kind of non-volatile memory medium, is stored thereon with computer program, feature
It is, which is performed the step of realizing any one the method such as first aspect offer.
Using above-mentioned technical proposal provided by the invention, can obtain it is following the utility model has the advantages that
1, using the mathematical model of neural network, participle cutting and Entity recognition is carried out to text, can quickly obtain text
The mode of fundamental in this, binding pattern rule carries out the extraction of text block classification information to text, by point in text block
Word and entity are regular according to the associative mode of grammar definition, and the text block after being organized into structuring is mentioned in this way with being more conducive to information
The mode taken by text information according to the grammatical expression formula structuring of computer language requirement, and text sections structuring handle
On the basis of carry out Event Distillation again, effectively solve data dispersion, it is sparse and extract domain apart from event statement center it is farther away
Problem, the accurate extraction of such textual event information;And Entity recognition includes being identified using deep neural network model,
Improve recognition effect.
It 2, include generic information in pattern rules as preferred embodiment, so by introducing external make by oneself
The thesaurus of justice, allows the more convenient to use of pattern rules.
3, the basis of pattern rules is revisable, for example, expression writing is carried out using rule description language NPRDL,
NPRDL language is using BNF form;For different application scene, flexibly model can be extracted by quick configuration information.
4, complicated length rule is placed on front, simple short rule is put behind;From front to back due to rule match
It carries out, avoids it is possible that because not traversing, and causing matching mistake first with the rule match of front success, subsequent
The technical issues of losing.
The other feature and advantage of invention will illustrate in the following description, also, partly become aobvious from specification
And it is clear to, or understood by implementing technical solution of the present invention.The objectives and other advantages of the invention can be by illustrating
Specifically noted structure and/or process are achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
Fig. 1 provides a kind of flow chart of textual event information extracting method for the embodiment of the present invention.
Fig. 2 provides the pretreated flow chart of text in a kind of textual event information extracting method for the embodiment of the present invention.
Fig. 3 provides the process that text block sort is extracted in a kind of textual event information extracting method for the embodiment of the present invention
Figure.
Fig. 4 provides the flow chart that event information extracts in a kind of textual event information extracting method for the embodiment of the present invention.
Fig. 5 provides a kind of structural block diagram of textual event information extracting device for the embodiment of the present invention.
Fig. 6 provides a kind of structural block diagram of electronic device for the embodiment of the present invention.
Specific embodiment
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and examples, how to apply to the present invention whereby
Technological means solves technical problem, and the realization process for reaching technical effect can fully understand and implement.It needs to illustrate
, these specific descriptions only allow those of ordinary skill in the art to be more easier, clearly understand the present invention, rather than to this hair
Bright limited explanation;And if conflict is not constituted, each spy in each embodiment and each embodiment in the present invention
Sign can be combined with each other, and it is within the scope of the present invention to be formed by technical solution.
In addition, step shown in the flowchart of the accompanying drawings can be in the control system of a such as group controller executable instruction
Middle execution, although also, logical order is shown in flow charts, and it in some cases, can be to be different from herein
Sequence executes shown or described step.
Below by the drawings and specific embodiments, technical solution of the present invention is described in detail:
Embodiment
In order to solve the technical solution that event information extracts in the prior art, there are model foundation difficulty or accuracy rate be not high
The technical issues of, the present embodiment proposes a kind of event extraction method and apparatus, first text pre-processed with neural network,
Piecemeal is carried out to text again, corresponding event information is extracted from different text blocks.
As shown in Figure 1, the present embodiment provides a kind of textual event information extracting methods, this method comprises:
S110, text is pre-processed, pretreatment includes participle division being carried out to text, and participle is done vector conversion
After obtain term vector, and term vector is input to neural network model, passes through neural network model and export entity.
Text Pretreatment in the present embodiment further includes text before carrying out participle cutting and Entity recognition to text
Format alignment: removing reservation one for continuous space, will include the row segmentation of multiple " keyword "+": "+" slot information " structure
For uniline;The double byte character of all appearance in the text is converted into half-angle character, by the capitalization English of all appearance in the text
Text mother is converted to small English alphabet.
The present embodiment is preferably carried out in mode, carries out participle to divide including: to adopt in the tissue of core lexicon to text
With the method for even numbers group trie tree;For overlap type segmentation ambiguity, the method combined using rule with statistics;For being not logged in
Word identification, using the recognition methods based on condition random field.
As shown in Fig. 2, the present embodiment is preferably carried out in mode, deep neural network model includes Embedding layers, double
To RNN layers and CRF layers;Embedding layers will participle carry out vector conversion after obtain term vector, be sequentially sent to RNN layers two-way, obtain
To the probability distribution of participle label, the probability distribution for segmenting label is sent into CRF layers, obtains the corresponding entity tag sequence of entity.
I.e. in the present embodiment preferred embodiment, the method combined is segmented with statistics using Dictionary based segment using participle:
1), in the organizational aspects of core lexicon, it is contemplated that the time efficiency of dictionary lookup, the space efficiency of storage, Chinese
The features such as statistical law, using the method for even numbers group trie tree;The full name in English of even numbers group trie is Double Array
Trie is one of trie tree simple and effectively realize, be made of two integer arrays, one is base [], the other is
check[];If array index is i, if base [i], check [i] is 0, indicates the position for sky;If base [i] is
Negative value indicates that the state is word;Check [i] indicates the previous state of the state, t=base [i]+a, check [t]=i.
2), ambiguity is eliminated and unregistered word is two big difficult points of Chinese word segmentation, for overlap type segmentation ambiguity, uses rule
The method then combined with statistics.
3) it, is directed to unknown word identification, using the recognition methods based on condition random field.
4) it, is directed to part-of-speech tagging, using the method based on Hidden Markov Model.
Specifically, Entity recognition is date, the time, phone, name, place name, mechanism name to all appearance in the text
Etc. being identified and marked (i.e. entity can also regard the participle after belonging to a kind of particular procedure as).Using based on deep learning
Method carries out Entity recognition, and Entity recognition task is regarded the classification problem based on sequence labelling, the depth mind that this method uses
It mainly include Embedding layers (mainly having term vector, character vector and some additional features) through network, it is RNN layers two-way,
TANH hidden layer and last CRF layer are constituted;Here RNN is often with LSTM or GRU.As shown in Fig. 2, Text Pretreatment is specific
Include:
1), text dividing is at individual character, English word, number, punctuation mark, in Embedding layer building term vector.
2) it is sent into two-way LSTM model after, constructing term vector, LSTM model can export a cutting according to list entries
Sequence label.
3) Soft max function, is used in the output end of LSTM, obtains the probability distribution of participle label.
4) probability distribution of cutting sequence label, is sent into CRF model, obtains optimal entity tag sequence;Here it mentions
And " optimal " result just for the sake of indicating the entity tag sequence obtained by CRF model is good, not has to it
The limitation of body.
In the present embodiment preferred embodiment, Text Pretreatment further includes context rule analysis, text in line of text
Row in context rule analysis include text is carried out participle cutting and Entity recognition as a result, using scheduled rule regulating side
Method is modified word segmentation result, is identified again to ambiguous generic.And generic information is available refers to below
Pattern rules.
Since current participle, Entity recognition are impossible to reach 100% accuracy rate, so needing through scheduled rule
Bearing calibration is modified word segmentation result, is identified again to ambiguous generic.The tool of scheduled rule regulating method
Body is realized, such as can use ngram model or customized grammar rule carrys out the Chinese word segmentation of disambiguation.
In the present embodiment, naming the main task of Entity recognition is to identify the proprietary names such as name, place name in text
With the numeral classifier phrases such as significant time, date and sorted out.
It should be noted that the text referred in the present embodiment includes but is not limited to: content is the text of plain text format,
Or the text with editable text and text, the form of expression of text include but is not limited to that webpage format, server store
Document format etc..
S120, piecemeal processing is carried out to text, obtains text block, this block sort extraction process of composing a piece of writing of going forward side by side, text block point
Class extraction process includes: the information type based on text formatting characterizing definition, will be literary according to the associative mode rule of grammar definition
Participle and entity in this block is regular according to the associative mode of grammar definition, the text block after being organized into structuring.
The present embodiment is preferably carried out in mode, includes base after the processing of resume piecemeal when text is resume
This information corresponds to the first text block, education undergoes corresponding second text block, work experience to correspond to third text block, training experience pair
Answer corresponding 5th text block of the 4th text block, credentials, corresponding 6th text block of job hunting wish;Format character respectively includes base
This information, education experience, work experience, training experience, credentials, the corresponding information characteristics of job hunting wish;Grammar definition
Pattern rules include the judgment rule defined according to morphological analysis, syntactic analysis and the semantic analysis in Fundamentals of Compiling.
The present embodiment is preferably carried out in mode, and pattern rules are that revisable, regular configuration information extracts model, energy
It is enough to be configured respectively according to different application scenarios;And generic information is provided in pattern rules, Text Pretreatment also wraps
It includes context rule in line of text to analyze, context rule analysis includes carrying out text participle cutting and entity knowledge in line of text
It is other as a result, be modified using scheduled rule regulating method to word segmentation result, ambiguous generic is identified again.
And pattern rules can general idea be interpreted as (or application scenarios) according to different modes, adjustment extract event information extract model
Corresponding rule (or using different rules based on different scenes).Rule in the present embodiment can be conditional statement expression
Formula is also possible to a variety of input and output injection tables etc., and the identification of different mode can be based on text itself application scenarios come really
Recognize, can also be realized by keyword in automatic identification text;The present embodiment is not limited, these different implementations
Mode belongs to the protection scope of the present embodiment.
The present embodiment is preferably carried out in mode, and pattern rules include abbreviation merging rule, and complicated length is regular
It is placed on front, simple short rule is put behind.Complicated length rule is placed on front, after simple short rule is placed on
Face;Since rule match carries out from front to back, avoid it is possible that first with the success of the rule match of front, it is subsequent then because
It does not traverse, and causes the technical issues of it fails to match.
The present embodiment is preferably carried out in mode, by text block participle and entity according to corresponding pattern rules, it is whole
Text block after managing into structuring includes: to judge whether continuous row meets specific mode using pattern rules sequence, and complete
After meeting corresponding AD HOC at text, the multidimensional number for the pattern rules result deposit character string type that Jiang Gehang is matched
In group.Hereafter explanation can be further explained in detail in conjunction with Fig. 3.
Wherein, text block sort extraction process is established on the basis of revisable pattern rules, for text formatting
Feature and the rule of judgement information type defined, mode is exactly the format character collection of various information.It can be led by the definition syntax
This pattern rules out, its explanation can borrow the side of morphological analysis, syntactic analysis and semantic analysis in Fundamentals of Compiling
Method.
It include generic information in pattern rules as the present embodiment preferred embodiment, so external by introducing
The thesaurus that can customize, allows the more convenient to use of pattern rules.
In the present embodiment, pattern rules formal definitions:
<MODE name>:<generic 1>[&, | ,-, ^, ()]<generic 2>[&, | ,-, ^, ()] ... [&, | ,-, ^, ()]<class
Category n >
: right-hand component is by the regular grammatical defined and institutes such as (&) or (|), non-(-), exclusive or (^), parantheses (())
The expression formula of composition.
Piecemeal rule is for sequentially judging whether continuous row meets AD HOC;
Such as: x_y_0_S_1_z;x_y_1_M_n_z;X_y_2_E_1_N- > SUCCESS, meaning are: x_y_0_S_1_
Z, x_y_1_M_n_z, x_y_2_E_1_N are the item name in piecemeal respectively, and rule is by vocabulary generic and operator group
At, & is logical AND, | be logic or,-be logic NOT, it is as follows:
x_y_0_S_1_z:“key1_&loc-key0”
Then the meaning of rule match is: " if key1_ and loc generic occurs in line of text, and key0 generic does not go out
It is existing, then meet x_y_0_S_1_z classification mode ".
The meaning of item name is:
X is the matched priority number of each information category, and since (0 ...) 1, the smaller then priority of priority number is bigger;
Y is the subscript of rule, since (0 ...) 0, while also indicating that matched priority number in same information category;
The 1st is the subconditional serial number of rule after x_y_, is incremented by since 0, to reduce complexity, limits serial number < 5;
2nd is the subconditional relative position of rule, and ' S' be to start, ' M' is centre, ' E' is to terminate;
3rd is that the sub- condition of rule limits matched line number, 1~9 natural number or be n- without limitation.
Z indicates the information category that the row need to mark, and since (0 ...) 1, N is default event information category.
If row k matching meets x_y_0_S_1_z, kth+1 meets x_y_1_M_n_z, kth+1+n+1 to kth+1+n row
Row meets x_y_2_E_1_N, then kth to kth+1+n+1 row successful match, and the information category of label kth to kth+1+n row is 1.
The value of x, y, z should be in corresponding predefined scope, and otherwise program will stop piecemeal.
For convenience of regular expression, defining " SYS_TRUE " indicates logical truth.
Abbreviation merges the overlapping and redundancy that rule can solve rule;Before complicated length rule should be placed on by pattern rules
Simple short rule is put behind in face.Specifically as shown in figure 3, the process that text block sort is extracted includes:
S121, beginning execute text block sort extraction process.
S122, pattern classification judge that current text row belongs to and which mould are able to satisfy that is, according to the pattern rules being arranged in advance
Formula rule, i.e. match pattern rule ": right-hand component ".
S123, judge whether text terminates or do not match any rule, i.e. whether all rows of current text execute
It completes or the style of writing does not originally match any rule, if so, executing step S124, otherwise, return and execute S122.
The mode that S124, Jiang Gehang are matched analyzes line of text, and match pattern rule ": left-hand component "
(that is: x_y_0_S_1_z classification results) are stored in three-dimensional array Cn [a] [b] [c].Subscript a corresponds to the 1st x of classification (that is: each letter
Cease the priority number of categorical match), subscript b corresponds to the 2nd y of classification (that is: regular subscript), and subscript c corresponds to classification the 3rd and (that is: advises
Then subconditional serial number).Last 3 that rule condition is deposited in Cn [a] [b] [c].Cn [a] [b] [c] [0] saves the sub- condition of rule
Relative position, Cn [a] [b] [c] [1] save defined by match line number, Cn [a] [b] [c] [2] saves what the row need to mark
Information category.The step completes the rule match of all styles of writing originally, and category result is stored in Cn [a] [b] [c].
S125, judging whether Cn [a] [b] [c] is the end of text: reading in Cn [a] [b] [c] i.e. since n=0, n is cumulative,
Until the end of text;If Cn [a] [b] [c] is NULL, indicates the end of text, jump to S129.
S126, the value for judging [a] [b] [c] [0] Cn: it indicates that the rule match of the category terminates if it is " E ", jumps to
S128;It either " M " indicates that the rule match of the category starts or intermediate if it is " S ", jumps to S127.
S127, judge whether the value of b reaches boundary: reaching boundary if it is b, indicate that category matching terminates, be not then
Row classification, directly processing n+1 row.If b does not reach boundary, b+1, c+1 continue with the subsequent rule of the category,
S126 is jumped to continue to judge.
S128, nominated bank category result: matching multirow from " S " to " E " success, or matching uniline " E " successfully, by these
Row merges and is Cn [a] [b] [c] [2] by piecemeal category label.Then n+1 handles next line.
S129, matching terminate.
S130, event information extraction process, event information are carried out respectively to the text block of all categories after structuring in text
Extraction process includes realizing keywording using the associative mode rule of grammar definition, and keyword is output to event information
It extracts in corresponding result template.
Therefore textual event information extracting method provided in this embodiment, using the mathematical model of neural network, to text
Participle cutting and Entity recognition are carried out, can quickly obtain the fundamental in text, the mode of binding pattern rule is to text
The extraction of text block classification information is carried out, by the associative mode rule of participle and entity according to grammar definition in text block, arrangement
At the text block after structuring, by text information according to the text of computer language requirement in this way in a manner of being more conducive to information extraction
Method expression formula structuring, and Event Distillation is carried out again on the basis of text sections structuring processing, effective solution data dispersion,
The problem of center is stated farther out apart from event in sparse and extraction domain, the accurate extraction of such textual event information;And entity
Identification includes being identified using deep neural network model, improves recognition effect.
The present embodiment is preferably carried out in mode, and regular configuration information extracts model using regular description language NPRDL
Expression writing is carried out, NPRDL language is using BNF form;And description language be the means based on set of complex features come
The grammatical and semantic information of vocabulary is described, while being retouched in dynamic analysis using the dynamic attribute list described based on set of complex features
It states.
Therefore, as the present embodiment preferred embodiment, the basis of pattern rules is revisable, for example, using rule
Then description language NPRDL carries out expression writing, and NPRDL language is using BNF form;It, can be with for different application scene
Flexibly quickly configuration information extracts model.In addition, complicated length rule is placed on front, after simple short rule is placed on
Face;Since rule match carries out from front to back, avoid it is possible that first with the success of the rule match of front, it is subsequent then because
It does not traverse, and causes the technical issues of it fails to match.
Specifically, the specific implementation of all kinds of keywords of outgoing event is extracted in text block: utilizing various information itself
Format character, the mainly tissue signature of various information identified using the method for rule.Such as: identification " school ", Yi Zhongke
The case where to utilize is " place "+" school " (or " university ", " institute " etc.), such example such as: " Peking University ", identification
The case where " organization ", one kind can use is " take office in " (or " holding a post in " etc.)+other information+" company " (or " collection
Group ", " office " etc.), such example such as: " taking office in Beijing TRS company ".
More specifically, including: to text block progress event information extraction process in text
One), regular description language NPRDL
Rule in system carries out expression writing using rule description language NPRDL, NPRDL language using
BNF form.The basic unit of NPRDL language is<rule>, and one<simple rule>is effectively equivalent to one of natural language
Simply " conditional clause ", citation form are as follows:
<test>=><operation>
Meaning: if<test>success, executes<operation>.
<test>and<operation>are by function division, due to identical (the i.e. unified input sentence of analysis of the two referent
Son), the form of following structure chart can be unified into:
In actual analysis,<structural formula>can be used to a segment of expression analysis sentence, wherein<structure item>and sentence
In ' word ' it is corresponding.The sequential organization being made of<item label><item operates><Xiang Yuansu>, can be used to express related to this word
The test or operation of attribute.
The description object of NPRDL language is using word as the Chinese sentence of basic object unit and its intermediate structure (such as grammer
Tree, semantic concept space etc.).Including:
1, the information of each word indicates are as follows: concept (attribute 1, attribute 2 ..., attribute n).Wherein concept everyday expressions sheet
Body indicates.
2, in the analysis process, using certain word as center word and there is close phrase, attributes (such as the phrase class, sentence such as clause
Class, tone etc.), the attribute of the centre word can be attributed to.
3, the relationship between concept and concept, may be expressed as:
Relationship
Concept --- > concept
One sentence (or in which a part), is represented by<word>sequence of following form:<word>+<word>+...+<word>.
In rule description, the formula that deserves to be called is<structural formula>, wherein each<word>is known as one<structure item>.<word>sequence is corresponding<knot
Structure formula>are as follows:<structure item>+<structure item>+...+<structure item>.
(1) each<structure item>corresponding one<word>, content include:
<item label>: to point out position of the word in sentence.
<item operation>: to point out the operation to the word in relation to attribute.
<Xiang Yuansu>: certain attributes to express the word.
(2) symbol '+' is structure connector, indicates that its former and later two<structure items>have adjacency and succession.
(3) two<structure item>use with set membership ↑ (representing father) or ↓ (representing son) indicates the two in not
In same hierarchical structure.
Example: ^ (VV, 2033)+^ ↓ # (NN, 111, SUBJECT)=> (^ ↓ #.GRELA:=AGT)
Meaning: if current word is (verb, thought), and current word son be (noun, people, upper verb
Subject), then the case relation of the son of current word is revised as agentive case.
There are three features for the rule description language:
1, descriptive power is strong, and description language is the means based on set of complex features to describe the grammatical and semantic information of vocabulary,
It is described in dynamic analysis using the dynamic attribute list described based on set of complex features simultaneously, it thus can be from multi-level, multi-party
Face describes the information of Chinese language text unit of analysis.
2, computer disposal, description language offer tree abundant, the movement of net atomic operation, for computer disposal point are provided
The syntax tree that is formed in analysis, semantic operation are very convenient, while the inquiry to static attribute list information, modification and deleting also provides
Atomic operation movement abundant.
3, it is convenient for rules for writing, not only descriptive power is strong for rule language, and description is very careful.For many descriptions
The lesser language phenomenon of fineness ratio, method that completely available word, part of speech, Classification of Speech code etc. uniquely describe describe.And by
It is write intuitively, conveniently for a user in regular using description thought design.
Two), contextual analysis rule
The rule of keywording can be indicated with following Bacchus formula:
Rule: :=^ test=> movement
Test: :=test formula;{ test formula;}
Test formula: :=n, test item (& | | |+) test item | n ,~{ & test item }
Test item: :=attribute (=|!=) ' attribute value '
Attribute: :=lex | class
Attribute value: :=vocabulary | generic character
Movement: :=action type;Action type }
Action type: :=n, n, generic character, part of speech
Wherein:
(1) ^ indicates that rule starts symbol
(2)~indicate some node pointer structure
(3) n indicates natural number, and specific number indicates node number, is 1-9
(4) lex and class instruction attribute value thereafter is specific entry or specific generic, if it is lex, then
Thereafter attribute value is specific ' entry ';If it is class, then attribute value thereafter is specific ' generic '.
(5) generic character indicates the generic in dictionary, such as: org.
Three), specific applicating example
The corresponding modes rule for analyzing date of birth (birthday) is as follows:
^1, lex=' be born in ';1, class='time';=> 2,2, birthday, n
The entry for meaning the 1st node is " being born in ", and the generic of the 2nd node is " time ", then by the class of the 2nd node
Category is identified as " birthday ".
Four), the merging of pattern rules and tissue
Rule can gradually expand, it is also possible to be overlapped, so having redundancy, abbreviation merging rule can solve this and ask
Topic, such as:
^1, class='time';N ,~&lex!=' $';1, class='from';=> 1,1,106, n
And ^1, class='time';N ,~&lex!=' $';1, class='to';=> 1,1,106, n
This two rule can be merged into a rule, using the property of the pattern rules syntax, meet associative law and distribution
Rule, the rule after merging are as follows:
^1, class='time';N ,~&lex!=' $';1, class='from'| class='to';=> 1,1,
106,n
This keeps logic tighter.
Because rule match carries out from front to back, may first with the rule match of front success, and it is subsequent then
Because not traversing and it fails to match, there may come a time when to will cause mistake.So pattern rules should put complicated length rule
In front, simple short rule is put behind.
As shown in figure 4, event information extraction includes: in textual event information extracting method in the present embodiment
S131, confirmation text and extraction type, such as the judgement defined for text formatting feature is referred to based on front
The rule of information type.
S132, open rule file, i.e., opening configuration information extract model file.
S133, a line character string is obtained from text.
S134, Text Pretreatment, including participle cutting and Entity recognition are carried out to text.
S135, it is packed into text container, by this line character string after participle cutting and Entity recognition according to pretreated format
It saves.
S136, use pattern rule piecemeal extract, i.e., carry out text block to text in the way of referring in aforementioned S120
It divides and text block sort extraction process.
S137, each class keywords are marked using rule-interpreter analysis according to piecemeal classification.
S138, keyword is output in result template.
S139, judge the end of text, text is if it is standardized or is exported (S140);Otherwise, return is
S133。
As shown in figure 5, the present embodiment also provides a kind of textual event information extracting device 100, text event information is mentioned
The device 100 is taken to include:
Text Pretreatment module 110 is arranged to pre-process text, and pretreatment includes carrying out participle to text to draw
Point, and term vector is obtained after participle is done vector conversion, and term vector is input to neural network model, pass through neural network
Model exports entity.
Text block sort extraction process module 120 is arranged to carry out piecemeal processing to text, obtains text block, and carry out
Text block sort extraction process, text block sort extraction process include: the information type based on text formatting characterizing definition, according to
The associative mode rule of grammar definition, by the associative mode rule of participle and entity according to grammar definition in text block, arrangement
At the text block after structuring.
Event information extraction process module 130 is arranged to mention the text block progress event information after structuring in text
Processing is taken, event information extraction process includes the associative mode rule realization keywording using grammar definition, and crucial
Word is output to event information and extracts in corresponding result template.
It should be noted that the specific of text-processing in textual event information extracting device 100 provided in this embodiment
Process is identical with the above-mentioned textual event information extracting method combined using statistics with rule, and can also obtain identical skill
Art effect;Details are not described herein.
In order to which those skilled in the art are easier to understand the technical solution of the present embodiment, below with corresponding with resume text
For event information extracts, expansion specific description is extracted to text information;Assuming that need to extract essential information, education experience,
Work experience, training experience, credentials, job hunting six sports such as wish, totally tens attribute informations.Specific textual event letter
Breath extracts
One), Text Pretreatment
1. defining event information extracts content
INFOTYPE1TResume_TRS# essential information
1_00IgnoreFlag#
1_01TrueName# job hunter's name in fact
1_02Email# E-mail address
1_03Mobel# phone number
1_04Phone# telephone number
1_05Sex_s# gender
…
INFOTYPE2TEducation_TRS# education experience
2_00E_StartTime_s# educates from date
2_01E_StartTime_d#
2_02E_EndTime_s# educates the Close Date
2_03E_EndTime_d#
2_04E_SchoolName# school title
…
INFOTYPE3TWorkExperierence_TRS# work experience
The 3_00InductionDate_s# job initiation date
3_01InductionDate_d#
3_02DimissionDate_s# works the Close Date
3_03DimissionDate_d#
3_04CompanyName# Business Name
…
2. defining thesaurus
It, can be by extracting some keywords to artificial observation for convenience of writing for rule.These words may be cut
It is divided into a word, it is also possible to be cut into several words.Be added into class.GB and to added after each entry t and
Generic name, such as:
Name kr1#kr1_01
Email kr1#kr1_02
Phone number kr1#kr1_03
Phone kr1#kr1_04
Gender kr1#kr1_05
Wed no kr1#kr1_07
…
3. text participle and Entity recognition
Participle and Entity recognition are carried out to text, generic can be automatically given to each word segmentation result.Class built in system
Category includes: name (name), loc (place name), org (mechanism name), id (identification card number), digit (numerical value) etc..
Two), text block sort
According to the information extraction content of definition, need resume text being divided into six pieces.By taking essential information piecemeal as an example, write
Pattern rules are as follows:
1_00_0_S_1_1:kr1&kru-(bdfhm|bdfhk)
1_00_1_E_n_1:(SYS_TRUE-kru)|(kru&(bdfhm|bdfhk)-kr9)
1_01_0_S_1_1:kr1&kru&bdfhm
1_01_1_E_n_1:(SYS_TRUE-kru)|(kru-bdfhm-kr9)
1_02_0_S_1_1:kr1&kru&bdfhk
1_02_1_E_n_1:(SYS_TRUE-kru)|(kru-bdfhk-kr9)
1_03_0_S_1_1:kr1&(bdfhm|kru)
1_03_1_E_n_1:SYS_TRUE-(bdfhm|kru)
1_04_0_S_1_1:english-(kr0|kr1|kr2|kr3|kr4|kr5|kr6|kr7|kr8|kr9)
1_04_1_E_n_1:kr1-kru
Three), information extraction
By taking essential information as an example, information extraction rule is write as follows:
1) is according to generic Direct Recognition
^1, class='name';=> 1,1,1_01, n
It indicates, is that name is identified as resume name by generic.
2) based on context keyword recognition, such as:
^1, class=' kr1_01 ';
1, class=' bdfhm ';
N ,~&class!=' bdfh'&class!=' kr';
1, class='row'| class='kr';
=> 3,3,1_01, n
Above-mentioned expression formula indicates: with " kr1_01+ colon " beginning, behind until encountering line feed or other keywords,
The multiple nodes for then identifying the centre are resume name.
As shown in fig. 6, the present embodiment also provides a kind of electronic device, comprising:
Memory 210;
Processor 220;And
Computer program;
Wherein, computer program stores in memory 210, and is configured as being executed by processor 220 to realize as above
Any one textual event information extracting method of offer is provided.
In addition, the present embodiment also provides a kind of non-volatile memory medium, it is stored thereon with computer program, the computer
Program, which is performed, realizes any one textual event information extraction side combined using statistics with rule as provided above
The step of method.
Those of ordinary skill in the art will appreciate that: it is above-mentioned according to the method for the embodiment of the present invention can be in hardware, firmware
Realize, or be implemented as the software being storable in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk) or
Computer code, or the original storage of network downloading is implemented through in long-range recording medium or nonvolatile machine readable media
In and the computer code that will be stored in local recording medium, so that method described herein can be stored in using general
It is such in computer, application specific processor or the programmable or recording medium of specialized hardware (such as ASIC, FPGA or SoC)
Software processing.It is appreciated that computer, processor, microprocessor controller or programmable hardware are soft including that can store or receive
The storage assembly (for example, RAM, ROM, flash memory etc.) of part or computer code, when the software or computer code by computer,
When processor or hardware access and execution, processing method described herein is realized.In addition, when general purpose computer access for realizing
When the code for the processing being shown here, the execution of code, which is converted to general purpose computer, is used to execute the special of the processing being shown here
Use computer.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and method and step can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The range of the embodiment of the present invention.
Finally, it should be noted that above description is only highly preferred embodiment of the present invention, not the present invention is appointed
What formal limitation.Anyone skilled in the art, it is without departing from the scope of the present invention, all available
The way and technology contents of the disclosure above make many possible variations and simple replacement etc. to technical solution of the present invention, these
Belong to the range of technical solution of the present invention protection.
Claims (10)
1. a kind of textual event information extracting method, which is characterized in that the described method includes:
Text is pre-processed, the pretreatment includes participle division being carried out to text, and obtain after participle is done vector conversion
It is input to neural network model to term vector, and by the term vector, entity is exported by the neural network model;
Piecemeal processing is carried out to text, obtains text block, this block sort extraction process of composing a piece of writing of going forward side by side, the text block sort is extracted
Processing includes: the information type based on text formatting characterizing definition, according to the associative mode rule of grammar definition, by the text
Participle and entity in block is regular according to the associative mode of the grammar definition, the text block after being organized into structuring;
Event information extraction process is carried out to the text block after structuring in the text, the event information extraction process includes
Keywording is realized using the associative mode rule of the grammar definition, and keyword is output to event information and extracts correspondence
Result template in.
2. the method according to claim 1, wherein when text be resume when, the resume piecemeal
It include corresponding first text block of essential information after processing, education experience corresponds to the second text block, work experience corresponds to third text
Corresponding 4th text block of block, training experience, corresponding 5th text block of credentials, corresponding 6th text block of job hunting wish;It is described
Format character respectively includes essential information, education experience, work experience, training experience, credentials, the corresponding letter of job hunting wish
Cease feature;The pattern rules of the grammar definition include according to morphological analysis, the syntactic analysis in Fundamentals of Compiling, and semanteme point
Analyse the judgment rule of definition.
3. the method according to claim 1, wherein described carry out participle to divide including: in core word to text
In the tissue of allusion quotation, using the method for even numbers group trie tree;For overlap type segmentation ambiguity, the side combined using rule with statistics
Method;For unknown word identification, using the recognition methods based on condition random field.
4. the method according to claim 1, wherein the deep neural network model include Embedding layers,
Two-way RNN layers and CRF layers;Described Embedding layers will obtain term vector after participle progress vector conversion, be sequentially sent to double
To RNN layers, the probability distribution of participle label is obtained, the probability distribution of the participle label is sent into described CRF layers, obtains entity pair
The entity tag sequence answered.
5. the method according to claim 1, wherein the pattern rules be it is revisable, it is described rule match
Information extraction model is set, can be configured respectively according to different application scenarios;And class is provided in the pattern rules
Belong to information, the Text Pretreatment further includes context rule analysis in line of text, context rule analysis in the line of text
Including to text carry out participle cutting and Entity recognition as a result, being repaired using scheduled rule regulating method to word segmentation result
Just, ambiguous generic is identified again.
6. the method according to claim 1, wherein the pattern rules include abbreviation merging rule, and will
Complicated length rule is placed on front, and simple short rule is put behind.
7. the method according to claim 1, wherein the participle and entity by the text block is according to right
The pattern rules answered, the text block after being organized into structuring include: to judge whether continuous row meets spy using pattern rules sequence
Fixed mode, and after completion text meets corresponding AD HOC, the pattern rules result that Jiang Gehang is matched is stored in word
In the Multidimensional numerical for according with string type.
8. the method in -7 described in any one according to claim 1, which is characterized in that the configuration information of the rule extracts mould
Type carries out expression writing using rule description language NPRDL, and NPRDL language is using BNF form;And the description language
Speech is the means based on set of complex features to describe the grammatical and semantic information of vocabulary, while using based on complexity in dynamic analysis
The dynamic attribute list of feature set description describes.
9. a kind of electronic device characterized by comprising
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor to realize such as
Method described in any one in claim 1-8.
10. a kind of non-volatile memory medium, is stored thereon with computer program, which is characterized in that the computer program is held
The step of any one the method in such as claim 1-8 is realized when row.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910548427.8A CN110321432B (en) | 2019-06-24 | 2019-06-24 | Text event information extraction method, electronic device and nonvolatile storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910548427.8A CN110321432B (en) | 2019-06-24 | 2019-06-24 | Text event information extraction method, electronic device and nonvolatile storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110321432A true CN110321432A (en) | 2019-10-11 |
| CN110321432B CN110321432B (en) | 2021-11-23 |
Family
ID=68120149
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910548427.8A Active CN110321432B (en) | 2019-06-24 | 2019-06-24 | Text event information extraction method, electronic device and nonvolatile storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110321432B (en) |
Cited By (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110738054A (en) * | 2019-10-14 | 2020-01-31 | 携程计算机技术(上海)有限公司 | Method, system, electronic device and storage medium for identifying hotel information in mail |
| CN110866393A (en) * | 2019-11-19 | 2020-03-06 | 北京网聘咨询有限公司 | Resume information extraction method and system based on domain knowledge base |
| CN111191459A (en) * | 2019-12-25 | 2020-05-22 | 医渡云(北京)技术有限公司 | Text processing method and device, readable medium and electronic equipment |
| CN111477320A (en) * | 2020-03-11 | 2020-07-31 | 北京大学第三医院(北京大学第三临床医学院) | Construction system of treatment effect prediction model, treatment effect prediction system and terminal |
| CN111581954A (en) * | 2020-05-15 | 2020-08-25 | 中国人民解放军国防科技大学 | A method and device for text event extraction based on grammatical dependency information |
| CN111859968A (en) * | 2020-06-15 | 2020-10-30 | 深圳航天科创实业有限公司 | A text structuring method, text structuring device and terminal device |
| CN111930869A (en) * | 2020-08-11 | 2020-11-13 | 上海寻梦信息技术有限公司 | Address deviation rectifying method and device, electronic equipment and storage medium |
| CN112445784A (en) * | 2020-12-16 | 2021-03-05 | 上海芯翌智能科技有限公司 | Text structuring method, equipment and system |
| CN112464927A (en) * | 2020-11-25 | 2021-03-09 | 苏宁金融科技(南京)有限公司 | Information extraction method, device and system |
| CN112487138A (en) * | 2020-11-19 | 2021-03-12 | 华为技术有限公司 | Information extraction method and device for formatted text |
| CN112597308A (en) * | 2020-12-24 | 2021-04-02 | 北京金堤科技有限公司 | Text data processing method and device, electronic equipment and storage medium |
| CN112651236A (en) * | 2020-12-28 | 2021-04-13 | 中电金信软件有限公司 | Method and device for extracting text information, computer equipment and storage medium |
| CN112764762A (en) * | 2021-02-09 | 2021-05-07 | 清华大学 | Method and system for automatically converting standard text into computable logic rule |
| CN112948471A (en) * | 2019-11-26 | 2021-06-11 | 广州知汇云科技有限公司 | Clinical medical text post-structured processing platform and method |
| CN113010628A (en) * | 2019-12-20 | 2021-06-22 | 北京宸瑞科技股份有限公司 | Information mining system and method combining mail content and text feature extraction |
| CN113051926A (en) * | 2021-03-01 | 2021-06-29 | 北京百度网讯科技有限公司 | Text extraction method, equipment and storage medium |
| CN113111170A (en) * | 2020-02-13 | 2021-07-13 | 北京明亿科技有限公司 | Method and device for extracting alarm receiving and processing text track ground information based on deep learning model |
| CN113435212A (en) * | 2021-08-26 | 2021-09-24 | 山东大学 | Text inference method and device based on rule embedding |
| CN113761906A (en) * | 2020-07-16 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method, device, equipment and computer readable medium for analyzing document |
| CN114330354A (en) * | 2022-03-02 | 2022-04-12 | 杭州海康威视数字技术股份有限公司 | Event extraction method and device based on vocabulary enhancement and storage medium |
| CN114490929A (en) * | 2021-12-31 | 2022-05-13 | 广州探迹科技有限公司 | Bidding information acquisition method and device, storage medium and terminal equipment |
| CN114510551A (en) * | 2020-11-17 | 2022-05-17 | 广州市有车以后信息科技有限公司 | Method for extracting viewpoint labels of automobile network public praise |
| CN119514544A (en) * | 2024-11-05 | 2025-02-25 | 四川大学华西医院 | Target object information structured processing method, device, equipment and medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106776711A (en) * | 2016-11-14 | 2017-05-31 | 浙江大学 | A kind of Chinese medical knowledge mapping construction method based on deep learning |
| CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | A system and method for building knowledge graphs for intelligence analysis |
| CN106959944A (en) * | 2017-02-14 | 2017-07-18 | 中国电子科技集团公司第二十八研究所 | A kind of Event Distillation method and system based on Chinese syntax rule |
| US20170286525A1 (en) * | 2016-03-31 | 2017-10-05 | Splunk Inc. | Field Extraction Rules from Clustered Data Samples |
| CN109408806A (en) * | 2018-09-11 | 2019-03-01 | 中国电子科技集团公司第二十八研究所 | A kind of Event Distillation method based on English grammar rule |
-
2019
- 2019-06-24 CN CN201910548427.8A patent/CN110321432B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170286525A1 (en) * | 2016-03-31 | 2017-10-05 | Splunk Inc. | Field Extraction Rules from Clustered Data Samples |
| CN106776711A (en) * | 2016-11-14 | 2017-05-31 | 浙江大学 | A kind of Chinese medical knowledge mapping construction method based on deep learning |
| CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | A system and method for building knowledge graphs for intelligence analysis |
| CN106959944A (en) * | 2017-02-14 | 2017-07-18 | 中国电子科技集团公司第二十八研究所 | A kind of Event Distillation method and system based on Chinese syntax rule |
| CN109408806A (en) * | 2018-09-11 | 2019-03-01 | 中国电子科技集团公司第二十八研究所 | A kind of Event Distillation method based on English grammar rule |
Cited By (38)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110738054B (en) * | 2019-10-14 | 2023-07-07 | 携程计算机技术(上海)有限公司 | Method, system, electronic device and storage medium for identifying hotel information in mail |
| CN110738054A (en) * | 2019-10-14 | 2020-01-31 | 携程计算机技术(上海)有限公司 | Method, system, electronic device and storage medium for identifying hotel information in mail |
| CN110866393A (en) * | 2019-11-19 | 2020-03-06 | 北京网聘咨询有限公司 | Resume information extraction method and system based on domain knowledge base |
| CN110866393B (en) * | 2019-11-19 | 2023-06-23 | 北京网聘咨询有限公司 | Resume information extraction method and system based on domain knowledge base |
| CN112948471A (en) * | 2019-11-26 | 2021-06-11 | 广州知汇云科技有限公司 | Clinical medical text post-structured processing platform and method |
| CN113010628B (en) * | 2019-12-20 | 2024-08-09 | 北京宸瑞科技股份有限公司 | Information mining system and method combining mail content and text feature extraction |
| CN113010628A (en) * | 2019-12-20 | 2021-06-22 | 北京宸瑞科技股份有限公司 | Information mining system and method combining mail content and text feature extraction |
| CN111191459A (en) * | 2019-12-25 | 2020-05-22 | 医渡云(北京)技术有限公司 | Text processing method and device, readable medium and electronic equipment |
| CN111191459B (en) * | 2019-12-25 | 2023-12-12 | 医渡云(北京)技术有限公司 | Text processing method and device, readable medium and electronic equipment |
| CN113111170A (en) * | 2020-02-13 | 2021-07-13 | 北京明亿科技有限公司 | Method and device for extracting alarm receiving and processing text track ground information based on deep learning model |
| CN111477320A (en) * | 2020-03-11 | 2020-07-31 | 北京大学第三医院(北京大学第三临床医学院) | Construction system of treatment effect prediction model, treatment effect prediction system and terminal |
| CN111477320B (en) * | 2020-03-11 | 2023-05-30 | 北京大学第三医院(北京大学第三临床医学院) | Construction system of treatment effect prediction model, treatment effect prediction system and terminal |
| CN111581954A (en) * | 2020-05-15 | 2020-08-25 | 中国人民解放军国防科技大学 | A method and device for text event extraction based on grammatical dependency information |
| CN111859968A (en) * | 2020-06-15 | 2020-10-30 | 深圳航天科创实业有限公司 | A text structuring method, text structuring device and terminal device |
| CN113761906B (en) * | 2020-07-16 | 2024-06-18 | 北京沃东天骏信息技术有限公司 | Method, device, apparatus and computer-readable medium for parsing documents |
| CN113761906A (en) * | 2020-07-16 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method, device, equipment and computer readable medium for analyzing document |
| CN111930869B (en) * | 2020-08-11 | 2024-02-06 | 上海寻梦信息技术有限公司 | Address correction method, address correction device, electronic equipment and storage medium |
| CN111930869A (en) * | 2020-08-11 | 2020-11-13 | 上海寻梦信息技术有限公司 | Address deviation rectifying method and device, electronic equipment and storage medium |
| CN114510551A (en) * | 2020-11-17 | 2022-05-17 | 广州市有车以后信息科技有限公司 | Method for extracting viewpoint labels of automobile network public praise |
| CN112487138A (en) * | 2020-11-19 | 2021-03-12 | 华为技术有限公司 | Information extraction method and device for formatted text |
| WO2022105237A1 (en) * | 2020-11-19 | 2022-05-27 | 华为技术有限公司 | Information extraction method and apparatus for text with layout |
| CN112464927A (en) * | 2020-11-25 | 2021-03-09 | 苏宁金融科技(南京)有限公司 | Information extraction method, device and system |
| CN112464927B (en) * | 2020-11-25 | 2023-10-31 | 苏宁金融科技(南京)有限公司 | Information extraction method, device and system |
| CN112445784B (en) * | 2020-12-16 | 2023-02-21 | 上海芯翌智能科技有限公司 | Text structuring method, equipment and system |
| CN112445784A (en) * | 2020-12-16 | 2021-03-05 | 上海芯翌智能科技有限公司 | Text structuring method, equipment and system |
| CN112597308A (en) * | 2020-12-24 | 2021-04-02 | 北京金堤科技有限公司 | Text data processing method and device, electronic equipment and storage medium |
| CN112597308B (en) * | 2020-12-24 | 2025-03-11 | 北京金堤科技有限公司 | Text data processing method, device, electronic device and storage medium |
| CN112651236A (en) * | 2020-12-28 | 2021-04-13 | 中电金信软件有限公司 | Method and device for extracting text information, computer equipment and storage medium |
| CN112764762B (en) * | 2021-02-09 | 2021-09-17 | 清华大学 | Method and system for automatically converting standard text into computable logic rule |
| CN112764762A (en) * | 2021-02-09 | 2021-05-07 | 清华大学 | Method and system for automatically converting standard text into computable logic rule |
| CN113051926A (en) * | 2021-03-01 | 2021-06-29 | 北京百度网讯科技有限公司 | Text extraction method, equipment and storage medium |
| CN113051926B (en) * | 2021-03-01 | 2023-06-23 | 北京百度网讯科技有限公司 | Text extraction method, device and storage medium |
| CN113435212A (en) * | 2021-08-26 | 2021-09-24 | 山东大学 | Text inference method and device based on rule embedding |
| CN113435212B (en) * | 2021-08-26 | 2021-11-16 | 山东大学 | Text inference method and device based on rule embedding |
| CN114490929A (en) * | 2021-12-31 | 2022-05-13 | 广州探迹科技有限公司 | Bidding information acquisition method and device, storage medium and terminal equipment |
| CN114330354A (en) * | 2022-03-02 | 2022-04-12 | 杭州海康威视数字技术股份有限公司 | Event extraction method and device based on vocabulary enhancement and storage medium |
| CN119514544A (en) * | 2024-11-05 | 2025-02-25 | 四川大学华西医院 | Target object information structured processing method, device, equipment and medium |
| CN119514544B (en) * | 2024-11-05 | 2025-08-01 | 四川大学华西医院 | Target object information structuring processing method, device, equipment and medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110321432B (en) | 2021-11-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110321432A (en) | Textual event information extracting method, electronic device and non-volatile memory medium | |
| US11989519B2 (en) | Applied artificial intelligence technology for using natural language processing and concept expression templates to train a natural language generation system | |
| CN114254653A (en) | Scientific and technological project text semantic extraction and representation analysis method | |
| CN108255813B (en) | Text matching method based on word frequency-inverse document and CRF | |
| CN111488466B (en) | Chinese tagged error corpus generation method, computing device and storage medium | |
| US11170169B2 (en) | System and method for language-independent contextual embedding | |
| Dobson | Interpretable Outputs: Criteria for Machine Learning in the Humanities. | |
| CN107992597A (en) | A kind of text structure method towards electric network fault case | |
| CN107038229A (en) | A kind of use-case extracting method based on natural semantic analysis | |
| CN106250372A (en) | A kind of Chinese electric power data text mining method for power system | |
| CN113869040B (en) | A speech recognition method for power grid dispatching | |
| CN114840657A (en) | API knowledge graph self-adaptive construction and intelligent question-answering method based on mixed mode | |
| CN116628229B (en) | Method and device for generating text corpus by using knowledge graph | |
| CN109933787A (en) | Method, device and medium for extracting text key information | |
| CN115438195A (en) | A method and device for constructing a knowledge map in the field of financial standardization | |
| Sanyal et al. | Natural language processing technique for generation of SQL queries dynamically | |
| CN118503454B (en) | Data query method, device, storage medium and computer program product | |
| RU2640718C1 (en) | Verification of information object attributes | |
| Shatalov et al. | Named entity recognition problem for long entities in english texts | |
| Hossain et al. | A hybrid attention-based transformer model for Arabic news classification using text embedding and deep learning | |
| JP2020181529A (en) | Investigation support method, investigation support computer program, and investigation support system | |
| CN112132214A (en) | Document information accurate extraction system compatible with multiple languages | |
| CN116720502B (en) | Aviation document information extraction method based on machine reading understanding and template rules | |
| Chen et al. | Deep learning model for humor recognition of different cultures | |
| Karajeh et al. | Fusing AraBERT and graph neural networks for enhanced Arabic text classification |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |