CN101571853A - Evolution analysis device and method for contents of network topics - Google Patents

Evolution analysis device and method for contents of network topics Download PDF

Info

Publication number
CN101571853A
CN101571853A CNA2009100720849A CN200910072084A CN101571853A CN 101571853 A CN101571853 A CN 101571853A CN A2009100720849 A CNA2009100720849 A CN A2009100720849A CN 200910072084 A CN200910072084 A CN 200910072084A CN 101571853 A CN101571853 A CN 101571853A
Authority
CN
China
Prior art keywords
topic
center
report
network
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2009100720849A
Other languages
Chinese (zh)
Inventor
王巍
杨武
苘大鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CNA2009100720849A priority Critical patent/CN101571853A/en
Publication of CN101571853A publication Critical patent/CN101571853A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供的是一种网络话题内容演化分析装置及分析方法。网络话题内容演化分析装置由网络事件数据收集装置、网络事件数据预处理装置、话题内容演化分析装置和输出装置构成。本发明的网络话题内容演化分析方法包括网络事件数据收集、网络事件预处理、相似度计算、话题多中心建立、话题中心更新和输出步骤。通过本发明可以发现与话题相关的多个内容侧面,采用多中心结构建立相应的话题模型,更为准确、全面地描述话题;通过话题多中心的建立和更新,可以展现话题内容的动态演化发展过程,即话题的产生、发展、高潮直至消亡的全过程。本发明提出的方法不依赖于报道的处理顺序,能够适用于侧重点不同的新闻报道的交叉出现情况。

Figure 200910072084

The invention provides a network topic content evolution analysis device and analysis method. The network topic content evolution analysis device is composed of a network event data collection device, a network event data preprocessing device, a topic content evolution analysis device and an output device. The network topic content evolution analysis method of the present invention includes the steps of network event data collection, network event preprocessing, similarity calculation, topic multi-center establishment, topic center update and output steps. Through the present invention, multiple content aspects related to the topic can be found, and a corresponding topic model can be established by using a multi-center structure to describe the topic more accurately and comprehensively; through the establishment and update of the multi-center topic, the dynamic evolution and development of the topic content can be displayed Process, that is, the whole process of topic generation, development, climax and extinction. The method proposed by the present invention does not depend on the processing order of the reports, and can be applied to the cross-appearance of news reports with different emphases.

Figure 200910072084

Description

Contents of network topics EVOLUTION ANALYSIS device and analytical approach
(1) technical field
What the present invention relates to is a kind of technology of utilizing information intelligent analysis of computer technology auxiliary network or network public-opinion management, it is particularly related to the system and method that utilizes natural language processing technique and data mining technology to come phase-split network topic content dynamically to develop, and is a kind of system and method that can accurately present the dynamic evolutionary process of contents of network topics specifically.
(2) background technology
Along with advancing by leaps and bounds of the development of infotech, particularly Internet technology, change people and obtained mode with exchange of information, a kind of new public sentiment form of expression---network public-opinion should be pregnant and give birth to.Network public-opinion is meant the public (netizen) with the network platform, by netspeak or alternate manner, to the summation of some public affair or the suggestion that focal issue showed.Compare with traditional media, the public feelings information source of network is abundant, comprise news analysis, BBS, chatroom, blog, polymerization news (RSS) etc., and network public-opinion have velocity of propagation fast, involve characteristics such as scope is wide, influence degree is dark, therefore much bigger than the difficulty of traditional society public sentiment management to the management of network public-opinion.According to " the China Internet network state of development statistical report " of China Internet Network Information Center's issue, netizen's quantity of China has surpassed the U.S. and has leapt to the first in the world.Therefore, the situation that the network public-opinion of China management faces is more severe, is badly in need of new technology and method and provides support for it and serve.
Network topics is the fundamental that network public-opinion embodies, to the management of network topics be realize the network public-opinion management the most basic, also be most important link.The life cycle of network topics generally includes topic appearance, topic survival and topic extinction three phases.Wherein the topic live stages is of paramount importance, and in this stage, the related related content of topic can constantly be launched, and along with the development of the state of affairs, related content also can constantly change, and enters the extinction stage until topic simultaneously.The topic related content develops along with the continuous phenomenon that changes of key element wherein is called topic, and topic develops to have shown a certain topic is how to develop, change in live stages, is of paramount importance foundation in grasp and the work of supervising the network public sentiment.
Network topics EVOLUTION ANALYSIS correlation technique is based upon on the network topics detection and tracking technical foundation, is the latter's extension and raising.Aspect the network topics EVOLUTION ANALYSIS, existing following related art scheme is proposed in succession.[topic towards dynamic evolution detects research [J] to Zhao Hua etc., the hi-tech communication, 2006,16 (12): 1230-1235] studying the dynamic evolution analysis method of topic that proposes when topic detects based on two barycenter topic models, this method adopts initial barycenter and current barycenter to represent that respectively topic is early paid close attention to and the content of current concern, the appearance of the foundation sign topic fresh content of separation, initial barycenter and current barycenter upgrade along with the foundation of separation.This method not only can keep the content that topic is early paid close attention to, and can also catch emerging content in the topic immediately.Can in time capture the appearance of the new side of topic based on the method for two barycenter models, but when the relevant report of each side of topic was out of order the appearance, this model then can't correctly be discerned the evolutionary process of topic content.
Wu Pingbo etc. [based on the intelligent retrieval research [J] of the incident relevant documentation of Events Frame. Journal of Chinese Information Processing, 2003,17 (6): 25-30] in topic evolution analysis method based on Events Frame thought has been proposed.This method extracts the side keyword and sets up more perfect Events Frame, according to the evolution of this Events Frame analysis topic content, in order to the accuracy of raising topic detection and tracking by the manual report of collecting with relevant each side of topic.Topic content evolution analysis method based on Events Frame thought, need to collect in advance the not report of ipsilateral of topic, and extract the side keyword, the behavior of manual intervention is too much, and whether each side-information is collected very big to the performance impact of topic detection and tracking comprehensively.
Wang Huizhen etc. [follow the trail of [J] based on the adaptive Chinese topic of feedback learning. Journal of Chinese Information Processing, 2005,20 (3), 92-98] in the influence of developing topic is followed the trail of at topic, promptly the topic drift phenomenon has proposed based on the adaptive topic method for tracing of feedback learning, adopt incremental mode that the topic model is revised, and keep each revised topic model, and follow the trail of follow-up report with the linear combination of these topic models, develop to the influence of Topic Tracking in order to solve topic.
Juha Makkonen[Investigations on event evolution in TDT.In Proceedings ofStudent Workshop of Human Language Technology Conference of the NorthAmerican Chapter of the Association for Computational Linguistics (HLT-NAACL), Edmonton, Canada, 2003:43-48] according to the elemental of media event: the time, the place, the personage, event content, traditional single vector is divided into four subvectors according to the different meaning of a word, calculate the similarity of four semantic vectors respectively, unification at last is a similarity.By judging the similarity between the similarity decision event (or report), analyze the partial content that evolution takes place in the actualite.
Topic develops closely bound up with advancing of time, and therefore the temporal data relevant with the topic content is to analyze the important evidence that topic develops.Chih-Ping Wei[IEEE Transactions On Systems, Man, AndCybernetics-Part A:Systems And Humans, 2007,37 (2), 273-283] the incident evolutionary pattern method for digging based on document sequence proposed, the document sequence here is by document to be analyzed is arranged formation according to time sequencing, and supposes that each piece document only relates to some sides of topic.Jia Ziyan [a kind of incident detection and tracing algorithm [J] based on the dynamic evolution model, computer research and development, 2004,41 (7): 1273-1280] thinks that topic is more little with the report mistiming, and similarity is big more.To introduce the topic calculation of similarity degree time, propose similarity computation model, and for distinguishing different topics certain effect be arranged based on the similarity computation model of time gap, but be not suitable for the not differentiation of ipsilateral of same topic based on time gap.
By to existing method and analysis of technology, bring the reason of various shortcoming and defect can be summarized as following 2 points:
1, topic model representation: for same topic, the often dynamic change of its content keypoint, i.e. many sides property of topic content along with the development of dependent event.The topic model that is proposed at present all is difficult to describe many sides of topic property, can not intactly present topic in evolutionary process.
2, topic modelling: existing technical method just adopts traditional clustering method to handle the news report that belongs to same topic, can not embody the active development process of this topic inside.
(3) summary of the invention
The object of the present invention is to provide and a kind ofly can analyze and present the dynamic evolutionary process of contents of network topics accurately, all sidedly, the contents of network topics EVOLUTION ANALYSIS device of more advanced technical support is provided for the Intelligent Information Processing of network-oriented and public sentiment analytical technology.The present invention also aims to provide a kind of contents of network topics evolution analysis method.
The object of the present invention is achieved like this:
The formation of the dynamic EVOLUTION ANALYSIS device of network topics of the present invention comprises network event transacter, network event data pretreatment unit, topic content EVOLUTION ANALYSIS device and the output unit that connects successively; The network event transacter obtains the raw data of describing the network topics dependent event in real time, on one's own initiative from the internet, and stores; Network event data pretreatment unit is described raw data to the network event that the network event transacter stores, through resolving the noise that filters out wherein, extract the real core data relevant with network event, core data is carried out characterizing definition and extraction, be expressed as the vector space model mode; Through input topic EVOLUTION ANALYSIS device after the data pre-service, the incident relevant with topic carried out cluster, and analyze active development and evolutionary process at the topic internal event; The topic EVOLUTION ANALYSIS result of output unit output system.
Contents of network topics EVOLUTION ANALYSIS device of the present invention can also comprise:
1, described network event data pretreatment unit is made of network event data purification unit and network event data representation unit, the interfere information in the webpage is removed in network event data purification unit, news content is extracted exactly, network event data representation unit, carry out Chinese word segmentation processing for the news content that extracts, be expressed as the form of vector then.
2, the dynamic EVOLUTION ANALYSIS device of described topic content is set up the unit by similarity calculated, topic multicenter and topic center updating block constitutes; Report that similarity calculated calculating is collected and the similarity between each topic center are judged the topic class that this report is affiliated; The topic multicenter is set up the unit on the basis of judging topic class under the current report, and the different characteristic number by the existing center of more current news report and topic decides the topic center under this report; When topic center updating block adds new report at a certain center when topic, upgrade the vector representation at this center of topic.
Conceptual illustration related in the contents of network topics evolution analysis method of the present invention is as follows:
The multicenter structure: a side of topic is represented at the center of topic, and the multicenter structure is representing of a plurality of sides of topic, and the emphasis of discussing between each side is inequality.
Different feature: current report is with respect to the new feature at certain topic center.With the different feature that different topic center calculation goes out may be inequality.
Different degree: the new feature that occurs in the current report accounts for the number percent of this report feature sum.
The contents of network topics evolution analysis method may further comprise the steps:
The network event data collection step, the news web page on the download network, and be kept at server end with the form of file is for the processing and the analysis of subsequent module provides raw data;
The network event pre-treatment step is carried out noise reduction with original news web page, removes useless information, carries out Chinese word segmentation then and handles, and adopt the weight of specific policy calculation speech, finally is expressed as the citation form that adopts vector space model;
The similarity calculation procedure adopts current report of cosine distance calculation and the similarity that has each topic, and record produces the topic of maximum similarity, if maximum similarity, thinks then that current report belongs to this topic class more than or equal to pre-set threshold; Otherwise maximum similarity is then set up new topic class less than threshold value;
Topic multicenter establishment step after the topic class of judging under the current report, continues to judge this report belongs to which center in this topic class, and it is joined in report set at this center, upgrades the topic center simultaneously;
Topic center step of updating when having new report to add the topic center, is upgraded corresponding topic center vector;
The output step with the result of topic content EVOLUTION ANALYSIS output, comprises all centers of topic inside, and the news report that comprises of each center.
Contents of network topics evolution analysis method of the present invention can also comprise:
1, in the similarity calculation procedure, when calculating report and a certain topic similarity, calculates the similarity of reporting with this each center of topic respectively, choose the similarity of maximal value as report and this topic.
2, in the topic multicenter establishment step, the strategy at center is similarity and the different degree according to current report and each center of topic under the judgement report: select the center conduct and the immediate center of current report of similarity maximum; If the different degree at current report and this center is then set up the new center of topic more than or equal to prior preset threshold with current report; If different degree less than threshold value, thinks that then current report belongs to this topic center.
3, the concrete update method of topic center step of updating for vector that current report is formed and center vector is done and, form new center vector.
The invention has the advantages that, can find a plurality of contents side relevant, adopt the multicenter structure to set up corresponding topic model, describe topic more accurately, all sidedly with topic by the present invention; By polycentric foundation of topic and renewal, can represent the dynamic Evolution Development process of topic content, i.e. the generation of topic, development, climax are until the overall process of withering away.The method that the present invention proposes does not rely on the processing sequence of report, can be applicable to the cross occurrence situation of the news report that emphasis is different.
(4) description of drawings
Fig. 1 is the system architecture diagram of apparatus of the present invention;
Fig. 2 is based on the dynamic evolution analysis method process flow diagram of contents of network topics of multicenter structure.
(5) embodiment
For example the present invention is done description in more detail below in conjunction with accompanying drawing:
Figure 1 shows that based on the dynamic EVOLUTION ANALYSIS of the contents of network topics of multicenter structure system, comprising:
Network event transacter: be used in real time, obtain from the internet on one's own initiative the raw data of describing the contents of network topics dependent event, and store;
Network event data pretreatment unit: the network event that the network event transacter stores is described raw data, abide by predefined certain form and resolve, filter out noise wherein, extract the real core data relevant with network event; In addition, core data is carried out characterizing definition and extraction, and adopt suitable form to express;
Topic EVOLUTION ANALYSIS device: after the data pre-service, news cluster that will be relevant with certain incident arrives together, and analyzes active development and evolutionary process at the topic internal event;
Output unit: be used for the topic EVOLUTION ANALYSIS result of output system, the related news report that specifically comprises the topic center and belong to each center.
Fig. 2 has provided the detail flowchart based on the dynamic evolution analysis method of contents of network topics of multicenter structure.
1. network event data aggregation
The characteristics of Internet news incident are many sides characteristics of news, have a plurality of emphasis in all promptly relevant with a certain topic news report, and each emphasis is discussed the content of an aspect of news.Along with the development of incident, the emphasis that topic is discussed is also constantly shifting and is changing.
2. network event data pre-service
The present invention adopts the formalized description of vector space model as news report and topic model, and the network event data vectorization comprises the steps:
(1) from original web page, extracts the body part of news;
(2) utilize dictionary for word segmentation that the text of news is carried out word segmentation processing, extract notional word wherein, remove function word and stop words;
(3) adopt the TF-IDF method to determine the weight of each speech behind the participle, the computing method of TF-IDF as shown in the formula:
W t , d = TF t , d × log ( N / DF t ) Σ t = 1 m [ TF t , d × log ( N / DF t ) ] 2
W wherein T, dBe the weight of feature t in document d, m is the feature number, and N is total number of files, TF T, dBe the word frequency in document d of feature t, DF tDocument frequency for feature t.
(4) by each speech, promptly feature and weight thereof form the vector representation of this news report as component, specifically are expressed as follows:
V d={(T 1,W 1,d);(T 2,W 2,d);...;(T m,W m,d)}
V wherein dThe vector representation of expression document d, T i(i feature among the expression of 1≤i≤m) the document d, W I, dThe weight of i feature among the expression document d.
3, similarity is calculated
Should calculate the similarity at report and each center of topic when calculating report and topic similarity, and similarity is peaked as the similarity of report with topic.Here adopt the included angle cosine formula to calculate similarity, concrete grammar is as follows:
(1) adopt the included angle cosine method to calculate the similarity at report and each center of topic, specifically adopt following formula to calculate:
Sim ( V d i , V d j ) = Σ t = 1 m W t , d i × W t , d j Σ t = 1 m W t , d i 2 × Σ t = 1 m W t , d j 2
Wherein
Figure A20091007208400102
Be respectively document d iAnd d jVector representation.
(2) maximal value in the similarity of selection calculating gained is as the similarity of report and topic.
4, the topic multicenter is set up
The evolution of topic content often is embodied in the appearance of new feature.Do not occur in the topic incipient stage as some feature, but just occur after continuing for some time, then the appearance of these features means that probably evolution has taken place the topic content.Yet the appearance according to a few new feature is not enough to also judge that evolution has taken place the topic content, has only when the new feature quantity that occurs reaches certain scale, can think that just its content develops.Here adopt vector decomposition method to set up topic multicenter structural model, and judge the evolution that the topic content takes place.
In the topic multicenter structure that the present invention proposes, only the topic class under reporting is judged it is incomplete, need also to judge which the center that report is discussed is.When judging the center of report discussion, different feature quantity is few more, and then report more may be at this center of discussion; Otherwise different feature quantity is many more, and is then possible more at the different center of discussion.Algorithm is as follows:
(1) calculates similarity, the different feature quantity of reporting with these all centers of topic;
(2) select and the immediate center of report: select the conduct and the immediate center of report of similarity maximum each in the heart from topic;
(3) judge the topic center that report is discussed: the different feature number percent at report and this center is less than threshold value, and then report belongs to this center; Otherwise, set up the new center of topic with this report.
5, the topic center is upgraded
The center of topic adopts the V vector space model to represent.When having new report to add the topic center, need to upgrade corresponding topic center vector.The vector that concrete update method forms current report and center vector do with, form new center vector, method is specific as follows:
Suppose T I, d(1≤i≤n) and W I, dCharacteristic item i and the corresponding weight value of representing current report document vector d respectively, then current document vector V dCan be expressed as V d={ (T 1, d, W 1, d); (T 2, d, W 2, d); (T M, d, W M, d), the document vector V that is formed centrally in current of topic in like manner cCan be expressed as form: V c={ T 1, c, W 1, c); (T 2, c, W 2, c); ...; (T M, c, W M, c), then they and be expressed as sum (V d, V c)=(T 1, s, W 1, sT 2, s, W 2, s...; T N, s, W N, s), for each component (T wherein I, s, W I, s), it is generated by following rule:
(1) generating feature item: make V dCharacteristic item set be S (V d), V cCharacteristic item set be S (V c), T then I, s(∈ S (the V of 1≤i≤n) d) ∩ S (V c).
(2) generate weights:
W i , s = W i , d + W i , c , T i , s ∈ S ( V d ) ∩ S ( V c ) W i , d , T i , s ∈ S ( V d ) - S ( V d ) ∩ S ( V c ) W i , c , T i , s ∈ S ( V c ) - S ( V d ) ∩ S ( V c )
6, embodiment scene and result describe
In order to verify validity of the present invention, we have realized concrete technology and the method wherein mentioned, and contrast with topic detection method based on two barycenter models, and contrast standard comprises that topic detects two aspects of setting up at performance and topic center.Experimental data is some news web pages of collecting in the Sina website, totally 5 topics, 181 pieces of news report are that the earthquake of Damxung, Tibet, Hangzhou Subway building site cave in respectively, Urumchi commercial building big fire, melamine problem egg, Shanxi Black brick field maltreat workman's incident, represent with numbering 1-5 respectively.
In specific implementation process, similarity threshold is set to 0.4, and different degree threshold value is set to 0.6.
Table 1 be the inventive method and topic detect the performance comparison result.
In order to verify method of the present invention and the not ipsilateral that can detect topic based on the method for two barycenter models, and the not ipsilateral of a topic inside of accurate description, we have carried out following experiment: at first distribute numbering for every piece of report, choose 23 pieces of reports from " Urumchi commercial building big fire " incident, be divided into three aspects: 1, wrecked (the report numbering 1-7) 2 of three firemans, fire failures investigation, (the report numbering 8-12) 3 that deal with problems arising from an accident, analyze and sum up culprit (report numbering 13-23), the data of table 2 are each side report results when handling successively; The result of table 3 is results of ipsilateral report cross occurrence not.
Table 1 is based on multicenter structural approach and two barycenter method performance comparison
Figure A20091007208400121
Classification results during table 2 sequential processes
Figure A20091007208400122
Classification results during each side of table 3 cross occurrence
Figure A20091007208400123
Wherein, the recall rate R in the evaluation index, accuracy rate P and F1 value are calculated by following formula:
Recall rate:
Figure A20091007208400124
Accuracy rate:
Figure A20091007208400131
The F1 value: F 1 = 2 PR P + R
From experimental result as can be seen, detect performance for topic, aspect performance index such as accuracy rate, recall rate, F1 value, basic inventive method is suitable with the method for two barycenter models.And aspect the setting up of topic center, the inventive method then is better than the method based on two barycenter models.Two barycenter only according to the appearance of neologisms, and only are applicable to the situation that each side of topic occurs successively aspect the setting up of separation, can not adapt to for the situation of different content cross occurrence.Method of the present invention then can the contents processing cross occurrence situation, not only can find the emerging content of topic, and can also carry out secondary to old content and sort out, the inner all centers of the in store topic of multicenter structure, and can in time upgrade each center of topic, thereby grasp the evolutionary process of topic content exactly, improved topic and detected performance.

Claims (8)

1, a kind of contents of network topics EVOLUTION ANALYSIS device, its formation comprise network event transacter, network event data pretreatment unit, topic content EVOLUTION ANALYSIS device and the output unit that connects successively; It is characterized in that: the network event transacter obtains the raw data of describing the network topics dependent event in real time, on one's own initiative from the internet, and stores; Network event data pretreatment unit is described raw data to the network event that the network event transacter stores, through resolving the noise that filters out wherein, extract the real core data relevant with network event, core data is carried out characterizing definition and extraction, be expressed as the vector space model mode; Through input topic EVOLUTION ANALYSIS device after the data pre-service, the incident relevant with topic carried out cluster, and analyze active development and evolutionary process at the topic internal event; The topic EVOLUTION ANALYSIS result of output unit output system.
2, contents of network topics EVOLUTION ANALYSIS device according to claim 1, it is characterized in that: described network event data pretreatment unit is made of network event data purification unit and network event data representation unit, the interfere information in the webpage is removed in network event data purification unit, news content is extracted exactly, network event data representation unit, carry out Chinese word segmentation processing for the news content that extracts, be expressed as the form of vector then.
3, contents of network topics EVOLUTION ANALYSIS device according to claim 1 and 2 is characterized in that: the dynamic EVOLUTION ANALYSIS device of described topic content sets up the unit by similarity calculated, topic multicenter and topic center updating block constitutes; Report that similarity calculated calculating is collected and the similarity between each topic center are judged the topic class that this report is affiliated; The topic multicenter is set up the unit on the basis of judging topic class under the current report, and the different characteristic number by the existing center of more current news report and topic decides the topic center under this report; When topic center updating block adds new report at a certain center when topic, upgrade the vector representation at this center of topic.
4, a kind of contents of network topics evolution analysis method is characterized in that may further comprise the steps:
The network event data collection step, the news web page on the download network, and be kept at server end with the form of file is for the processing and the analysis of subsequent module provides raw data;
The network event pre-treatment step is carried out noise reduction with original news web page, removes useless information, carries out Chinese word segmentation then and handles, and adopt the weight of specific policy calculation speech, finally is expressed as the citation form that adopts vector space model;
The similarity calculation procedure adopts current report of cosine distance calculation and the similarity that has each topic, and record produces the topic of maximum similarity, if maximum similarity, thinks then that current report belongs to this topic class more than or equal to pre-set threshold; Otherwise maximum similarity is then set up new topic class less than threshold value;
Topic multicenter establishment step after the topic class of judging under the current report, continues to judge this report belongs to which center in this topic class, and it is joined in report set at this center, upgrades the topic center simultaneously;
Topic center step of updating when having new report to add the topic center, is upgraded corresponding topic center vector;
The output step with the result of topic content EVOLUTION ANALYSIS output, comprises all centers of topic inside, and the news report that comprises of each center.
5, according to claim 4 contents of network topics evolution analysis method, it is characterized in that: in the similarity calculation procedure, when calculating report and a certain topic similarity, calculate the similarity of reporting with this each center of topic respectively, choose the similarity of maximal value as report and this topic.
6, according to claim 4 or 5 contents of network topics evolution analysis method, it is characterized in that: in the topic multicenter establishment step, the strategy at center is similarity and the different degree according to current report and each center of topic under the judgement report: select the center conduct and the immediate center of current report of similarity maximum; If the different degree at current report and this center is then set up the new center of topic more than or equal to prior preset threshold with current report; If different degree less than threshold value, thinks that then current report belongs to this topic center.
7, according to claim 4 or 5 contents of network topics evolution analysis method, it is characterized in that: the concrete update method of topic center step of updating for vector that current report is formed and center vector do with, form new center vector.
8, according to claim 6 contents of network topics evolution analysis method, it is characterized in that: the concrete update method of topic center step of updating for vector that current report is formed and center vector do with, form new center vector.
CNA2009100720849A 2009-05-22 2009-05-22 Evolution analysis device and method for contents of network topics Pending CN101571853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2009100720849A CN101571853A (en) 2009-05-22 2009-05-22 Evolution analysis device and method for contents of network topics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2009100720849A CN101571853A (en) 2009-05-22 2009-05-22 Evolution analysis device and method for contents of network topics

Publications (1)

Publication Number Publication Date
CN101571853A true CN101571853A (en) 2009-11-04

Family

ID=41231212

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2009100720849A Pending CN101571853A (en) 2009-05-22 2009-05-22 Evolution analysis device and method for contents of network topics

Country Status (1)

Country Link
CN (1) CN101571853A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012917A (en) * 2010-11-26 2011-04-13 百度在线网络技术(北京)有限公司 Information processing device and method
CN102136975A (en) * 2011-02-24 2011-07-27 上海大学 Large-scale network environment-oriented similarity network construction method
CN102419778A (en) * 2012-01-09 2012-04-18 中国科学院软件研究所 Information searching method for mining and clustering sub-topics of query sentences
CN102915341A (en) * 2012-09-21 2013-02-06 人民搜索网络股份公司 Dynamic topic model-based dynamic text cluster device and method
CN102929927A (en) * 2012-09-20 2013-02-13 北京航空航天大学 Method for immediately tracking random event evolution based on Internet mass information
CN102999539A (en) * 2011-09-13 2013-03-27 富士通株式会社 Method and device for forecasting future development trend of given topic
WO2013086931A1 (en) * 2011-12-13 2013-06-20 International Business Machines Corporation Event mining in social networks
CN104199974A (en) * 2013-09-22 2014-12-10 中科嘉速(北京)并行软件有限公司 Microblog-oriented dynamic topic detection and evolution tracking method
CN104715014A (en) * 2015-01-26 2015-06-17 中山大学 Online news topic detection method
CN104915446A (en) * 2015-06-29 2015-09-16 华南理工大学 Automatic extracting method and system of event evolving relationship based on news
WO2015165230A1 (en) * 2014-04-28 2015-11-05 华为技术有限公司 Social contact message monitoring method and device
CN106294405A (en) * 2015-05-22 2017-01-04 国家计算机网络与信息安全管理中心 A kind of microblogging topic evolution analysis method and device
CN106682049A (en) * 2015-11-09 2017-05-17 财团法人资讯工业策进会 Topic display system and topic display method
CN106934049A (en) * 2017-03-16 2017-07-07 天闻数媒科技(北京)有限公司 A kind of the news selected topic analysis method and device
CN109064347A (en) * 2017-06-11 2018-12-21 南京理工大学 Information based on multiple agent is propagated and public sentiment evolution simulation method
CN109558546A (en) * 2018-11-06 2019-04-02 广州大学 A kind of the microblog topic expression model generating method and device of Behavior-based control analysis
CN109635174A (en) * 2018-10-29 2019-04-16 珠海市君天电子科技有限公司 News information flow management method, device, electronic equipment and storage medium
CN111680205A (en) * 2020-06-12 2020-09-18 杨鹏 Event evolution analysis method and device based on event map
CN112069246A (en) * 2020-09-08 2020-12-11 天津大学 Analysis method for event evolution process integration in physical world and network world

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012917B (en) * 2010-11-26 2013-02-20 百度在线网络技术(北京)有限公司 Information processing device and method
CN102012917A (en) * 2010-11-26 2011-04-13 百度在线网络技术(北京)有限公司 Information processing device and method
CN102136975A (en) * 2011-02-24 2011-07-27 上海大学 Large-scale network environment-oriented similarity network construction method
CN102136975B (en) * 2011-02-24 2014-04-02 上海大学 Large-scale network environment-oriented similarity network construction method
CN102999539B (en) * 2011-09-13 2015-11-25 富士通株式会社 Predict the method and apparatus of the future developing trend of given topic
CN102999539A (en) * 2011-09-13 2013-03-27 富士通株式会社 Method and device for forecasting future development trend of given topic
CN104054072B (en) * 2011-12-13 2017-03-29 国际商业机器公司 Event in social networks is excavated
GB2509874A (en) * 2011-12-13 2014-07-16 Ibm Event mining in social networks
CN104054072A (en) * 2011-12-13 2014-09-17 国际商业机器公司 Event mining in social networks
WO2013086931A1 (en) * 2011-12-13 2013-06-20 International Business Machines Corporation Event mining in social networks
US8914371B2 (en) 2011-12-13 2014-12-16 International Business Machines Corporation Event mining in social networks
CN102419778A (en) * 2012-01-09 2012-04-18 中国科学院软件研究所 Information searching method for mining and clustering sub-topics of query sentences
CN102929927A (en) * 2012-09-20 2013-02-13 北京航空航天大学 Method for immediately tracking random event evolution based on Internet mass information
CN102915341A (en) * 2012-09-21 2013-02-06 人民搜索网络股份公司 Dynamic topic model-based dynamic text cluster device and method
CN104199974A (en) * 2013-09-22 2014-12-10 中科嘉速(北京)并行软件有限公司 Microblog-oriented dynamic topic detection and evolution tracking method
WO2015165230A1 (en) * 2014-04-28 2015-11-05 华为技术有限公司 Social contact message monitoring method and device
CN105095228A (en) * 2014-04-28 2015-11-25 华为技术有限公司 Method and apparatus for monitoring social information
US10250550B2 (en) 2014-04-28 2019-04-02 Huawei Technologies Co., Ltd. Social message monitoring method and apparatus
CN104715014A (en) * 2015-01-26 2015-06-17 中山大学 Online news topic detection method
CN104715014B (en) * 2015-01-26 2017-10-10 中山大学 A kind of online topic detecting method of news
CN106294405A (en) * 2015-05-22 2017-01-04 国家计算机网络与信息安全管理中心 A kind of microblogging topic evolution analysis method and device
CN104915446A (en) * 2015-06-29 2015-09-16 华南理工大学 Automatic extracting method and system of event evolving relationship based on news
CN104915446B (en) * 2015-06-29 2019-01-29 华南理工大学 News-based automatic extraction method and system of event evolution relationship
US10459980B2 (en) 2015-11-09 2019-10-29 Institute For Information Industry Display system, method and computer readable recording media for an issue
CN106682049A (en) * 2015-11-09 2017-05-17 财团法人资讯工业策进会 Topic display system and topic display method
CN106682049B (en) * 2015-11-09 2020-04-14 财团法人资讯工业策进会 Issue display system and issue display method
CN106934049A (en) * 2017-03-16 2017-07-07 天闻数媒科技(北京)有限公司 A kind of the news selected topic analysis method and device
CN106934049B (en) * 2017-03-16 2020-08-07 天闻数媒科技(北京)有限公司 News question selection analysis method and device
CN109064347A (en) * 2017-06-11 2018-12-21 南京理工大学 Information based on multiple agent is propagated and public sentiment evolution simulation method
CN109064347B (en) * 2017-06-11 2022-05-17 南京理工大学 Simulation method of information dissemination and public opinion evolution based on multi-agent
CN109635174A (en) * 2018-10-29 2019-04-16 珠海市君天电子科技有限公司 News information flow management method, device, electronic equipment and storage medium
CN109558546A (en) * 2018-11-06 2019-04-02 广州大学 A kind of the microblog topic expression model generating method and device of Behavior-based control analysis
CN111680205A (en) * 2020-06-12 2020-09-18 杨鹏 Event evolution analysis method and device based on event map
CN112069246A (en) * 2020-09-08 2020-12-11 天津大学 Analysis method for event evolution process integration in physical world and network world
CN112069246B (en) * 2020-09-08 2024-01-09 天津大学 An analysis method integrating the event evolution process in the physical world and the cyber world

Similar Documents

Publication Publication Date Title
CN101571853A (en) Evolution analysis device and method for contents of network topics
CN107766324B (en) A Text Consistency Analysis Method Based on Deep Neural Network
CN106202561B (en) Digitlization contingency management case base construction method and device based on text big data
CN105868108B (en) The unrelated binary code similarity detection method of instruction set based on neural network
CN103544255A (en) Text semantic relativity based network public opinion information analysis method
Zhu et al. CCBLA: a lightweight phishing detection model based on CNN, BiLSTM, and attention mechanism
CN111274814B (en) A Novel Semi-Supervised Text Entity Information Extraction Method
CN103631859A (en) Intelligent review expert recommending method for science and technology projects
CN101751455A (en) Method for automatically generating title by adopting artificial intelligence technology
CN101819585A (en) Device and method for constructing forum event dissemination pattern
CN106682123A (en) Hot event acquiring method and device
CN108416034B (en) Information acquisition system based on financial heterogeneous big data and control method thereof
Upadhyaya et al. Intensity-Valued Emotions Help Stance Detection of Climate Change Twitter Data.
Peng et al. Cross-site scripting attack detection method based on transformer
CN118820745A (en) Unsupervised log anomaly detection method integrating sequence and template semantics
CN106599304A (en) Small and medium-sized website-oriented modularized user retrieval intention modeling method
Basile et al. Kronos-it: A dataset for the Italian semantic change detection task
Alkhalifa et al. Qmul-sds@ sardistance: Leveraging network interactions to boost performance on stance detection using knowledge graphs
CN103714093B (en) A kind of method for digging and device of the website emphasis page
CN110750981A (en) High-accuracy website sensitive word detection method based on machine learning
KR20170067558A (en) A malicious comments detection technique on the Internet using support vector machine
CN112686054B (en) Public opinion analysis method and system based on seismic content hot spot
Zhao et al. Gat-Ti: Extracting Entities From Cyber Threat Intelligence Texts
Imran et al. Twitter Sentimental Analysis using Machine Learning Approaches for SemeVal Dataset
Li et al. Log anomaly detection based on parallel fusion of CNN and GRU

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20091104