CN116775868A

CN116775868A - News text classification method, system, storage medium and equipment

Info

Publication number: CN116775868A
Application number: CN202310646587.2A
Authority: CN
Inventors: 耿玉水; 李丽; 梁虎; 赵晶
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-09-19
Anticipated expiration: 2043-05-31
Also published as: CN116775868B

Abstract

The invention provides a news text classification method, a news text classification system, a storage medium and a news text classification device, wherein a semantic dictionary is introduced to enrich a constructed text graph; for each rich text graph, extracting text structural features by using a graph convolution network, and extracting context semantic features of the text by using a BERT model; the extracted text structural features and the context semantic features are interacted by adopting a multi-head attention mechanism, so that the features with two different granularities are combined; and performing feature aggregation on the interacted features by using different aggregation methods to realize classification of news texts. According to the invention, text structural features can be transmitted and extracted by utilizing graph convolution, semantic features are extracted by using BERT, and meanwhile, feature interaction between two different granularity features is realized by utilizing a multi-head attention mechanism, so that the accuracy rate of news text classification can be effectively improved.

Description

News text classification method, system, storage medium and equipment

Technical Field

The invention belongs to the technical field of news text classification, and relates to a news text classification method, a system, a storage medium and equipment.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The traditional text classification method is mainly combined with a machine learning algorithm to train and predict, the basic thought is to use feature engineering to express the text, and then the classifier is used to train and predict the category of the text. Feature engineering, including bag of words models and n-grams, is often used for word-based representation methods to represent sentences by a collection of words or sequences of byte fragments of length n that occur in the sentence. These features combine with models such as support vector machines, naive bayes, etc. in machine learning algorithms to achieve good results. However, the conventional feature extraction method has many drawbacks, and the processing and generalization capabilities of high-dimensional data are poor without considering the context information in the text classification process.

To solve the limitation of the artificial feature engineering, many researchers have applied deep learning to text classification, which has an advantage in that feature representation of data can be automatically learned by using own network structure without manually performing feature engineering. With the advent of word embedding, new solutions are provided for solving natural language tasks. Pre-trained Word embedding has been demonstrated by researchers to capture meaningful text semantic features such as Word2Vec and Glove. In terms of model, textCNN uses a simple convolutional neural network, combining convolutional layers of multiple different-sized convolutional kernels for feature mapping. Recently, pre-trained language models have gained widespread attention, such as BERT employing Masked Language Model (MLM) to pre-train bi-directional transducers to generate deep bi-directional language representations. After pre-training, only one extra output layer is needed to be added for fine adjustment, so that the effect on the downstream task is obviously improved. The attention mechanism is also applied to various models, and the self-attention mechanism in BERT forms a multi-level self-attention network, so that semantic feature representation is enhanced, and the performance of NLP tasks is improved. The BERT cannot fully utilize the structural information of the text.

The graph neural network can capture the characteristics and structural characteristics of nodes in the graph, and can learn more effective node or the representation of the whole graph. Many variants of the graph neural network have been proposed and applied to text classification tasks. The model takes the attribute of the node and the attribute of the adjacent node of the node into consideration to obtain the feature vector of the node, and finally, the tasks of classification, regression and the like of the nodes of the graph are realized. The method has a good effect on node classification tasks. The TextGCN model proposed by Yao et al, for the first time, applies a graph rolling network to text classification. The TextGCN builds a large heterogeneous graph network with words and documents as nodes for the whole corpus and learns the global information structure in the graph. VGCN-BERT is also a graph built over the entire dataset and uses features extracted by the graph convolution network to enhance the effect of BERT. However, for a new document, the entire graph structure needs to be updated to make predictions, and it is difficult to consider local features between words in a single text, weakening the individual functions of each document. TextING builds a graphical structure for each individual text. There are also methods to combine pre-trained language models with the neural network to extract features.

News texts, which are one of the types of data to be classified, have some specificities, and have some classification difficulties with respect to other texts, particularly in the following aspects:

diversity and complexity: news texts cover a wide range of topics including politics, economy, science and technology, entertainment, and the like. This results in diversity and complexity of the text content, making the classification task more challenging.

Semantic and contextual understanding: news text typically involves complex semantics and context. Understanding implicit meaning, emotional tendency, specific reference, etc. in text is critical to correctly classifying news text.

Noise and interference: there may be a significant amount of noise and interference in the news text, such as misspellings, abbreviations, punctuation, etc. These noise and interference can negatively impact the performance of the classification algorithm, requiring proper data cleaning and preprocessing.

Topic crossover and ambiguity: in news text, there may be intersections and correlations between different topics. Some news may involve multiple topics and there may be ambiguity making it difficult to accurately categorize it into a particular topic.

According to the inventor, most of the currently proposed news text classification models process serialized data by using LSTM, BERT and other methods, so that context information of texts can be well processed, and the accuracy of text coding is improved. But does not consider structural information of text. Modeling using GCN can make better use of structural information of text data. And the text graph constructed based on word co-occurrence is usually only considered for the co-occurrence relation between nodes, but not for richer semantic information. Furthermore, some approaches combine text semantic features and structural features to extract, but they do not consider individual text features or interactions between features, which limits their representation capabilities.

Disclosure of Invention

In order to solve the problems, the invention provides a news text classification method, a system, a storage medium and equipment, wherein the invention can utilize graph convolution to carry out text structural feature propagation and extraction, and BERT is used for extracting semantic features, and meanwhile, a multi-head attention mechanism is utilized to realize feature interaction between two different granularity features, so that the accuracy rate of news text classification can be effectively improved.

According to some embodiments, the present invention employs the following technical solutions:

a news text classification method comprising the steps of:

constructing a separate text diagram for each document storing a news text dataset according to word co-occurrence;

introducing a semantic dictionary to enrich the constructed text graph;

for each rich text graph, extracting text structural features by using a graph convolution network, and extracting context semantic features of the text by using a BERT model;

the extracted text structural features and the context semantic features are interacted by adopting a multi-head attention mechanism, so that the features with two different granularities are combined;

and performing feature aggregation on the interacted features by using different aggregation methods to realize classification of news texts.

As an alternative embodiment, the specific process of building a separate text graph for each document from word co-occurrence includes representing the graph of the document with g= (V, E), V representing the set of nodes in the graph, including all words in the text, and E representing the set of edges between the nodes.

Further, the edge weight between two words is calculated by using a normalized point-by-point mutual information method, the greater the frequency of co-occurrence of the two words in a set range is, the greater the weight is, when the semantic relevance between the words in the corpus exceeds a set value, the edge weight is positive, and conversely, is negative, and when only the edge weight is positive, an edge is created between the word pairs.

As an alternative implementation mode, a semantic dictionary is introduced to enrich the concrete process of the constructed text graph, wherein the semantic dictionary organizes a vocabulary semantic network according to semantic relations, the vocabulary is represented by synonym sets, each set marks a vocabulary concept, and simultaneously semantic relations comprising upper and lower positions, partial whole and synonym antisense are expressed by links;

and if the edge weight is a negative value, developing synonyms by using a semantic dictionary, and calculating the semantic similarity between the words in the two synonym sets.

As an alternative implementation mode, the specific process of extracting text structural features by using the graph convolution network comprises the steps of performing convolution operation on a text graph by using the graph convolution network, extracting feature representations of nodes and simultaneously considering structural information among the nodes;

in a graph rolling network, propagation of each layer is performed by updating a weighted sum of a node neighbor of a node and the node itself;

and uses a nonlinear function as an activation function in the graph convolutional network training process.

Further, the graph rolling network comprises a plurality of graph rolling networks which are stacked.

As an alternative embodiment, a specific process of extracting contextual semantic features of text using a BERT model includes the BERT model including an encoder section stacked from multiple layers of encoders, each encoder including a self-attention layer and a feed-forward network layer.

As an alternative implementation, the specific process of using the multi-headed attention mechanism to interact with the extracted text structural features and contextual semantic features includes computing a query matrix, key and value matrix of the text structural features and contextual semantic features, and transmitting the keys and values of the text structural features and contextual semantic features as inputs to each other's multi-headed attention module.

As an alternative embodiment, the specific process of feature aggregation of the interacted features by using different aggregation methods comprises aggregation of the extracted features by using Max-pooling, stitching and element-by-element addition methods

A news text classification system, comprising:

a text graph construction module configured to construct a separate text graph for each document storing a news text dataset based on word co-occurrence;

a text graph enrichment module configured to introduce a semantic dictionary to enrich the constructed text graph;

the feature extraction module is configured to extract text structural features by using a graph convolution network and extract context semantic features of the text by using a BERT model for each rich text graph;

the feature interaction module is configured to interact the extracted text structural features and the context semantic feature features by adopting a multi-head attention mechanism so as to combine the features with two different granularities;

and the aggregation module is configured to aggregate the characteristics after interaction by utilizing different aggregation methods so as to realize classification of the news text.

A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the steps in the method.

A terminal device comprising a processor and a computer readable storage medium, the processor configured to implement instructions; the computer readable storage medium is for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps in the method.

Compared with the prior art, the invention has the beneficial effects that:

according to the method, a graph structure is built for each document according to word co-occurrence, and a semantic dictionary (WordNet) is introduced to enrich the building of text graphs aiming at the problem that word connection is not rich enough and context dependency relations cannot be well captured. The invention also uses the graph convolution network to extract the text structural features. And extracting semantic features of the text by using the BERT model. Then, a feature interaction layer is added, a multi-head attention module is adopted to carry out interaction between two different granularity features, and three aggregation methods are designed to carry out feature aggregation, so that the characterization capability of the extracted features is improved to the greatest extent.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a schematic diagram of the overall model structure used in the present embodiment;

FIG. 2 is a schematic diagram of the feature interactions of the present embodiment;

fig. 3 is a schematic diagram showing the comparison of the effects of the method of the present embodiment and other methods.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

The invention provides a news text classification method, which comprises the steps of firstly, respectively constructing a graph structure for each document according to word co-occurrence, wherein word connection is not abundant enough, and context dependency relations cannot be well captured, so that a semantic dictionary (WordNet) is introduced, and the construction of text graphs is enriched. And extracting text structural features using a graph convolution network. And extracting semantic features of the text by using the BERT model. Then, a feature interaction layer is added, a multi-head attention module is adopted to carry out interaction between two different granularity features, three aggregation methods are designed to carry out feature aggregation, therefore, the characterization capability of the extracted features is improved to the maximum extent, and finally, a full-connection layer is used for predicting the category, and the final category of the text is obtained.

As shown in fig. 1, the steps of the method of the present embodiment are described in conjunction with the model structure:

step one: text diagram construction

The data set used in this embodiment is a news data set, which is made up of a plurality of news documents, and the present invention constructs an individual text map for each news document text in the data set.

In order to process text data using a graph neural network, a text graph needs to be constructed. A graph structure describing the semantic structure of the text is constructed based on the word co-occurrence relationship and the semantic dictionary. The word co-occurrence relationship can be used for measuring the relativity between two words so as to describe the semantically similarity and relativity of the words. And introducing a semantic dictionary co-building text graph to capture the relation among more words and enrich the relation into text graph information.

In an embodiment, a separate graph is created for a document. Formally, expressed as g= (V, E). V represents the set of nodes in the graph, including all words in the text. And E represents the set of edges between nodes. In this embodiment, for each piece of data in the dataset, the text is pre-processed using standard methods to remove stop words defined in the NLTK and low frequency words that occur less than a set number of times (e.g., 5) in the entire corpus.

A text graph is constructed herein using a normalized point-by-point mutual information (NPMI) semantic dictionary (WordNet). For two words i and j, the NPMI method is used to calculate the edge weights between the two words. Formally, the edge weights between word node i and word node j are defined as:

wherein the method comprises the steps of# w (i, j) is the number of all sliding windows that contain word i, j at the same time, # w (i) is the number of all sliding windows that contain word i, # w is the total number of sliding windows. To obtain long-term dependence, the window setting is set to the entire sentence herein. NPMI takes on the value range of [ -1,1]. In brief, the greater the frequency of co-occurrence of two words over a range, the greater the weight. NPMI values are positive when the semantic relevance between words in the corpus is high. Conversely, the semantic relevance between words is low or sometimes no NPMI value is negative. Thus, only one edge is created between word pairs when the NPMI value is positive.

However, the calculation of NPMI depends on the corpus. When the probability of occurrence of some words in the corpus is low, the calculation result of NPMI may be small, so as to judge that the similarity between the words is low or dissimilar. In addition, there are many links between words, many words representing the same concept and being interchangeable in many cases, most directly being synonyms. In this case, the text diagram constructed by only word co-occurrence ignores the word relation information of the part.

In order to better utilize the synonym relation of words to enrich text graph information, a semantic dictionary is introduced. WordNet is selected for use. WordNet organizes a semantic network of words according to semantic relationships, the words are represented by synonym sets, each set marks a word concept, and semantic relationships such as upper and lower levels, partial whole, synonym antisense and the like are expressed through links. Thus, a relatively complete semantic network is formed, a good concept hierarchical structure is provided, and semantic relations among synonym sets are realized through links. Thus, we first use WordNet to develop synonyms and then use the WordNet-based Wup method to calculate semantic similarity between words in the two synonym sets. The Wup method considers the path between two concepts, the common father node of the two concepts and the related depth of the father node in the classification tree, returns a score, and represents the similarity degree of the two word senses. The average of all similarity scores is calculated here, and the larger the value obtained, the higher the semantic similarity is. This minimizes problems caused by certain low frequency words in the corpus and adds more information about the relationships between the words.

Therefore, for two words i and j, if the calculated NPMI value is negative, using the Wup method based on WordNet, the semantic similarity between the words in the two synonym sets is calculated as:

wherein let c ₃ Is S _x 、P _y Is the deepest common child node of (a)。N ₁ Represent S _x To c ₃ Distance N of (2) ₂ Representing P _y To c ₃ Distance N of (2) ₃ Representation c ₃ Distance to the root node of the concept hierarchy tree. Let c ₁ Is the synonym set for word i, i.e. c ₁ ＝{S ₁ ,S ₂ ,...,S _x }，x∈[1,L ₁ ]，L ₁ C is ₁ Number of words in the collection. Let c ₂ Is the synonym set for word j, i.e. c ₂ ＝{P ₁ ,P ₂ ,...,P _y }，y∈[1,L ₂ ]，L ₂ C is ₂ Number of words in the collection. n represents the word logarithm with a similarity of 0. The value range of Sim (i, j) is [0,1]. Here it is provided that when the value of Sim (i, j) is greater than 0.5, then an edge is created between the two words.

When the text graph is constructed, an independent text graph is constructed for each news document, single text features can be considered, finer granularity features can be captured, a semantic dictionary (WordNet) is introduced, the problem of ambiguity caused by co-occurrence of words only can be solved, a word set is expanded, and the links among words are enriched. Providing a certain help to the topic crossing and ambiguity problems in news text classification.

Step two: GCN-based feature extraction

For each text graph, the graph rolling neural network is used herein for feature propagation and extraction. The GCN carries out convolution operation on the text graph, extracts characteristic representation of the nodes, and also considers structural information among the nodes. The structural information among the nodes can help the GCN to better capture the semantic information of the text data, and the representation capability of the text data is improved, so that the accuracy of text classification is improved.

In GCN, the propagation of each layer is done by updating the weighted sum of a node's node neighbors and the node itself. For single layer GCN, the new representation is calculated as follows:

wherein X is E R ^|v|×m Is an input matrix of |v| nodes and m-dimensional features. Is a normalized symmetric adjacency matrix, D is a degree matrix and D _ii ＝∑ _j A _ij The purpose of the A normalization is to solve the problems of unstable feature values and gradient extinction/explosion.Is an adjacency matrix plus a self-loop matrix, I _n Is an identity matrix. W (W) ₀ Is a weight matrix and σ is an activation function.

When multiple GCN layers are stacked, there is more neighborhood information integrated. Specifically, the multi-layer GCN, new calculations are expressed as follows:

wherein l represents the number of layers, H ⁽⁰⁾ =x is the initial feature matrix, here initialized with Glove word vectors and trained using two layers of GCN. The dimension of the second layer GCN is the number of classes of the dataset. Through two layers of message transmission, the final word characteristic representation is obtained

Here, the nonlinear function LeakyReLU is chosen as the activation function in the GCN training process. Not only can the complexity of the model be reduced, but also the risk of overfitting can be reduced. The LeakyReLU function is as follows:

step three: BERT-based feature extraction

In addition to using GCN to obtain text structural features, BERT is used herein to extract contextual semantic features. BERT can understand the meaning and role of words in a sentence, thereby achieving better context understanding capabilities. And BERT can also infer well what the ambiguous word should express in the current context, thus better representing the text.

BERT is transform-based, but it uses only the encoder part of the transform, each encoder consisting of two sublayers: a self-attention layer and a feed-forward network layer. For self-attention, the entire calculation process can be expressed as:

wherein Q (Query), K (Key) and V (Value) are matrices of Query, key and Value, respectively, all from the same input. d, d _k Is the dimension of the matrix.

The BERT global framework is stacked by encoders of multiple layers of transducers. The encoder of each layer is composed of a layer of multi-head attention and a layer of feedforward network. Each set of attentiveness is used to map the input to a different sub-representation space, which allows the model to focus on different locations in the different sub-representation spaces. For multi-headed attention, the overall calculation process can be expressed as:

MultiHead(Q,K,V)＝CONCAT(head ₄ ,head ₂ ,...,head _h ) (8)

extracting features by BERT to obtain final word feature representation

In the embodiment, the BERT is used for extracting the context semantic features, has strong semantic representation capability and can capture richer semantic information. Structural information of text data, such as adjacency relations and relative positions between nodes, is extracted using the GCN. Structural information between nodes can help the GCN better capture semantic information of the text data. By interacting the text information extracted by the GCN and the BERT, the processing capability of the graph structure of the GCN and the semantic understanding capability of the BERT can be utilized to complement each other. The multi-headed attention module may help the model better capture the correlation and importance between the two. In the feature aggregation stage, the most remarkable features can be extracted by using maximum pooling, complementary information among different features can be reserved by element-by-element addition, and the dimensions of the different features can be connected together by splicing. Thus, richer and more comprehensive characteristic representations can be generated, and the accuracy of text classification can be improved.

Step four: feature interactions

In order to utilize the advantages of feature expression with different granularity, the feature expression capability is improved, and the classification accuracy is improved. And the interaction of the structural features and the semantic features is realized through a multi-head attention module by extracting the structural features and the semantic features of the texts extracted by the GCN and the BERT.

As in the standard self-attention mechanism, the query matrix (Q), key (K) and value (V) matrix are computed from the text representations extracted by GCN and BERT. The keys (K) and values (V) of the GCN and BERT extracted text features are passed as inputs to each other's multi-headed note module, as shown in FIG. 2.

Specifically, first byAnd->The corresponding query matrix (Q), key matrix (K) and value matrix (V) are calculated respectively. Wherein:

unlike the self-attention mechanism, take Q ^L 、K ^bert And V ^bert Obtaining H as an input to equation (7) ^L Taking Q ^bert 、K ^L And V ^L Obtaining H as an input to equation (7) ^bert . Wherein W is ^Q 、W ^K 、W ^V Is a parameter matrix.

Then, an attention representation of the GCN on the condition of the BERT output and an attention representation of the BERT on the condition of the GCN output can be obtained. The extracted text structural features and semantic features are interacted, and the advantages expressed among the features with different granularity are fully utilized.

Step five: feature aggregation

The feature of different granularity is polymerized, so that the classification accuracy and performance can be improved, the model structure is simplified, and the robustness and stability of the model are improved. Three methods are used herein to aggregate extracted features. Including Max-pooling, stitching, and element-by-element addition. For word w in the sequence _j The features extracted by GCN and BERT are respectively marked asAnd->The specific method comprises the following steps:

max-working: allowing each word to function in the text, the keywords should function more explicitly. The maximum eigenvalues of the two features are selected in each dimension to form the final representation:

concatenation: the two features are directly spliced, so that the output of each module can be kept complete. The final representation is:

element-wise addition: the method is realized by directly adding the data of the corresponding dimension, the dimension of the feature vector is not changed in the mode, and the final representation form is as follows:

where # -operation means element-by-element addition.

The final representation of the entire document j is denoted as H _j 。

FIG. 3 is an ablation experiment performed on three news datasets of 20NG, AGNews, R, BEGCN-nonword net is a network of learning graphs that uses only normalized point-by-point mutual information to construct a text graph. BEGCN-noAttention is a model without a feature interaction module, namely only features extracted by BERT and GCN, and the structural information obtained by GCN is spliced with semantic information obtained by BERT. No interaction between the two features is performed. BEGCN is the complete news text classification model proposed by this embodiment.

It can be seen that this example uses three news data sets of 20NG, AGNews, R for experiments, and for each news data set, a berttoken was used to segment the document and 10% training data was used for verification. The gap between the model herein and the current most advanced model is compared. The results indicate that the model herein is very competitive compared to other models. According to an ablation experiment, a model which introduces a semantic dictionary and uses a multi-head attention module to perform feature interaction when a text diagram is constructed is greatly improved compared with a baseline model, and the method provided by the model can be directly described to obviously improve the text classification performance.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. The news text classifying method is characterized by comprising the following steps:

introducing a semantic dictionary to enrich the constructed text graph;

2. A method of classifying news text according to claim 1, wherein the specific process of constructing a separate text graph for each document based on word co-occurrence includes representing the graph of the document with g= (V, E), V representing a set of nodes in the graph including all words in the text, E representing a set of edges between the nodes.

3. A news text classifying method according to claim 2, characterized in that the edge weight between two words is calculated by means of normalized point-by-point mutual information, the greater the frequency of co-occurrence of two words in a set range, the greater the weight, and when the semantic relevance between words in the corpus exceeds a set value, the edge weight is positive, and conversely is negative, and when only the edge weight is positive, an edge is created between pairs of words.

4. A news text classification method according to claim 1 or 3, characterized in that the specific process of introducing a semantic dictionary for enriching the built text graph comprises that said semantic dictionary organizes a lexical semantic network according to semantic relations, the lexical being represented by sets of synonyms, each set designating a lexical concept and simultaneously expressing semantic relations comprising upper and lower positions, partial integers, synonyms and antisense words by links;

5. The news text classification method of claim 1, wherein the specific process of extracting text structural features using a graph convolutional network includes performing a convolutional operation on a text graph using the graph convolutional network, extracting feature representations of nodes while considering structural information between the nodes;

and the nonlinear function is used as an activation function in the graph convolution network training process;

alternatively, a specific process for extracting contextual semantic features of text using a BERT model includes the BERT model including an encoder section stacked from multiple layers of encoders, each encoder including a self-attention layer and a feed-forward network layer.

6. A method of classifying news text as in claim 1, wherein the specific process of interacting the extracted text structural features and contextual semantic features using a multi-headed attentiveness mechanism includes computing a query matrix, key and value matrix of text structural features and contextual semantic features, and passing the key and value of the text structural features and contextual semantic features as inputs to each other multi-headed attentiveness modules.

7. The news text classifying method as claimed in claim 1, wherein the specific process of feature aggregation of the interacted features using different aggregation methods includes aggregation of the extracted features using Max-pooling, stitching and element-by-element addition methods.

8. A news text classification system, comprising:

9. A computer readable storage medium, characterized in that a plurality of instructions are stored, said instructions being adapted to be loaded by a processor of a terminal device and to perform the steps of the method of any of claims 1-7.

10. A terminal device, comprising a processor and a computer readable storage medium, the processor configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method of any of claims 1-7.