Disclosure of Invention
In order to solve the problems, the application provides a weight-loss prevention method and device for a knowledge graph, which are used for solving the problem that the accuracy of the answer to the question is not high frequently in practical application of the question and answer based on the knowledge graph.
Based on this, the embodiment of the application discloses the following technical scheme:
in one aspect, an embodiment of the present application provides a method for preventing duplication of a knowledge graph, where the method includes:
acquiring a knowledge graph corresponding to the knowledge to be stored;
adjusting the map labels in the knowledge map according to a first preset rule to obtain a target knowledge map;
converting the target knowledge graph into a target number string according to a second preset rule;
judging whether a database table comprises a number string which is the same as the target number string or not by a number query mode, wherein the database table comprises a plurality of number strings which are obtained by a plurality of knowledge maps corresponding to stored knowledge according to the first preset rule and the second preset rule;
and if the database table comprises the numeric string which is the same as the target numeric string, refusing to write the target knowledge graph into the database.
Optionally, the converting the target knowledge graph into a target numeric string according to a second preset rule includes:
and calculating a target code corresponding to the target knowledge graph in a Hash coding mode, and taking the target code as the target digit string.
Optionally, the converting the target knowledge graph into a target numeric string according to a second preset rule includes:
respectively acquiring index values corresponding to map labels in the knowledge map;
and combining the index values according to the sequence of the map labels in the target knowledge map to obtain a target number string.
Optionally, the method further includes:
if the database table does not contain the numeric string which is the same as the target numeric string, judging whether the target knowledge graph is stored in the database or not in a character string query mode;
if the target knowledge graph is stored in the database, refusing to write the target knowledge graph into the database;
and if the target knowledge graph is not stored in the database, writing the target knowledge graph into the database.
Optionally, the adjusting, according to a first preset rule, the atlas tag in the knowledge atlas to obtain the target knowledge atlas includes:
and adjusting the sequence of the map labels in the knowledge map according to the sequence of the first letters of the map labels in the knowledge map to obtain the target knowledge map.
On the other hand, the embodiment of the application provides a duplication preventing device for the knowledge graph, which comprises an acquisition unit, a first conversion unit, a second conversion unit, a judgment unit and an execution unit;
the acquisition unit is used for acquiring a knowledge graph corresponding to the knowledge to be stored;
the first conversion unit is used for adjusting the map labels in the knowledge map according to a first preset rule to obtain a target knowledge map;
the second conversion unit is used for converting the target knowledge graph into a target numeric string according to a second preset rule;
the judging unit is used for judging whether a database table comprises a number string which is the same as the target number string or not in a number query mode, the database table comprises a plurality of number strings, and the number strings are obtained by a plurality of knowledge maps corresponding to stored knowledge according to the first preset rule and the second preset rule;
and the execution unit is used for refusing to write the target knowledge graph into the database if the database table comprises the numeric string which is the same as the target numeric string.
Optionally, the second conversion unit is configured to:
and calculating a target code corresponding to the target knowledge graph in a Hash coding mode, and taking the target code as the target digit string.
Optionally, the second conversion unit is configured to:
respectively acquiring index values corresponding to map labels in the knowledge map;
and combining the index values according to the sequence of the map labels in the target knowledge map to obtain a target number string.
Optionally, the execution unit is further configured to:
if the database table does not contain the numeric string which is the same as the target numeric string, judging whether the target knowledge graph is stored in the database or not in a character string query mode;
if the target knowledge graph is stored in the database, refusing to write the target knowledge graph into the database;
and if the target knowledge graph is not stored in the database, writing the target knowledge graph into the database.
Optionally, the first conversion unit is configured to:
and adjusting the sequence of the map labels in the knowledge map according to the sequence of the first letters of the map labels in the knowledge map to obtain the target knowledge map.
Compared with the prior art, the technical scheme of the application has the advantages that:
the pre-established database comprises a plurality of stored knowledge, and the knowledge graph corresponding to each stored knowledge can obtain a corresponding digital string according to a first preset rule and a second preset rule, so that the digital string is stored in the database table corresponding to the database and can be used as a unique identifier of the knowledge graph. When storing new knowledge, acquiring a knowledge graph corresponding to the knowledge to be stored, adjusting graph labels in the knowledge graph according to a first preset rule to obtain a target knowledge graph, thereby avoiding the subsequent problem that target numeric strings generated by the same knowledge graph due to factors such as different graph label sequences and the like, converting the target knowledge graph into the target numeric strings according to a second preset rule so as to judge whether the database table comprises the numeric strings same as the target numeric strings through a digital query mode, if the database table comprises the numeric strings same as the target numeric strings, indicating that the knowledge graph corresponding to the knowledge to be stored is associated in other knowledge, refusing to write the target knowledge graph into the database, thereby preventing the problem that the knowledge graphs of different knowledge in the database are repeated, and avoiding matching a plurality of query sentences through one knowledge graph, therefore, the condition of wrong answers is returned, and the accuracy of the answers to the questions is improved. Meanwhile, for the database, the digital query mode is faster than the character string query mode, so that the efficiency of problem re-judgment prevention is improved.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The key points of the knowledge graph-based question-answering are the construction of the knowledge graph and the combing of knowledge. The storage elements of knowledge in the database generally include knowledge identifiers, knowledge titles, knowledge directories, knowledge states (valid or invalid), knowledge creators, updaters, knowledge creation time update time and the like, and when the knowledge graph is used, graph labels are also required to be associated, and one piece of knowledge may need to be expressed by a knowledge graph formed by a plurality of graph labels.
Taking a certain scenario of the financial industry as an example, the map labels can be divided into product, attribute, operation, channel and condition dimensions. For example, when the knowledge is "telephone bank modifies credit card transaction password", the corresponding knowledge map includes four map labels, which are credit card (product), transaction password (attribute), modification (operation), and telephone bank (channel). When the knowledge is 'credit card overseas ATM withdrawal handling fee', the corresponding knowledge map comprises five map labels which are credit card (product), handling fee (attribute), withdrawal (operation), ATM (channel) and overseas (condition).
According to the above-mentioned knowledge graph combing process, the knowledge graphs corresponding to different knowledge should be different. However, in the actual knowledge combing process, due to the fact that the knowledge understanding levels of the dealers are different, the dealers are different and the like, repeated problems can occur to knowledge maps with different knowledge, a plurality of query sentences can be matched according to one knowledge map in the actual knowledge problems, so that a plurality of answers are corresponding, and if the returned answers are not the answers required by the user, the accuracy rate of question answering is low.
Therefore, in order to prevent the new knowledge from being related to other knowledge, the new knowledge is required to be verified again, so that the repeated occurrence of the knowledge is prevented from influencing the accuracy of question answering.
In the related technology, whether the same knowledge graph exists in the database or not is searched in a character string query mode for the knowledge graph to be stored. Specifically, a corresponding query statement is searched according to the record of knowledge in a database table, then a plurality of map labels corresponding to the query statement are found in the database in a character string query mode, and then the map labels of the knowledge map corresponding to the knowledge to be stored are compared with the plurality of map labels corresponding to the query statement. Due to the fact that the database has low searching efficiency for searching the character string, performance is poor, and especially under the condition of more knowledge, the efficiency of anti-replay verification is very low.
Based on this, the embodiment of the application provides a method and a device for preventing duplication of a knowledge graph, a pre-established database comprises a plurality of stored knowledge, and the knowledge graph corresponding to each stored knowledge can obtain a corresponding number string according to a first preset rule and a second preset rule, so as to be stored in a database table corresponding to the database so as to serve as a unique identification attribute of the knowledge graph. When storing new knowledge, acquiring a knowledge graph corresponding to the knowledge to be stored, adjusting graph labels in the knowledge graph according to a first preset rule to obtain a target knowledge graph, thereby avoiding the subsequent problem that target numeric strings generated by the same knowledge graph due to factors such as different graph label sequences and the like, converting the target knowledge graph into the target numeric strings according to a second preset rule so as to judge whether the database table comprises the numeric strings same as the target numeric strings through a digital query mode, if the database table comprises the numeric strings same as the target numeric strings, indicating that the knowledge graph corresponding to the knowledge to be stored is associated in other knowledge, refusing to write the target knowledge graph into the database, thereby preventing the problem that the knowledge graphs of different knowledge in the database are repeated, and avoiding matching a plurality of query sentences through one knowledge graph, therefore, the condition of wrong answers is returned, and the accuracy of the answers to the questions is improved. Meanwhile, for the database, the digital query mode is faster than the character string query mode, so that the efficiency of problem re-judgment prevention is improved.
Referring to fig. 1, a method for preventing duplication of a knowledge graph according to an embodiment of the present application will be described. Referring to fig. 1, the figure is a flowchart of a method for preventing duplication of a knowledge graph provided in the present application, and the method may include the following steps 101-105.
S101: and acquiring a knowledge graph corresponding to the knowledge to be stored.
In practical applications, if a user wants to store a certain knowledge in a database, the knowledge graph corresponding to the knowledge to be stored may be input into the terminal device, and the terminal device may perform the subsequent S102-S105, or the terminal device sends a storage request to the server, and the server performs the subsequent S102-S105, where the storage request carries the knowledge graph corresponding to the storage.
The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like, but is not limited thereto; the server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers.
S102: and adjusting the map label in the knowledge map according to a first preset rule to obtain the target knowledge map.
The knowledge graph may correspond to a plurality of graph labels, and in order to avoid that subsequently generated target number strings are different due to the fact that the plurality of graph labels have the same substantial content but are different in sequence or expression, the graph labels in the knowledge graph are adjusted according to a first preset rule. Three methods are described below as examples.
The first method is as follows: and adjusting the sequence of the map labels in the knowledge map according to the sequence of the first letters of the map labels in the knowledge map to obtain the target knowledge map.
For example, the map labels in the knowledge map are credit card, commission charge, withdrawal, ATM, oversea, the corresponding initial is X, S, Q, A, J, and the map label sequence of the target knowledge map obtained by adjusting the mode one is ATM, oversea, withdrawal, commission charge, and credit card.
The second method comprises the following steps: and adjusting the sequence of the map labels in the knowledge map according to the sequence of the first letters of the dimensionality of the map labels in the knowledge map to obtain the target knowledge map.
For example, the atlas labels in the knowledge atlas are a credit card (product), a transaction password (attribute), a modification (operation), and a telephone bank (channel), the initial corresponding to the dimension is C, S, C, Q, and the atlas label sequence of the target knowledge atlas adjusted in the second way is a credit card, a modification, a telephone bank, and a transaction password, where if there are repeated initials, the sequence may be sorted according to the initial of the second word, and the like, and this application is not particularly limited thereto.
The third method comprises the following steps: and adjusting the sequence of the map labels in the knowledge map according to the sequence of the map labels in the knowledge map to the dimension setting to obtain the target knowledge map.
For example, the atlas labels in the knowledge graph are credit cards (products), transaction passwords (attributes), modification (operations), and telephone banks (channels), the sequence of dimension setting is products, attributes, operations, and channels, and the atlas label sequence of the target knowledge graph obtained by adjusting in the third mode is credit cards, transaction passwords, modification, and telephone banks.
S103: and converting the target knowledge graph into a target number string according to a second preset rule.
The performance is poor due to the low efficiency of the database in searching the character string query mode, especially under the condition of more knowledge stored in the database. Therefore, the embodiment of the application proposes that a character string query mode is not used, but a numerical query mode is used for searching. Based on this, the type of the target knowledge-graph needs to be converted from a string type to an integer type (int value).
The present application does not specifically limit the specific content of the second preset rule, and two ways are described below as examples.
The first method is as follows: and calculating a target code corresponding to the target knowledge graph in a Hash coding mode, and taking the target code as a target digit string.
The hash encoding method (hash) is simple in calculation method, and the collision rate of hash values is low, so that the hash encoding method (hash) is suitable for being used as uniqueness judgment. The target numeric string obtained in the hashcode mode can be used as the unique identifier of the target knowledge graph, and the type of the target knowledge graph can be converted from the character string type to the integer type.
By using the characteristics of simple hash mode and low collision rate, the hash value is used as the judgment standard of uniqueness of the map label, and the problem of low repeated check efficiency of the map label in the knowledge maintenance process can be effectively solved.
The hash means that an input with an arbitrary length is converted into an output with a fixed length by a hash algorithm, and the output is a hash value. (different keywords may get the same hash value after being transformed by the hash algorithm, which is called collision; if two hash values are different (if the same hash algorithm is assumed), the original inputs corresponding to the two hash values must be different).
The hashcode is used to determine the storage address of an object in the hash storage structure.
The second method comprises the following steps: and respectively obtaining index values corresponding to the map labels in the knowledge map, and combining the index values according to the sequence of the map labels in the target knowledge map to obtain the target numeric string.
The corresponding relation between the map labels in the knowledge map and the index values can be preset, the index values corresponding to the map labels in the knowledge map are obtained, and after the target knowledge map is obtained through adjustment, the index values are combined according to the sequence of the map labels in the target knowledge map, and the target digit string is obtained.
For example, the map label in the knowledge map is a credit card, a transaction password, a modification, and a telephone bank, the index value of the credit card is 11, the index value of the transaction password is 21, the modified index value is 31, and the index value of the telephone bank is 41, and if the map label sequence of the target knowledge map obtained by the adjustment in the third way described in the above S102 is a credit card, a transaction password, a modification, and a telephone bank, the target number string obtained by the corresponding method is 11213141.
S104: and judging whether the database table comprises the numeric string same as the target numeric string or not in a numeric query mode.
The pre-established database comprises a plurality of stored knowledge, and the knowledge map corresponding to each stored knowledge is stored in the database table corresponding to the database so as to be used as the unique identifier of the knowledge map by firstly converting the knowledge map according to a first preset rule and then obtaining a corresponding number string according to a second preset rule. It should be noted that the database table includes a plurality of numeric strings, which are all verified through the duplication prevention method of the knowledge graph provided by the present application, and the numeric strings correspond to the knowledge graph one to one.
Meanwhile, for the database, the digital query mode is faster than the character string query mode, so that the efficiency of problem re-judgment prevention is improved.
As a possible implementation manner, if the lengths of the numbers obtained by the second manner in S103 are not equal, the length of the target number string may be obtained first, then the number string with the same length is obtained in the database table based on the length of the target number string, and then whether the database table includes the number string that is the same as the target number string is determined based on the number query manner, thereby increasing the speed of determination.
S105: and if the database table comprises the numeric string which is the same as the target numeric string, refusing to write the target knowledge graph into the database.
The unique identification field for representing the knowledge graph is added in the design of a database table of knowledge, and when the knowledge is newly added or modified, whether the knowledge appears in the database table is judged through the unique identification field of the knowledge graph label, so that the problem of graph label repetition is solved, and the efficiency of knowledge maintenance is effectively improved.
As a possible implementation manner, if the target number string is obtained in the first manner in S103, and if the database table includes a number string that is the same as the target number string, it indicates that the knowledge graph corresponding to the knowledge to be stored is associated with other knowledge, and the target knowledge graph is rejected from being written into the database.
If the hashcodes are the same, whether the specific label contents are the same or not needs to be further judged, so that the hash code collision is avoided. It should be noted that, because the probability of collision is extremely low, the method can still greatly improve the efficiency of judging the weight, that is, if the database table does not include the number string the same as the target number string, it is judged whether the target knowledge graph is stored in the database by means of character string query.
And if the target knowledge graph is stored in the database, the knowledge graph corresponding to the knowledge to be stored is associated in other knowledge, the target knowledge graph is refused to be written into the database, and if the target knowledge graph is not stored in the database, the target knowledge graph is written into the database.
According to the technical scheme, the pre-established database comprises a plurality of stored knowledge, and the knowledge graph corresponding to each stored knowledge can obtain the corresponding numeric string according to the first preset rule and the second preset rule, so that the numeric string is stored in the database table corresponding to the database and can be used as the unique identifier of the knowledge graph. When storing new knowledge, acquiring a knowledge graph corresponding to the knowledge to be stored, adjusting graph labels in the knowledge graph according to a first preset rule to obtain a target knowledge graph, thereby avoiding the subsequent problem that target numeric strings generated by the same knowledge graph due to factors such as different graph label sequences and the like, converting the target knowledge graph into the target numeric strings according to a second preset rule so as to judge whether the database table comprises the numeric strings same as the target numeric strings through a digital query mode, if the database table comprises the numeric strings same as the target numeric strings, indicating that the knowledge graph corresponding to the knowledge to be stored is associated in other knowledge, refusing to write the target knowledge graph into the database, thereby preventing the problem that the knowledge graphs of different knowledge in the database are repeated, and avoiding matching a plurality of query sentences through one knowledge graph, therefore, the condition of wrong answers is returned, and the accuracy of the answers to the questions is improved. Meanwhile, for the database, the digital query mode is faster than the character string query mode, so that the efficiency of problem re-judgment prevention is improved.
In addition to the provided method for preventing duplication of the knowledge graph, the embodiment of the application also provides a duplication preventing device for the knowledge graph, as shown in fig. 2, which includes an obtaining unit 201, a first converting unit 202, a second converting unit 203, a judging unit 204 and an executing unit 205;
the acquiring unit 201 is configured to acquire a knowledge graph corresponding to knowledge to be stored;
the first conversion unit 202 is configured to adjust the atlas label in the knowledge atlas according to a first preset rule, so as to obtain a target knowledge atlas;
the second conversion unit 203 is configured to convert the target knowledge graph into a target numeric string according to a second preset rule;
the judging unit 204 is configured to judge whether a database table includes a number string that is the same as the target number string by a number query method, where the database table includes a plurality of number strings, and the number strings are obtained by a plurality of knowledge maps corresponding to stored knowledge according to the first preset rule and the second preset rule;
the executing unit 205 is configured to refuse to write the target knowledge graph into the database if the database table includes a number string that is the same as the target number string.
Optionally, the second converting unit 203 is configured to:
and calculating a target code corresponding to the target knowledge graph in a Hash coding mode, and taking the target code as the target digit string.
Optionally, the second converting unit 203 is configured to:
respectively acquiring index values corresponding to map labels in the knowledge map;
and combining the index values according to the sequence of the map labels in the target knowledge map to obtain a target number string.
Optionally, the execution unit 205 is further configured to:
if the database table does not contain the numeric string which is the same as the target numeric string, judging whether the target knowledge graph is stored in the database or not in a character string query mode;
if the target knowledge graph is stored in the database, refusing to write the target knowledge graph into the database;
and if the target knowledge graph is not stored in the database, writing the target knowledge graph into the database.
Optionally, the first conversion unit 202 is configured to:
and adjusting the sequence of the map labels in the knowledge map according to the sequence of the first letters of the map labels in the knowledge map to obtain the target knowledge map.
The device for preventing repetition of the knowledge graph provided by the embodiment of the application comprises a plurality of pieces of stored knowledge in a pre-established database, wherein the knowledge graph corresponding to each piece of stored knowledge obtains a corresponding digital string according to a first preset rule and a second preset rule, and the digital strings are stored in a database table corresponding to the database so as to serve as a unique identifier of the knowledge graph. When storing new knowledge, acquiring a knowledge graph corresponding to the knowledge to be stored, adjusting graph labels in the knowledge graph according to a first preset rule to obtain a target knowledge graph, thereby avoiding the subsequent problem that target numeric strings generated by the same knowledge graph due to factors such as different graph label sequences and the like, converting the target knowledge graph into the target numeric strings according to a second preset rule so as to judge whether the database table comprises the numeric strings same as the target numeric strings through a digital query mode, if the database table comprises the numeric strings same as the target numeric strings, indicating that the knowledge graph corresponding to the knowledge to be stored is associated in other knowledge, refusing to write the target knowledge graph into the database, thereby preventing the problem that the knowledge graphs of different knowledge in the database are repeated, and avoiding matching a plurality of query sentences through one knowledge graph, therefore, the condition of wrong answers is returned, and the accuracy of the answers to the questions is improved. Meanwhile, for the database, the digital query mode is faster than the character string query mode, so that the efficiency of problem re-judgment prevention is improved.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the units and modules described as separate components may or may not be physically separate. In addition, some or all of the units and modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is directed to embodiments of the present application and it is noted that numerous modifications and adaptations may be made by those skilled in the art without departing from the principles of the present application and are intended to be within the scope of the present application.