CN120123491A - A semantic retrieval method, device, equipment and storage medium based on reordering - Google Patents

A semantic retrieval method, device, equipment and storage medium based on reordering Download PDF

Info

Publication number
CN120123491A
CN120123491A CN202510194607.6A CN202510194607A CN120123491A CN 120123491 A CN120123491 A CN 120123491A CN 202510194607 A CN202510194607 A CN 202510194607A CN 120123491 A CN120123491 A CN 120123491A
Authority
CN
China
Prior art keywords
data
semantic
structured data
vector
query text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510194607.6A
Other languages
Chinese (zh)
Inventor
蒋涉权
张志彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bigo Technology Pte Ltd
Original Assignee
Bigo Technology Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bigo Technology Pte Ltd filed Critical Bigo Technology Pte Ltd
Priority to CN202510194607.6A priority Critical patent/CN120123491A/en
Publication of CN120123491A publication Critical patent/CN120123491A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a semantic retrieval method, a semantic retrieval device, semantic retrieval equipment and a storage medium based on reordering. According to the technical scheme provided by the embodiment of the application, the first matching score of dense vectors of a plurality of structured data in a vector library and the first matching score corresponding to the query text and the second matching score of sparse vectors of a plurality of structured data and the query text are determined, fusion processing is carried out on the first matching score and the second matching score to obtain fusion scores corresponding to the structured data, a first number of candidate data are screened out from the structured data according to the fusion scores, semantic processing is carried out on the candidate data to obtain semantic data corresponding to each candidate data, the first number of candidate data are reordered according to the query text and the semantic data, a second number of search results are screened out from the first number of candidate data according to the reordering results, balance between search efficiency and accuracy is achieved, and semantic search effect is improved.

Description

Semantic retrieval method, device, equipment and storage medium based on reordering
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a semantic retrieval method, device and equipment based on reordering and a storage medium.
Background
With the development of artificial intelligence and big data technology, semantic retrieval technology has become an important direction in the field of information retrieval, such as a semantic retrieval scheme based on sparse vectors and a semantic retrieval scheme based on dense vectors. The semantic retrieval scheme based on sparse vectors can obtain better retrieval effect when short text or clear keyword query is processed by mapping the text to a high-dimensional vector space, can better capture semantic information of the text by mapping the text to the high-dimensional vector space, and can obtain better retrieval effect when long text query is processed.
However, the recall quality of the semantic retrieval scheme of the sparse vector is poor when capturing semantic information and processing long text or ambiguity, and the semantic retrieval scheme of the dense vector has poor capability of understanding semantic fine granularity and poor semantic retrieval effect.
Disclosure of Invention
The embodiment of the application provides a semantic retrieval method, a semantic retrieval device, semantic retrieval equipment and a storage medium based on reordering, which are used for solving the technical problems of poor recall quality and semantic fine granularity understanding capability of semantic retrieval and poor semantic retrieval effect in the related technology, and can effectively improve the recall quality and semantic fine granularity understanding capability of semantic retrieval and improve the semantic retrieval effect.
In a first aspect, an embodiment of the present application provides a semantic retrieval method based on reordering, including:
acquiring a query text, and determining first matching scores of dense vectors of a plurality of structured data in a vector library and the query text, and second matching scores of sparse vectors of the plurality of structured data and the query text;
Determining fusion scores corresponding to the structured data according to the first matching score and the second matching score, and screening a first number of candidate data from the structured data according to the fusion scores;
carrying out semantic processing on the candidate data to obtain semantic data corresponding to each candidate data;
And reordering the first quantity of candidate data according to the query text and the semantical data, and screening the second quantity of retrieval results from the first quantity of candidate data according to the reordered results.
In a second aspect, an embodiment of the present application provides a semantic retrieval apparatus based on reordering, including a vector matching module, a candidate matching module, a semantic processing module, and a reordering module, wherein:
The vector matching module is configured to acquire a query text, and determine first matching scores of dense vectors of a plurality of structured data in a vector library and the query text, and second matching scores of sparse vectors of the plurality of structured data and the query text;
the candidate matching module is configured to determine fusion scores corresponding to the plurality of structured data according to the first matching score and the second matching score, and screen a first number of candidate data from the plurality of structured data according to the fusion scores;
the semantic processing module is configured to perform semantic processing on the candidate data to obtain semantic data corresponding to each candidate data;
The reordering module is configured to reorder the first number of candidate data according to the query text and the semantical data, and screen the second number of retrieval results from the first number of candidate data according to the reordered results.
In a third aspect, embodiments of the present application provide a reorder-based semantic retrieval apparatus comprising a memory and one or more processors;
The memory is used for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the reordering-based semantic retrieval method as described in the first aspect.
In a fourth aspect, embodiments of the present application provide a non-volatile storage medium storing computer-executable instructions which, when executed by a computer processor, are used to perform the reordering-based semantic retrieval method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium, from which at least one processor of a device reads and executes the computer program, causing the device to perform the reordering-based semantic retrieval method according to the first aspect.
According to the embodiment of the application, the first matching score corresponding to the dense vector of the plurality of structured data and the query text in the vector library and the second matching score corresponding to the query text are determined, the first matching score and the second matching score are fused to obtain the fusion score corresponding to the plurality of structured data, the first quantity of candidate data is screened out from the plurality of structured data according to the fusion score, the candidate data is semantically processed to obtain the semantical data corresponding to each candidate data, the first quantity of candidate data is reordered according to the query text and the semantical data, the second quantity of retrieval results are screened out from the first quantity of candidate data according to the reordering results, the mixed search and the reordering mode are combined, and a fusion scoring mechanism is introduced to the sparse vector and the dense vector in the mixed search, so that recall quality of semantic retrieval and semantic fine granularity understanding capability can be effectively improved, balance between retrieval efficiency and semantic retrieval accuracy is realized, and semantic retrieval effect is improved.
Drawings
FIG. 1 is a flow chart of a semantic retrieval method based on reordering provided by an embodiment of the present application;
FIG. 2 is a flow chart of another reorder-based semantic retrieval method provided by an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a semantic retrieval apparatus based on reordering according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of a semantic retrieval apparatus based on reordering according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the following detailed description of specific embodiments of the present application is given with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the matters related to the present application are shown in the accompanying drawings. Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or at the same time. Furthermore, the order of the operations may be rearranged. The above-described process may be terminated when its operations are completed, but may have additional steps not included in the drawings. The processes described above may correspond to methods, functions, procedures, subroutines, and the like.
The semantic retrieval method based on reordering provided by the application aims at combining a mixed search and a reordering mode, introducing a fusion scoring mechanism to sparse vectors and dense vectors in the mixed search, improving recall quality of semantic retrieval and capability of understanding semantic granularity, realizing balance between retrieval efficiency and precision, and improving semantic retrieval effect.
Conventional retrieval methods generally rely on keyword matching, such as based on sparse vector search algorithms, which, while better performing in processing short text or explicit keyword queries, suffer from significant drawbacks in capturing semantic information, processing long text, or ambiguity. With the rapid development of large models, dense vector retrieval technology based on deep neural networks (such as Bert, roBERTa, etc.) and similarity calculation is emerging, which can better capture semantic information of text by mapping the text to a high-dimensional vector space. However, dense vector retrieval is inefficient in handling large scale indexes, and dense vector-based retrieval schemes have limitations on the ability to understand semantic fine granularity. Hybrid Search combines the advantages of sparse and dense vectors, both keyword matching is considered and semantic information of text is captured, however, in practical application, recall results of Hybrid Search have poor data recall quality in fine-grained requirements. In addition, for a retrieval scheme based on a sorting model, such as a large model pre-training language model based on a transducer architecture, particularly a model using an interactive similarity scoring structure (Cross-Encoder), although the data sorting precision can be improved and the semantics of fine granularity are considered, the calculation cost is huge, the speed of recalling accurate data in a large-data-volume scene is too slow, and the retrieval scheme is difficult to be applied to high-efficiency retrieval in massive data, and can only be used for small-scale data retrieval. Based on the above, the semantic retrieval method based on the reordering is provided, so that the technical problems of poor recall quality and poor semantic fine granularity understanding capability of the existing semantic retrieval and poor semantic retrieval effect are solved.
Fig. 1 shows a flowchart of a semantic retrieval method based on reordering provided by an embodiment of the present application, where the semantic retrieval method based on reordering provided by the embodiment of the present application may be implemented by a semantic retrieval device based on reordering, and the semantic retrieval device based on reordering may be implemented by means of hardware and/or software and integrated in a semantic retrieval device based on reordering.
The following description will be made taking, as an example, a semantic retrieval method based on reordering performed by a semantic retrieval apparatus based on reordering. Referring to fig. 1, the semantic retrieval method based on reordering includes:
s110, acquiring a query text, and determining first matching scores of dense vectors of a plurality of structured data in a vector library and the query text, and second matching scores of sparse vectors of the plurality of structured data and the query text.
The vector library provided by the application stores dense vectors and sparse vectors corresponding to a plurality of structured data (such as tables), wherein the dense vectors and the sparse vectors can be obtained by carrying out Embedding embedding processing on the structured data, and each structured data corresponds to the dense vectors and the sparse vectors.
Illustratively, query text is obtained for query data, which may be entered by a user, which may be described in natural language, or in a standard language (e.g., SQL) used to manage and operate relational databases, such as the natural language text "wish to obtain details or statistics of the XX Table".
In one embodiment, after the query text is obtained, a first matching score for dense vectors of the plurality of structured data in the vector library corresponding to the query text and a second matching score for sparse vectors of the plurality of structured data corresponding to the query text may be determined. Wherein the first matching score may be used to reflect the degree of matching of the dense vector to the query text and the second matching score may be used to reflect the degree of matching of the sparse vector to the query text.
S120, determining fusion scores corresponding to the plurality of structured data according to the first matching scores and the second matching scores, and screening a first number of candidate data from the plurality of structured data according to the fusion scores.
For example, for the first matching score and the second matching score corresponding to each structured data, fusion processing may be performed on the first matching score and the second matching score to obtain fusion scores corresponding to each structured data, where the fusion scores may be used to reflect the matching degree of the structured data and the query text.
Alternatively, the fusion processing of the first matching score and the second matching score may be multiplication, addition or weighted summation processing of the first matching score and the second matching score, or may be first performing amplification and/or reduction processing on the first matching score and the second matching score, and then multiplying, adding or weighted summation processing of the first matching score and the second matching score after the amplification and/or reduction processing, or may be fusion processing of the first matching score and the second matching score by a preset fusion function (for example, a nonlinear function).
In one embodiment, after determining the fusion score corresponding to each structured data, a first number of candidate data may be selected from the plurality of structured data according to the fusion score, e.g., the first number of structured data with the fusion score sorted from large to small is used as the candidate data.
And S130, carrying out semantic processing on the candidate data to obtain semantic data corresponding to each candidate data.
The semantical processing is performed on the determined first number of candidate data to obtain semantical data corresponding to each candidate data. For example, according to the characteristics of the candidate data, descriptive text corresponding to the candidate data can be generated, and context semantics can be supplemented in the descriptive text to obtain semantic data, so that the integrity and fluency of the structured data expression can be effectively enhanced.
It should be explained that structured data lacks sufficiently explicit semantic relationships, has insufficient semantic relevance, whereas query text in natural language is typically unstructured, the application converts the structured data into descriptive text, so that the model can understand the semantic relationship between the structured data and the descriptive text and make up the semantic difference between the structured data and the descriptive text. Alternatively, descriptive text conversion may be converted using a large language model LLM or based on preconfiguring descriptions of different fields and relationship descriptions between fields according to empirical rules. For example, the field "UID" in the structured data may be converted to "record and unique UID information identifying the user, using long-shaped type storage", the field "chinese name" in the structured data may be converted to "chinese description information of record table, strongly correlated with table data content", and the relationship "there is a dependency relationship" convertible to "field a depends on field B" between field a and field B in the structured data.
And S140, reordering the first quantity of candidate data according to the query text and the semantical data, and screening the second quantity of search results from the first quantity of candidate data according to the reordered results.
Illustratively, the first number of candidate data is reordered according to the query text and the semantical data, and the second number of search results can be screened from the first number of candidate data according to the reorder results, wherein the higher the matching degree between the semantical data and the query text is, the more the reorder results of the corresponding candidate data are before, and the second number of structured data with the before reorder results can be used as the search results.
Optionally, the query text and the semantical data may be input into a trained ranking model, and the query text and the semantical data are analyzed and processed by the ranking model to obtain a reordered result of the first number of candidate data. For example, a pre-trained ranking model based on a Cross-Encoder structure may be used to capture semantic relationships between query text and descriptive text (i.e., semantically data) and generate precise reordered results based thereon.
According to the method, the first matching scores and the second matching scores of the dense vectors of the plurality of structured data and the query text in the vector library are determined, the first matching scores and the second matching scores are fused to obtain the fusion scores corresponding to the plurality of structured data, the first quantity of candidate data is screened out of the plurality of structured data according to the fusion scores, semantic processing is carried out on the candidate data to obtain semantic data corresponding to each candidate data, the first quantity of candidate data is reordered according to the query text and the semantic data, a second quantity of retrieval results are screened out from the first quantity of candidate data according to the reordering results, a mixed search and reordering mode is combined, a fusion scoring mechanism is introduced to the sparse vectors and the dense vectors in the mixed search, recall quality of semantic retrieval and semantic fine granularity understanding capability can be effectively improved, balance between retrieval efficiency and semantic retrieval accuracy is achieved, and semantic retrieval effect is improved.
On the basis of the above embodiment, fig. 2 shows a flowchart of another semantic retrieval method based on reordering, which is a specific implementation of the semantic retrieval method based on reordering. Referring to fig. 2, the semantic retrieval method based on reordering includes:
and S210, converting the structured data into dense vectors through a trained first vector extraction model.
S220, performing word segmentation processing on the structured data to obtain a plurality of data words, determining importance scores of the data words, performing word segmentation screening processing on the structured data according to the importance scores, and converting the structured data subjected to the word segmentation screening processing into sparse vectors through a trained second vector extraction model.
And S230, storing the dense vector and the sparse vector into a vector library.
Illustratively, structured data is acquired and input into a trained first vector extraction model (e.g., embedding model), by which the structured data is converted to a high-dimensional vector space representation, thereby converting the structured data into dense vectors. The first vector extraction model may be a vector extraction model built based on a convolutional neural network, a cyclic neural network, a neural network based on a self-attention mechanism, or the like.
In one embodiment, word segmentation is performed on each structured data to obtain a plurality of data words corresponding to each structured data (for example, the structured data is divided into a plurality of field words), and importance scores of each data word are determined, where the importance scores of the data words can be determined according to word frequencies (TF) and/or Inverse Document Frequencies (IDFs) corresponding to the data words, and the importance scores can be used to reflect importance degrees of the data words in the structured data.
In one possible embodiment, the semantic retrieval method based on reordering provided by the application determines the importance score of each data word, and comprises the steps of determining the word frequency and the inverse document frequency of the data word, and multiplying the word frequency and the inverse document frequency to obtain the importance score of the data word.
The word frequency can be used for representing the frequency of occurrence of the data word in the structured data, and reflects the relative importance of the data word in the document. Alternatively, the word frequency of the t-th data word may be determined by the following formula:
the inverse document frequency may be used to represent how prevalent the data word is in all data sets (structured data sets). Alternatively, the inverse document frequency of the nth data word may be determined by the following formula:
Wherein N is the total number of data words of the structured data set, N t is the number of structured data containing the t-th data word, and 1 is added to avoid 0 as the denominator in the formula.
In one embodiment, the term frequency corresponding to the data word and the inverse document frequency may be multiplied to obtain an importance score of the data word, i.e., the importance score TF-IDF (t) =tf (t) ×idf (t). According to the application, the importance score of the data word segmentation is obtained by multiplying the word frequency of the data word segmentation and the inverse document frequency, so that the importance of the data word segmentation in the whole structured data can be correctly reflected, the accuracy of word segmentation screening processing of the structured data can be effectively improved, and the quality of sparse vectors is improved.
After determining the importance scores of the data word segments, word segment screening processing can be performed on the structured data according to the importance scores, for example, data word segments with importance scores lower than a preset score threshold in the structured data are screened out. Optionally, a preset domain dictionary (such as a stop word list) may be used to filter irrelevant field words in the data word segmentation.
After the word segmentation screening processing on each structured data is completed, the structured data after the word segmentation screening processing can be input into a trained second vector extraction model (e.g., embedding model), and the structured data is converted into a high-dimensional vector space representation through the second vector extraction model, so that the structured data after the word segmentation screening processing is converted into sparse vectors. In one embodiment, after obtaining the dense vector and the sparse vector corresponding to the structured data, the dense vector and the sparse vector may be saved to a vector library. The second vector extraction model may be a vector extraction model built based on a convolutional neural network, a cyclic neural network, a neural network based on a self-attention mechanism, or the like. The first vector extraction model and the second vector extraction model may be the same neural network model or may be different neural network expansion models.
According to the scheme, structured data are converted into dense vectors, word segmentation processing is conducted on the structured data to obtain a plurality of data word segments, importance scores of the data word segments are determined, word segmentation screening processing is conducted on the structured data according to the importance scores, the structured data after the word segmentation screening processing are converted into sparse vectors, the dense vectors and the sparse vectors of the structured data are accurately extracted, influences of redundant field words on recall precision and speed are reduced according to statistical information aiming at the sparse vectors, optimal balance of recall precision and speed of the structured data is achieved, and complex scenes such as long text relevance assessment and ambiguity processing are effectively adapted.
And S240, acquiring the query text, and determining first matching scores of dense vectors of the plurality of structured data in the vector library and the query text, and second matching scores of sparse vectors of the plurality of structured data and the query text.
In one possible embodiment, the semantic retrieval method based on reordering provided by the application determines a first matching score corresponding to a query text and a second matching score corresponding to a dense vector of a plurality of structured data in a vector library, wherein the first matching score comprises calculating similarity information of the dense vector of the plurality of structured data in the vector library and the query text, determining the first matching score according to the similarity information, and calculating the second matching score corresponding to the query text and the sparse vector of the plurality of structured data in the vector library according to an inverted ordering index mode.
For example, after the query text is obtained, for dense vectors of respective structured data in the vector library, similarity information of the dense vectors to the query text may be calculated (e.g., similarity of the dense vectors to the query text is determined according to cosine distance, euclidean distance, hamming distance, etc.), and a first matching score may be determined according to the similarity information, wherein the greater the degree of acquisitions reflected by the similarity information of the dense vectors to the query text, the greater the first matching score. Alternatively, the similarity information of the dense vector to the query text may be calculated based on HNSW (Hierarchical Navigable Small World) algorithm and the first matching score may be determined from the similarity information.
For sparse vectors of each structured data in the vector library, a second matching score corresponding to the query text of the sparse vectors of the plurality of structured data in the vector library can be calculated according to an inverted sequence index mode based on a BM25 algorithm (a text retrieval algorithm based on a probability model). According to the application, the first matching score of the dense vector and the query text is determined according to the similarity information of the dense vector and the query text, and the second matching score of the sparse vector and the query text is calculated according to the reverse ordering index mode, so that the matching degree of the dense vector and the sparse vector and the query text is accurately reflected, and the accuracy of semantic retrieval is improved.
S250, determining fusion scores corresponding to the plurality of structured data according to the first matching scores and the second matching scores, and screening a first number of candidate data from the plurality of structured data according to the fusion scores.
In a possible embodiment, the semantic retrieval method based on reordering provided by the application determines fusion scores corresponding to a plurality of structured data according to a first matching score and a second matching score, wherein the semantic retrieval method based on reordering comprises the steps of carrying out logarithmic scaling on the first matching score to obtain a scaling result, carrying out exponential amplification on the second matching score to obtain an amplification result, and carrying out weighted summation on the scaling result and the amplification result to obtain the fusion score corresponding to the structured data.
Illustratively, after the first matching score and the second matching score corresponding to the structured data are obtained, the first matching score may be subjected to logarithmic scaling to obtain a scaling result, the second matching score may be subjected to exponential amplification to obtain an amplification result, and the scaling result and the amplification result may be subjected to weighted summation to obtain a fusion score corresponding to the structured data.
It should be explained that in the related art, the method of merging the sparse vector and the dense vector usually adopts a simple linear weighting or independent sorting and then merging, but the simple linear weighting cannot fully utilize the nonlinear characteristics of the sparse vector and the dense vector, so that the final merging result has limited precision, and the sparse vector and the dense vector in the merging result obtained by independent sorting and merging lack depth synergy, so that the merging effect is poor, and the optimal recall quality cannot be realized. Because the distribution characteristics of the sparse vector and the dense vector are different, the sparse vector is highly sparse, the partial position score value in the sparse vector is far higher than other position score values, and the dense vector is generally uniform in distribution and small in difference. And the contributions of the different vector spaces are non-linear, e.g. maxima in the sparse vector may have a greater influence, their value ranges may be compressed by logarithmic scaling, while dense vectors may be exponentially scaled to promote differentiation. The logarithmic scaling of the first matching score and the exponential scaling of the second matching score may be implemented by a nonlinear function to dynamically adjust the influence of sparse and dense vectors, such as the influence of reinforcing sparse vectors on short queries (BM 25 algorithms are typically more sensitive to short queries), the effect of scaling dense vectors on long queries (long queries are better at capturing semantic similarity in vector models). According to the application, the fusion score is obtained by carrying out weighted summation on the scaling processing result of the first matching score and the amplifying processing result of the second matching score, so that the dynamic adjustment of the influence of the sparse vector and the dense vector can be realized, the different advantages of the dense vector and the sparse vector are fully exerted, and the capability and effect of simultaneously matching semantic relativity and text matching degree in the search result are improved.
In a possible embodiment, the semantic retrieval method based on reordering provided by the application performs weighted summation processing on the scaling processing result and the amplifying processing result to obtain the fusion score corresponding to the structured data, and may perform weighted summation processing on the scaling processing result, the amplifying processing result and the preset service characteristic value by using a preset weighting coefficient to obtain the fusion score corresponding to the structured data. The preset service characteristic value can be configured into a plurality of service types. Alternatively, the fusion score provided by the present application may be determined based on the following formula:
Wherein ω 12,…,ωn is a preset weighting coefficient corresponding to the sparse vector, the dense vector, the preset service feature value, and the like, s sparse is a second matching score, s dense is a first matching score, and x i is an ith preset service feature value. According to the application, the sparse vector, the dense vector and the preset service characteristic value are fused, the service characteristics are fused while the different advantages of the dense vector and the sparse vector are fully exerted, the effect of important information in the service in data retrieval recall is enhanced, the importance degree of each piece of structured data in the service is more accurately judged, and the capability and effect of simultaneously matching semantic relativity and text matching degree are improved.
And S260, carrying out semantic processing on the candidate data to obtain semantic data corresponding to each candidate data.
S270, reordering the first quantity of candidate data according to the query text and the semantical data, and screening the second quantity of search results from the first quantity of candidate data according to the reordered results.
In one possible embodiment, the semantic retrieval method based on reordering provided by the application further comprises the steps of screening a third number of target retrieval results from the second number of retrieval results after screening a second number of retrieval results from the first number of candidate data according to the reordering results, generating a structured query text according to the query text and the target retrieval results, and carrying out data query according to the structured query text to obtain the query results. Wherein the first number, the second number and the third number provided by the application are sequentially reduced.
Illustratively, a third number of target search results is selected from the second number of search results, e.g., a third number of search results preceding the second number of search result reordering results is used as the target search result. And inputting the target search result and the query text into a trained large language model, analyzing and processing the target search result and the query text through the large language model, outputting a structured query text (for example, the structured query text expressed by SQL language), and carrying out data query by utilizing the structured query text to obtain the query result. Optionally, the large language model may further perform data analysis processing on the query result according to the query text to obtain a data analysis result, and return the data analysis result to the user. According to the scheme, the structured query text is generated according to the query text and the target search result, the data query is carried out according to the structured query text to obtain the query result, the accurate query result is returned to the user, and the user experience is improved.
For example, in a natural language TEXT to structured query TEXT (TEXT 2 SQL) scenario, the reordered semantic retrieval method provided by the application can assist a user in generating SQL through natural language and acquiring desired statistics or detail data through a data query system (such as OLAP system, a technology for quickly querying and analyzing multidimensional data, which is commonly used in a data warehouse, to assist the user in performing complex data analysis from different angles), reduce the learning cost of the user for the SQL technology, and reduce the requirement of the user for the degree of knowledge of the business information of the stored data. For example, the operation user is not familiar with writing SQL, and does not know which table (such as HIVE table) the data he wants to acquire is in, query text can be input in a natural language description mode, the reordering-based semantic retrieval method provided based on the scheme can accurately search the table of the data storage, generate SQL and execute query, finally return data to the user, and further provide analysis such as data report and the like to the user.
According to the method, the first matching scores and the second matching scores of the dense vectors of the plurality of structured data and the query text in the vector library are determined, the first matching scores and the second matching scores are fused to obtain the fusion scores corresponding to the plurality of structured data, the first quantity of candidate data is screened out of the plurality of structured data according to the fusion scores, semantic processing is carried out on the candidate data to obtain semantic data corresponding to each candidate data, the first quantity of candidate data is reordered according to the query text and the semantic data, a second quantity of retrieval results are screened out from the first quantity of candidate data according to the reordering results, a mixed search and reordering mode is combined, a fusion scoring mechanism is introduced to the sparse vectors and the dense vectors in the mixed search, recall quality of semantic retrieval and semantic fine granularity understanding capability can be effectively improved, balance between retrieval efficiency and semantic retrieval accuracy is achieved, and semantic retrieval effect is improved. The structured data is converted into dense vectors, word segmentation processing is carried out on the structured data to obtain a plurality of data word segments, importance scores of the data word segments are determined, word segmentation screening processing is carried out on the structured data according to the importance scores, the structured data after the word segmentation processing is converted into sparse vectors, the dense vectors and the sparse vectors of the structured data are accurately extracted, influences of redundant field words on recall precision and speed are reduced according to statistical information aiming at the sparse vectors, optimal balance of the recall precision and speed of the structured data is achieved, and complex scenes such as long text relevance assessment, ambiguity processing and the like are effectively adapted.
Fig. 3 is a schematic structural diagram of a semantic retrieval apparatus based on reordering according to an embodiment of the present application. Referring to fig. 3, the reordering-based semantic retrieval apparatus includes a vector matching module 31, a candidate matching module 32, a semantic processing module 33, and a reordering module 34.
The vector matching module 31 is configured to acquire a query text and determine a first matching score corresponding to the query text and a second matching score corresponding to the query text and a sparse vector of a plurality of structured data in a vector library, the candidate matching module 32 is configured to determine a fusion score corresponding to the structured data according to the first matching score and the second matching score and screen out a first number of candidate data in the structured data according to the fusion score, the semantic processing module 33 is configured to semantically process the candidate data to obtain semantic data corresponding to each candidate data, and the reordering module 34 is configured to reorder the first number of candidate data according to the query text and the semantic data and screen out a second number of search results from the first number of candidate data according to the reordering results.
According to the method, the first matching scores and the second matching scores of the dense vectors of the plurality of structured data and the query text in the vector library are determined, the first matching scores and the second matching scores are fused to obtain the fusion scores corresponding to the plurality of structured data, the first quantity of candidate data is screened out of the plurality of structured data according to the fusion scores, semantic processing is carried out on the candidate data to obtain semantic data corresponding to each candidate data, the first quantity of candidate data is reordered according to the query text and the semantic data, a second quantity of retrieval results are screened out from the first quantity of candidate data according to the reordering results, a mixed search and reordering mode is combined, a fusion scoring mechanism is introduced to the sparse vectors and the dense vectors in the mixed search, recall quality of semantic retrieval and semantic fine granularity understanding capability can be effectively improved, balance between retrieval efficiency and semantic retrieval accuracy is achieved, and semantic retrieval effect is improved.
In one possible embodiment, the reordered-based semantic retrieval apparatus further comprises a vector generation module configured to:
converting the structured data into dense vectors by the trained first vector extraction model;
Performing word segmentation processing on the structured data to obtain a plurality of data words, determining importance scores of the data words, performing word segmentation screening processing on the structured data according to the importance scores, and converting the structured data subjected to the word segmentation screening processing into sparse vectors through a trained second vector extraction model;
The dense vectors and sparse vectors are saved to a vector library.
In one possible embodiment, the vector generation module determines importance scores for the respective data tokens configured to:
Determining word frequency and inverse document frequency of the data word segmentation, and multiplying the word frequency and the inverse document frequency to obtain importance scores of the data word segmentation.
In one possible embodiment, the vector matching module 31 determines a first matching score corresponding to the query text for dense vectors of the plurality of structured data in the vector library and a second matching score corresponding to the query text for sparse vectors of the plurality of structured data, configured to:
calculating similarity information of dense vectors of a plurality of structured data in a vector library and a query text, and determining a first matching score according to the similarity information;
And calculating second matching scores corresponding to sparse vectors of the plurality of structured data in the vector library and the query text according to the reverse ordering index mode.
In one possible embodiment, the candidate matching module 32 determines a fusion score corresponding to the plurality of structured data according to the first matching score and the second matching score, and is configured to:
carrying out logarithmic scaling on the first matching score to obtain a scaling result, and carrying out exponential amplification on the second matching score to obtain an amplification result;
and carrying out weighted summation on the zooming processing result and the amplifying processing result to obtain a fusion score corresponding to the structured data.
In one possible embodiment, the candidate matching module 32 performs weighted summation processing on the scaling processing result and the amplifying processing result to obtain a fusion score corresponding to the structured data, where the fusion score is configured to:
and carrying out weighted summation on the scaling processing result, the amplifying processing result and the preset service characteristic value to obtain the fusion score corresponding to the structured data.
In one possible embodiment, the fusion score is determined based on the following formula:
Wherein ω 12,…,ωn is a preset weighting coefficient, s sparse is a second matching score, s dense is a first matching score, and x i is a preset service characteristic value.
In one possible embodiment, the reordered-based semantic retrieval apparatus further comprises a query processing module configured to:
Screening a third number of target search results from the second number of search results, and generating a structured query text according to the query text and the target search results;
And carrying out data query according to the structured query text to obtain a query result.
It should be noted that, in the embodiment of the semantic retrieval apparatus based on reordering, each unit and module included are only divided according to the functional logic, but not limited to the above division, as long as the corresponding functions can be implemented, and in addition, specific names of each functional unit are only for facilitating mutual distinction, and are not used for limiting the protection scope of the embodiment of the present application.
The embodiment of the application also provides a semantic retrieval device based on the reordering, which can integrate the semantic retrieval device based on the reordering provided by the embodiment of the application. Fig. 4 is a schematic structural diagram of a semantic retrieval apparatus based on reordering according to an embodiment of the present application. Referring to fig. 4, the reordering-based semantic retrieval apparatus includes an input device 43, an output device 44, a memory 42, and one or more processors 41, the memory 42 storing one or more programs, which when executed by the one or more processors 41, cause the one or more processors 41 to implement the reordering-based semantic retrieval method as provided in the above embodiments. The semantic retrieval device, the semantic retrieval device and the semantic retrieval computer based on the reordering provided by the embodiment can be used for executing the semantic retrieval method based on the reordering provided by any embodiment, and have corresponding functions and beneficial effects.
Embodiments of the present application also provide a non-volatile storage medium storing computer-executable instructions that, when executed by a computer processor, are used to perform a reordering-based semantic retrieval method as provided by the above embodiments. Of course, the non-volatile storage medium storing the computer executable instructions provided in the embodiments of the present application is not limited to the reordering-based semantic retrieval method provided above, and may also perform related operations in the reordering-based semantic retrieval method provided in any embodiment of the present application. The semantic retrieval apparatus, the apparatus and the storage medium based on reordering provided in the above embodiments may perform the semantic retrieval method based on reordering provided in any embodiment of the present application, and technical details not described in detail in the above embodiments may be referred to the semantic retrieval method based on reordering provided in any embodiment of the present application.
On the basis of the above embodiments, the present embodiment further provides a computer program product, where the technical solution of the present application is essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product, and the computer program product is stored in a storage medium, and includes several instructions to cause a computer device, a mobile terminal or a processor therein to execute all or part of the steps of the reordering-based semantic retrieval method provided in the various embodiments of the present application.

Claims (12)

1.一种基于重排序的语义检索方法,其特征在于,包括:1. A semantic retrieval method based on re-ranking, characterized by comprising: 获取查询文本,并确定向量库中多个结构化数据的稠密向量与所述查询文本对应的第一匹配分数,以及多个结构化数据的稀疏向量与所述查询文本对应的第二匹配分数;Acquire a query text, and determine a first matching score between a dense vector of a plurality of structured data in a vector library and the query text, and a second matching score between a sparse vector of a plurality of structured data and the query text; 根据所述第一匹配分数以及所述第二匹配分数确定多个所述结构化数据对应的融合分数,并根据所述融合分数在多个所述结构化数据中筛选出第一数量的候选数据;Determining fusion scores corresponding to the plurality of structured data according to the first matching score and the second matching score, and screening out a first number of candidate data from the plurality of structured data according to the fusion scores; 对所述候选数据进行语义化处理得到各个所述候选数据对应的语义化数据;Performing semantic processing on the candidate data to obtain semantic data corresponding to each of the candidate data; 根据所述查询文本以及所述语义化数据对第一数量的候选数据进行重排序,并根据重排序结果从第一数量的候选数据中筛选出第二数量的检索结果。The first number of candidate data is reordered according to the query text and the semantic data, and a second number of search results are screened out from the first number of candidate data according to the reordering result. 2.根据权利要求1所述的基于重排序的语义检索方法,其特征在于,在所述确定向量库中多个结构化数据的稠密向量与所述查询文本对应的第一匹配分数,以及多个结构化数据的稀疏向量与所述查询文本对应的第二匹配分数之前,还包括:2. The semantic retrieval method based on re-ranking according to claim 1, characterized in that before determining the first matching scores of the dense vectors of the plurality of structured data in the vector library and the query text, and the second matching scores of the sparse vectors of the plurality of structured data and the query text, it further comprises: 通过训练完成的第一向量提取模型将结构化数据转换为稠密向量;The structured data is converted into dense vectors through the trained first vector extraction model; 对所述结构化数据进行分词处理得到多个数据分词,确定各个所述数据分词的重要性分数,根据所述重要性分数对所述结构化数据进行分词筛选处理,并通过训练完成的第二向量提取模型将分词筛选处理后的所述结构化数据转换为稀疏向量;Performing word segmentation processing on the structured data to obtain multiple data word segments, determining the importance score of each of the data word segments, performing word segmentation screening processing on the structured data according to the importance score, and converting the structured data after the word segmentation screening processing into a sparse vector through a trained second vector extraction model; 将所述稠密向量和所述稀疏向量保存到向量库中。The dense vector and the sparse vector are saved in a vector library. 3.根据权利要求2所述的基于重排序的语义检索方法,其特征在于,所述确定各个所述数据分词的重要性分数,包括:3. The semantic retrieval method based on re-ranking according to claim 2, characterized in that the step of determining the importance score of each of the data segmentations comprises: 确定所述数据分词的词频以及逆文档频率,将所述词频以及所述逆文档频率相乘得到所述数据分词的重要性分数。The word frequency and the inverse document frequency of the data segment are determined, and the importance score of the data segment is obtained by multiplying the word frequency and the inverse document frequency. 4.根据权利要求1所述的基于重排序的语义检索方法,其特征在于,所述确定向量库中多个结构化数据的稠密向量与所述查询文本对应的第一匹配分数,以及多个结构化数据的稀疏向量与所述查询文本对应的第二匹配分数,包括:4. The semantic retrieval method based on re-ranking according to claim 1, characterized in that the step of determining a first matching score between a dense vector of a plurality of structured data in a vector library and the query text, and a second matching score between a sparse vector of a plurality of structured data and the query text comprises: 计算向量库中多个结构化数据的稠密向量与所述查询文本的相似度信息,并根据所述相似度信息确定第一匹配分数;Calculating similarity information between dense vectors of multiple structured data in a vector library and the query text, and determining a first matching score according to the similarity information; 根据倒排序索引方式计算向量库中多个结构化数据的稀疏向量与所述查询文本对应的第二匹配分数。A second matching score corresponding to the query text and the sparse vectors of the plurality of structured data in the vector library is calculated according to the inverted index method. 5.根据权利要求1所述的基于重排序的语义检索方法,其特征在于,所述根据所述第一匹配分数以及所述第二匹配分数确定多个所述结构化数据对应的融合分数,包括:5. The semantic retrieval method based on re-ranking according to claim 1, characterized in that the step of determining the fusion scores corresponding to the plurality of structured data according to the first matching scores and the second matching scores comprises: 对所述第一匹配分数进行对数缩放处理得到缩放处理结果,以及对所述第二匹配分数进行指数放大处理得到放大处理结果;Performing logarithmic scaling on the first matching score to obtain a scaling result, and performing exponential scaling on the second matching score to obtain a scaling result; 对所述缩放处理结果和所述放大处理结果进行加权求和处理得到所述结构化数据对应的融合分数。A weighted sum process is performed on the scaling process result and the magnification process result to obtain a fusion score corresponding to the structured data. 6.根据权利要求5所述的基于重排序的语义检索方法,其特征在于,所述对所述缩放处理结果和所述放大处理结果进行加权求和处理得到所述结构化数据对应的融合分数,包括:6. The semantic retrieval method based on reordering according to claim 5, characterized in that the step of performing weighted sum processing on the scaling processing result and the magnification processing result to obtain the fusion score corresponding to the structured data comprises: 对所述缩放处理结果、所述放大处理结果以及预设业务特征值进行加权求和处理得到所述结构化数据对应的融合分数。A weighted sum process is performed on the scaling result, the amplification result and the preset service characteristic value to obtain a fusion score corresponding to the structured data. 7.根据权利要求1所述的基于重排序的语义检索方法,其特征在于,所述融合分数基于以下公式进行确定:7. The semantic retrieval method based on re-ranking according to claim 1, characterized in that the fusion score is determined based on the following formula: 其中,ω12,…,ωn为预设加权系数,ssparse为第二匹配分数,sdense为第一匹配分数,xi为预设业务特征值。Among them, ω 12 ,…,ω n are preset weighting coefficients, s sparse is the second matching score, s dense is the first matching score, and xi is a preset service feature value. 8.根据权利要求1所述的基于重排序的语义检索方法,其特征在于,在所述根据重排序结果从第一数量的候选数据中筛选出第二数量的检索结果之后,还包括:8. The semantic search method based on re-ranking according to claim 1, characterized in that after selecting a second number of search results from the first number of candidate data according to the re-ranking result, it further comprises: 从第二数量的所述检索结果中筛选出第三数量的目标检索结果,并根据所述查询文本以及所述目标检索结果生成结构化查询文本;Screening out a third number of target search results from the second number of search results, and generating a structured query text according to the query text and the target search results; 根据所述结构化查询文本进行数据查询得到查询结果。A data query is performed according to the structured query text to obtain a query result. 9.一种基于重排序的语义检索装置,其特征在于,包括向量匹配模块、候选匹配模块、语义处理模块和重排序模块,其中:9. A semantic search device based on re-ranking, characterized in that it includes a vector matching module, a candidate matching module, a semantic processing module and a re-ranking module, wherein: 所述向量匹配模块,配置为获取查询文本,并确定向量库中多个结构化数据的稠密向量与所述查询文本对应的第一匹配分数,以及多个结构化数据的稀疏向量与所述查询文本对应的第二匹配分数;The vector matching module is configured to obtain a query text, and determine a first matching score between a dense vector of a plurality of structured data in a vector library and the query text, and a second matching score between a sparse vector of a plurality of structured data and the query text; 所述候选匹配模块,配置为根据所述第一匹配分数以及所述第二匹配分数确定多个所述结构化数据对应的融合分数,并根据所述融合分数在多个所述结构化数据中筛选出第一数量的候选数据;The candidate matching module is configured to determine fusion scores corresponding to the plurality of structured data according to the first matching score and the second matching score, and screen out a first number of candidate data from the plurality of structured data according to the fusion scores; 所述语义处理模块,配置为对所述候选数据进行语义化处理得到各个所述候选数据对应的语义化数据;The semantic processing module is configured to perform semantic processing on the candidate data to obtain semantic data corresponding to each candidate data; 所述重排序模块,配置为根据所述查询文本以及所述语义化数据对第一数量的候选数据进行重排序,并根据重排序结果从第一数量的候选数据中筛选出第二数量的检索结果。The reordering module is configured to reorder the first number of candidate data according to the query text and the semantic data, and filter out a second number of retrieval results from the first number of candidate data according to the reordering result. 10.一种基于重排序的语义检索设备,其特征在于,包括:存储器以及一个或多个处理器;10. A semantic retrieval device based on reordering, characterized by comprising: a memory and one or more processors; 所述存储器,用于存储一个或多个程序;The memory is used to store one or more programs; 当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8任一项所述的基于重排序的语义检索方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the re-ranking-based semantic retrieval method as described in any one of claims 1 to 8. 11.一种存储计算机可执行指令的非易失性存储介质,其特征在于,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-8任一项所述的基于重排序的语义检索方法。11. A non-volatile storage medium storing computer executable instructions, wherein the computer executable instructions are used to execute the reordering-based semantic retrieval method according to any one of claims 1 to 8 when executed by a computer processor. 12.一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-8任一项所述的基于重排序的语义检索方法。12. A computer program product, comprising a computer program, characterized in that when the computer program is executed by a processor, the re-ranking-based semantic retrieval method according to any one of claims 1 to 8 is implemented.
CN202510194607.6A 2025-02-21 2025-02-21 A semantic retrieval method, device, equipment and storage medium based on reordering Pending CN120123491A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510194607.6A CN120123491A (en) 2025-02-21 2025-02-21 A semantic retrieval method, device, equipment and storage medium based on reordering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510194607.6A CN120123491A (en) 2025-02-21 2025-02-21 A semantic retrieval method, device, equipment and storage medium based on reordering

Publications (1)

Publication Number Publication Date
CN120123491A true CN120123491A (en) 2025-06-10

Family

ID=95919785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510194607.6A Pending CN120123491A (en) 2025-02-21 2025-02-21 A semantic retrieval method, device, equipment and storage medium based on reordering

Country Status (1)

Country Link
CN (1) CN120123491A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121096687A (en) * 2025-11-11 2025-12-09 赛福解码(四川)基因科技有限公司 Medical record text processing method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121096687A (en) * 2025-11-11 2025-12-09 赛福解码(四川)基因科技有限公司 Medical record text processing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN111581949B (en) Method and device for disambiguating name of learner, storage medium and terminal
CN116701431A (en) Data retrieval method and system based on large language model
CN119311831A (en) A knowledge question answering method and system for hybrid retrieval generation enhancement of heterogeneous databases
CN112307182B (en) An Extended Query Method for Pseudo-Relevant Feedback Based on Question Answering System
CN112800205B (en) Method and device for obtaining question and answer related paragraphs based on semantic change manifold analysis
CN109885813B (en) A computing method and system for text similarity based on word coverage
CN118551086A (en) Multi-mode data distributed retrieval method and system based on knowledge graph and vector matching
CN119311911B (en) A cross-modal image text retrieval method based on deep learning
CN115017267A (en) Unsupervised semantic retrieval method and device and computer readable storage medium
CN116304748A (en) Text similarity calculation method, system, equipment and medium
CN113434639A (en) Audit data processing method and device
CN113761104A (en) Method, device and electronic device for detecting entity relationship in knowledge graph
CN119474274A (en) Document question-answer pair generation and question-answering method, device, computer device and readable storage medium
CN120123491A (en) A semantic retrieval method, device, equipment and storage medium based on reordering
CN117494815B (en) File-oriented credible large language model training and reasoning method and device
CN117593410A (en) Report generation method, device, electronic equipment and storage medium
CN120541213A (en) Data processing method and device
CN118333033B (en) Advanced learning-based technological project innovation potential prediction method and device
CN114547233A (en) Data duplicate checking method and device and electronic equipment
CN119623619A (en) Intellectual property and academic assistant system and implementation method, device, electronic device and storage medium thereof
CN118467795A (en) Feature vector similarity-based lake and bin integrated unstructured data searching method
CN118013020A (en) Patent search method and system based on retrieval generation joint training
CN117421404A (en) Multi-channel text recall method, system, electronic device and storage medium
CN117112727A (en) Large language model fine tuning instruction set construction method suitable for cloud computing service
CN120316201B (en) A Pseudo-Document Generation Method and System Based on Multiple Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination