CN120611018A - A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students - Google Patents

A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students

Info

Publication number
CN120611018A
CN120611018A CN202510693980.6A CN202510693980A CN120611018A CN 120611018 A CN120611018 A CN 120611018A CN 202510693980 A CN202510693980 A CN 202510693980A CN 120611018 A CN120611018 A CN 120611018A
Authority
CN
China
Prior art keywords
question
semantic
candidate
answer
questions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510693980.6A
Other languages
Chinese (zh)
Inventor
潘志宏
汪孝泉
林明芳
解徐超
陈锦丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Second Affiliated Middle School Of Fuzhou Institute Of Education
Original Assignee
Second Affiliated Middle School Of Fuzhou Institute Of Education
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Second Affiliated Middle School Of Fuzhou Institute Of Education filed Critical Second Affiliated Middle School Of Fuzhou Institute Of Education
Priority to CN202510693980.6A priority Critical patent/CN120611018A/en
Publication of CN120611018A publication Critical patent/CN120611018A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • G06F16/33295Natural language query formulation in dialogue systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种面向中小学生智能心理咨询的语义混合检索与重排序方法,包括:S1、接收用户输入的问题并对其进行预处理,提取出关键词集合,并转换为规范化的中文语义表示;S2、基于包括关键词匹配和语义索引的混合检索技术,从FAQ知识库中检索出与用户问题相关的候选问答集合;S3、对用户问题和每个候选问答进行词向量嵌入表示,计算它们之间的相似度,并根据相似度得分对候选问题进行重排序,筛选出Top K个候选问题;S4、将Top K个候选问题及其回答输入预训练的大语言模型中,生成优化后的自然语言回答;S5、对生成的回答进行语义完整性和上下文一致性校验,并输出最终答案。该方法可以提升中小学生智能心理咨询问答的匹配准确率和回复质量。

The present invention relates to a semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students, comprising the following steps: S1, receiving a question input by a user and pre-processing it, extracting a keyword set, and converting it into a standardized Chinese semantic representation; S2, retrieving a set of candidate questions and answers related to the user question from a FAQ knowledge base based on a hybrid retrieval technique including keyword matching and semantic indexing; S3, embedding the user question and each candidate question and answer into word vectors, calculating the similarity between them, and re-ranking the candidate questions based on the similarity scores to screen out the top K candidate questions; S4, inputting the top K candidate questions and their answers into a pre-trained large language model to generate optimized natural language answers; S5, performing semantic integrity and context consistency verification on the generated answers, and outputting the final answers. This method can improve the matching accuracy and response quality of intelligent psychological counseling questions and answers for primary and secondary school students.

Description

Semantic mixed retrieval and reordering method for intelligent psychological consultation of middle and primary school students
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a semantic mixed retrieval and reordering method for intelligent psychological consultation of primary and secondary school students.
Background
With the development of artificial intelligence technology, a psychological consultation platform for primary and secondary school students is gradually fusing semantic understanding and question-answering dialogue technology. At present, two paths are mostly adopted in the mainstream products, namely, one path is based on retrieval matching of knowledge base or community questions and answers, and the other path is to introduce a large model (a large pre-training language model) for dialogue generation.
Knowledge base/community question-answering path-early on-line psychological consultation platform (such as one psychological) is mainly positioned in psychological content and service provision, and the on-line question-answering community gathers professional consultants to answer questions for users. After the user submits the text questions on the platform, the system issues the questions, and the interesting psychological consultants leave a message to answer, so that a large number of answer pairs and psychological knowledge contents are formed by gradual precipitation. In order to improve the retrieval efficiency, the platform gradually introduces semantic analysis and matching technology, for example, a community question-and-answer retrieval system patent (CN 105786794B) is used for matching related question-and-answer pairs by extracting keywords, expanding and calculating weights of user questions and combining dependency relation analysis, so that the retrieval accuracy is improved. In these schemes, natural language processing is used to understand the intent of the user's question and retrieve the best answer from the existing FAQ knowledge base.
Large model dialogue path with breakthrough of deep learning and pre-training models, the exploration of using large models for psychological dialogue has emerged in recent years. For example, the concept product AI heart words provides to build a psychological consultation platform by using an advanced large model technology, and provides comprehensive and convenient psychological health services for users by means of personalized psychological assessment, instant feedback, interactive learning and other functions. In practical application, the large factory starts to lay out the direction that the Yidianling platform announces the capability of accessing a large model of hundred degree text-to-speech (ERNIE Bot), and becomes a platform for applying the generated dialogue model to psychological consultation scenes for the first time in the industry. The religion relies on cross-modal and cross-language deep semantic understanding and generating capability, and can be used for intention recognition and content generation in psychological consultation dialogue. By introducing the large model dialogue technology, the platform hopes to realize deep analysis of natural language and emotion expression of the user, timely identify potential psychological problems of the user and give auxiliary support. The Tengxun also cooperates with the psychological research institute of China academy to develop an artificial intelligence psychological consultation robot and applies for a related patent CN111667926B, and provides a consultation session system based on psychological knowledge patterns and multi-round dialogue management. The system comprises an input module, a language analysis module, a logic tree dialogue module, a corpus, a feedback module and the like, and is used for assisting multi-round dialogue intention analysis through a psychological knowledge graph, guiding a dialogue flow and enabling a user to obtain dialogue experience close to a real person consultant. In addition, some customized large models begin to appear, such as EmoLLM models introduced by hundred-degree intelligent cloud are specially designed for mental health support, instruction fine adjustment and multi-model fusion technology are adopted, and the understanding and response capability of the emotion context of the user are improved by combining the advantages of a plurality of pre-training models. The schemes have the characteristics of semantic retrieval, question-answer matching and large model fusion, namely, the scheme not only has a FAQ retrieval system based on a domain knowledge base, but also has a large model dialogue system fused with emotion analysis and knowledge maps.
Although the above-described solution improves the level of intellectualization of on-line psychological counseling, the prior art still has some drawbacks and aspects to be improved with respect to the object of the present invention:
1. The semantic retrieval precision is limited, and the traditional FAQ retrieval and community question-answer matching mainly depend on keywords and shallow semantic expansion. When students in middle and primary schools describe psychological puzzles in daily language or in a hidden way, simple keyword matching may not accurately understand deep intention, and problems of incomplete search or inaccurate matching are likely to occur. The existing schemes improve the question-answer matching rate through rule weights and dependency relationships, but the situation that semantic analysis is incomplete and implicit emotion factors cannot be identified still occurs in the face of student groups with variable expressions.
2. The reliability of dialog generation is insufficient, and the use of a large model directly for psychological dialog, while excellent in fluency, also exposes the problem of content reliability. The general large models (e.g., chatGPT, etc.) often lack expert psychological knowledge constraints and may generate responses that are inconsistent with standard psychological coaching, even with biased or distorted advice. The simple integration approach (e.g., invoking a discontent-generate answer) currently attempted in the industry lacks checksum guidance on large model outputs, and presents a risk.
3. The optimization of the context and the multi-round interaction is insufficient in that the existing multiple psychological question-answering system is either in a single-round question-answering (user question and answer) mode or adopts a preset dialogue logic tree to manage multi-round dialogue. However, the fixed logic tree is difficult to cover a wide variety of real consultation scenarios, which may cause stiffness of the conversation process and cannot be dynamically adjusted according to user feedback. On the other hand, typical FAQ systems lack dialog memory and cannot provide a more pertinent response in combination with the user's historical representation in the dialog.
4. Knowledge fusion and reasoning depth are insufficient, namely in the current solution, knowledge base retrieval and large model reasoning are often independent, and organic fusion is not formed yet. Such as FAQ retrieval systems, which, while reliable, lack analysis for individuals due to partial templates of answers, pure large model dialogs can generate responses for specific expressions, but may ignore knowledge of mature psychological interventions.
Disclosure of Invention
The invention aims to provide a semantic mixed retrieval and reordering method for intelligent psychological consultation of primary and secondary school students, which can improve the matching accuracy and the replying quality of the intelligent psychological consultation questions and answers of the primary and secondary school students.
In order to achieve the purpose, the technical scheme adopted by the invention is that the semantic hybrid retrieval and reordering method for intelligent psychological consultation of middle and primary school students comprises the following steps:
s1, receiving a Chinese psychological consultation problem input by a user, preprocessing the problem, extracting a keyword set capable of representing the core semantics of the problem, and converting the keyword set into normalized Chinese semantic representation;
s2, searching a candidate question-answer set related to the user problem from the FAQ knowledge base based on a mixed search technology comprising keyword matching and semantic indexing;
S3, word vector embedding representation is carried out on the user questions and each candidate question answer, the similarity between the user question vectors and each candidate question vector is calculated, the candidate questions are reordered from high to low according to the similarity score, and Top K candidate questions are screened out;
S4, inputting the screened Top K candidate questions and the answers corresponding to the Top K candidate questions into a pre-trained large language model to generate optimized natural language answers;
S5, carrying out semantic integrity and context consistency verification on the generated answer, and outputting a final answer.
Further, in step S1, the implementation method for preprocessing the chinese psychological consultation problem input by the user is as follows:
101 Performing word segmentation processing on the Chinese psychological consultation problem input by the user, performing part-of-speech tagging on all the segmented words by using a part-of-speech tagging technology, extracting keywords comprising nouns and verbs, and filtering stop words to obtain a keyword set capable of representing the core semantics of the problem;
102 Synonym expansion is carried out on the extracted keywords through a predefined synonym dictionary, and the keywords are mapped into standard vocabulary expressions, so that natural language problems of users are converted into normalized Chinese semantic representations.
Further, in step S2, the implementation method for retrieving the candidate question-answer set related to the user question from the FAQ knowledge base includes:
201 Performing preliminary matching search on the FAQ knowledge base based on keyword matching, and searching candidate questions with the degree of correlation with the user questions higher than a set value from the FAQ knowledge base to obtain a preliminary candidate question-answer set;
202 Using the pre-trained word vector model to convert the user questions into vectors, and searching a plurality of candidate questions and corresponding candidate answers with the semantic distance smaller than a set value from the vectors in a vector space based on the vector similarity;
203 Combining the search results obtained in step 201) and step 202) to obtain a mixed candidate question-answer set.
Further, the specific implementation method of the step S3 is as follows:
301 Respectively carrying out word vector embedding representation on each candidate problem in the user problem and the candidate question-answer set to obtain corresponding vectorization semantic representation, namely a user problem vector and a candidate problem vector;
302 Cosine similarity between the user problem vector and each candidate problem vector is calculated, and the semantic matching degree of the user problem and each candidate problem is measured;
303 Reordering all candidate questions from high to low according to the similarity score, wherein the higher the score is, the stronger the semantic relevance is;
304 The Top K candidate questions are selected as candidate questions most likely to match the user demands.
In step S4, the screened Top K candidate questions and the answers corresponding to the Top K candidate questions are input into a pre-trained large language model, the optimized natural language answers are generated through a constraint generation process of a semantic guidance type prompt template, in the semantic guidance type prompt template, the large language model is constrained to answer by combining the user questions and standard answer texts corresponding to the Top K candidate questions, so that the large language model is guided to generate more accurate and detailed answers by utilizing the content of an existing knowledge base, and the large language model is deployed on a local server to perform training fine adjustment according to requirements.
Further, in step S5, the generated answer is submitted to a manual review module for content verification, after the review is passed, the final answer is sent to the user, and the new question-answer pair passed by the review is added to the FAQ knowledge base in real time, so as to complete the dynamic update of the knowledge base.
Further, the auditing rule of the manual auditing module includes:
a1 Scoring emotional calm and professional accuracy of the generated answer;
a2 If the score is lower than the preset threshold, the psychological consultant revises the answer content manually;
A3 The revised answer is stored in a binding manner with the user question and is added into the FAQ knowledge base as a new entry.
Further, the updating mechanism of the FAQ knowledge base includes:
b1 Performing duplicate elimination check on the newly added question and answer pair, and avoiding repeated entry warehouse entry;
B2 Periodically eliminate pairs of questions and answers in the knowledge base, the elimination rules being based on historical answer adoption rates and manual scores.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory, wherein the processor realizes the method when executing the computer program.
The invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the method described above.
Compared with the prior art, the invention has the following beneficial effects:
1. And the retrieval accuracy is improved, namely, because synonym expansion and semantic vector matching are introduced, the system can more comprehensively understand the questions of the user, and the retrieval recall rate of the semantic related questions is remarkably improved. No matter how students express the questions, the platform can recognize the true intention and find the corresponding answers, and the condition that the retrieval fails due to the difference of expressions in the past is avoided. Tests show that compared with the traditional FAQ matching mode, the method can improve the hit rate of the useful answers to a higher level.
2. The answer naturalness is enhanced, the answer is generated and moistened by means of a large language model, and the replied sentences are more smooth and are closer to human expression. The answer generated by the system is accurate in content, has context consistency and emotion temperature, and is read more like a careless suggestion given by a consultation teacher. The natural and smooth answer style improves the user experience, students feel stronger co-emotion and understanding in communication, and satisfaction is obviously improved.
3. The architecture of the invention has good expansibility, and can support multiple rounds of conversations and complex semantic scenes. In practical application, students often conduct multiple question-answer interactions with a platform on the same topic. The system can understand the association of the front and rear questions by accumulating context semantics in prompts and utilizing the knowledge base and the knowledge graph support, and can still provide consistent and targeted answers in the follow-up questions. When the student questions relate to multiple sub topics or need to be inferred, the system can also use the inference capability of the knowledge graph and the large model to give an answer comprehensively considering multiple factors. This enables the platform to handle more complex and deep psychological counseling dialog scenarios.
4. And the platform maintains higher response speed while guaranteeing answer quality through the combination of intelligent retrieval and generation. After asking the questions, the system can quickly find out candidate answers from the massive knowledge base, and quickly generate custom replies by means of AI, and the whole process is completed in real time, so that the timeliness requirement of online consultation is met. In addition, the generated answer content aggregates the experiences of professional psychological consultants (especially when knowledge bases continue to learn solutions provided by experts) to some extent, ensuring scientificity and reliability of the advice. The platform is also internally provided with a compliance checking mechanism and a manual checking link, so that the generation of incorrect language is stopped, and the safety and the controllability of the service process are ensured.
Drawings
Fig. 1 is a schematic diagram of an implementation of a semantic hybrid retrieval and reordering method according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
As shown in fig. 1, the embodiment provides a semantic hybrid retrieval and reordering method for intelligent psychological consultation of primary and secondary school students, which specifically comprises the following steps:
S1, preprocessing a problem, namely receiving a Chinese psychological consultation problem input by a user, preprocessing the problem, extracting a keyword set capable of representing the core semantics of the problem, and converting the keyword set into normalized Chinese semantic representation, wherein the specific implementation method comprises the following steps:
101 Performing word segmentation processing on the Chinese psychological consultation problem input by the user, performing part-of-speech tagging on all the segmented words by using a part-of-speech tagging technology, extracting keywords comprising nouns and verbs, and filtering stop words to obtain a keyword set capable of representing the core semantics of the problem;
102 Synonym expansion is carried out on the extracted keywords through a predefined synonym dictionary, and the keywords are mapped into standard vocabulary expressions, so that natural language problems of users are converted into normalized Chinese semantic representations.
S2, candidate matching retrieval, namely, based on a mixed retrieval technology comprising keyword matching and semantic indexing, retrieving a candidate question-answer set related to a user problem from a FAQ knowledge base, wherein the specific implementation method comprises the following steps:
201 Performing preliminary matching search on the FAQ knowledge base based on keyword matching, and searching candidate questions with the degree of correlation with the user questions higher than a set value from the FAQ knowledge base to obtain a preliminary candidate question-answer set;
202 Using the pre-trained word vector model to convert the user questions into vectors, and searching a plurality of candidate questions and corresponding candidate answers with the semantic distance smaller than a set value from the vectors in a vector space based on the vector similarity;
203 Combining the search results obtained in step 201) and step 202) to obtain a mixed candidate question-answer set.
S3, semantic vectorization and similarity calculation, namely word vector embedding representation is carried out on the user problem and each candidate question answer, the similarity between the user problem vector and each candidate problem vector is calculated, the candidate problems are reordered according to the similarity score from high to low, and Top K candidate problems are screened out, wherein the specific implementation method comprises the following steps:
301 Respectively carrying out word vector embedding representation on each candidate problem in the user problem and the candidate question-answer set to obtain corresponding vectorization semantic representation, namely a user problem vector and a candidate problem vector;
302 Cosine similarity between the user problem vector and each candidate problem vector is calculated, and the semantic matching degree of the user problem and each candidate problem is measured;
303 Reordering all candidate questions from high to low according to the similarity score, wherein the higher the score is, the stronger the semantic relevance is;
304 The Top K candidate questions are selected as candidate questions most likely to match the user demands.
S4, generating a formula answer, namely inputting the screened Top K candidate questions and the answers corresponding to the Top K candidate questions into a pre-trained large language model, and generating an optimized natural language answer through a semantic guidance type prompt template constraint generation process.
In the semantic guidance type prompt template, the constraint large predictive model is combined with the user questions and standard answer texts corresponding to Top K candidate questions to answer, so that the large language model is guided to generate more accurate and detailed answers by utilizing the content of the existing knowledge base.
S5, answer verification optimization, namely carrying out semantic integrity and context consistency verification on the generated answer, and outputting a final answer.
Specifically, the generated answers are submitted to a manual auditing module for content verification, after the auditing is passed, the final answers are sent to the user, and new question-answer pairs which pass the auditing are added to the FAQ knowledge base in real time, so that the knowledge base dynamic updating is completed. Wherein, the audit rule of the manual audit module includes:
a1 Scoring emotional calm and professional accuracy of the generated answer;
a2 If the score is lower than the preset threshold, the psychological consultant revises the answer content manually;
A3 The revised answer is stored in a binding manner with the user question and is added into the FAQ knowledge base as a new entry.
The updating mechanism of the FAQ knowledge base comprises the following steps:
b1 Performing duplicate elimination check on the newly added question and answer pair, and avoiding repeated entry warehouse entry;
B2 Periodically eliminate pairs of questions and answers in the knowledge base, the elimination rules being based on historical answer adoption rates and manual scores.
The relevant matters related to the method are further described below.
1. Problem pre-processing
And receiving consultation questions from students, performing word segmentation processing on the input Chinese questions, and identifying key components such as nouns, verbs and the like by using part-of-speech tagging technology. And simultaneously, combining a pre-constructed synonym dictionary, carrying out synonym expansion on a word segmentation result, and mapping key information appearing in the question to a standard vocabulary expression so as to relieve matching difficulty caused by different expressions. In the process, common stop words or common stop words are filtered, and a keyword set capable of representing the core semantics of the problem is extracted. Through the preprocessing step, the natural language question of the user is converted into normalized Chinese semantic representation, and a foundation is laid for subsequent retrieval.
2. Candidate matching retrieval
Keywords and expanded vocabulary of the question are used to query the FAQ question-answer knowledge base. A plurality of psychological consultation inquiry answer pairs are stored in the knowledge base, including the common questions of students and the corresponding standard answers. The system firstly carries out preliminary matching search based on keywords, and quickly screens out a group of candidate problem sets with higher correlation degree with the user problems. At the same time, the system may query the knowledge base in combination with semantic indexing techniques, e.g., using a trained word vector model to convert the user problem into a vector, and search for candidate problems in the vector space that are closer to the semantic distance. The mixed search strategy combines the advantages of keyword matching and semantic similarity search, and ensures that potential related questions and answers with different expressions and similar meanings are not missed. Through this step, the system obtains a candidate question list containing a number of known questions and answers that may match the user's question semantics.
3. Semantic vectorization and similarity calculation
For the candidate problem set, the system further introduces a semantic vector reordering mechanism. Specifically, word vector embedding representation is performed on the user question and each candidate question, respectively, to obtain a corresponding vectorized semantic representation (e.g., sentence vectors are generated using embedding models such as Word2Vec, BERT, etc.). And then, calculating cosine similarity between the user question vector and each candidate question vector, and measuring the matching degree of the user question vector and each candidate question vector on the semanteme. The candidate questions are ranked according to the similarity score, with higher values indicating stronger semantic relevance. The system selects Top K questions with highest similarity as candidates most likely to match the user's needs. Through the reordering process based on the word vector cosine similarity, interference items which only depend on a few identical keywords but have the meanings which are not really relevant can be effectively removed, and the fact that a plurality of finally selected candidate question-answers are matched with the user questions in a semantic highly mode is ensured.
4. Generating answers
For Top K candidate questions and answers screened by semantic rearrangement, the system carries out generative supplement and optimization processing on answer contents by using a large language model. First, answer text corresponding to each candidate question is extracted from the FAQ knowledge base. These answers, along with the user's current questions, form a prompt for input to a pre-trained Large Language Model (LLM). In addition, the system can also design a semantic guidance type prompt word template, and the prompt explicitly requires the model to answer in combination with the provided candidate answers and user questions, so that a large language model is guided to generate more accurate and detailed answers by utilizing the content of the existing knowledge base. The models integrated in the invention are large language models (such as hundred degrees of "text-to-speech" and "Aili" meaning-to-thousand questions) which are independently developed in China and can be used for understanding and generating Chinese sentences. The system inputs the questions of the user, the screened related question and answer contents and preset system prompts into the model, and triggers the model to generate natural language type answers. For the problems covered in the knowledge base, the process is equivalent to language color rendering and personalized expression of the prior standard answers, while for the new problems which are not directly covered in the knowledge base, a large language model can be generated by reasoning based on self training knowledge, and helpful suggestions are given in a possible range. The generated AI module ensures that the finally replied language is more vivid and smooth, and can be combined with a plurality of reference information to synthesize, so that the answer is more suitable for the question background of students.
5. Answer verification optimization
After the large language model generates the preliminary answer, the system performs verification optimization on the semantic integrity and the context consistency of the answer. One way is to make secondary adjustments to the answer, such as disambiguation, supplementing missing information, etc., using the feedback mechanism or additional rules of the model itself, ensuring that the answer is semantically logically smooth and closely surrounds the user's question. If the platform is provided with a human secondary teacher (e.g., a psychological teacher) to participate in the review, the generated answers may be submitted to the teacher for confirmation and appropriate revision. The teacher can refer to the replies provided by the model and combine the experience fine-tuning content of the teacher so as to ensure that the reply is accurate and fits the actual situation of the students. And finally, the confirmed answers are sent to students through a platform to complete a round of consultation question-answering service. In addition, the system supports adding valuable new question-answer pairs to the FAQ knowledge base in real time as a dynamic extension to the knowledge base. After the results of each student consultation are checked and confirmed by teachers, the results can be fed back into a knowledge base to be precipitated as new knowledge, and the problem range which can be solved by the platform is gradually enriched. The circulation mechanism enables the system to continuously learn to grow in the use process, and retrieval and generation are more efficient and accurate when similar problems are encountered in the future.
The semantic mixed retrieval and reordering method for intelligent psychological consultation of the middle and primary school students provided by the invention has a plurality of key innovation points and superiorities in the technology:
1. Word vector semantic rearrangement, namely, word vector representation and cosine similarity calculation are introduced, and semantic relevance reordering is carried out on candidate answers obtained through preliminary retrieval. Compared with the traditional matching mode which only depends on keywords, the vectorization rearrangement can identify the problems of synonymous expression and semantic approximation, obviously improves the matching accuracy and ensures that the search results are ranked according to semantic relevance instead of literal similarity.
2. The mixed question and answer combined with the search and generation adopts a mixed search-generation architecture (RAG is introduced, RETRIEVAL-Augmented Generation) to organically integrate knowledge base search and large language model generation. When the knowledge base has answers, the retrieval result is used for guiding the answers of the generation model, and when the knowledge base lacks direct answers, the generation model plays creative generation replies. The mixing strategy has the advantages of ensuring that the answer is based on knowledge base fact basis, providing richer and natural expression by using the generated AI, and realizing the balance of retrieval accuracy and generation flexibility.
3. The semantic guided Prompt mechanism designs a semantic guided Prompt strategy when interacting with a large language model. By adding the context information such as the filtered related questions and answers to the prompts, the model is guided to answer by referring to the semantic related contents, and the model is prevented from being generated in disorder from the existing knowledge. The mechanism is equal to that a bridge is built between the model and the knowledge base, so that the generated result is constrained and guided by the semantics of the knowledge base, and the relevance and the reliability of answers are improved. Compared with the mode of directly enabling the model to be freely generated, the semantic prompt mode can effectively reduce content deviation, ensure answer questions and accord with psychological consultation situations.
4. The invention preferably integrates a domestic large language model as a generating engine, such as a Chinese pre-training model developed by mechanisms of hundred degrees, signal flight and the like. Compared with the dependence on overseas models, the scheme has more advantages in data safety and use compliance, can ensure that student sensitive questioning data is not in the way of going out, and meets relevant supervision requirements. In addition, the domestic model is optimized in the Chinese field, so that the language, the atmosphere and the habit of the teenager psychological consultation dialogue under the Chinese context are understood, and the words are more fit with the native culture. The large model can be deployed locally to perform special training and fine adjustment according to the requirement, so that customized psychological consultation answering service can be provided with high efficiency.
5. The knowledge base dynamic expansion supporting continuous updating iteration of the FAQ knowledge base is also a great feature of the invention. The system integrates the newly added high-quality question and answer pairs in the running process into the knowledge base in time, so that the content of the knowledge base is continuously enriched. The dynamic capacity expansion mechanism means that the knowledge coverage of the platform can be expanded along with use, and the problem that the traditional system knowledge base is cured and old is avoided. Over time, the types and accuracy of questions that the platform can answer are gradually improved, and newly presented questions can be quickly recorded and answered, forming benign self-optimization loops.
The embodiment also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor realizes the method when executing the computer program.
The present embodiment also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described method.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (10)

1.一种面向中小学生智能心理咨询的语义混合检索与重排序方法,其特征在于,包括以下步骤:1. A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students, characterized by comprising the following steps: S1、接收用户输入的中文心理咨询问题,并对其进行预处理,提取出能够代表问题核心语义的关键词集合,并转换为规范化的中文语义表示;S1. Receive Chinese psychological counseling questions input by users, pre-process them, extract a set of keywords that can represent the core semantics of the questions, and convert them into standardized Chinese semantic representations; S2、基于包括关键词匹配和语义索引的混合检索技术,从FAQ知识库中检索出与用户问题相关的候选问答集合;S2. Based on a hybrid retrieval technology including keyword matching and semantic indexing, a candidate question and answer set related to the user's question is retrieved from the FAQ knowledge base; S3、对用户问题和每个候选问答进行词向量嵌入表示,计算用户问题向量与各个候选问题向量之间的相似度,并根据相似度得分从高到低对候选问题进行重排序,筛选出Top K个候选问题;S3. Embed the user question and each candidate question and answer into word vectors, calculate the similarity between the user question vector and each candidate question vector, and re-rank the candidate questions from high to low based on the similarity score to select the top K candidate questions. S4、将筛选出的Top K个候选问题及其对应的回答输入预训练的大语言模型中,生成优化后的自然语言回答;S4: Input the top K candidate questions and their corresponding answers into the pre-trained large language model to generate optimized natural language answers; S5、对生成的回答进行语义完整性和上下文一致性校验,并输出最终答案。S5. Check the semantic integrity and context consistency of the generated answer and output the final answer. 2.根据权利要求1所述的面向中小学生智能心理咨询的语义混合检索与重排序方法,其特征在于,步骤S1中,对用户输入的中文心理咨询问题进行预处理的实现方法为:2. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that, in step S1, the method for pre-processing the Chinese psychological counseling questions input by the user is: 101)对用户输入的中文心理咨询问题进行分词处理,并利用词性标注技术对所有分词进行词性标注,然后提取出包括名词、动词的关键词,过滤停用词,得到能够代表问题核心语义的关键词集合;101) Segment the Chinese psychological counseling questions input by the user and tag all the segmented words using part-of-speech tagging technology. Then, extract keywords including nouns and verbs, filter out stop words, and obtain a set of keywords that can represent the core semantics of the question. 102)通过预定义的同义词词典对提取的关键词进行同义词扩展,将关键词映射为标准词汇表述,从而将用户的自然语言问题转换为规范化的中文语义表示。102) Synonym expansion is performed on the extracted keywords through a predefined synonym dictionary, and the keywords are mapped into standard vocabulary expressions, thereby converting the user's natural language questions into standardized Chinese semantic representations. 3.根据权利要求1所述的面向中小学生智能心理咨询的语义混合检索与重排序方法,其特征在于,步骤S2中,从FAQ知识库中检索出与用户问题相关的候选问答集合的实现方法为:3. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that, in step S2, the method for retrieving a candidate question and answer set related to the user's question from the FAQ knowledge base is: 201)基于关键词匹配在FAQ知识库进行初步匹配检索,从FAQ知识库中检索出与用户问题相关度高于设定值的候选问题,得到初步候选问答集合;201) Perform preliminary matching retrieval in the FAQ knowledge base based on keyword matching, retrieve candidate questions with a relevance to the user question higher than a set value from the FAQ knowledge base, and obtain a preliminary candidate question and answer set; 202)利用预训练的词向量模型将用户问题转化为向量,基于向量相似度在向量空间中检索与该向量的语义距离小于设定值的若干候选问题及对应的候选回答;202) Use the pre-trained word vector model to convert the user question into a vector, and retrieve several candidate questions and corresponding candidate answers in the vector space whose semantic distance to the vector is less than a set value based on vector similarity; 203)结合步骤201)和步骤202)得到的检索结果,得到混合候选问答集合。203) Combining the retrieval results obtained in step 201) and step 202) to obtain a mixed candidate question and answer set. 4.根据权利要求1所述的面向中小学生智能心理咨询的语义混合检索与重排序方法,其特征在于,步骤S3的具体实现方法为:4. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that the specific implementation method of step S3 is: 301)对用户问题和候选问答集合中的每个候选问题分别进行词向量嵌入表示,得到对应的向量化语义表示,即用户问题向量和候选问题向量;301) Perform word vector embedding on the user question and each candidate question in the candidate question and answer set to obtain the corresponding vectorized semantic representation, i.e., the user question vector and the candidate question vector; 302)计算用户问题向量与各个候选问题向量之间的余弦相似度,衡量用户问题与各个候选问题在语义上的匹配程度;302) Calculate the cosine similarity between the user question vector and each candidate question vector to measure the degree of semantic matching between the user question and each candidate question; 303)根据相似度得分从高到低对所有候选问题进行重排序,得分越高表示语义相关性越强;303) Re-rank all candidate questions based on their similarity scores from high to low, where higher scores indicate stronger semantic relevance; 304)选取相似度得分最高的前K个候选问题,即Top K个候选问题作为最有可能匹配用户需求的候选问题。304) Select the top K candidate questions with the highest similarity scores, i.e., the top K candidate questions, as the candidate questions most likely to match the user's needs. 5.根据权利要求1所述的面向中小学生智能心理咨询的语义混合检索与重排序方法,其特征在于,步骤S4中,将筛选出的Top K个候选问题及其对应的回答输入预训练的大语言模型中,通过语义引导型提示模板约束生成过程,生成优化后的自然语言回答;在语义引导型提示模板中,约束大预言模型结合用户问题及Top K个候选问题对应的标准回答文本进行回答,从而引导大语言模型利用已有知识库内容来生成更加准确和详尽的答复;所述大语言模型部署于本地服务器,以根据需要进行训练微调。5. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that, in step S4, the top K candidate questions and their corresponding answers are input into a pre-trained large language model, and the generation process is constrained by a semantically guided prompt template to generate optimized natural language answers; in the semantically guided prompt template, the large prediction model is constrained to answer based on the user question and the standard answer text corresponding to the top K candidate questions, thereby guiding the large language model to use the existing knowledge base content to generate more accurate and detailed responses; the large language model is deployed on a local server for training and fine-tuning as needed. 6.根据权利要求1所述的面向中小学生智能心理咨询的语义混合检索与重排序方法,其特征在于,步骤S5中,将生成的回答提交至人工审核模块进行内容校验,审核通过后将最终答案发送给用户,并将审核通过的新问答对实时添加至FAQ知识库中,完成知识库动态更新。6. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that, in step S5, the generated answer is submitted to the manual review module for content verification, and the final answer is sent to the user after the review is passed, and the new question and answer pairs that have passed the review are added to the FAQ knowledge base in real time to complete the dynamic update of the knowledge base. 7.根据权利要求6所述的面向中小学生智能心理咨询的语义混合检索与重排序方法,其特征在于,所述人工审核模块的审核规则包括:7. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 6, characterized in that the review rules of the manual review module include: A1)对生成回答的情绪安抚性、专业准确性进行评分;A1) Score the emotional soothingness and professional accuracy of the generated answers; A2)若评分低于预设阈值,由心理咨询师手动修订回答内容;A2) If the score is below the preset threshold, the psychological counselor will manually revise the answer; A3)修订后的回答与用户问题绑定存储,作为新增条目加入FAQ知识库。A3) The revised answer is stored in conjunction with the user's question and added to the FAQ knowledge base as a new entry. 8.根据权利要求6所述的面向中小学生智能心理咨询的语义混合检索与重排序方法,其特征在于,所述FAQ知识库的更新机制包括:8. The semantic hybrid retrieval and re-ranking method for intelligent psychological consultation for primary and secondary school students according to claim 6, characterized in that the update mechanism of the FAQ knowledge base includes: B1)对新增问答对进行去重校验,避免重复条目入库;B1) Perform deduplication verification on newly added question and answer pairs to avoid duplicate entries in the database; B2)定期对知识库中的低效问答对进行淘汰,淘汰规则基于历史回答采纳率及人工评分。B2) Regularly eliminate inefficient question-answer pairs from the knowledge base. The elimination rules are based on historical answer adoption rates and manual scoring. 9.一种电子设备,包括存储器、处理器及存储在存储器上的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1-8中任一项所述的方法。9. An electronic device comprising a memory, a processor, and a computer program stored in the memory, wherein the processor implements the method according to any one of claims 1 to 8 when executing the computer program. 10.一种计算机可读存储介质,其特征在于,存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-8中任一项所述的方法。10. A computer-readable storage medium, characterized in that a computer program is stored therein, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 8 is implemented.
CN202510693980.6A 2025-05-27 2025-05-27 A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students Pending CN120611018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510693980.6A CN120611018A (en) 2025-05-27 2025-05-27 A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510693980.6A CN120611018A (en) 2025-05-27 2025-05-27 A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students

Publications (1)

Publication Number Publication Date
CN120611018A true CN120611018A (en) 2025-09-09

Family

ID=96925645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510693980.6A Pending CN120611018A (en) 2025-05-27 2025-05-27 A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students

Country Status (1)

Country Link
CN (1) CN120611018A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121255967A (en) * 2025-12-05 2026-01-02 广东顺畅科技有限公司 Knowledge extraction method and system for large language model without search assistance

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121255967A (en) * 2025-12-05 2026-01-02 广东顺畅科技有限公司 Knowledge extraction method and system for large language model without search assistance

Similar Documents

Publication Publication Date Title
Paladines et al. A systematic literature review of intelligent tutoring systems with dialogue in natural language
KR102654480B1 (en) Knowledge based dialogue system and method for language learning
CN117235347B (en) A Learning System and Method for Teenagers Based on Large Language Models and Algorithm Code
CN117149984A (en) A customized training method and device based on large model thinking chain
CN119149710B (en) An AI online education intelligent question-answering information processing method
CN117992614A (en) A method, device, equipment and medium for sentiment classification of Chinese online course reviews
Picca et al. Natural Language Processing in Serious Games: A state of the art.
Chen et al. A ranked-based learning approach to automated essay scoring
CN120611018A (en) A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students
Li English Research Learning and Functional Research Based on Constructivism Theory and Few‐Shot Learning
CN119474259A (en) A quantitative diagnosis method, device and equipment for abnormal psychological diseases
Safari et al. Data augmentation and preparation process of PerInfEx: a Persian chatbot with the ability of information extraction
Liu Exploring the impact of artificial intelligence-enhanced language learning on youths’ intercultural communication competence
Cheng et al. [Retracted] Construction of AI Environmental Music Education Application Model Based on Deep Learning
CN120277199B (en) Children's education knowledge boundary management method, system and equipment based on large model
CN117711444B (en) An interactive method, device, equipment and storage medium based on eloquence expression
CN118917437A (en) Man-machine dialogue method based on AI intelligent large model
Shi et al. Xai language tutor–a Xai-based language learning Chatbot using ontology and transfer learning techniques
Jadhav et al. Engage Learn: An AI-Based English Proficiency Improviser
CN120973949B (en) A Real-Time Anxiety State Assessment Method Based on Multimodal Fusion
Tao et al. Self‐Study System Assessment of Spoken English considering the Speech Scientific Computing Knowledge Assessment Algorithm
CN119848555B (en) Large model data labeling method, device, equipment, medium and product
Jing Error pattern recognition and correction methods in English oral learning process based on deep learning
US12566904B2 (en) Methods and systems for domain-specific interview simulations
Wang Research and Implementation of English Assisted Learning System Based on Decision Tree Algorithm for Judging Vocabulary Difficulty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination