CN120611018A

CN120611018A - A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students

Info

Publication number: CN120611018A
Application number: CN202510693980.6A
Authority: CN
Inventors: 潘志宏; 汪孝泉; 林明芳; 解徐超; 陈锦丽
Original assignee: Second Affiliated Middle School Of Fuzhou Institute Of Education
Current assignee: Second Affiliated Middle School Of Fuzhou Institute Of Education
Priority date: 2025-05-27
Filing date: 2025-05-27
Publication date: 2025-09-09

Abstract

The present invention relates to a semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students, comprising the following steps: S1, receiving a question input by a user and pre-processing it, extracting a keyword set, and converting it into a standardized Chinese semantic representation; S2, retrieving a set of candidate questions and answers related to the user question from a FAQ knowledge base based on a hybrid retrieval technique including keyword matching and semantic indexing; S3, embedding the user question and each candidate question and answer into word vectors, calculating the similarity between them, and re-ranking the candidate questions based on the similarity scores to screen out the top K candidate questions; S4, inputting the top K candidate questions and their answers into a pre-trained large language model to generate optimized natural language answers; S5, performing semantic integrity and context consistency verification on the generated answers, and outputting the final answers. This method can improve the matching accuracy and response quality of intelligent psychological counseling questions and answers for primary and secondary school students.

Description

Semantic mixed retrieval and reordering method for intelligent psychological consultation of middle and primary school students

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a semantic mixed retrieval and reordering method for intelligent psychological consultation of primary and secondary school students.

Background

With the development of artificial intelligence technology, a psychological consultation platform for primary and secondary school students is gradually fusing semantic understanding and question-answering dialogue technology. At present, two paths are mostly adopted in the mainstream products, namely, one path is based on retrieval matching of knowledge base or community questions and answers, and the other path is to introduce a large model (a large pre-training language model) for dialogue generation.

Knowledge base/community question-answering path-early on-line psychological consultation platform (such as one psychological) is mainly positioned in psychological content and service provision, and the on-line question-answering community gathers professional consultants to answer questions for users. After the user submits the text questions on the platform, the system issues the questions, and the interesting psychological consultants leave a message to answer, so that a large number of answer pairs and psychological knowledge contents are formed by gradual precipitation. In order to improve the retrieval efficiency, the platform gradually introduces semantic analysis and matching technology, for example, a community question-and-answer retrieval system patent (CN 105786794B) is used for matching related question-and-answer pairs by extracting keywords, expanding and calculating weights of user questions and combining dependency relation analysis, so that the retrieval accuracy is improved. In these schemes, natural language processing is used to understand the intent of the user's question and retrieve the best answer from the existing FAQ knowledge base.

Large model dialogue path with breakthrough of deep learning and pre-training models, the exploration of using large models for psychological dialogue has emerged in recent years. For example, the concept product AI heart words provides to build a psychological consultation platform by using an advanced large model technology, and provides comprehensive and convenient psychological health services for users by means of personalized psychological assessment, instant feedback, interactive learning and other functions. In practical application, the large factory starts to lay out the direction that the Yidianling platform announces the capability of accessing a large model of hundred degree text-to-speech (ERNIE Bot), and becomes a platform for applying the generated dialogue model to psychological consultation scenes for the first time in the industry. The religion relies on cross-modal and cross-language deep semantic understanding and generating capability, and can be used for intention recognition and content generation in psychological consultation dialogue. By introducing the large model dialogue technology, the platform hopes to realize deep analysis of natural language and emotion expression of the user, timely identify potential psychological problems of the user and give auxiliary support. The Tengxun also cooperates with the psychological research institute of China academy to develop an artificial intelligence psychological consultation robot and applies for a related patent CN111667926B, and provides a consultation session system based on psychological knowledge patterns and multi-round dialogue management. The system comprises an input module, a language analysis module, a logic tree dialogue module, a corpus, a feedback module and the like, and is used for assisting multi-round dialogue intention analysis through a psychological knowledge graph, guiding a dialogue flow and enabling a user to obtain dialogue experience close to a real person consultant. In addition, some customized large models begin to appear, such as EmoLLM models introduced by hundred-degree intelligent cloud are specially designed for mental health support, instruction fine adjustment and multi-model fusion technology are adopted, and the understanding and response capability of the emotion context of the user are improved by combining the advantages of a plurality of pre-training models. The schemes have the characteristics of semantic retrieval, question-answer matching and large model fusion, namely, the scheme not only has a FAQ retrieval system based on a domain knowledge base, but also has a large model dialogue system fused with emotion analysis and knowledge maps.

Although the above-described solution improves the level of intellectualization of on-line psychological counseling, the prior art still has some drawbacks and aspects to be improved with respect to the object of the present invention:

1. The semantic retrieval precision is limited, and the traditional FAQ retrieval and community question-answer matching mainly depend on keywords and shallow semantic expansion. When students in middle and primary schools describe psychological puzzles in daily language or in a hidden way, simple keyword matching may not accurately understand deep intention, and problems of incomplete search or inaccurate matching are likely to occur. The existing schemes improve the question-answer matching rate through rule weights and dependency relationships, but the situation that semantic analysis is incomplete and implicit emotion factors cannot be identified still occurs in the face of student groups with variable expressions.

2. The reliability of dialog generation is insufficient, and the use of a large model directly for psychological dialog, while excellent in fluency, also exposes the problem of content reliability. The general large models (e.g., chatGPT, etc.) often lack expert psychological knowledge constraints and may generate responses that are inconsistent with standard psychological coaching, even with biased or distorted advice. The simple integration approach (e.g., invoking a discontent-generate answer) currently attempted in the industry lacks checksum guidance on large model outputs, and presents a risk.

3. The optimization of the context and the multi-round interaction is insufficient in that the existing multiple psychological question-answering system is either in a single-round question-answering (user question and answer) mode or adopts a preset dialogue logic tree to manage multi-round dialogue. However, the fixed logic tree is difficult to cover a wide variety of real consultation scenarios, which may cause stiffness of the conversation process and cannot be dynamically adjusted according to user feedback. On the other hand, typical FAQ systems lack dialog memory and cannot provide a more pertinent response in combination with the user's historical representation in the dialog.

4. Knowledge fusion and reasoning depth are insufficient, namely in the current solution, knowledge base retrieval and large model reasoning are often independent, and organic fusion is not formed yet. Such as FAQ retrieval systems, which, while reliable, lack analysis for individuals due to partial templates of answers, pure large model dialogs can generate responses for specific expressions, but may ignore knowledge of mature psychological interventions.

Disclosure of Invention

The invention aims to provide a semantic mixed retrieval and reordering method for intelligent psychological consultation of primary and secondary school students, which can improve the matching accuracy and the replying quality of the intelligent psychological consultation questions and answers of the primary and secondary school students.

In order to achieve the purpose, the technical scheme adopted by the invention is that the semantic hybrid retrieval and reordering method for intelligent psychological consultation of middle and primary school students comprises the following steps:

s1, receiving a Chinese psychological consultation problem input by a user, preprocessing the problem, extracting a keyword set capable of representing the core semantics of the problem, and converting the keyword set into normalized Chinese semantic representation;

s2, searching a candidate question-answer set related to the user problem from the FAQ knowledge base based on a mixed search technology comprising keyword matching and semantic indexing;

S3, word vector embedding representation is carried out on the user questions and each candidate question answer, the similarity between the user question vectors and each candidate question vector is calculated, the candidate questions are reordered from high to low according to the similarity score, and Top K candidate questions are screened out;

S4, inputting the screened Top K candidate questions and the answers corresponding to the Top K candidate questions into a pre-trained large language model to generate optimized natural language answers;

S5, carrying out semantic integrity and context consistency verification on the generated answer, and outputting a final answer.

Further, in step S1, the implementation method for preprocessing the chinese psychological consultation problem input by the user is as follows:

101 Performing word segmentation processing on the Chinese psychological consultation problem input by the user, performing part-of-speech tagging on all the segmented words by using a part-of-speech tagging technology, extracting keywords comprising nouns and verbs, and filtering stop words to obtain a keyword set capable of representing the core semantics of the problem;

102 Synonym expansion is carried out on the extracted keywords through a predefined synonym dictionary, and the keywords are mapped into standard vocabulary expressions, so that natural language problems of users are converted into normalized Chinese semantic representations.

Further, in step S2, the implementation method for retrieving the candidate question-answer set related to the user question from the FAQ knowledge base includes:

201 Performing preliminary matching search on the FAQ knowledge base based on keyword matching, and searching candidate questions with the degree of correlation with the user questions higher than a set value from the FAQ knowledge base to obtain a preliminary candidate question-answer set;

202 Using the pre-trained word vector model to convert the user questions into vectors, and searching a plurality of candidate questions and corresponding candidate answers with the semantic distance smaller than a set value from the vectors in a vector space based on the vector similarity;

203 Combining the search results obtained in step 201) and step 202) to obtain a mixed candidate question-answer set.

Further, the specific implementation method of the step S3 is as follows:

301 Respectively carrying out word vector embedding representation on each candidate problem in the user problem and the candidate question-answer set to obtain corresponding vectorization semantic representation, namely a user problem vector and a candidate problem vector;

302 Cosine similarity between the user problem vector and each candidate problem vector is calculated, and the semantic matching degree of the user problem and each candidate problem is measured;

303 Reordering all candidate questions from high to low according to the similarity score, wherein the higher the score is, the stronger the semantic relevance is;

304 The Top K candidate questions are selected as candidate questions most likely to match the user demands.

In step S4, the screened Top K candidate questions and the answers corresponding to the Top K candidate questions are input into a pre-trained large language model, the optimized natural language answers are generated through a constraint generation process of a semantic guidance type prompt template, in the semantic guidance type prompt template, the large language model is constrained to answer by combining the user questions and standard answer texts corresponding to the Top K candidate questions, so that the large language model is guided to generate more accurate and detailed answers by utilizing the content of an existing knowledge base, and the large language model is deployed on a local server to perform training fine adjustment according to requirements.

Further, in step S5, the generated answer is submitted to a manual review module for content verification, after the review is passed, the final answer is sent to the user, and the new question-answer pair passed by the review is added to the FAQ knowledge base in real time, so as to complete the dynamic update of the knowledge base.

Further, the auditing rule of the manual auditing module includes:

a1 Scoring emotional calm and professional accuracy of the generated answer;

a2 If the score is lower than the preset threshold, the psychological consultant revises the answer content manually;

A3 The revised answer is stored in a binding manner with the user question and is added into the FAQ knowledge base as a new entry.

Further, the updating mechanism of the FAQ knowledge base includes:

b1 Performing duplicate elimination check on the newly added question and answer pair, and avoiding repeated entry warehouse entry;

B2 Periodically eliminate pairs of questions and answers in the knowledge base, the elimination rules being based on historical answer adoption rates and manual scores.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory, wherein the processor realizes the method when executing the computer program.

The invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the method described above.

Compared with the prior art, the invention has the following beneficial effects:

1. And the retrieval accuracy is improved, namely, because synonym expansion and semantic vector matching are introduced, the system can more comprehensively understand the questions of the user, and the retrieval recall rate of the semantic related questions is remarkably improved. No matter how students express the questions, the platform can recognize the true intention and find the corresponding answers, and the condition that the retrieval fails due to the difference of expressions in the past is avoided. Tests show that compared with the traditional FAQ matching mode, the method can improve the hit rate of the useful answers to a higher level.

2. The answer naturalness is enhanced, the answer is generated and moistened by means of a large language model, and the replied sentences are more smooth and are closer to human expression. The answer generated by the system is accurate in content, has context consistency and emotion temperature, and is read more like a careless suggestion given by a consultation teacher. The natural and smooth answer style improves the user experience, students feel stronger co-emotion and understanding in communication, and satisfaction is obviously improved.

3. The architecture of the invention has good expansibility, and can support multiple rounds of conversations and complex semantic scenes. In practical application, students often conduct multiple question-answer interactions with a platform on the same topic. The system can understand the association of the front and rear questions by accumulating context semantics in prompts and utilizing the knowledge base and the knowledge graph support, and can still provide consistent and targeted answers in the follow-up questions. When the student questions relate to multiple sub topics or need to be inferred, the system can also use the inference capability of the knowledge graph and the large model to give an answer comprehensively considering multiple factors. This enables the platform to handle more complex and deep psychological counseling dialog scenarios.

4. And the platform maintains higher response speed while guaranteeing answer quality through the combination of intelligent retrieval and generation. After asking the questions, the system can quickly find out candidate answers from the massive knowledge base, and quickly generate custom replies by means of AI, and the whole process is completed in real time, so that the timeliness requirement of online consultation is met. In addition, the generated answer content aggregates the experiences of professional psychological consultants (especially when knowledge bases continue to learn solutions provided by experts) to some extent, ensuring scientificity and reliability of the advice. The platform is also internally provided with a compliance checking mechanism and a manual checking link, so that the generation of incorrect language is stopped, and the safety and the controllability of the service process are ensured.

Drawings

Fig. 1 is a schematic diagram of an implementation of a semantic hybrid retrieval and reordering method according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

As shown in fig. 1, the embodiment provides a semantic hybrid retrieval and reordering method for intelligent psychological consultation of primary and secondary school students, which specifically comprises the following steps:

S1, preprocessing a problem, namely receiving a Chinese psychological consultation problem input by a user, preprocessing the problem, extracting a keyword set capable of representing the core semantics of the problem, and converting the keyword set into normalized Chinese semantic representation, wherein the specific implementation method comprises the following steps:

S2, candidate matching retrieval, namely, based on a mixed retrieval technology comprising keyword matching and semantic indexing, retrieving a candidate question-answer set related to a user problem from a FAQ knowledge base, wherein the specific implementation method comprises the following steps:

S3, semantic vectorization and similarity calculation, namely word vector embedding representation is carried out on the user problem and each candidate question answer, the similarity between the user problem vector and each candidate problem vector is calculated, the candidate problems are reordered according to the similarity score from high to low, and Top K candidate problems are screened out, wherein the specific implementation method comprises the following steps:

S4, generating a formula answer, namely inputting the screened Top K candidate questions and the answers corresponding to the Top K candidate questions into a pre-trained large language model, and generating an optimized natural language answer through a semantic guidance type prompt template constraint generation process.

In the semantic guidance type prompt template, the constraint large predictive model is combined with the user questions and standard answer texts corresponding to Top K candidate questions to answer, so that the large language model is guided to generate more accurate and detailed answers by utilizing the content of the existing knowledge base.

S5, answer verification optimization, namely carrying out semantic integrity and context consistency verification on the generated answer, and outputting a final answer.

Specifically, the generated answers are submitted to a manual auditing module for content verification, after the auditing is passed, the final answers are sent to the user, and new question-answer pairs which pass the auditing are added to the FAQ knowledge base in real time, so that the knowledge base dynamic updating is completed. Wherein, the audit rule of the manual audit module includes:

a1 Scoring emotional calm and professional accuracy of the generated answer;

The updating mechanism of the FAQ knowledge base comprises the following steps:

The relevant matters related to the method are further described below.

1. Problem pre-processing

And receiving consultation questions from students, performing word segmentation processing on the input Chinese questions, and identifying key components such as nouns, verbs and the like by using part-of-speech tagging technology. And simultaneously, combining a pre-constructed synonym dictionary, carrying out synonym expansion on a word segmentation result, and mapping key information appearing in the question to a standard vocabulary expression so as to relieve matching difficulty caused by different expressions. In the process, common stop words or common stop words are filtered, and a keyword set capable of representing the core semantics of the problem is extracted. Through the preprocessing step, the natural language question of the user is converted into normalized Chinese semantic representation, and a foundation is laid for subsequent retrieval.

2. Candidate matching retrieval

Keywords and expanded vocabulary of the question are used to query the FAQ question-answer knowledge base. A plurality of psychological consultation inquiry answer pairs are stored in the knowledge base, including the common questions of students and the corresponding standard answers. The system firstly carries out preliminary matching search based on keywords, and quickly screens out a group of candidate problem sets with higher correlation degree with the user problems. At the same time, the system may query the knowledge base in combination with semantic indexing techniques, e.g., using a trained word vector model to convert the user problem into a vector, and search for candidate problems in the vector space that are closer to the semantic distance. The mixed search strategy combines the advantages of keyword matching and semantic similarity search, and ensures that potential related questions and answers with different expressions and similar meanings are not missed. Through this step, the system obtains a candidate question list containing a number of known questions and answers that may match the user's question semantics.

3. Semantic vectorization and similarity calculation

For the candidate problem set, the system further introduces a semantic vector reordering mechanism. Specifically, word vector embedding representation is performed on the user question and each candidate question, respectively, to obtain a corresponding vectorized semantic representation (e.g., sentence vectors are generated using embedding models such as Word2Vec, BERT, etc.). And then, calculating cosine similarity between the user question vector and each candidate question vector, and measuring the matching degree of the user question vector and each candidate question vector on the semanteme. The candidate questions are ranked according to the similarity score, with higher values indicating stronger semantic relevance. The system selects Top K questions with highest similarity as candidates most likely to match the user's needs. Through the reordering process based on the word vector cosine similarity, interference items which only depend on a few identical keywords but have the meanings which are not really relevant can be effectively removed, and the fact that a plurality of finally selected candidate question-answers are matched with the user questions in a semantic highly mode is ensured.

4. Generating answers

For Top K candidate questions and answers screened by semantic rearrangement, the system carries out generative supplement and optimization processing on answer contents by using a large language model. First, answer text corresponding to each candidate question is extracted from the FAQ knowledge base. These answers, along with the user's current questions, form a prompt for input to a pre-trained Large Language Model (LLM). In addition, the system can also design a semantic guidance type prompt word template, and the prompt explicitly requires the model to answer in combination with the provided candidate answers and user questions, so that a large language model is guided to generate more accurate and detailed answers by utilizing the content of the existing knowledge base. The models integrated in the invention are large language models (such as hundred degrees of "text-to-speech" and "Aili" meaning-to-thousand questions) which are independently developed in China and can be used for understanding and generating Chinese sentences. The system inputs the questions of the user, the screened related question and answer contents and preset system prompts into the model, and triggers the model to generate natural language type answers. For the problems covered in the knowledge base, the process is equivalent to language color rendering and personalized expression of the prior standard answers, while for the new problems which are not directly covered in the knowledge base, a large language model can be generated by reasoning based on self training knowledge, and helpful suggestions are given in a possible range. The generated AI module ensures that the finally replied language is more vivid and smooth, and can be combined with a plurality of reference information to synthesize, so that the answer is more suitable for the question background of students.

5. Answer verification optimization

After the large language model generates the preliminary answer, the system performs verification optimization on the semantic integrity and the context consistency of the answer. One way is to make secondary adjustments to the answer, such as disambiguation, supplementing missing information, etc., using the feedback mechanism or additional rules of the model itself, ensuring that the answer is semantically logically smooth and closely surrounds the user's question. If the platform is provided with a human secondary teacher (e.g., a psychological teacher) to participate in the review, the generated answers may be submitted to the teacher for confirmation and appropriate revision. The teacher can refer to the replies provided by the model and combine the experience fine-tuning content of the teacher so as to ensure that the reply is accurate and fits the actual situation of the students. And finally, the confirmed answers are sent to students through a platform to complete a round of consultation question-answering service. In addition, the system supports adding valuable new question-answer pairs to the FAQ knowledge base in real time as a dynamic extension to the knowledge base. After the results of each student consultation are checked and confirmed by teachers, the results can be fed back into a knowledge base to be precipitated as new knowledge, and the problem range which can be solved by the platform is gradually enriched. The circulation mechanism enables the system to continuously learn to grow in the use process, and retrieval and generation are more efficient and accurate when similar problems are encountered in the future.

The semantic mixed retrieval and reordering method for intelligent psychological consultation of the middle and primary school students provided by the invention has a plurality of key innovation points and superiorities in the technology:

1. Word vector semantic rearrangement, namely, word vector representation and cosine similarity calculation are introduced, and semantic relevance reordering is carried out on candidate answers obtained through preliminary retrieval. Compared with the traditional matching mode which only depends on keywords, the vectorization rearrangement can identify the problems of synonymous expression and semantic approximation, obviously improves the matching accuracy and ensures that the search results are ranked according to semantic relevance instead of literal similarity.

2. The mixed question and answer combined with the search and generation adopts a mixed search-generation architecture (RAG is introduced, RETRIEVAL-Augmented Generation) to organically integrate knowledge base search and large language model generation. When the knowledge base has answers, the retrieval result is used for guiding the answers of the generation model, and when the knowledge base lacks direct answers, the generation model plays creative generation replies. The mixing strategy has the advantages of ensuring that the answer is based on knowledge base fact basis, providing richer and natural expression by using the generated AI, and realizing the balance of retrieval accuracy and generation flexibility.

3. The semantic guided Prompt mechanism designs a semantic guided Prompt strategy when interacting with a large language model. By adding the context information such as the filtered related questions and answers to the prompts, the model is guided to answer by referring to the semantic related contents, and the model is prevented from being generated in disorder from the existing knowledge. The mechanism is equal to that a bridge is built between the model and the knowledge base, so that the generated result is constrained and guided by the semantics of the knowledge base, and the relevance and the reliability of answers are improved. Compared with the mode of directly enabling the model to be freely generated, the semantic prompt mode can effectively reduce content deviation, ensure answer questions and accord with psychological consultation situations.

4. The invention preferably integrates a domestic large language model as a generating engine, such as a Chinese pre-training model developed by mechanisms of hundred degrees, signal flight and the like. Compared with the dependence on overseas models, the scheme has more advantages in data safety and use compliance, can ensure that student sensitive questioning data is not in the way of going out, and meets relevant supervision requirements. In addition, the domestic model is optimized in the Chinese field, so that the language, the atmosphere and the habit of the teenager psychological consultation dialogue under the Chinese context are understood, and the words are more fit with the native culture. The large model can be deployed locally to perform special training and fine adjustment according to the requirement, so that customized psychological consultation answering service can be provided with high efficiency.

5. The knowledge base dynamic expansion supporting continuous updating iteration of the FAQ knowledge base is also a great feature of the invention. The system integrates the newly added high-quality question and answer pairs in the running process into the knowledge base in time, so that the content of the knowledge base is continuously enriched. The dynamic capacity expansion mechanism means that the knowledge coverage of the platform can be expanded along with use, and the problem that the traditional system knowledge base is cured and old is avoided. Over time, the types and accuracy of questions that the platform can answer are gradually improved, and newly presented questions can be quickly recorded and answered, forming benign self-optimization loops.

The embodiment also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor realizes the method when executing the computer program.

The present embodiment also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described method.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

1. A semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students, characterized by comprising the following steps:

S1. Receive Chinese psychological counseling questions input by users, pre-process them, extract a set of keywords that can represent the core semantics of the questions, and convert them into standardized Chinese semantic representations;

S2. Based on a hybrid retrieval technology including keyword matching and semantic indexing, a candidate question and answer set related to the user's question is retrieved from the FAQ knowledge base;

S3. Embed the user question and each candidate question and answer into word vectors, calculate the similarity between the user question vector and each candidate question vector, and re-rank the candidate questions from high to low based on the similarity score to select the top K candidate questions.

S4: Input the top K candidate questions and their corresponding answers into the pre-trained large language model to generate optimized natural language answers;

S5. Check the semantic integrity and context consistency of the generated answer and output the final answer.

2. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that, in step S1, the method for pre-processing the Chinese psychological counseling questions input by the user is:

101) Segment the Chinese psychological counseling questions input by the user and tag all the segmented words using part-of-speech tagging technology. Then, extract keywords including nouns and verbs, filter out stop words, and obtain a set of keywords that can represent the core semantics of the question.

102) Synonym expansion is performed on the extracted keywords through a predefined synonym dictionary, and the keywords are mapped into standard vocabulary expressions, thereby converting the user's natural language questions into standardized Chinese semantic representations.

3. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that, in step S2, the method for retrieving a candidate question and answer set related to the user's question from the FAQ knowledge base is:

201) Perform preliminary matching retrieval in the FAQ knowledge base based on keyword matching, retrieve candidate questions with a relevance to the user question higher than a set value from the FAQ knowledge base, and obtain a preliminary candidate question and answer set;

202) Use the pre-trained word vector model to convert the user question into a vector, and retrieve several candidate questions and corresponding candidate answers in the vector space whose semantic distance to the vector is less than a set value based on vector similarity;

203) Combining the retrieval results obtained in step 201) and step 202) to obtain a mixed candidate question and answer set.

4. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that the specific implementation method of step S3 is:

301) Perform word vector embedding on the user question and each candidate question in the candidate question and answer set to obtain the corresponding vectorized semantic representation, i.e., the user question vector and the candidate question vector;

302) Calculate the cosine similarity between the user question vector and each candidate question vector to measure the degree of semantic matching between the user question and each candidate question;

303) Re-rank all candidate questions based on their similarity scores from high to low, where higher scores indicate stronger semantic relevance;

304) Select the top K candidate questions with the highest similarity scores, i.e., the top K candidate questions, as the candidate questions most likely to match the user's needs.

5. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that, in step S4, the top K candidate questions and their corresponding answers are input into a pre-trained large language model, and the generation process is constrained by a semantically guided prompt template to generate optimized natural language answers; in the semantically guided prompt template, the large prediction model is constrained to answer based on the user question and the standard answer text corresponding to the top K candidate questions, thereby guiding the large language model to use the existing knowledge base content to generate more accurate and detailed responses; the large language model is deployed on a local server for training and fine-tuning as needed.

6. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 1 is characterized in that, in step S5, the generated answer is submitted to the manual review module for content verification, and the final answer is sent to the user after the review is passed, and the new question and answer pairs that have passed the review are added to the FAQ knowledge base in real time to complete the dynamic update of the knowledge base.

7. The semantic hybrid retrieval and re-ranking method for intelligent psychological counseling for primary and secondary school students according to claim 6, characterized in that the review rules of the manual review module include:

A1) Score the emotional soothingness and professional accuracy of the generated answers;

A2) If the score is below the preset threshold, the psychological counselor will manually revise the answer;

A3) The revised answer is stored in conjunction with the user's question and added to the FAQ knowledge base as a new entry.

8. The semantic hybrid retrieval and re-ranking method for intelligent psychological consultation for primary and secondary school students according to claim 6, characterized in that the update mechanism of the FAQ knowledge base includes:

B1) Perform deduplication verification on newly added question and answer pairs to avoid duplicate entries in the database;

B2) Regularly eliminate inefficient question-answer pairs from the knowledge base. The elimination rules are based on historical answer adoption rates and manual scoring.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory, wherein the processor implements the method according to any one of claims 1 to 8 when executing the computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored therein, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 8 is implemented.