CN113780009A - Information generation method, apparatus, electronic device and computer readable medium - Google Patents

Information generation method, apparatus, electronic device and computer readable medium Download PDF

Info

Publication number
CN113780009A
CN113780009A CN202110130566.6A CN202110130566A CN113780009A CN 113780009 A CN113780009 A CN 113780009A CN 202110130566 A CN202110130566 A CN 202110130566A CN 113780009 A CN113780009 A CN 113780009A
Authority
CN
China
Prior art keywords
sequence
vector sequence
picture
vector
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110130566.6A
Other languages
Chinese (zh)
Inventor
赵楠
吴友政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110130566.6A priority Critical patent/CN113780009A/en
Publication of CN113780009A publication Critical patent/CN113780009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

本公开的实施例公开了信息生成方法、装置、电子设备和计算机可读介质。该方法的一具体实施方式包括:获取对话场景中的对话信息序列,其中,上述对话信息序列中的对话信息包括图片和语句;对上述对话信息序列所包括的各个图片进行图片特征提取处理,得到图片向量序列;对上述对话信息序列所包括的各个语句进行语句特征提取处理,得到语句向量序列;基于上述图片向量序列和上述语句向量序列,生成应答信息反馈结果。该实施方式通过考量用户上述输入的图片信息,提高了对用户所输入的语句进行回复的准确度。从而,提高了用户的体验感,降低了用户流量的流失。

Figure 202110130566

Embodiments of the present disclosure disclose information generation methods, apparatuses, electronic devices, and computer-readable media. A specific implementation of the method includes: acquiring a dialogue information sequence in a dialogue scene, wherein the dialogue information in the dialogue information sequence includes pictures and sentences; performing image feature extraction processing on each picture included in the dialogue information sequence to obtain Picture vector sequence; perform sentence feature extraction processing on each sentence included in the above dialogue information sequence to obtain a sentence vector sequence; generate a response information feedback result based on the above picture vector sequence and the above sentence vector sequence. This embodiment improves the accuracy of replying to the sentence input by the user by considering the picture information input by the user. Thus, the user experience is improved and the loss of user traffic is reduced.

Figure 202110130566

Description

Information generation method and device, electronic equipment and computer readable medium
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to an information generation method, an information generation device, an electronic device, and a computer-readable medium.
Background
With the rapid development of online shopping platforms, the dialogue system is widely applied to human-computer dialogue scenes. At present, a dialog system usually adopts a dialog mode of replying only a sentence input by a user.
However, when the above-described manner is adopted, there are generally the following technical problems: other information input by the user is not considered, so that the information input by the user cannot be accurately replied, the experience of the user is poor, and the flow of the user is lost.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose information generation methods, apparatuses, electronic devices, and computer readable media to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide an information generating method, including: acquiring a conversation information sequence in a conversation scene, wherein the conversation information in the conversation information sequence comprises pictures and sentences; carrying out picture feature extraction processing on each picture included in the conversation information sequence to obtain a picture vector sequence; performing statement feature extraction processing on each statement included in the dialogue information sequence to obtain a statement vector sequence; and generating a response information feedback result based on the picture vector sequence and the statement vector sequence.
Optionally, the generating a response information feedback result based on the picture vector sequence and the sentence vector sequence includes: determining the position corresponding to each dialogue information in the dialogue information sequence to obtain a position sequence; performing position feature conversion processing on each position included in the position sequence to obtain a position vector sequence; and generating a response information feedback result based on the picture vector sequence, the statement vector sequence and the position vector sequence.
Optionally, the generating a response information feedback result based on the picture vector sequence, the sentence vector sequence, and the position vector sequence includes: determining a role corresponding to each dialog message in the dialog message sequence to obtain a role set corresponding to the dialog message sequence; performing role characteristic conversion processing on each role in the role set to obtain a role vector sequence; and generating a response information feedback result based on the picture vector sequence, the sentence vector sequence, the position vector sequence and the role vector sequence.
Optionally, the generating a response information feedback result based on the picture vector sequence, the term vector sequence, the position vector sequence, and the role vector sequence includes: fusing each picture vector in the picture vector sequence, the statement vector, the position vector and the role vector corresponding to the picture vector to generate a fusion vector, so as to obtain a fusion vector sequence; and inputting the fusion vector sequence into a pre-trained response text feedback model to generate a response information feedback result.
Optionally, the answer text feedback model includes: an attention-encoding neural network and an attention-decoding neural network.
Optionally, the inputting the fusion vector sequence into a response text feedback model trained in advance to generate a response information feedback result includes: inputting the fusion vector sequence into the attention coding neural network to obtain a multi-modal scene vector sequence; and inputting the multi-modal scene vector sequence into the attention decoding neural network to obtain a response information feedback result.
Optionally, the performing picture feature extraction processing on each picture included in the dialog information sequence to obtain a picture vector sequence includes: and inputting each picture in the pictures into a pre-trained picture characteristic extraction network to generate a picture vector, so as to obtain a picture vector sequence.
Optionally, the performing statement feature extraction processing on each statement included in the dialog information sequence to obtain a statement vector sequence includes: and performing pooling processing on each statement in each statement to generate a statement vector, so as to obtain a statement vector sequence.
In a second aspect, some embodiments of the present disclosure provide an information generating apparatus, the apparatus comprising: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a conversation information sequence in a conversation scene, and the conversation information in the conversation information sequence comprises pictures and sentences; the picture feature extraction unit is configured to perform picture feature extraction processing on each picture included in the conversation information sequence to obtain a picture vector sequence; a sentence feature extraction unit configured to perform sentence feature extraction processing on each sentence included in the dialogue information sequence to obtain a sentence vector sequence; a generating unit configured to generate a response information feedback result based on the picture vector sequence and the sentence vector sequence.
Optionally, the generating unit is further configured to: determining the position corresponding to each dialogue information in the dialogue information sequence to obtain a position sequence; performing position feature conversion processing on each position included in the position sequence to obtain a position vector sequence; and generating a response information feedback result based on the picture vector sequence, the statement vector sequence and the position vector sequence.
Optionally, the generating unit is further configured to: determining a role corresponding to each dialog message in the dialog message sequence to obtain a role set corresponding to the dialog message sequence; performing role characteristic conversion processing on each role in the role set to obtain a role vector sequence; and generating a response information feedback result based on the picture vector sequence, the sentence vector sequence, the position vector sequence and the role vector sequence.
Optionally, the generating unit is further configured to: fusing each picture vector in the picture vector sequence, the statement vector, the position vector and the role vector corresponding to the picture vector to generate a fusion vector, so as to obtain a fusion vector sequence; and inputting the fusion vector sequence into a pre-trained response text feedback model to generate a response information feedback result.
Optionally, the answer text feedback model includes: an attention-encoding neural network and an attention-decoding neural network.
Optionally, the generating unit is further configured to: inputting the fusion vector sequence into the attention coding neural network to obtain a multi-modal scene vector sequence; and inputting the multi-modal scene vector sequence into the attention decoding neural network to obtain a response information feedback result.
Optionally, the picture feature extraction unit is further configured to: and inputting each picture in the pictures into a pre-trained picture characteristic extraction network to obtain a picture vector sequence.
Optionally, the sentence feature extraction unit is further configured to: and performing pooling processing on each statement in each statement to generate a statement vector, so as to obtain a statement vector sequence.
In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.
In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.
The above embodiments of the present disclosure have the following advantages: by the information generation method of some embodiments of the present disclosure, the loss of user traffic is reduced. Specifically, the loss of user traffic is caused by: other information input by the user is not considered, so that the information input by the user cannot be accurately replied, and the experience of the user is poor. Based on this, the information generating method of some embodiments of the present disclosure, first, obtains a dialog information sequence in a dialog scene. Thus, data support can be provided for subsequently generating text feedback results. Then, picture feature extraction processing is carried out on each picture included in the dialogue information sequence to obtain a picture vector sequence. Therefore, the picture information input by the user can be considered, and data support is provided for improving the accuracy of generating the text feedback result. Next, sentence feature extraction processing is performed on each sentence included in the dialogue information sequence to obtain a sentence vector sequence. And finally, generating a response information feedback result based on the picture vector sequence and the statement vector sequence. Therefore, the picture information input by the user is considered, and the accuracy of replying the sentence input by the user is improved. Therefore, the experience of the user is improved, and the loss of the user flow is reduced.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.
1-2 are schematic diagrams of one application scenario of the information generation method of some embodiments of the present disclosure;
FIG. 3 is a flow diagram of some embodiments of an information generation method according to the present disclosure;
FIG. 4 is a flow diagram of further embodiments of an information generation method according to the present disclosure;
FIG. 5 is a schematic block diagram of some embodiments of an information generating apparatus according to the present disclosure;
FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1-2 are schematic diagrams of an application scenario of an information generation method according to some embodiments of the present disclosure.
In the application scenarios of fig. 1-2, first, the computing device 101 may obtain a sequence of dialog information 102 in a dialog scenario. As shown in fig. 2, the dialog information in the dialog information sequence 102 includes pictures and sentences. Here, the dialog information sequence may refer to an information sequence of a dialog text that a user has with a human-machine customer service. For example, the dialog information sequence 102 may be "[ user: XXxxxxxXX, FIG. 1.png ]; [ human-machine customer service: XXXYYXXXXX, FIG. 2.png ]; [ user: xyyxyxyxxxyxy, FIG. 3.png ] ". Next, the computing device 101 may perform picture feature extraction processing on each picture included in the dialog information sequence 102 to obtain a picture vector sequence 103. For example, the picture feature extraction process may be performed on each picture included in the dialog information sequence 102 by a language representation model (a language representation model). Then, the computing device 101 may perform sentence feature extraction processing on each sentence included in the dialog information sequence 102 to obtain a sentence vector sequence 104. For example, the sentence feature extraction process may be performed on each sentence included in the dialogue information sequence 102 by a residual neural network. Finally, the computing device 101 may generate a response information feedback result 105 based on the picture vector sequence 103 and the sentence vector sequence 104. In practice, the picture vector sequence 103 and the sentence vector sequence 104 may be input into a text generation model (e.g., an attention neural network model) to generate the response information feedback result 105.
The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.
It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.
With continued reference to fig. 3, a flow 300 of some embodiments of an information generation method according to the present disclosure is shown. The information generation method comprises the following steps:
step 301, a dialog information sequence in a dialog scene is obtained.
In some embodiments, an executing subject of the information generating method (e.g., the computing device 101 shown in fig. 1) may acquire the dialog information sequence in the dialog scenario from the device terminal by means of a wired connection or a wireless connection. The dialogue information in the dialogue information sequence comprises pictures and sentences. Here, the dialog information sequence may refer to an information sequence of a dialog text that a user has with a human-machine customer service. For example, the dialog information sequence may be "[ screen XX appears, not normally used, as it is, fig. 1.png ]; [ do you see it as it is, FIG. 2.png ]; [ yes, FIG. 3.png ] ".
Step 302, performing picture feature extraction processing on each picture included in the dialog information sequence to obtain a picture vector sequence.
In some embodiments, the executing entity may input each picture included in the dialog information sequence into a pre-trained image extraction neural network model, so as to obtain a picture vector sequence. Here, the image extraction neural network model may include, but is not limited to, at least one of: vgg, Resnet, Goole-net, Mobile-net.
Step 303, performing statement feature extraction processing on each statement included in the dialog information sequence to obtain a statement vector sequence.
In some embodiments, the execution subject may input each sentence included in the dialog information sequence to a pre-trained sentence feature extraction neural network model to obtain a sentence vector sequence. Here, the sentence feature extraction neural network model may be a recurrent neural network model. For example, RNN (Recurrent Neural Networks) and the like.
And 304, generating a response information feedback result based on the picture vector sequence and the statement vector sequence.
In some embodiments, first, the execution body may perform addition processing on a statement vector corresponding to each picture vector in the picture vector sequence to generate an added vector as an addition vector, resulting in an addition vector sequence. And then, sequentially inputting the addition vectors in the addition vector sequence into a pre-trained text feedback model to generate a response information feedback result. Here, the text feedback model may be BERT (language representation model).
The above embodiments of the present disclosure have the following advantages: by the information generation method of some embodiments of the present disclosure, the loss of user traffic is reduced. Specifically, the loss of user traffic is caused by: other information input by the user is not considered, so that the sentence input by the user cannot be accurately replied, and the experience of the user is poor. Based on this, the information generating method of some embodiments of the present disclosure, first, obtains a dialog information sequence in a dialog scene. Thus, data support can be provided for subsequently generating text feedback results. Then, picture feature extraction processing is carried out on each picture included in the dialogue information sequence to obtain a picture vector sequence. Therefore, the picture information input by the user can be considered, and data support is provided for improving the accuracy of generating the text feedback result. Next, sentence feature extraction processing is performed on each sentence included in the dialogue information sequence to obtain a sentence vector sequence. And finally, generating a response information feedback result based on the picture vector sequence and the statement vector sequence. Therefore, the picture information input by the user is considered, and the accuracy of replying the sentence input by the user is improved. Therefore, the experience of the user is improved, and the loss of the user flow is reduced.
With further reference to fig. 4, a flow diagram of further embodiments of an information generation method according to the present disclosure is shown. The information generation method comprises the following steps:
step 401, obtaining a dialog information sequence in a dialog scene.
In some embodiments, the specific implementation of step 401 and the technical effect brought by the implementation may refer to step 301 in those embodiments corresponding to fig. 3, which are not described herein again.
And step 402, inputting each picture in each picture into a pre-trained picture feature extraction network to obtain a picture vector sequence.
In some embodiments, the execution subject may input each picture into the image extraction network model to obtain a picture vector sequence. Here, the image extraction network model may be a ResNet model (residual neural network), Vgg, Goole-net, Let-net.
Step 403, performing pooling processing on each statement in each statement to generate a statement vector, so as to obtain a statement vector sequence.
In some embodiments, first, the execution body may input each sentence into a pre-trained word embedding neural network model, resulting in an initial sentence vector sequence. Then, maximum pooling processing is carried out on each of the initial statement vector sequences to generate statement vectors, and a statement vector sequence is obtained. Here, the sentence feature extraction Neural network model may be RNN (Recurrent Neural Networks).
Step 404, determining a position corresponding to each dialog message in the dialog message sequence to obtain a position sequence.
In some embodiments, the execution subject may determine a position corresponding to each dialog message in the dialog message sequence, resulting in a position sequence. Here, the position may refer to a sequence number of the session information in the session information sequence.
Step 405, performing position feature conversion processing on each position included in the position sequence to obtain a position vector sequence.
In some embodiments, the execution subject may input each position included in the position sequence into a position vector extraction neural network, so as to obtain a position vector sequence. Here, the location vector extraction neural network may be: RNN (Recurrent Neural Networks), BERT (Bidirectional Encoder expressions from transformations, linguistic representation model).
Step 406, generating a response information feedback result based on the picture vector sequence, the sentence vector sequence, and the position vector sequence.
In some embodiments, the execution body may generate a response information feedback result based on the picture vector sequence, the sentence vector sequence, and the position vector sequence in various ways.
In some optional implementation manners of some embodiments, the execution subject may generate the response information feedback result by:
the first step is to determine the role corresponding to each dialog message in the dialog message sequence, and obtain the role set corresponding to the dialog message sequence. In practice, the role corresponding to the dialog information may refer to the output party that outputs the dialog information. Here, the output party may characterize the user or human-machine customer service.
And secondly, performing role characteristic conversion processing on each role in the role set to obtain a role vector sequence. In practice, the execution subject may input each role in the role sequence to a role vector transformation neural network, so as to obtain a role vector sequence. Here, the role vector transformation neural network may be: RNN (Recurrent Neural Networks), BERT (Bidirectional Encoder expressions from transformations, linguistic representation model).
And thirdly, generating a response information feedback result based on the picture vector sequence, the statement vector sequence, the position vector sequence and the role vector sequence.
In some optional implementations of some embodiments, the third step may include the following sub-steps:
the first substep is to fuse each picture vector in the picture vector sequence, the statement vector, the position vector and the role vector corresponding to the picture vector to generate a fused vector, so as to obtain a fused vector sequence. Here, the fusion process may refer to an addition process.
And a second substep, inputting the fusion vector sequence into a pre-trained response text feedback model to generate a response information feedback result. Here, the answer text feedback model may include: an attention-encoding neural network and an attention-decoding neural network.
In practice, the execution subject may input the fusion vector sequence into the attention-coding neural network to obtain a multi-modal scene vector sequence. Then, the multi-modal scene vector sequence can be input into the attention decoding neural network to obtain a response information feedback result.
As can be seen from fig. 4, compared with the description of some embodiments corresponding to fig. 3, the flow 400 in some embodiments corresponding to fig. 4 embodies the fusion of four dimensions of the sentence, the picture, the position and the character in the dialogue information. And then, the multi-modal expression of each vector is fused through a self-attention mechanism, so that the multi-modal expression of the dialog scene is obtained, and finally, a response information feedback result of the current dialog is generated through a response text feedback model. Therefore, the accuracy of replying the sentence input by the user is improved. Therefore, the experience of the user is improved, and the loss of the user flow is reduced.
With further reference to fig. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides some embodiments of an information generating apparatus, which correspond to those illustrated in fig. 3, and which may be particularly applied in various electronic devices.
As shown in fig. 5, the information generating apparatus 500 of some embodiments includes: an acquisition unit 501, a picture feature extraction unit 502, a sentence feature extraction unit 503, and a generation unit 504. The obtaining unit 501 is configured to obtain a dialog information sequence in a dialog scene, where dialog information in the dialog information sequence includes a picture and a sentence; the picture feature extraction unit 502 is configured to perform picture feature extraction processing on each picture included in the dialog information sequence to obtain a picture vector sequence; the sentence feature extraction unit 503 is configured to perform sentence feature extraction processing on each sentence included in the dialog information sequence to obtain a sentence vector sequence; the generating unit 504 is configured to generate a response information feedback result based on the picture vector sequence and the sentence vector sequence.
In some optional implementations of some embodiments, the generating unit 504 is further configured to: determining the position corresponding to each dialogue information in the dialogue information sequence to obtain a position sequence; performing position feature conversion processing on each position included in the position sequence to obtain a position vector sequence; and generating a response information feedback result based on the picture vector sequence, the statement vector sequence and the position vector sequence.
In some optional implementations of some embodiments, the generating unit 504 is further configured to: determining a role corresponding to each dialog message in the dialog message sequence to obtain a role set corresponding to the dialog message sequence; performing role characteristic conversion processing on each role in the role set to obtain a role vector sequence; and generating a response information feedback result based on the picture vector sequence, the sentence vector sequence, the position vector sequence and the role vector sequence.
In some optional implementations of some embodiments, the generating unit 504 is further configured to: fusing each picture vector in the picture vector sequence, the statement vector, the position vector and the role vector corresponding to the picture vector to generate a fusion vector, so as to obtain a fusion vector sequence; and inputting the fusion vector sequence into a pre-trained response text feedback model to generate a response information feedback result.
Optionally, the answer text feedback model includes: an attention-encoding neural network and an attention-decoding neural network.
In some optional implementations of some embodiments, the generating unit 504 is further configured to: inputting the fusion vector sequence into the attention coding neural network to obtain a multi-modal scene vector sequence; and inputting the multi-modal scene vector sequence into the attention decoding neural network to obtain a response information feedback result.
In some optional implementations of some embodiments, the picture feature extraction unit 502 is further configured to: and inputting each picture in the pictures into a pre-trained picture characteristic extraction network to obtain a picture vector sequence.
In some optional implementations of some embodiments, the sentence feature extraction unit 503 is further configured to: and performing pooling processing on each statement in each statement to generate a statement vector, so as to obtain a statement vector sequence.
It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 3. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.
Referring now to FIG. 6, a block diagram of an electronic device (e.g., computing device 101 of FIG. 1)600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device in some embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle-mounted terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.
It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a conversation information sequence in a conversation scene, wherein the conversation information in the conversation information sequence comprises pictures and sentences; carrying out picture feature extraction processing on each picture included in the conversation information sequence to obtain a picture vector sequence; performing statement feature extraction processing on each statement included in the dialogue information sequence to obtain a statement vector sequence; and generating a response information feedback result based on the picture vector sequence and the statement vector sequence.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a picture feature extraction unit, a sentence feature extraction unit, and a generation unit. Here, the names of these units do not constitute a limitation to the unit itself in some cases, and for example, the generating unit may be further described as a "unit that generates a response information feedback result based on the above picture vector sequence and the above sentence vector sequence".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (10)

1. An information generating method, comprising:
acquiring a conversation information sequence in a conversation scene, wherein the conversation information in the conversation information sequence comprises pictures and sentences;
carrying out picture feature extraction processing on each picture included in the dialogue information sequence to obtain a picture vector sequence;
performing statement feature extraction processing on each statement included in the dialogue information sequence to obtain a statement vector sequence;
and generating a response information feedback result based on the picture vector sequence and the statement vector sequence.
2. The method of claim 1, wherein the generating a response information feedback result based on the picture vector sequence and the sentence vector sequence comprises:
determining the position corresponding to each dialogue information in the dialogue information sequence to obtain a position sequence;
performing position feature conversion processing on each position included in the position sequence to obtain a position vector sequence;
and generating a response information feedback result based on the picture vector sequence, the statement vector sequence and the position vector sequence.
3. The method of claim 2, wherein the generating a response information feedback result based on the picture vector sequence, the sentence vector sequence, and the position vector sequence comprises:
determining a role corresponding to each dialog message in the dialog message sequence to obtain a role set corresponding to the dialog message sequence;
performing role characteristic conversion processing on each role in the role set to obtain a role vector sequence;
and generating a response information feedback result based on the picture vector sequence, the statement vector sequence, the position vector sequence and the role vector sequence.
4. The method of claim 3, wherein the generating an answer information feedback result based on the picture vector sequence, the sentence vector sequence, the position vector sequence, and the role vector sequence comprises:
fusing each picture vector in the picture vector sequence, the statement vector, the position vector and the role vector corresponding to the picture vector to generate a fusion vector, so as to obtain a fusion vector sequence;
and inputting the fusion vector sequence into a pre-trained response text feedback model to generate a response information feedback result.
5. The method of claim 4, wherein the answer text feedback model comprises: an attention-encoding neural network and an attention-decoding neural network; and
the inputting the fusion vector sequence into a pre-trained response text feedback model to generate a response information feedback result includes:
inputting the fusion vector sequence into the attention coding neural network to obtain a multi-modal scene vector sequence;
and inputting the multi-modal scene vector sequence into the attention decoding neural network to obtain a response information feedback result.
6. The method according to claim 1, wherein the performing picture feature extraction processing on each picture included in the dialog information sequence to obtain a picture vector sequence includes:
and inputting each picture in the pictures into a pre-trained picture feature extraction network to obtain a picture vector sequence.
7. The method according to claim 1, wherein the performing sentence feature extraction processing on each sentence included in the dialog information sequence to obtain a sentence vector sequence includes:
and performing pooling processing on each statement in each statement to generate a statement vector, so as to obtain a statement vector sequence.
8. An information generating apparatus comprising:
an acquisition unit configured to acquire a dialog information sequence in a dialog scene, wherein dialog information in the dialog information sequence includes a picture and a sentence;
the picture feature extraction unit is configured to perform picture feature extraction processing on each picture included in the dialogue information sequence to obtain a picture vector sequence;
a sentence feature extraction unit configured to perform sentence feature extraction processing on each sentence included in the dialogue information sequence to obtain a sentence vector sequence;
a generating unit configured to generate a response information feedback result based on the picture vector sequence and the sentence vector sequence.
9. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.
CN202110130566.6A 2021-01-29 2021-01-29 Information generation method, apparatus, electronic device and computer readable medium Pending CN113780009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110130566.6A CN113780009A (en) 2021-01-29 2021-01-29 Information generation method, apparatus, electronic device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110130566.6A CN113780009A (en) 2021-01-29 2021-01-29 Information generation method, apparatus, electronic device and computer readable medium

Publications (1)

Publication Number Publication Date
CN113780009A true CN113780009A (en) 2021-12-10

Family

ID=78835588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110130566.6A Pending CN113780009A (en) 2021-01-29 2021-01-29 Information generation method, apparatus, electronic device and computer readable medium

Country Status (1)

Country Link
CN (1) CN113780009A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114944219A (en) * 2022-05-17 2022-08-26 平安科技(深圳)有限公司 Psychological scale recommendation method, device and storage medium based on artificial intelligence
WO2025020611A1 (en) * 2023-07-24 2025-01-30 京东科技控股股份有限公司 Session response method and apparatus, electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101487A (en) * 2018-07-11 2018-12-28 广州杰赛科技股份有限公司 Conversational character differentiating method, device, terminal device and storage medium
CN109165285A (en) * 2018-08-24 2019-01-08 北京小米智能科技有限公司 Handle the method, apparatus and storage medium of multi-medium data
CN109359196A (en) * 2018-10-22 2019-02-19 北京百度网讯科技有限公司 Method and device for multimodal representation of text
CN110209897A (en) * 2018-02-12 2019-09-06 腾讯科技(深圳)有限公司 Intelligent dialogue method, apparatus, storage medium and equipment
CN110399474A (en) * 2019-07-18 2019-11-01 腾讯科技(深圳)有限公司 A kind of Intelligent dialogue method, apparatus, equipment and storage medium
CN111581958A (en) * 2020-05-27 2020-08-25 腾讯科技(深圳)有限公司 Dialogue state determination method, device, computer equipment and storage medium
CN111897933A (en) * 2020-07-27 2020-11-06 腾讯科技(深圳)有限公司 Emotional dialogue generation method and device and emotional dialogue model training method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209897A (en) * 2018-02-12 2019-09-06 腾讯科技(深圳)有限公司 Intelligent dialogue method, apparatus, storage medium and equipment
CN109101487A (en) * 2018-07-11 2018-12-28 广州杰赛科技股份有限公司 Conversational character differentiating method, device, terminal device and storage medium
CN109165285A (en) * 2018-08-24 2019-01-08 北京小米智能科技有限公司 Handle the method, apparatus and storage medium of multi-medium data
CN109359196A (en) * 2018-10-22 2019-02-19 北京百度网讯科技有限公司 Method and device for multimodal representation of text
CN110399474A (en) * 2019-07-18 2019-11-01 腾讯科技(深圳)有限公司 A kind of Intelligent dialogue method, apparatus, equipment and storage medium
CN111581958A (en) * 2020-05-27 2020-08-25 腾讯科技(深圳)有限公司 Dialogue state determination method, device, computer equipment and storage medium
CN111897933A (en) * 2020-07-27 2020-11-06 腾讯科技(深圳)有限公司 Emotional dialogue generation method and device and emotional dialogue model training method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114944219A (en) * 2022-05-17 2022-08-26 平安科技(深圳)有限公司 Psychological scale recommendation method, device and storage medium based on artificial intelligence
CN114944219B (en) * 2022-05-17 2025-05-23 平安科技(深圳)有限公司 Psychological scale recommendation method and device based on artificial intelligence and storage medium
WO2025020611A1 (en) * 2023-07-24 2025-01-30 京东科技控股股份有限公司 Session response method and apparatus, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN111008533B (en) Method, device, equipment and storage medium for obtaining translation model
CN115908640A (en) Method, device, readable medium and electronic device for generating image
CN114863214A (en) Image generation model training method, image generation device, image generation medium, and image generation device
CN115640815B (en) Translation method, device, readable medium and electronic device
CN111459364B (en) Icon updating method and device and electronic equipment
CN112380876A (en) Translation method, device, equipment and medium based on multi-language machine translation model
CN112364144A (en) Interaction method, device, equipment and computer readable medium
CN114170342A (en) Image processing method, device, device and storage medium
CN118965348B (en) Detection method, device, equipment and computer medium applied to program code
CN114067327A (en) Text recognition method, device, readable medium and electronic device
CN109949806A (en) Information interaction method and device
CN113780009A (en) Information generation method, apparatus, electronic device and computer readable medium
CN113191257A (en) Order of strokes detection method and device and electronic equipment
CN113850890A (en) Method, device, equipment and storage medium for generating animal image
CN112418249A (en) Mask image generation method, apparatus, electronic device and computer readable medium
CN112488947A (en) Model training and image processing method, device, equipment and computer readable medium
CN112259079A (en) Method, device, equipment and computer readable medium for speech recognition
CN111933122B (en) Speech recognition method, apparatus, electronic device, and computer-readable medium
CN110209851B (en) Model training method and device, electronic equipment and storage medium
CN115098647B (en) Feature vector generation method and device for text representation and electronic equipment
CN111797263A (en) Image label generation method, device, equipment and computer readable medium
CN108804442A (en) Sequence number generation method and device
CN112530416A (en) Speech recognition method, device, equipment and computer readable medium
CN113628097A (en) Image special effect configuration method, image recognition method, image special effect configuration device and electronic equipment
CN110929209B (en) Method and device for transmitting information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination