CN118741039A - Video conference processing method, device, electronic device and medium based on computing network - Google Patents

Video conference processing method, device, electronic device and medium based on computing network Download PDF

Info

Publication number
CN118741039A
CN118741039A CN202410850112.XA CN202410850112A CN118741039A CN 118741039 A CN118741039 A CN 118741039A CN 202410850112 A CN202410850112 A CN 202410850112A CN 118741039 A CN118741039 A CN 118741039A
Authority
CN
China
Prior art keywords
video stream
video
container
features
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410850112.XA
Other languages
Chinese (zh)
Inventor
王艳辉
沈军
方东
杨春晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Visionvera Information Technology Co Ltd
Original Assignee
Visionvera Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Visionvera Information Technology Co Ltd filed Critical Visionvera Information Technology Co Ltd
Priority to CN202410850112.XA priority Critical patent/CN118741039A/en
Publication of CN118741039A publication Critical patent/CN118741039A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/155Conference systems involving storage of or access to video conference sessions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明实施例提供了基于算力网的视频会议处理方法、装置、电子设备及介质,包括:获取目标会议终端发送的第一视频流;调用第一容器,将第一视频流的视频帧中人像区域的画面生成为预置的人物图像,并将第一视频流的视频帧中背景区域的画面生成为预置的背景图像,得到第二视频流;调用第二容器,从第一视频流的视频帧中提取场景特征和/或人像特征,根据场景特征和/或人像特征,对第二视频流中视频帧进行修复,得到第三视频流;调用第三容器,根据第一视频流中视频帧及相应的音频数据生成动态行为特征,并根据动态行为特征,对第三视频流中视频帧进行修复,得到第四视频流,实现了通过算力网中多个容器对数字人画面进行多次修复。

The embodiments of the present invention provide a video conference processing method, device, electronic device and medium based on a computing power network, including: obtaining a first video stream sent by a target conference terminal; calling a first container to generate a picture of a portrait area in a video frame of the first video stream as a preset character image, and generating a picture of a background area in a video frame of the first video stream as a preset background image, to obtain a second video stream; calling a second container to extract scene features and/or portrait features from video frames of the first video stream, and repairing video frames in the second video stream according to the scene features and/or portrait features to obtain a third video stream; calling a third container to generate dynamic behavior features according to video frames in the first video stream and corresponding audio data, and repairing video frames in the third video stream according to the dynamic behavior features to obtain a fourth video stream, thereby realizing multiple repairs of digital human images through multiple containers in the computing power network.

Description

Video conference processing method and device based on computing power network, electronic equipment and medium
Technical Field
The present invention relates to the field of video conferencing technologies, and in particular, to a video conference processing method and apparatus based on a computing power network, an electronic device, and a medium.
Background
With the rapid development of global informatization, the video conference technology has become an important way for people to work and communicate daily, but in some scenes, the stability and quality of the video conference are often limited, and in these scenes, digital people technology can be applied, namely, virtual characters are adopted to replace real characters.
In the prior art, the digital human picture synthesized by the digital human technology has poor effect and large difference from the real portrait of the scene of the participant, thereby affecting the experience of the video conference.
Disclosure of Invention
In view of the foregoing, it is proposed to provide a method, apparatus, electronic device and medium for processing a video conference based on a power network, which overcomes or at least partially solves the foregoing problems, comprising:
A method of video conference processing based on a computing power network, the method comprising:
in the process of a video conference, acquiring a first video stream sent by a target conference terminal;
Invoking a first container, generating a picture of a portrait area in a video frame of the first video stream into a preset character image, and generating a picture of a background area in the video frame of the first video stream into a preset background image to obtain a second video stream carrying a digital portrait;
invoking a second container, extracting scene characteristics and/or portrait characteristics from video frames of the first video stream, and repairing the video frames in the second video stream according to the scene characteristics and/or portrait characteristics to obtain a third video stream carrying digital portrait;
And calling a third container, generating dynamic behavior characteristics according to the video frames in the first video stream and corresponding audio data, and repairing the video frames in the third video stream according to the dynamic behavior characteristics to obtain a fourth video stream carrying digital human pictures.
Optionally, the calling the first container generates a picture of a portrait area in a video frame of the first video stream as a preset portrait image, and generates a picture of a background area in the video frame of the first video stream as a preset background image, so as to obtain a second video stream carrying a digital portrait image, which includes:
Invoking the first container, extracting character audio features from audio data corresponding to the first video stream, determining character images from preset candidate character images according to the character audio features, extracting environment audio features from the audio data corresponding to the first video stream, and determining background images from preset candidate background images according to the environment audio features;
And calling the first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in the video frame of the first video stream into a preset background image to obtain a second video stream carrying a digital portrait image.
Optionally, the second container includes a first sub-container and a second sub-container, the calling the second container extracts scene features and/or portrait features from video frames of the first video stream, repairs the video frames in the second video stream according to the scene features and/or portrait features, and obtains a third video stream carrying digital portrait, including:
Invoking the first sub-container, extracting scene characteristics from video frames of the first video stream, and repairing background images of the video frames in the second video stream according to the scene characteristics to obtain a first sub-video stream carrying digital human pictures;
invoking the second sub-container, extracting the portrait characteristic from the video frame of the first video stream, and repairing the portrait image of the video frame in the first sub-video stream according to the portrait characteristic to obtain a third video stream carrying a digital portrait;
Or calling the second sub-container, extracting the portrait characteristic from the video frame of the first video stream, and repairing the portrait image of the video frame in the second video stream according to the portrait characteristic to obtain a second sub-video stream carrying a digital portrait;
And calling the first sub-container, extracting scene characteristics from the video frames of the first video stream, and repairing the background images of the video frames in the second sub-video stream according to the scene characteristics to obtain a third video stream carrying digital human pictures.
Optionally, the second container includes an image encoder, an image decoder, a color decoder;
The image encoder is used for extracting picture characteristics and performing multi-scale processing;
The image decoder is used for processing an up-sampling process of the visual characteristics;
the color decoder is used for processing the decoding process of the color inquiry.
Optionally, the calling the third container generates a dynamic behavior feature according to the video frame in the first video stream and the corresponding audio data, repairs the video frame in the third video stream according to the dynamic behavior feature, and obtains a fourth video stream carrying a digital human picture, including:
Invoking the third container, generating dynamic behavior characteristics according to video frames and corresponding audio data in the first video stream, and generating transition frames according to the dynamic behavior characteristics;
and calling the third container, and inserting the transition frame into the video frame of the third video stream to obtain a fourth video stream carrying the digital human picture.
Optionally, the generating the dynamic behavior feature according to the video frame and the corresponding audio data in the first video stream includes:
determining a first dynamic behavior sub-feature according to the human behavior in the video frame of the first video stream, and generating a second dynamic behavior sub-feature according to the corresponding audio data of the first video stream;
And fusing the first dynamic behavior sub-feature and the second dynamic behavior sub-feature to obtain a dynamic behavior feature.
Optionally, before the calling the first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in a video frame of the first video stream into a preset background image, to obtain a second video stream, the method further includes:
when a trigger event is detected, the digital person function is turned on.
A video conference processing device based on a computing power network, the device comprising:
The first video stream receiving module is used for acquiring a first video stream sent by a target conference terminal in the video conference process;
The second video stream obtaining module is used for calling the first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in the video frame of the first video stream into a preset background image to obtain a second video stream carrying digital portrait images;
The third video stream obtaining module is used for calling a second container, extracting scene characteristics and/or portrait characteristics from video frames of the first video stream, and repairing the video frames in the second video stream according to the scene characteristics and/or portrait characteristics to obtain a third video stream carrying digital portrait;
The fourth video stream obtaining module is used for calling a third container, generating dynamic behavior characteristics according to video frames in the first video stream and corresponding audio data, repairing the video frames in the third video stream according to the dynamic behavior characteristics, and obtaining a fourth video stream carrying digital human pictures.
An electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor implements a method as described above.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as described above.
The embodiment of the invention has the following advantages:
In the embodiment of the invention, a first video stream sent by a target conference terminal is acquired in the video conference process, a first container is called, a picture of a portrait area in a video frame of the first video stream is generated as a preset character image, a picture of a background area in the video frame of the first video stream is generated as a preset background image, a second video stream carrying a digital portrait is obtained, a second container is called, scene characteristics and/or portrait characteristics are extracted from the video frame of the first video stream, the video frame in the second video stream is repaired according to the scene characteristics and/or portrait characteristics, a third video stream carrying a digital portrait is obtained, a third container is called, a dynamic behavior characteristic is generated according to the video frame in the first video stream and corresponding audio data, the video frame in the third video stream is repaired according to the dynamic behavior characteristic, a fourth video stream carrying the digital portrait is obtained, the fourth video stream is sent to other conference terminals participating in the conference, multiple times of repairing the digital portrait is realized through a plurality of containers in the computing network, the digital portrait is enabled to be close to the digital portrait of the conference, and the digital portrait is synthesized in the digital conference.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flowchart illustrating steps of a method for processing a video conference based on a power network according to some embodiments of the present invention;
FIG. 2 is a schematic diagram of a system architecture provided by some embodiments of the invention;
Fig. 3 is a flow chart of steps of another method for processing a video conference based on a power network according to some embodiments of the present invention;
fig. 4 is a block diagram of a video conference processing device based on a power network according to some embodiments of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flowchart illustrating steps of a video conference processing method based on a power network according to some embodiments of the present invention may specifically include the following steps:
step 101, in the process of the video conference, acquiring a first video stream sent by a target conference terminal.
As an example, the video conference may be a video conference based on the view network, that is, data in the video conference is transmitted by using the view network, and the conference terminal may be a view network terminal.
In a video conference, when a plurality of conference terminals are present and data video and audio collected by a certain conference terminal need to be sent to other conference terminals, if the conference terminal needs to speak in the conference, the conference terminal is a target conference terminal, the target conference terminal can send the collected video to a server through a network in a video stream mode, and can send the collected audio to the server through a network in an audio stream mode, and the video and the audio can be sent in the same video stream.
In some embodiments of the present invention, before the calling the first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in a video frame of the first video stream into a preset background image, to obtain a second video stream, the method further includes: when a trigger event is detected, the digital person function is turned on.
In video conferencing, problems of image quality such as blurred images (e.g. motion blur, lens defocus), low resolution images, low video clip/frame rate, and other image quality problems (e.g. insufficient brightness, insufficient contrast, noise, overcompression, rain and snow fog, too bright or too dark, damage of shielding type (e.g. logo, watermark, subtitle, shielding), color distortion (e.g. color cast, lack of color), black and white illumination may occur in the video stream, and the problems of image quality may be solved by applying digital human technology to synthesize digital human images.
In some examples, the above image quality problem has a larger occurrence probability at a lower bandwidth, that is, the bandwidth is smaller than a bandwidth threshold, for example, the bandwidth threshold of the video stream is 150kb/s, the bandwidth threshold of the audio stream is 64kb/s, the triggering event may be an event of detecting that the bandwidth is smaller than the bandwidth threshold, the server establishes communication channels with each reference terminal, the server may be provided with a bandwidth monitoring module, the bandwidth fluctuation monitoring module may detect the bandwidth condition of each communication channel, and each channel may be provided with a certain bandwidth threshold. After the video conference is started, the server can detect whether the bandwidth of a channel connected with the target conference terminal is smaller than a bandwidth threshold value through the bandwidth monitoring module, and when the bandwidth is larger than the bandwidth threshold value, a triggering event is detected.
In some examples, the triggering event may also be an event that the definition of the detected video stream is less than a definition threshold, for example, the definition threshold is 1024, and the video stream sent by the target conference terminal may be monitored in real time by the server, and when the definition of the video stream is less than the definition threshold, the triggering event is detected.
In some examples, the trigger event may be a user manual operation event, that is, the user manually opens a digital person function, an interactive interface may be provided for the user at the target conference terminal, the user may perform a manual operation through the interactive interface, and further may generate and send a video composition request to the server through the target conference terminal, and when the server receives the video composition request, the trigger event is detected. For example, in the case that the camera connected with the target conference terminal can only collect video streams with lower definition and cannot collect high-definition video streams, a user can control the target conference terminal to generate and send a video composition request to the server through the interactive interface; in another example, due to the fact that bandwidth fluctuation is large, video streaming uploaded by the target conference terminal is caused to have video conference picture blocking, and a user can control the target conference terminal to generate and send a video synthesis request to the server through the interactive interface, so that a triggering event is detected.
In some examples, a scheduling platform may be provided, which may be used to control the number of cameras and microphones that are allowed to be switched on and off by the user, and may control whether the AI function is allowed to be switched on (i.e. whether the digital person function is switched on) and also needs to be scheduled, in some examples, the AI enhancement and synthesis functions may not be allowed to be switched on and off at will by each user, subject to the background GPU cluster effort. In some examples, the server may be initialized first, the scheduling platform, the conference server, and the video post-processing server may further establish a TCP Socket connection, the participant terminal may perform user login, after the user login is successful, the conference terminal responsible for hosting may start a conference through the scheduling platform, the scheduling platform may further control the server to start a conference room, feedback conference information to the conference terminal responsible for hosting, and other participant terminals may join the conference room through the conference information, and may control the participant terminal to start a camera and a microphone by the scheduling platform, and may control to start a digital personal function.
Step 102, calling a first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in the video frame of the first video stream into a preset background image, so as to obtain a second video stream carrying a digital portrait image.
In practical application, a plurality of containers, such as containers 1-4 in fig. 2, can be deployed in the computing power network, and the video stream is processed by a plurality of containers of images connected in series, and the output picture image is continuously close to the original picture image through a plurality of feature extraction, multi-scale processing and encoding and decoding processing, so that the picture quality of the video stream is continuously close to the original picture image (namely, is close to the real picture of a reference scene) on the basis of the last low-restoration picture quality container, and the picture correction of the original picture image of the video stream frame by frame picture is completed under the ultra-low bandwidth through the computing power network, and the AI image enhancement processing effect is improved.
For the first video stream sent by the target conference terminal, the current video frame of the first video stream can be input into a first container, the first container can replace the current video stream with a high-definition digital human video stream according to the digital human video flow synthesis rule, and specifically, the human in the current video frame of the first video stream can be replaced with a preset person, and the background is replaced with a preset background, so that a second video stream is obtained.
In some embodiments of the present invention, the calling the first container generates a picture of a portrait area in a video frame of the first video stream as a preset portrait image, and generates a picture of a background area in a video frame of the first video stream as a preset background image, to obtain a second video stream carrying a digital portrait image, including:
And step 11, calling the first container, extracting the character audio features from the audio data corresponding to the first video stream, determining the character image from the preset candidate character images according to the character audio features, extracting the environment audio features from the audio data corresponding to the first video stream, and determining the background image from the preset candidate background images according to the environment audio features.
In practical application, a plurality of candidate character images and candidate background images may be preset, and matching character images and background images may be selected from the plurality of candidate character images and candidate background images according to the result of analyzing the audio data.
For the character image, the character audio feature may be extracted from the audio data corresponding to the first video stream, where the character audio feature may include a character gender, a character sound age, a character personality element, and so on, for example, a certain character audio feature includes a character gender being female, a character sound age being 30, and a character personality element being lively, and according to the character audio feature, the first container of the computer network may find a matched character image. In some examples, the first container of the algorithm may prioritize high frequency selected pictures as character images based on frequency of use in the large model data
For the background image, an environmental audio feature may be extracted from the corresponding audio data of the first video stream, where the environmental audio feature may include a noise attribute, an echo attribute, and a person role, for example, where a certain environmental audio feature includes a noise attribute being quiet (noise is less than 20 db), an echo attribute being echo (exceeding 10 db), and a person role being a company high-level, and according to the environmental audio feature, the first container of the computing power network may find a matched background image. In some examples, the first container of the algorithm may prioritize the high frequency selected picture as a background image based on the frequency of use in the large model data.
And a substep 12, calling the first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in the video frame of the first video stream into a preset background image, so as to obtain a second video stream carrying a digital portrait image.
And 103, calling a second container, extracting scene characteristics and/or portrait characteristics from the video frames of the first video stream, and repairing the video frames in the second video stream according to the scene characteristics and/or portrait characteristics to obtain a third video stream carrying digital portrait.
For the second video stream output by the first container, since the person in the current video frame of the first video stream is generated into a preset person and the background is generated into a preset background, the image quality of the first video stream can be improved from low image quality to 4K image quality, but the person and the scene do not coincide with the real person and the scene picture.
In practical applications, video frames of the second video stream and video frames collected in a video conference subsequently received in the first video stream may be input into the second container.
The second container can extract scene characteristics of video frames in the first video stream to obtain scene characteristics, and then generates a high-definition scene graph according to the scene characteristics, and combines the high-definition scene graph to perform digital human synthesis.
The second container can also extract the portrait features of the video frames in the first video stream to obtain portrait features, and then generate high-definition portrait images according to scene features, and combine the high-definition portrait images to perform digital portrait synthesis.
And repairing video frames in the second video stream according to scene characteristics and portrait characteristics to obtain a third video stream carrying digital portrait, wherein the third video stream still has 4K picture quality, and the scene is consistent with the real scene picture and the personage is consistent with the real personage picture.
In some embodiments of the invention, the second container comprises an image encoder for picture feature extraction and multi-scale processing, an image decoder for an upsampling process to process visual features, and a color decoder for a decoding process to process color queries.
In some embodiments of the present invention, the second container includes a first sub-container and a second sub-container, the calling the second container extracts scene features and/or portrait features from video frames of the first video stream, repairs the video frames in the second video stream according to the scene features and/or portrait features, and obtains a third video stream carrying digital portrait, including:
And a sub-step 21 of calling the first sub-container, extracting scene characteristics from the video frames of the first video stream, and repairing the background images of the video frames in the second video stream according to the scene characteristics to obtain a first sub-video stream carrying digital human pictures.
Because the background image of the digital person in the previously generated video stream is not unified with the real scene image, the scene characteristics can be extracted from the video frame of the subsequently received first video stream, then the background image of the digital person in the previously generated video stream is repaired by adopting the scene characteristics, the image quality of the generated video stream is still 4K, and the scene is consistent with the real scene image.
And a sub-step 22 of calling the second sub-container, extracting the portrait characteristic from the video frame of the first video stream, and repairing the portrait image of the video frame in the first sub-video stream according to the portrait characteristic to obtain a third video stream carrying the digital portrait.
Because the character image of the digital person in the video stream generated in advance is not unified with the real character image, the character features can be extracted from the video frames of the first video stream received later, then the character image of the digital person in the video stream generated in advance is repaired by adopting the character features, the image quality of the generated video stream is still 4K, and the character is consistent with the real character image.
In some examples, the repair approach includes: feature extraction, multi-scale processing, AI coloring, and color richness optimization, the container is mainly realized by a double decoder, the double decoder comprises an image encoder and two decoders, the image encoder is used for extracting picture features and performing multi-scale processing, and the two decoders are respectively an image decoder and a color decoder. The image decoder performs the up-sampling process of the visual features and the color decoder decodes the color query based on a transducer.
In some embodiments of the present invention, the second container includes a first sub-container and a second sub-container, the calling the second container extracts scene features and/or portrait features from video frames of the first video stream, repairs the video frames in the second video stream according to the scene features and/or portrait features, and obtains a third video stream carrying digital portrait, including:
And a substep 31 of calling the second sub-container, extracting the portrait characteristic from the video frame of the first video stream, and repairing the portrait image of the video frame in the second video stream according to the portrait characteristic to obtain a second sub-video stream carrying the digital portrait.
Because the character image of the digital person in the video stream generated in advance is not unified with the real character image, the character features can be extracted from the video frames of the first video stream received later, then the character image of the digital person in the video stream generated in advance is repaired by adopting the character features, the image quality of the generated video stream is still 4K, and the character is consistent with the real character image.
And step 32, calling the first sub-container, extracting scene characteristics from the video frames of the first video stream, and repairing the background images of the video frames in the second sub-video stream according to the scene characteristics to obtain a third video stream carrying digital human pictures.
Because the background image of the digital person in the previously generated video stream is not unified with the real scene image, the scene characteristics can be extracted from the video frame of the subsequently received first video stream, then the background image of the digital person in the previously generated video stream is repaired by adopting the scene characteristics, the image quality of the generated video stream is still 4K, and the scene is consistent with the real scene image.
In some examples, the repair approach includes: feature extraction, multi-scale processing, AI coloring, and color richness optimization, the container is mainly realized by a double decoder, the double decoder comprises an image encoder and two decoders, the image encoder is used for extracting picture features and performing multi-scale processing, and the two decoders are respectively an image decoder and a color decoder. The image decoder performs the up-sampling process of the visual features and the color decoder decodes the color query based on a transducer.
Step 104, calling a third container, generating dynamic behavior characteristics according to the video frames in the first video stream and corresponding audio data, and repairing the video frames in the third video stream according to the dynamic behavior characteristics to obtain a fourth video stream carrying digital human pictures.
According to the scene characteristics and the portrait characteristics, the video frames in the second video stream are repaired, and the characters and the scenes in the obtained third video stream are consistent with the actual characters and the scene pictures, but the problem of asynchronous sound and picture can be caused in the process of continuously optimizing the character portrait and the background portrait.
In practical application, the problem of asynchronous audio and video is mainly that the dynamic behaviors such as expression and action of a digital person are asynchronous with audio, so that the dynamic behavior characteristics of the digital person, such as expression characteristics and action characteristics, can be determined through analysis of video frames and corresponding audio data in a first video stream received subsequently, then the video frames in a third video stream can be repaired according to the dynamic behavior characteristics to obtain a fourth video stream, the image quality in the fourth video stream is still 4K, the scene is consistent with the real scene image, the characters are consistent with the real character image, and the audio and video are kept completely synchronous.
In some embodiments of the present invention, the invoking the third container generates a dynamic behavior feature according to the video frame and the corresponding audio data in the first video stream, and repairs the video frame in the third video stream according to the dynamic behavior feature to obtain a fourth video stream carrying a digital human picture, including:
Invoking the third container, generating dynamic behavior characteristics according to video frames and corresponding audio data in the first video stream, and generating transition frames according to the dynamic behavior characteristics; and calling the third container, and inserting the transition frame into the video frame of the third video stream to obtain a fourth video stream carrying the digital human picture.
In practical application, a transition frame can be generated according to dynamic behavior characteristics in a frame inserting mode, and then the transition frame is inserted into a video frame of the third video stream, so that the effect of smooth linkage is achieved.
In some embodiments of the present invention, the generating dynamic behavior features according to the video frames and the corresponding audio data in the first video stream includes: determining a first dynamic behavior sub-feature according to the human behavior in the video frame of the first video stream, and generating a second dynamic behavior sub-feature according to the corresponding audio data of the first video stream; and fusing the first dynamic behavior sub-feature and the second dynamic behavior sub-feature to obtain a dynamic behavior feature.
In practical application, on one hand, the first dynamic behavior sub-feature can be determined according to the human behavior in the video frame of the first video stream through analysis of the video frame of the first video stream, and on the other hand, the second dynamic behavior sub-feature can be generated according to the corresponding audio data of the first video stream through analysis of the corresponding audio data of the first video stream, and then the dynamic behavior sub-feature is obtained through fusion.
In some examples, after the fourth video stream is generated, the fourth video stream may be sent to other conference terminals participating in the conference, and in particular, the fourth video stream may be sent to a conference server, which sends the fourth video stream to the other conference terminals.
In the embodiment of the invention, a first video stream sent by a target conference terminal is acquired in the video conference process, a first container is called, a picture of a portrait area in a video frame of the first video stream is generated as a preset character image, a picture of a background area in the video frame of the first video stream is generated as a preset background image, a second video stream carrying a digital portrait is obtained, a second container is called, scene characteristics and/or portrait characteristics are extracted from the video frame of the first video stream, the video frame in the second video stream is repaired according to the scene characteristics and/or portrait characteristics, a third video stream carrying a digital portrait is obtained, a third container is called, a dynamic behavior characteristic is generated according to the video frame in the first video stream and corresponding audio data, the video frame in the third video stream is repaired according to the dynamic behavior characteristic, a fourth video stream carrying the digital portrait is obtained, the fourth video stream is sent to other conference terminals participating in the conference, multiple times of repairing the digital portrait is realized through a plurality of containers in the computing network, the digital portrait is enabled to be close to the digital portrait of the conference, and the digital portrait is synthesized in the digital conference.
Referring to fig. 3, a flowchart illustrating steps of another video conference processing method based on a power network according to some embodiments of the present invention may specifically include the following steps:
Step 301, in the process of a video conference, acquiring a first video stream sent by a target conference terminal.
Step 302, when a trigger event is detected, turning on a digital person function.
Step 303, calling a first container, extracting a character audio feature from audio data corresponding to the first video stream, determining a character image from preset candidate character images according to the character audio feature, extracting an environment audio feature from audio data corresponding to the first video stream, and determining a background image from preset candidate background images according to the environment audio feature.
Step 304, invoking the first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in the video frame of the first video stream into a preset background image, thereby obtaining a second video stream carrying a digital portrait image.
Step 305, a first sub-container of a second container is called, scene characteristics are extracted from video frames of the first video stream, and a background image of the video frames in the second video stream is repaired according to the scene characteristics, so as to obtain a first sub-video stream carrying digital human pictures.
Step 306, invoking a second sub-container of the second container, extracting the portrait characteristic from the video frame of the first video stream, and repairing the portrait image of the video frame in the first sub-video stream according to the portrait characteristic to obtain a third video stream carrying the digital portrait.
Step 307, calling a third container, generating dynamic behavior characteristics according to the video frames and the corresponding audio data in the first video stream, and generating transition frames according to the dynamic behavior characteristics.
Step 308, calling the third container, and inserting the transition frame into the video frame of the third video stream to obtain a fourth video stream carrying the digital human picture.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Referring to fig. 4, a schematic structural diagram of a video conference processing device based on a power network according to some embodiments of the present invention may specifically include the following modules:
a first video stream obtaining module 401, configured to obtain a first video stream sent by a target conference terminal in a video conference process;
A second video stream obtaining module 402, configured to invoke a first container, generate a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generate a picture of a background area in a video frame of the first video stream into a preset background image, so as to obtain a second video stream carrying a digital portrait image;
a third video stream obtaining module 403, configured to invoke a second container, extract scene features and/or portrait features from video frames of the first video stream, and repair video frames in the second video stream according to the scene features and/or portrait features, to obtain a third video stream carrying digital portrait;
and a fourth video stream obtaining module 404, configured to invoke a third container, generate dynamic behavior features according to the video frames in the first video stream and corresponding audio data, and repair the video frames in the third video stream according to the dynamic behavior features, so as to obtain a fourth video stream carrying digital human pictures.
In some embodiments of the present invention, the calling the first container generates a picture of a portrait area in a video frame of the first video stream as a preset portrait image, and generates a picture of a background area in a video frame of the first video stream as a preset background image, to obtain a second video stream carrying a digital portrait image, including:
Invoking the first container, extracting character audio features from audio data corresponding to the first video stream, determining character images from preset candidate character images according to the character audio features, extracting environment audio features from the audio data corresponding to the first video stream, and determining background images from preset candidate background images according to the environment audio features;
And calling the first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in the video frame of the first video stream into a preset background image to obtain a second video stream carrying a digital portrait image.
In some embodiments of the present invention, the second container includes a first sub-container and a second sub-container, the calling the second container extracts scene features and/or portrait features from video frames of the first video stream, repairs the video frames in the second video stream according to the scene features and/or portrait features, and obtains a third video stream carrying digital portrait, including:
Invoking the first sub-container, extracting scene characteristics from video frames of the first video stream, and repairing background images of the video frames in the second video stream according to the scene characteristics to obtain a first sub-video stream carrying digital human pictures;
invoking the second sub-container, extracting the portrait characteristic from the video frame of the first video stream, and repairing the portrait image of the video frame in the first sub-video stream according to the portrait characteristic to obtain a third video stream carrying a digital portrait;
Or calling the second sub-container, extracting the portrait characteristic from the video frame of the first video stream, and repairing the portrait image of the video frame in the second video stream according to the portrait characteristic to obtain a second sub-video stream carrying a digital portrait;
And calling the first sub-container, extracting scene characteristics from the video frames of the first video stream, and repairing the background images of the video frames in the second sub-video stream according to the scene characteristics to obtain a third video stream carrying digital human pictures.
In some embodiments of the invention, the second container comprises an image encoder, an image decoder, a color decoder;
The image encoder is used for extracting picture characteristics and performing multi-scale processing;
The image decoder is used for processing an up-sampling process of the visual characteristics;
the color decoder is used for processing the decoding process of the color inquiry.
In some embodiments of the present invention, the invoking the third container generates a dynamic behavior feature according to the video frame and the corresponding audio data in the first video stream, and repairs the video frame in the third video stream according to the dynamic behavior feature to obtain a fourth video stream carrying a digital human picture, including:
Invoking the third container, generating dynamic behavior characteristics according to video frames and corresponding audio data in the first video stream, and generating transition frames according to the dynamic behavior characteristics;
and calling the third container, and inserting the transition frame into the video frame of the third video stream to obtain a fourth video stream carrying the digital human picture.
In some embodiments of the present invention, the generating dynamic behavior features according to the video frames and the corresponding audio data in the first video stream includes:
determining a first dynamic behavior sub-feature according to the human behavior in the video frame of the first video stream, and generating a second dynamic behavior sub-feature according to the corresponding audio data of the first video stream;
And fusing the first dynamic behavior sub-feature and the second dynamic behavior sub-feature to obtain a dynamic behavior feature.
In some embodiments of the present invention, before the calling the first container, generating a picture of a portrait area in a video frame of the first video stream into a preset portrait image, and generating a picture of a background area in a video frame of the first video stream into a preset background image, to obtain a second video stream, the method further includes:
and the digital person function starting module is used for starting the digital person function when the trigger event is detected.
In the embodiment of the invention, a first video stream sent by a target conference terminal is acquired in the video conference process, a first container is called, a picture of a portrait area in a video frame of the first video stream is generated as a preset character image, a picture of a background area in the video frame of the first video stream is generated as a preset background image, a second video stream carrying a digital portrait is obtained, a second container is called, scene characteristics and/or portrait characteristics are extracted from the video frame of the first video stream, the video frame in the second video stream is repaired according to the scene characteristics and/or portrait characteristics, a third video stream carrying a digital portrait is obtained, a third container is called, a dynamic behavior characteristic is generated according to the video frame in the first video stream and corresponding audio data, the video frame in the third video stream is repaired according to the dynamic behavior characteristic, a fourth video stream carrying the digital portrait is obtained, the fourth video stream is sent to other conference terminals participating in the conference, multiple times of repairing the digital portrait is realized through a plurality of containers in the computing network, the digital portrait is enabled to be close to the digital portrait of the conference, and the digital portrait is synthesized in the digital conference.
Some embodiments of the invention also provide an electronic device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program implementing a method as above when executed by the processor.
Some embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as above.
Some embodiments of the invention also provide a computer program product comprising a computer program which, when executed by a processor, implements a method as above.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.
The video conference processing method, device, electronic equipment and medium based on the power network are provided in the above description, and specific examples are applied to illustrate the principles and implementation modes of the present invention, and the above examples are only used to help understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the idea of the present invention, the present disclosure should not be construed as limiting the present invention in summary.

Claims (10)

1.一种基于算力网的视频会议处理方法,其特征在于,所述方法包括:1. A video conference processing method based on a computing network, characterized in that the method comprises: 在视频会议的过程中,获取目标会议终端发送的第一视频流;During the video conference, obtaining a first video stream sent by a target conference terminal; 调用第一容器,将所述第一视频流的视频帧中人像区域的画面生成为预置的人物图像,并将所述第一视频流的视频帧中背景区域的画面生成为预置的背景图像,得到携带数字人画面的第二视频流;Calling the first container, generating a picture of a portrait area in the video frame of the first video stream as a preset character image, and generating a picture of a background area in the video frame of the first video stream as a preset background image, to obtain a second video stream carrying a picture of a digital human; 调用第二容器,从所述第一视频流的视频帧中提取场景特征和/或人像特征,根据所述场景特征和/或人像特征,对所述第二视频流中视频帧进行修复,得到携带数字人画面的第三视频流;Calling the second container, extracting scene features and/or portrait features from the video frames of the first video stream, and repairing the video frames in the second video stream according to the scene features and/or portrait features to obtain a third video stream carrying the digital human image; 调用第三容器,根据所述第一视频流中视频帧及相应的音频数据生成动态行为特征,并根据所述动态行为特征,对所述第三视频流中视频帧进行修复,得到携带数字人画面的第四视频流。The third container is called to generate dynamic behavior features according to the video frames and corresponding audio data in the first video stream, and the video frames in the third video stream are repaired according to the dynamic behavior features to obtain a fourth video stream carrying the digital human image. 2.根据权利要求1所述的方法,其特征在于,所述调用第一容器,将所述第一视频流的视频帧中人像区域的画面生成为预置的人物图像,并将所述第一视频流的视频帧中背景区域的画面生成为预置的背景图像,得到携带数字人画面的第二视频流,包括:2. The method according to claim 1, characterized in that the calling of the first container, generating the picture of the portrait area in the video frame of the first video stream as a preset character image, and generating the picture of the background area in the video frame of the first video stream as a preset background image, to obtain the second video stream carrying the digital human picture, comprises: 调用所述第一容器,从所述第一视频流相应的音频数据中提取人物音频特征,根据所述人物音频特征,从预置的候选人物图像中确定人物图像,并从所述第一视频流相应的音频数据中提取环境音频特征,根据所述环境音频特征,从预置的候选背景图像中确定背景图像;Calling the first container, extracting character audio features from audio data corresponding to the first video stream, determining a character image from preset candidate character images based on the character audio features, and extracting environmental audio features from audio data corresponding to the first video stream, determining a background image from preset candidate background images based on the environmental audio features; 调用所述第一容器,将所述第一视频流的视频帧中人像区域的画面生成为预置的人物图像,并将所述第一视频流的视频帧中背景区域的画面生成为预置的背景图像,得到携带数字人画面的第二视频流。The first container is called to generate a preset character image from a picture of a portrait area in a video frame of the first video stream, and to generate a preset background image from a picture of a background area in a video frame of the first video stream, thereby obtaining a second video stream carrying a digital human picture. 3.根据权利要求1所述的方法,其特征在于,所述第二容器包括第一子容器和第二子容器,所述调用第二容器,从所述第一视频流的视频帧中提取场景特征和/或人像特征,根据所述场景特征和/或人像特征,对所述第二视频流中视频帧进行修复,得到携带数字人画面的第三视频流,包括:3. The method according to claim 1, characterized in that the second container includes a first sub-container and a second sub-container, and the calling of the second container, extracting scene features and/or portrait features from the video frames of the first video stream, and repairing the video frames in the second video stream according to the scene features and/or portrait features to obtain a third video stream carrying the digital human image, comprises: 调用所述第一子容器,从所述第一视频流的视频帧中提取场景特征,并根据所述场景特征,对所述第二视频流中视频帧的背景图像进行修复,得到携带数字人画面的第一子视频流;Calling the first sub-container, extracting scene features from the video frames of the first video stream, and repairing the background image of the video frames in the second video stream according to the scene features, to obtain a first sub-video stream carrying the digital human image; 调用所述第二子容器,从所述第一视频流的视频帧中提取人像特征,并根据所述人像特征,对所述第一子视频流中视频帧的人物图像进行修复,得到携带数字人画面的第三视频流;The second sub-container is called to extract portrait features from the video frames of the first video stream, and according to the portrait features, the human image of the video frame in the first sub-video stream is repaired to obtain a third video stream carrying the digital human image; 或者,调用所述第二子容器,从所述第一视频流的视频帧中提取人像特征,并根据所述人像特征,对所述第二视频流中视频帧的人物图像进行修复,得到携带数字人画面的第二子视频流;Alternatively, the second sub-container is called to extract portrait features from the video frames of the first video stream, and according to the portrait features, the human image of the video frames in the second video stream is repaired to obtain a second sub-video stream carrying the digital human image; 调用所述第一子容器,从所述第一视频流的视频帧中提取场景特征,并根据所述场景特征,对所述第二子视频流中视频帧的背景图像进行修复,得到携带数字人画面的第三视频流。The first sub-container is called to extract scene features from the video frames of the first video stream, and the background image of the video frames in the second sub-video stream is repaired according to the scene features to obtain a third video stream carrying the digital human image. 4.根据权利要求3所述的方法,其特征在于,所述第二容器包括图像编码器、图像解码器、颜色解码器;4. The method according to claim 3, characterized in that the second container comprises an image encoder, an image decoder, and a color decoder; 所述图像编码器,用于画面特征提取和多尺度处理;The image encoder is used for picture feature extraction and multi-scale processing; 所述图像解码器,用于处理视觉特征的上采样过程;The image decoder is used to process the upsampling process of visual features; 所述颜色解码器,用于处理颜色查询的解码过程。The color decoder is used to process the decoding process of color query. 5.根据权利要求1至4任一项所述的方法,其特征在于,所述调用第三容器,根据所述第一视频流中视频帧及相应的音频数据生成动态行为特征,并根据所述动态行为特征,对所述第三视频流中视频帧进行修复,得到携带数字人画面的第四视频流,包括:5. The method according to any one of claims 1 to 4, characterized in that the calling of the third container, generating dynamic behavior features according to the video frames and corresponding audio data in the first video stream, and repairing the video frames in the third video stream according to the dynamic behavior features to obtain a fourth video stream carrying the digital human image, comprises: 调用所述第三容器,根据所述第一视频流中视频帧及相应的音频数据生成动态行为特征,并根据所述动态行为特征,生成过渡帧;Calling the third container, generating dynamic behavior features according to the video frames and corresponding audio data in the first video stream, and generating transition frames according to the dynamic behavior features; 调用所述第三容器,在所述第三视频流的视频帧中插入所述过渡帧,得到携带数字人画面的第四视频流。The third container is called, and the transition frame is inserted into the video frame of the third video stream to obtain a fourth video stream carrying the digital human image. 6.根据权利要求5所述的方法,其特征在于,所述根据所述第一视频流中视频帧及相应的音频数据生成动态行为特征,包括:6. The method according to claim 5, characterized in that generating dynamic behavior features according to the video frames and corresponding audio data in the first video stream comprises: 根据所述第一视频流的视频帧中人物行为确定第一动态行为子特征,并根据所述第一视频流相应的音频数据生成第二动态行为子特征;Determine a first dynamic behavior sub-feature according to the behavior of the person in the video frame of the first video stream, and generate a second dynamic behavior sub-feature according to the audio data corresponding to the first video stream; 对所述第一动态行为子特征和所述第二动态行为子特征进行融合,得到动态行为特征。The first dynamic behavior sub-feature and the second dynamic behavior sub-feature are fused to obtain a dynamic behavior feature. 7.根据权利要求1所述的方法,其特征在于,在所述调用第一容器,将所述第一视频流的视频帧中人像区域的画面生成为预置的人物图像,并将所述第一视频流的视频帧中背景区域的画面生成为预置的背景图像,得到第二视频流之前,还包括:7. The method according to claim 1, characterized in that before calling the first container, generating the picture of the portrait area in the video frame of the first video stream as a preset character image, and generating the picture of the background area in the video frame of the first video stream as a preset background image, and obtaining the second video stream, it also includes: 在检测到触发事件时,开启数字人功能。When a trigger event is detected, the digital human function is turned on. 8.一种基于算力网的视频会议处理装置,其特征在于,所述装置包括:8. A video conference processing device based on a computing network, characterized in that the device comprises: 第一视频流获取模块,用于在视频会议的过程中,获取目标会议终端发送的第一视频流;A first video stream acquisition module, used to acquire a first video stream sent by a target conference terminal during a video conference; 第二视频流得到模块,用于调用第一容器,将所述第一视频流的视频帧中人像区域的画面生成为预置的人物图像,并将所述第一视频流的视频帧中背景区域的画面生成为预置的背景图像,得到携带数字人画面的第二视频流;The second video stream obtaining module is used to call the first container, generate the picture of the portrait area in the video frame of the first video stream as a preset character image, and generate the picture of the background area in the video frame of the first video stream as a preset background image, so as to obtain a second video stream carrying the picture of the digital human; 第三视频流得到模块,用于调用第二容器,从所述第一视频流的视频帧中提取场景特征和/或人像特征,根据所述场景特征和/或人像特征,对所述第二视频流中视频帧进行修复,得到携带数字人画面的第三视频流;A third video stream obtaining module is used to call the second container, extract scene features and/or portrait features from the video frames of the first video stream, and repair the video frames in the second video stream according to the scene features and/or portrait features to obtain a third video stream carrying the digital human image; 第四视频流得到模块,用于调用第三容器,根据所述第一视频流中视频帧及相应的音频数据生成动态行为特征,并根据所述动态行为特征,对所述第三视频流中视频帧进行修复,得到携带数字人画面的第四视频流。The fourth video stream obtaining module is used to call the third container, generate dynamic behavior features according to the video frames and corresponding audio data in the first video stream, and repair the video frames in the third video stream according to the dynamic behavior features to obtain a fourth video stream carrying the digital human image. 9.一种电子设备,其特征在于,包括处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至7中任一项所述的方法。9. An electronic device, characterized in that it comprises a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein when the computer program is executed by the processor, the method according to any one of claims 1 to 7 is implemented. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的方法。10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.
CN202410850112.XA 2024-06-27 2024-06-27 Video conference processing method, device, electronic device and medium based on computing network Pending CN118741039A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410850112.XA CN118741039A (en) 2024-06-27 2024-06-27 Video conference processing method, device, electronic device and medium based on computing network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410850112.XA CN118741039A (en) 2024-06-27 2024-06-27 Video conference processing method, device, electronic device and medium based on computing network

Publications (1)

Publication Number Publication Date
CN118741039A true CN118741039A (en) 2024-10-01

Family

ID=92860136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410850112.XA Pending CN118741039A (en) 2024-06-27 2024-06-27 Video conference processing method, device, electronic device and medium based on computing network

Country Status (1)

Country Link
CN (1) CN118741039A (en)

Similar Documents

Publication Publication Date Title
US20130301918A1 (en) System, platform, application and method for automated video foreground and/or background replacement
CN112672090B (en) Method for optimizing audio and video effects in cloud video conference
CN105847718A (en) Scene recognition-based live video bullet screen display method and display device thereof
US20250047806A1 (en) Real-time video enhancement
WO2026045711A1 (en) Data generation method and apparatus, product, device, and medium
CN111476866B (en) Video optimization and playback methods, systems, electronic devices and storage media
CN114531564A (en) Processing method and electronic equipment
CN118741039A (en) Video conference processing method, device, electronic device and medium based on computing network
US20240348885A1 (en) System and method for question answering
CN114187216B (en) Image processing method, device, terminal equipment and storage medium
CN117579855A (en) Virtual live broadcast method and device
CN116681607A (en) Repair model training method, video repair method, equipment and medium
CN115412701A (en) Picture processing technology applied to meeting scene
CN117478824B (en) Conference video generation method and device, electronic equipment and storage medium
CN117746888B (en) A voice detection method, device, equipment and readable storage medium
Ying Perceptual quality prediction of social pictures, social videos, and telepresence videos
CN118714255A (en) A video conferencing method and device based on interpolation technology
CN114302175A (en) Video processing method and device
CN113301427A (en) Data processing method and device, electronic equipment and storage medium
CN119168873A (en) A method and device for image enhancement in video conferencing
CN120378569A (en) Method, device, equipment and medium for processing video conference based on semantic information
CN113038254B (en) Video playing method, device and storage medium
CN117939052A (en) A method, device, equipment and medium for data processing in video conferencing
CN120378568A (en) Method, device, equipment and medium for processing video conference based on ultra-low bandwidth
CN117939053A (en) A method, device, equipment and medium for video prediction processing in video conferencing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination