CN112365589B

CN112365589B - A virtual three-dimensional scene display method, device and system

Info

Publication number: CN112365589B
Application number: CN202011387657.XA
Authority: CN
Inventors: 李小波; 甘健; 蔡小禹; 马伟振
Original assignee: Oriental Dream Virtual Reality Technology Co ltd
Current assignee: Oriental Dream Virtual Reality Technology Co ltd
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2024-04-26
Anticipated expiration: 2040-12-01
Also published as: CN112365589A

Abstract

The present application discloses a method, device and system for displaying a virtual three-dimensional scene. The method includes creating a preset human body model, and obtaining a three-dimensional human body preset model point cloud based on the human body preset model prediction; obtaining a human body depth image collected by a somatosensory device, and converting the human body depth image into a three-dimensional human body point cloud; matching the converted three-dimensional human body point cloud with the predicted three-dimensional human body preset model point cloud, and calculating the posture of the somatosensory device; according to the calculated posture of the somatosensory device, fusing the three-dimensional human body point cloud into the existing human body preset model point cloud to obtain a three-dimensional fusion model; fusing the three-dimensional human body model with the preset environment point cloud to obtain a three-dimensional scene display, and outputting the three-dimensional scene display to a high-definition display for display. The present application adjusts the preset human body model through an image collected by a somatosensory device, reduces the use of the somatosensory device, reduces the reconstruction process of the three-dimensional model, and improves the speed of three-dimensional scene modeling.

Description

Virtual three-dimensional scene display method, device and system

Technical Field

The application relates to the field of virtual somatosensory interaction, in particular to a virtual three-dimensional scene display method, device and system.

Background

The existing virtual three-dimensional scene display system has the following defects:

1. The use is difficult: to improve the accuracy of the final model, multiple motion sensing devices are often required to be identified from multiple angles, which can cause interference among the multiple devices, and the quality of data in their scan overlap areas can be greatly reduced. Meanwhile, when a plurality of devices are used, each device is required to be calibrated independently, and the coordinate system of the plurality of devices after calibration is ensured to be completely consistent. When multiple devices are used simultaneously, the devices are complicated to set, so that the operation of a common user is difficult.

2. The time consumption is large: modeling using a Kinect somatosensory device requires a certain time to scan the user data to build a user model. In the process, the user needs to be ensured to be static, the limbs move, the thoracic cavity is fluctuated due to respiration, and the like can generate noise, so that the accuracy of the model is affected. At the same time, the user's clothing and body shadows can also affect model refinement. The user needs a certain time to generate a high-precision model without too much interference.

3. The calculation cost is large: when the Kinect somatosensory equipment is used for modeling, point cloud modeling is generally adopted, and because the data volume of the point cloud is huge, redundant data and noise interference exist, the calculation complexity is increased. Therefore, points capable of reflecting curved surface characteristics in the point cloud data are often required to be extracted, the data are simplified, and noise is removed, so that the accuracy and efficiency of model reconstruction are improved.

Disclosure of Invention

The application provides a virtual three-dimensional scene display method, device and system, which are used for adjusting a preset human body model through an image acquired by a motion sensing device, reducing the use of the motion sensing device, reducing the reconstruction process of the three-dimensional model and improving the modeling speed of the three-dimensional scene.

A virtual three-dimensional scene display method comprises the following steps:

creating a human body preset model, and predicting according to the human body preset model to obtain a three-dimensional human body preset model point cloud;

acquiring a human body depth image acquired by somatosensory equipment, and converting the human body depth image into a three-dimensional human body point cloud;

Matching the converted three-dimensional human body point cloud with the predicted three-dimensional human body preset model point cloud, and calculating the pose of somatosensory equipment;

according to the calculated pose of the somatosensory equipment, fusing the three-dimensional human body point cloud into the existing human body preset model point cloud to obtain a three-dimensional fusion model;

And fusing the three-dimensional human body model with the cloud of the preset environment points to obtain three-dimensional scene display, and outputting the three-dimensional scene display to a high-definition display for display.

The virtual three-dimensional scene display method comprises the following steps of:

Acquiring point cloud sets which are matched in point cloud positions and have the same number from the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud, and calculating centroids of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud in the point cloud sets;

constructing an error function according to the mass centers of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud;

And (3) minimizing the value of the error function, calculating an optimal rotation matrix and a translation vector, and determining the pose of the somatosensory equipment according to the optimal rotation matrix and the translation vector.

The virtual three-dimensional scene display method comprises the following concrete steps of:

Obtaining human skin color according to the RGB image acquired from the somatosensory equipment, and attaching the human skin color on the three-dimensional human model;

selecting a preset environment point cloud with the lowest color similarity with the human skin color from a preset environment library according to the human skin color;

And carrying out fusion registration on the three-dimensional human body model and the selected preset environment point cloud to obtain three-dimensional scene display, and outputting the three-dimensional scene display to a high-definition display for display.

The virtual three-dimensional scene display method comprises the following steps of fusing a three-dimensional human body point cloud into a human body preset model to obtain a three-dimensional fused model, wherein the method specifically comprises the following steps of:

Preprocessing the three-dimensional human body point cloud and the human body preset model point cloud;

carrying out model registration and fusion on the preprocessed three-dimensional human body point cloud and a human body preset model;

and (3) carrying out boundary corrosion deburring on the fused model to enable the boundary to be smooth.

The application also provides a virtual three-dimensional scene display device, which comprises:

the human body preset model creation module is used for creating a human body preset model, and predicting to obtain a three-dimensional human body preset model point cloud according to the human body preset model;

the three-dimensional human body point cloud construction module is used for acquiring a human body depth image acquired by the somatosensory equipment and converting the human body depth image into a three-dimensional human body point cloud;

The posture calculation module of the somatosensory equipment is used for matching the converted three-dimensional human body point cloud with the predicted three-dimensional human body preset model point cloud to calculate the posture of the somatosensory equipment;

the three-dimensional fusion model construction module is used for fusing the three-dimensional human body point cloud into the existing human body preset model point cloud according to the calculated pose of the somatosensory equipment to obtain a three-dimensional fusion model;

The three-dimensional scene display construction module is used for fusing the three-dimensional human body model with the preset environment point cloud to obtain three-dimensional scene display, and outputting the three-dimensional scene display to the high-definition display for display.

The virtual three-dimensional scene display device comprises a motion sensing equipment pose calculation module, a motion sensing equipment pose calculation module and a motion sensing equipment pose calculation module, wherein the motion sensing equipment pose calculation module is specifically used for obtaining point cloud sets which are matched in position and have the same number from a three-dimensional human body point cloud and a three-dimensional human body preset model point cloud, and calculating centroids of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud in the point cloud sets; constructing an error function according to the mass centers of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud; and (3) minimizing the value of the error function, calculating an optimal rotation matrix and a translation vector, and determining the pose of the somatosensory equipment according to the optimal rotation matrix and the translation vector.

The virtual three-dimensional scene display device is characterized in that the three-dimensional scene display construction module is specifically used for obtaining human skin colors according to RGB images acquired from somatosensory equipment and attaching the human skin colors on the three-dimensional human model; selecting a preset environment point cloud with the lowest color similarity with the human skin color from a preset environment library according to the human skin color; and carrying out fusion registration on the three-dimensional human body model and the selected preset environment point cloud to obtain three-dimensional scene display, and outputting the three-dimensional scene display to a high-definition display for display.

The virtual three-dimensional scene display device is characterized in that the three-dimensional fusion model construction module is specifically used for preprocessing a three-dimensional human body point cloud and a human body preset model point cloud; carrying out model registration and fusion on the preprocessed three-dimensional human body point cloud and a human body preset model; and (3) carrying out boundary corrosion deburring on the fused model to enable the boundary to be smooth.

The application also provides a virtual three-dimensional scene display system, which comprises: the virtual three-dimensional scene display device also comprises somatosensory equipment and a high-definition display.

The virtual three-dimensional scene showing system as described above, wherein the somatosensory device is used for acquiring human depth images and RGB images.

The beneficial effects achieved by the application are as follows:

(1) The application only uses one motion sensing device, so that the problems of the same coordinate system, interference among multiple devices and the like do not need to be considered, and simultaneously, the model generated when the multiple devices are scanned can be obtained with the accuracy comparable to that of the model generated when the multiple devices are scanned.

(2) And adjusting the human body preset model according to the acquired actual human body image, reducing the reconstruction process of the three-dimensional model, and improving the modeling speed of the three-dimensional scene.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a schematic diagram of a virtual three-dimensional scene display system according to a first embodiment of the present application;

FIG. 2 is a flow chart of a method for a virtual three-dimensional scene showing device to perform virtual three-dimensional scene showing;

fig. 3 is a schematic diagram of a virtual three-dimensional scene display device.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The embodiment of the application provides a virtual three-dimensional scene display system, as shown in fig. 1, which comprises a Kinect2 somatosensory device for acquiring user data, a device (such as a PC) for erecting front-end user image data acquisition and performing virtual three-dimensional somatosensory modeling on the rear end, two display processes and a high-definition display for outputting and displaying processed images; the application only uses one motion sensing device, so that the problems of the same coordinate system, interference among multiple devices and the like do not need to be considered, and simultaneously, the model generated when the multiple devices are scanned can be obtained with the accuracy comparable to that of the model generated when the multiple devices are scanned.

Specifically, in the virtual three-dimensional scene display system, the virtual three-dimensional scene display device executes a virtual three-dimensional scene display method, as shown in fig. 2, and the virtual three-dimensional scene display method specifically includes the following steps:

Step 210, creating a human body preset model, and predicting according to the human body preset model to obtain a three-dimensional human body preset model point cloud;

In the embodiment of the application, the human body preset model is created in the PC in advance, and the model is a binding body of bones and muscles conforming to human body construction, wherein the bones and the muscles conforming to human body construction can be standard statures of different humanoid varieties, different sexes and different ages, and then the human body preset model is adjusted in real time according to the acquired actual human body data, so that the reconstruction process of the three-dimensional model is reduced, and the speed of three-dimensional human body modeling is improved.

Step 220, acquiring a human body depth image acquired by somatosensory equipment, and converting the human body depth image into a three-dimensional human body point cloud;

specifically, a skeleton event is created in Kinect2 somatosensory equipment, a skeleton tracking function is opened, and a human body depth image and an RGB image containing skeleton data are acquired; constructing a front-end image acquisition system and a flow control system by using Unity in a PC, and rapidly acquiring a human body depth image acquired by Kinect2 somatosensory equipment through the front-end image acquisition system and the flow control system, wherein the pixel coordinate points of the human body depth image are [ u, v, w ];

in the embodiment of the application, after the PC acquires the human depth image, the human depth image is converted into the point cloud, specifically:

Specifically, the three-dimensional world coordinate points [ x, y, z ] of the human body of the Kinect2 somatosensory device and the pixel coordinate points [ u, v, w ] of the acquired human body depth image have the following relationship:

w＝z·s (1)

wherein f _x、f_y is the focal length of the Kinect2 somatosensory device on the x and y axes, c _z、c_y is the aperture center of the Kinect2 somatosensory device, and s is the scaling factor of the human depth image;

The PC derives the relationship between the human body depth image and the three-dimensional world coordinate according to the formula after the depth image is acquired as follows:

Performing point cloud construction according to the formula (2); specifically, f _x、f_y、c_z、c_y in the formula (2) is defined as an internal reference matrix C of the Kinect2 somatosensory device, and the spatial position and pixel coordinates of each point are expressed as follows using a matrix model:

wherein R and t are the postures of Kinect2 somatosensory equipment, R is a rotation matrix, t is a displacement vector, and s is the ratio of the data of the human depth map to the actual distance; if the Kinect2 somatosensory equipment is set to be in a static state, namely, rotation and translation are not performed, setting R as an identity matrix I and t as 0;

Each point in the converted point cloud picture is used for defining a size position point of a human body, a reference space coordinate axis of the point cloud picture is established at an intersection point of the central position of the human body and the ground, the human body is divided into two parts by adopting a straight line with the center perpendicular to the ground, and each point in the generated point cloud picture has own coordinate position.

Step 230, matching the converted three-dimensional human body point cloud with the predicted three-dimensional human body preset model point cloud, and calculating the pose of the somatosensory equipment;

Specifically, a back-end image processing system is built in a PC through OpenCV and Python, collected data is processed in real time through the back-end processing system, virtual three-dimensional somatosensory modeling is conducted, and a user three-dimensional model is output;

in the embodiment of the application, the pose of the somatosensory equipment is calculated, and the method specifically comprises the following substeps:

Step 231, acquiring point cloud sets with the same number and matched point cloud positions from the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud, and calculating centroids of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud in the point cloud sets;

specifically, the number of point clouds with matched point cloud positions obtained from the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud is n, and the obtained point cloud set is D= { D ₁,d′₁,d₂,d′₂......d_n,d′_n };

Defining a three-dimensional human body point cloud as d _i and a three-dimensional human body preset model point cloud as d' _i, wherein the mass center point clouds of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud are respectively as follows:

wherein, p _i is the centroid point cloud of the three-dimensional human body point cloud, and p' _i is the centroid point cloud of the three-dimensional human body preset model.

Step 232, constructing an error function according to the mass centers of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud;

Specifically, by calculation Transforming the three-dimensional human body preset model point cloud into a three-dimensional human body point cloud coordinate system, and constructing an error function of two point clouds as follows:

wherein J (i) is the error function of the construction, R is the rotation matrix, Is a translation vector.

Step 233, enabling the value of the error function to be minimum, calculating an optimal rotation matrix and a translation vector, and determining the pose of the somatosensory equipment according to the optimal rotation matrix and the translation vector;

specifically, bringing the above formula (4) into formula (5) minimizes the value of the error function to obtain:

Setting R in the formula (6) to be 0, and solving the obtained translation vector Is an optimal value; similarly set translation vector0, Solving the obtained rotation matrix R as an optimal value; then, the pose of the somatosensory equipment is calculated as follows:

Wherein D is a point cloud set, R is a rotation matrix, Is a modulus of the translation vector.

Referring back to fig. 2, step 240, according to the calculated pose of the somatosensory device, fusing the three-dimensional human body point cloud into the existing human body preset model point cloud to obtain a three-dimensional fused model;

In the embodiment of the application, a three-dimensional human body point cloud is fused into a human body preset model to obtain a three-dimensional fusion model, which specifically comprises the following sub-steps:

Step 241, preprocessing the three-dimensional human body point cloud and the human body preset model point cloud;

Wherein the preprocessing includes geometric distortion correction, noise suppression and filtering;

the geometric distortion correction is to establish a corresponding mathematical model according to the distortion cause, extract required information from the polluted or distorted signal, restore the original appearance along the distortion inverse process, calculate the estimation value of the real model from the distortion model by using a filter, and enable the estimation value to approach the real model to the maximum extent according to the pre-specified error criterion;

the noise suppression specifically adopts a mean value filtering or median filtering method; the average filtering is specifically to select a plurality of adjacent pixels of the current pixel to be processed to form a template, and the average value of the pixels in the template is used for replacing the value of the original pixel; the median filtering is specifically to sort the to-be-processed models according to the pixel values, and generate monotonically ascending or descending two-dimensional data sequences;

The filtering is to filter out the frequency of a specific wave band in the signal to achieve the effects of inhibiting and preventing interference, and specifically adopts the following formula to filter:

Wherein I (x, y, z) is an input three-dimensional human body point cloud or human body preset model point cloud, I' (x, y, z) is a three-dimensional human body point cloud image or human body preset model point cloud image output after filtering, Ω is a neighborhood range of 2n x 2n size with (x, y, z) as a center point, w (I, j, k) is a weight of the filter at (I, j, k), and (I, j, k) is a point in the field range, and w is a normalization coefficient.

Step 242, performing model registration and fusion on the preprocessed three-dimensional human body point cloud and a human body preset model;

The application carries out model registration and fusion treatment, and specifically comprises the following steps: firstly, respectively calculating gradient fields of a three-dimensional human body point cloud and a human body preset model, and replacing the gradient fields at the corresponding positions of the human body preset model by using the gradient fields of the three-dimensional human body point cloud to obtain a fused model gradient field;

specifically, the gradient fields of the three-dimensional human body point cloud, the human body preset model and the fused model are respectively calculated by adopting the following steps:

wherein, Is x-direction unit vector,Is a y-direction unit vector,Partial derivative of fused model in X direction for three-dimensional human body point cloud/human body preset model,The partial derivative of the model in the Y direction after the fusion of the three-dimensional human body point cloud/human body preset model is obtained; grad (u) is a gradient field vector of a three-dimensional human body point cloud/human body preset model/fused model; h ₁ and h ₂ are scale factors.

Then calculating the divergence of the fused model according to the gradient field of the fused model, and calculating a pixel value matrix of the fused model according to the divergence of the fused image;

Specifically, a second derivative is obtained on the gradient field of the fused model, so as to obtain the divergence of the fused model; then calculating pixel values of the fused model according to the divergence of the fused model and the coefficient matrix of the fused model; the coefficient matrix of the fused model is specifically: calculating data of the central position of the coefficient matrix according to the matrix corresponding to the boundary pixel points of the fused model, and then setting data of two sides of the central position of the coefficient matrix as 1, and setting the forward diagonal data as 1 to obtain the coefficient matrix;

243, carrying out boundary corrosion deburring on the fused model to enable the boundary to be smooth;

and after boundary corrosion deburring is carried out on the fused graph model, carrying out normalization processing on the fused graph model, and converting the pixel value of the fused graph model into a numerical value between 0 and 1.

In the embodiment of the application, after the three-dimensional human body model is obtained through fusion, the three-dimensional human body model is output to the high-definition display for display.

Referring back to fig. 2, step 250, fusing the three-dimensional human body model with the preset environmental point cloud to obtain a three-dimensional scene display, and outputting the three-dimensional scene display to a high-definition display for display;

Specifically, the three-dimensional human body model is fused with a preset environmental point cloud to obtain a three-dimensional scene display, and the method specifically comprises the following substeps:

step1, obtaining human skin color according to an RGB image acquired from somatosensory equipment, and attaching the human skin color on a three-dimensional human model;

Specifically, a face image is separated from an RGB image, then, the average value of the color values of the face image is calculated to obtain the average skin color, and the average skin color is used as the attached skin color of the three-dimensional human body model.

Step2, selecting a preset environment point cloud with the lowest color similarity with the human skin color from a preset environment library according to the human skin color;

Specifically, the similarity between the human skin color and the preset environmental point cloud is calculated by adopting the following formula:

Wherein sim is similarity, A is human skin color, B _i (x, y, z) is preset environmental point cloud, n is total number of the preset environmental point cloud, and the minimum similarity is calculated, namely min (sim) is calculated to obtain the selected preset environmental point cloud.

Step3, fusing and registering the three-dimensional human body model and the selected preset environment point cloud to obtain three-dimensional scene display, and outputting the three-dimensional scene display to a high-definition display for display.

Example two

A second embodiment of the present application provides a virtual three-dimensional scene display device, as shown in FIG. 3, including:

the human body preset model creation module 310 is configured to create a human body preset model, and predict and obtain a three-dimensional human body preset model point cloud according to the human body preset model;

The three-dimensional human body point cloud construction module 320 is configured to acquire a human body depth image acquired by the somatosensory device, and convert the human body depth image into a three-dimensional human body point cloud;

The posture calculation module 330 of the somatosensory device is configured to match the converted three-dimensional human body point cloud with the predicted three-dimensional human body preset model point cloud, and calculate the posture of the somatosensory device;

The three-dimensional fusion model construction module 340 is configured to fuse the three-dimensional human body point cloud into the existing human body preset model point cloud according to the calculated pose of the somatosensory device, so as to obtain a three-dimensional fusion model;

the three-dimensional scene display construction module 350 is configured to fuse the three-dimensional human model with a preset environmental point cloud to obtain three-dimensional scene display, and output the three-dimensional scene display to the high-definition display for display.

The three-dimensional scene display construction module is specifically used for obtaining human skin colors according to RGB images acquired from somatosensory equipment and attaching the human skin colors on the three-dimensional human model; selecting a preset environment point cloud with the lowest color similarity with the human skin color from a preset environment library according to the human skin color; and carrying out fusion registration on the three-dimensional human body model and the selected preset environment point cloud to obtain three-dimensional scene display, and outputting the three-dimensional scene display to a high-definition display for display.

Specifically, the posture calculation module 330 of the somatosensory device is specifically configured to obtain point cloud sets with the same number and matching point cloud positions from the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud, and calculate centroids of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud in the point cloud sets; constructing an error function according to the mass centers of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud; and (3) minimizing the value of the error function, calculating an optimal rotation matrix and a translation vector, and determining the pose of the somatosensory equipment according to the optimal rotation matrix and the translation vector.

The three-dimensional fusion model construction module 340 specifically includes a preprocessing sub-module 341, a registration fusion sub-module 342, and a boundary processing sub-module 343;

The preprocessing sub-module 341 preprocesses the three-dimensional human body point cloud and the human body preset model point cloud; the registration fusion sub-module 342 performs model registration and fusion on the preprocessed three-dimensional human body point cloud and a human body preset model; the boundary processing sub-module 343 performs boundary corrosion deburring on the fused model to smooth the boundary.

Specifically, the preprocessing sub-module 341 is specifically configured to perform geometric distortion correction, noise suppression, and filtering on the three-dimensional human body point cloud and the human body preset model point cloud; the registration fusion sub-module 342 is specifically configured to calculate gradient fields of the three-dimensional human body point cloud and the human body preset model, and replace the gradient field at the corresponding position of the human body preset model with the gradient field of the three-dimensional human body point cloud to obtain a fused model gradient field; and calculating the divergence of the fused model according to the gradient field of the fused model, and calculating the pixel value matrix of the fused model according to the divergence of the fused image. After the boundary processing sub-module 343 performs boundary corrosion deburring on the fused graph model, the method further comprises the step of performing normalization processing on the fused graph model and converting the pixel value of the fused graph model into a value between 0 and 1.

The above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The virtual three-dimensional scene display method is characterized by comprising the following steps of:

fusing the three-dimensional human body model with a preset environmental point cloud to obtain three-dimensional scene display, and outputting the three-dimensional scene display to a high-definition display for display;

the method comprises the following steps of fusing a three-dimensional human body model with a preset environmental point cloud to obtain a three-dimensional scene display, and specifically comprises the following sub-steps:

and carrying out fusion registration on the three-dimensional human body model and the selected preset environment point cloud to obtain three-dimensional scene display.

2. The virtual three-dimensional scene showing method according to claim 1, wherein the step of calculating the pose of the somatosensory device comprises the following steps:

3. The virtual three-dimensional scene display method according to claim 1, wherein the three-dimensional human body point cloud is fused into a human body preset model to obtain a three-dimensional fusion model, and the method specifically comprises the following sub-steps:

4. A virtual three-dimensional scene display device, comprising:

The three-dimensional scene display construction module is used for fusing the three-dimensional human body model with the preset environment point cloud to obtain three-dimensional scene display, and outputting the three-dimensional scene display to the high-definition display for display;

The three-dimensional scene display construction module is specifically used for obtaining human skin colors according to RGB images acquired from somatosensory equipment and attaching the human skin colors on the three-dimensional human model; selecting a preset environment point cloud with the lowest color similarity with the human skin color from a preset environment library according to the human skin color; and carrying out fusion registration on the three-dimensional human body model and the selected preset environment point cloud to obtain three-dimensional scene display.

5. The virtual three-dimensional scene display device according to claim 4, wherein the motion sensing device pose calculation module is specifically configured to obtain point cloud sets with the same number and matching positions from a three-dimensional human body point cloud and a three-dimensional human body preset model point cloud, and calculate centroids of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud in the point cloud sets; constructing an error function according to the mass centers of the three-dimensional human body point cloud and the three-dimensional human body preset model point cloud; and (3) minimizing the value of the error function, calculating an optimal rotation matrix and a translation vector, and determining the pose of the somatosensory equipment according to the optimal rotation matrix and the translation vector.

6. The virtual three-dimensional scene display device according to claim 4, wherein the three-dimensional fusion model construction module is specifically configured to preprocess a three-dimensional human body point cloud and a human body preset model point cloud; carrying out model registration and fusion on the preprocessed three-dimensional human body point cloud and a human body preset model; and (3) carrying out boundary corrosion deburring on the fused model to enable the boundary to be smooth.

7. A virtual three-dimensional scene display system, comprising the virtual three-dimensional scene display device according to any of claims 4-6, further comprising a somatosensory device and a high definition display.

8. The virtual three-dimensional scene presentation system of claim 7, wherein the motion sensing device is configured to capture human depth images and RGB images.