CN116385518B - A method for robotic arm to grasp test tubes based on depth completion of transparent objects - Google Patents

A method for robotic arm to grasp test tubes based on depth completion of transparent objects

Info

Publication number
CN116385518B
CN116385518B CN202310085582.7A CN202310085582A CN116385518B CN 116385518 B CN116385518 B CN 116385518B CN 202310085582 A CN202310085582 A CN 202310085582A CN 116385518 B CN116385518 B CN 116385518B
Authority
CN
China
Prior art keywords
depth
point cloud
test tube
sub
completion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310085582.7A
Other languages
Chinese (zh)
Other versions
CN116385518A (en
Inventor
陈洪波
顾赵键
朱萍
吕斌
董哲康
郑军科
邢峰
林冰晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Fangyuan Detection Group Stock Co ltd
Original Assignee
Zhejiang Fangyuan Detection Group Stock Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Fangyuan Detection Group Stock Co ltd filed Critical Zhejiang Fangyuan Detection Group Stock Co ltd
Priority to CN202310085582.7A priority Critical patent/CN116385518B/en
Publication of CN116385518A publication Critical patent/CN116385518A/en
Application granted granted Critical
Publication of CN116385518B publication Critical patent/CN116385518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Program-controlled manipulators
    • B25J9/16Program controls
    • B25J9/1694Program controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J15/00Gripping heads and other end effectors
    • B25J15/08Gripping heads and other end effectors having finger members
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/021Optical sensing devices
    • B25J19/023Optical sensing devices including video camera means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/557Depth or shape recovery from multiple images from light fields, e.g. from plenoptic cameras
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E30/00Energy generation of nuclear origin
    • Y02E30/30Nuclear fission reactors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本发明公开一种基于透明物体深度补全的机械臂抓取试管方法,其特征在于包括以下步骤:搭建透明深度补全模型;将补全后的点云投影回深度图像;利用深度相机获取透明试管的彩色图像和深度图像并进行对齐;将深度补全模型移植到ROS开发平台上,并加载网络权重参数,对ROS中实时获取到的试管图像进行处理,获得完整的深度图像;机械臂得到试管的位置信息,结合试管顶部的中心坐标系(xt,yt,zt)计算机械臂末端执行器进行抓取的位置,得到机械臂末端执行器在笛卡尔空间下的位姿信息;根据得到的世界坐标系下的位姿信息,结合三次B样条插值算法得到机械臂末端夹爪的轨迹信息。本方法能够较为准确的估计出透明试管的位置信息,快速规划出对应的抓取轨迹完成抓取。

This invention discloses a method for a robotic arm to grasp a test tube based on depth completion of a transparent object. The method includes the following steps: constructing a transparent depth completion model; projecting the completed point cloud back into a depth image; acquiring and aligning the color and depth images of the transparent test tube using a depth camera; porting the depth completion model to the ROS development platform and loading network weight parameters to process the test tube image acquired in real-time in ROS to obtain a complete depth image; obtaining the position information of the test tube using the robotic arm, and calculating the grasping position of the end effector based on the center coordinate system (x <sub>t</sub> , y <sub>t</sub> , z<sub>t</sub> ) at the top of the test tube, thus obtaining the pose information of the end effector in Cartesian space; and obtaining the trajectory information of the robotic arm's end gripper based on the obtained pose information in the world coordinate system using a cubic B-spline interpolation algorithm. This method can accurately estimate the position information of the transparent test tube and quickly plan the corresponding grasping trajectory to complete the grasping process.

Description

Method for grabbing test tube by mechanical arm based on transparent object depth completion
Technical Field
The invention belongs to the technical field of depth camera application, and particularly relates to a method for grabbing test tubes by a mechanical arm based on transparent object depth complementation.
Background
The application of robots in chemical analysis laboratories is mainly the gripping and moving of mechanical arms. In the face of complex and changeable environments such as various laboratory instruments and equipment, changeable placement positions and the like, the success rate of grabbing the mechanical arm is seriously affected. Test tubes are a common instrument for every chemical laboratory, and many steps in the experimental process require test tubes. However, the cuvette, as a transparent object, has unique visual characteristics, which makes it difficult for a general purpose RGB-D camera to capture its complete depth information.
The unique visual characteristics of the transparent object, such as refraction and reflection, make the conventional RGBD camera unable to accurately acquire the depth information. The depth of the transparency obtained by the camera often appears as depth information behind the surface through which it passes, or as a lack of depth caused by specular highlights. For an automatic grabbing robot used in a chemical laboratory, the automatic identification mode is generally to locate the position to be grabbed by analyzing depth information of the surfaces of various objects, however, all visual characteristics of a transparent test tube make it difficult for a traditional algorithm to obtain an ideal depth value, and certain flexibility and stability are lacked. With the continuous deep learning application in image processing, the depth of a transparent object can be completely complemented, and various complex and changeable laboratory environments can be handled, so that the success rate of grabbing the mechanical arm can be greatly increased by processing the data of the transparent test tube by adopting a deep learning method.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention collects depth information and color information of the surface of the test tube in real time through a depth camera, simultaneously provides a test tube grabbing method based on a mechanical arm with depth complement of a transparent object, processes transparent test tube data through a depth complement network to obtain a complete depth map, deduces the position information of the test tube according to the obtained depth data, calculates to obtain grabbing pose information of the test tube according to a 3D space rotation principle, plans the tail end track of the mechanical arm through an interpolation algorithm, obtains motion information in a joint space through inverse kinematics, and finally controls the mechanical arm to complete the grabbing of the test tube.
The method for grabbing test tubes by using the mechanical arm based on transparent object depth complementation is characterized by comprising the following steps of:
(1) Constructing a transparent depth complement model;
(2) Projecting the completed point cloud back to the depth image to be used as an input of a depth completion module for training;
(3) A depth camera is used for acquiring and aligning a color image and a depth image of a transparent test tube, and the depth camera is fixed at the tail end of a mechanical arm;
(4) Transplanting the depth complement model constructed in the step (1) to an ROS development platform, loading the network weight parameters stored in the step (2), and processing the test tube image acquired in real time in the ROS to obtain a complete depth image;
(5) The mechanical arm subscribes to topics published in the step (3) to obtain position information of a test tube, and calculates the grabbing position of the mechanical arm end effector by combining a central coordinate system (x t,yt,zt) at the top of the test tube to obtain pose information of the mechanical arm end effector in Cartesian space;
(6) And (3) obtaining track information of the tail end clamping jaw of the mechanical arm according to the pose information under the world coordinate system obtained in the step (4) and combining a cubic B spline interpolation algorithm.
Further, the transparent depth completion model comprises a point cloud completion module and a depth completion module, wherein the point cloud completion module is used for preprocessing a depth map, and the depth completion module inputs complete depth data converted from the point cloud into the depth completion module to further refine the depth information.
Further, the step (1) includes:
(11) A point cloud complement module is constructed, the depth of the transparent object is back projected to the point cloud, and a correct depth image is estimated by predicting the shape of the complete point cloud;
(12) Taking the projected sparse point cloud as the input of a point cloud complement module;
(121) Aiming at unordered sparse point clouds in the step (12), constructing a 3D grid to convert unordered point cloud information into rule data capable of representing local information and structures of the point clouds;
(122) Defining a Grid g= < V, W >, wherein V, W represents the set of vertices and the set of values of Grid, respectively;
(123) Judging the coordinate position of the original point cloud, and calculating the weight corresponding to each vertex;
(124) Learning feature data required to complement the point cloud using a 3D CNN encoder-decoder structure, and converting Grid into an unordered point cloud through GRIDDING REVERSE;
(125) The point cloud is complemented by creating an MLP and further refined for the coarse point cloud.
Further, in the step (11), a characterization equation of the pixel point of the depth image at the 3D space coordinate is:
Wherein (x, y, z) is a three-dimensional coordinate point in space, (x ', y') is a pixel point coordinate in the depth map, D is a depth value, and f x,fy is a camera internal reference;
in the step (122), a step of, in the first embodiment, Wherein the method comprises the steps ofRepresenting each vertex generated in the mesh, w i representing the weight of each vertex;
in step (123), the original point cloud is If the coordinate position of a point p= (x, y, z) in the point cloud satisfies:
then take this point as vertex v i neighborhood The weight corresponding to each vertex can be expressed as:
Wherein the method comprises the steps of
Furthermore, the depth complement module adopts the architecture of an encoder-decoder, the encoder part adopts EFFICIENTNET as a main body and constructs a CECD module and a CEC module, RGB and depth images are spliced to be used as network input, depth data is input into a single-channel sampling module for feature extraction, and depth features with corresponding resolution are used as the input of each CECD block and each CEC block;
the decoder is a lightweight REFINENET decoder and mainly comprises a CRP module and a FUSE module, wherein the CRP module mainly comprises a convolution layer with the size of 1x1 and a maximum pooling layer with the size of 5x5, the FUSE module mainly comprises two convolution layers with the size of 1x1 and an up-sampling module, and characteristic information output by the CEC module is combined with original depth information to serve as input of the decoder, and complete depth information is recovered.
Further, step (2) includes:
(21) By passing through Representing the process of projecting the point cloud back to the depth image;
(22) The noisy depth map generated after the point cloud is complemented and the original RGB color image are input into a depth complementing module;
(23) The point cloud completion network is trained using Gridding Loss, the Loss function of which can be expressed as:
Wherein the method comprises the steps of A weight representing each vertex in the mesh;
The loss function of the depth-completion network can be expressed as:
where beta is a weight parameter and where, And D gt represent the predicted depth and the true depth, respectively, and D h and D w are gradient vectors of the depth map D along the width and height coordinate axes, respectively.
Further, in the step (3), a calibration mode of 'eyes on hands' is adopted;
CLEAR GRASP, TODD and TransCG were used as training data sets.
Further, in step (4), after the complete depth image is acquired, the depth image is mapped to coordinate points in the 3D space, and finally the obtained coordinate points are issued through the ROS message mechanism.
Further, step (5) includes:
(51) Defining the center point of opening and closing of the clamping jaw as the origin of a terminal coordinate system, taking the vertical clamping jaw plane downward as the Z-axis direction, taking the direction parallel to the grabbing direction of the clamping jaw forward as the X-axis direction, and determining the Y-axis direction according to the right-hand rule;
(52) The completed depth map is back projected to the 3D space again, and the complete point cloud can be expressed as The position of putting of transparent test tube is located on the test-tube rack, through traversing all point clouds and fix a position the pose of snatching, at first searches the highest point at test tube top:
For vertex p highest=(xh,yh,zh, search for all points within its neighborhood N (p highest) that meet the following requirements:
Where r T denotes the radius of the tube orifice, then searching the center point (x c,yc,zc) of neighborhood N (p highest) as the spatial location of the gripping jaw grip;
(53) After the grabbing position of the test tube is obtained, the gesture of the grabbing point clamping jaw is calculated, the gesture is set according to the coordinate system in (51), the positive direction of the x axis is the advancing direction of clamping jaw grabbing, and the gesture of the clamping jaw coordinate system during grabbing is the gesture during picture acquisition.
(54) According to the obtained test tube position information under the camera coordinate system and the conversion relation between the camera coordinate system and the world coordinate system, the test tube coordinate position under the world coordinate system is obtained, and the process can be expressed as follows:
Further, in the step (6), a cubic B-spline interpolation definition formula is as follows:
C3(u)=∑PiNi,3(u)
Wherein P i is a control point of a spline curve, P i is set as an initial point and a grabbing point of a test tube, N i,3 is a basis function of a cubic spline curve, and an equation of the basis function can be solved by a recurrence formula:
Where k is the number of times of the curve;
And adding a time stamp, a speed and an acceleration value to the obtained track information to obtain a complete motion track, converting the motion track into motion information in a joint space through inverse kinematics, and sending the motion information to a mechanical arm control module to complete a test tube grabbing task.
Compared with the prior art, the invention has the following advantages:
According to the invention, the RGBD sensor is used for collecting depth information and RGB information on the surface of the transparent test tube, the depth of the transparent test tube is complemented through an improved U-Net architecture depth complement model, then the grabbing pose of the test tube is calculated according to the complete depth information, and finally the grabbing track of the mechanical arm is obtained through a track planning algorithm. The method can accurately estimate the position information of the transparent test tube, and rapidly plan the corresponding grabbing track to finish grabbing.
Drawings
FIG. 1 is a schematic diagram of a transparent depth complement model.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, a method for grabbing test tubes by using a mechanical arm based on transparent object depth complementation includes:
Firstly, constructing a transparent depth complement model as shown in the figure, firstly constructing a point cloud complement module, back projecting the depth of a transparent object to the point cloud, and estimating a correct depth image by predicting the shape of the complete point cloud. The process of converting depth map pixels into 3D spatial coordinates can be expressed as:
Wherein (x, y, z) is a three-dimensional coordinate point in space, (x ', y') is a pixel point coordinate in the depth map, D is a depth value, and f x,fy is a camera internal reference.
The projected sparse point cloud is used as an input of a point cloud complement module. Inspired by the point cloud processing module provided in GRNet, for the input sparse unordered point cloud, a 3D grid is firstly constructed to convert unordered point cloud information into rule data capable of representing local information and structures of the point cloud. Define a Grid g= < V, W >, whereinRespectively representing the set of vertices and the set of values for Grid.Representing each vertex generated in the mesh, then the weight of each vertex is represented. Original point cloudIf the coordinate position of a point p= (x, y, z) in the point cloud satisfies:
then take this point as vertex v i neighborhood Is a point in the above. The weight corresponding to each vertex can be expressed as:
Wherein the method comprises the steps of The 3D CNN encoder-decoder structure is then used to learn the feature data needed to complement the point cloud and convert Grid to an unordered point cloud by GRIDDING REVERSE, whose coordinates are a weighted sum of the eight vertices. And finally, supplementing the point cloud by creating the MLP, and further refining the rough point cloud.
The transparent depth complement model is mainly divided into two parts, namely a point cloud complement (GRNet) module and a depth complement module. The first part is preprocessing of the depth map, firstly, the input original depth map is back projected to the 3D space, and as input of GRNet modules, the complete point cloud is predicted by GRNet modules to estimate the correct depth information. And the second part is to input more complete depth data converted from the point cloud into a depth complement module to further refine the depth information.
The depth completion module adopts the architecture of an encoder-decoder, and the encoder part adopts EFFICIENTNET as a backbone and constructs Conv-effect-Conv-Downsample (CECD module) and Conv-effect-Conv (CEC module). The RGB and depth images are spliced to be used as network input, meanwhile, in order to keep original depth information, the depth data is input into a single-channel sampling module for feature extraction, and depth features with corresponding resolution are used as input of each CECD blocks and CEC blocks.
The decoder part adopts a lightweight REFINENET decoder and mainly consists of a CRP module and a FUSE module, wherein the CRP module mainly consists of a convolution layer with the size of 1x1 and a maximum pooling layer with the size of 5x5, and the FUSE module mainly consists of two convolution layers with the size of 1x1 and an up-sampling module. The characteristic information output by the CEC module is combined with the original depth information as an input to the decoder and the complete depth information is restored.
Projecting the completed point cloud back to the depth image, wherein the process can be expressed as follows:
And then inputting the noisy depth map generated after the point cloud is complemented and the original RGB color image into a depth complementing module. Depth completion module uses encoder-decoder architecture, uses EFFICIENTNET as backbone and customizes Conv-effect-Conv-Downsample (CECD block), conv-effect-Conv (CEC block) to extract features and upsamples with one lightweight REFINENET decoder consisting essentially of CRP module consisting of a 1x1 convolutional layer and a 5x5 max-pooling layer, and two 1x1 convolutions
And the FUSE module formed by the layers improves the performance of the algorithm on the premise of ensuring the accuracy.
The point cloud completion network is trained using a Gridding Loss, whose Loss function can be expressed as:
Wherein the method comprises the steps of Representing the weight of each vertex in the mesh, the loss function of the depth completion network can be expressed as:
where beta is a weight parameter and where, And D gt represent the predicted depth and the true depth, respectively, and D h and D w are gradient vectors of the depth map D along the width and height coordinate axes, respectively.
And thirdly, acquiring a color image and a depth image of the transparent test tube by using a depth camera and aligning the color image and the depth image, wherein the camera is fixed at the tail end of the mechanical arm, and a calibration mode of 'eyes on hands' is adopted, namely, in order to acquire the conversion relation between a camera coordinate system and a mechanical arm base coordinate system, the calibration of eyes and hands is needed. The mechanical arm is moved to a plurality of different poses on the premise of keeping the calibration plate in the field of view of the camera, so that a plurality of point motion samples are obtained and used for calculating a conversion matrix. The dataset was trained using CLEAR GRASP, TODD, and TransCG. That is, the joint training was performed using three data sets CLEAR GRASP, TODD, and TransCG.
And step four, transplanting the depth completion model built in the step one to an ROS development platform, loading the network weight parameters stored in the step two, processing the test tube image acquired in real time in the ROS to obtain a complete depth image, mapping the depth image to coordinate points in a 3D space, and issuing the finally obtained coordinate points through an ROS message mechanism.
Fifthly, the mechanical arm subscribes to topics published in the third step to obtain position information of the test tube, and calculates the grabbing position of the mechanical arm end effector by combining a central coordinate system (x t,yt,zt) at the top of the test tube to obtain pose information of the mechanical arm end effector in Cartesian space, wherein the specific calculation process is as follows:
a) The mechanical arm end effector is a PGI parallel electric claw, a center point for opening and closing the clamping jaw is defined as an origin of an end coordinate system, a vertical clamping jaw plane is downwards in a Z-axis direction, a direction parallel to a clamping jaw grabbing direction is forwards in an X-axis direction, and a Y-axis direction is determined according to a right-hand rule.
B) The completed depth map is back projected to the 3D space again, and the complete point cloud can be expressed asThe placing position of the transparent test tube is located on the test tube rack, the pose of grabbing is positioned by traversing all point clouds, and the highest point at the top of the test tube is searched first:
For vertex p highest=(xh,yh,zh, search for all points within its neighborhood N (p highest) that meet the following requirements:
Where r T denotes the radius of the tube orifice. The center point (x c,yc,zc) of neighborhood N (p highest) is then searched as the spatial location of the jaw grip.
C) After the grabbing position of the test tube is obtained, the gesture of the grabbing point clamping jaw needs to be calculated. The positive direction of the x-axis is the advancing direction of the gripping jaws, according to the coordinate system set in a). Because the z-axis of the camera coordinate system and the x-axis of the clamping jaw point in the same direction, the posture of the clamping jaw coordinate system at the time of capturing is equal to the posture at the time of taking the picture.
D) According to the obtained test tube position information under the camera coordinate system and the conversion relation between the camera coordinate system and the world coordinate system, the test tube coordinate position under the world coordinate system can be obtained, and the process can be expressed as follows:
and step six, obtaining track information of the tail end clamping jaw of the mechanical arm according to the pose information under the world coordinate system obtained in the step four and combining a cubic B spline interpolation algorithm. The cubic B-spline interpolation definition formula is as follows:
C3(u)=∑PiNi,3(u)
Where P i is the control point of the spline curve, here set as the initial point and the grabbing point of the test tube. N i,3 is the basis function of a cubic spline, the equation of which can be solved by a recursive formula:
where k is the number of times the curve.
And finally adding a time stamp, a speed and an acceleration value to the obtained track information to obtain a complete motion track, converting the motion track into motion information in a joint space through inverse kinematics, and sending the motion information to a mechanical arm control module to complete a test tube grabbing task.
According to the invention, the RGBD sensor is used for collecting depth information and RGB information on the surface of the transparent test tube, the depth of the transparent test tube is complemented through an improved U-Net architecture depth complement model, then the grabbing pose of the test tube is calculated according to the complete depth information, and finally the grabbing track of the mechanical arm is obtained through a track planning algorithm. The method can accurately estimate the position information of the transparent test tube, and rapidly plan the corresponding grabbing track to finish grabbing.

Claims (7)

1.一种基于透明物体深度补全的机械臂抓取试管方法,其特征在于,包括以下步骤:1. A method for a robotic arm to grasp test tubes based on depth completion of transparent objects, characterized by comprising the following steps: (1)搭建透明深度补全模型;(1) Construct a transparent depth completion model; (2)将补全后的点云投影回深度图像,作为深度补全模块的输入进行训练;所述深度补全模块采用编码器-解码器的架构,编码器部分采用EfficientNet作为主干并构建了CECD模块和CEC模块,将RGB和深度图像拼接后作为网络的输入,将深度数据输入一个单通道的采样模块进行特征提取,并把对应分辨率的深度特征作为每一个CECD块以及CEC块的输入;(2) The completed point cloud is projected back into the depth image and used as the input of the depth completion module for training. The depth completion module adopts an encoder-decoder architecture. The encoder part uses EfficientNet as the backbone and constructs CECD and CEC modules. The RGB and depth images are stitched together and used as the input of the network. The depth data is input into a single-channel sampling module for feature extraction, and the depth features of the corresponding resolution are used as the input of each CECD block and CEC block. 解码器为轻量级的RefineNet解码器,主要由CRP模块和FUSE模块组成,其中CRP块主要由1x1大小的卷积层和5x5大小的最大池化层构成,FUSE块主要由俩个1x1大小的卷积层以及上采样模块组成,CEC模块输出的特征信息与原始深度信息结合作为解码器的输入,并恢复完整的深度信息;The decoder is a lightweight RefineNet decoder, mainly composed of the CRP module and the FUSE module. The CRP block mainly consists of a 1x1 convolutional layer and a 5x5 max pooling layer, while the FUSE block mainly consists of two 1x1 convolutional layers and an upsampling module. The feature information output by the CEC module is combined with the original depth information as the input of the decoder to recover the complete depth information. (21)通过表示点云投影回深度图像的过程;(21) Through This represents the process of projecting a point cloud back into a depth image; (22)将点云补全后生成的嘈杂深度图与原始RGB彩色图像输入深度补全模块;(22) Input the noisy depth map generated after point cloud completion and the original RGB color image into the depth completion module; (23)使用Gridding Loss训练点云补全网络,其损失函数可以表示为:(23) The point cloud completion network is trained using Gridding Loss, and its loss function can be expressed as: , 其中,表示网格中每个顶点的权重;深度补全网络的损失函数可以表示为:in, This represents the weight of each vertex in the grid; the loss function of the depth completion network can be expressed as: , 其中,β是权重参数,和Dgt分别表示预测深度和真实深度,Dh和Dw分别是深度图D沿宽度和高度坐标轴的梯度向量;Where β is the weighting parameter, D <sub>gt</sub> and D<sub>h</sub> represent the predicted depth and the true depth, respectively, while D <sub>h</sub> and D<sub> w </sub> are the gradient vectors of the depth map D along the width and height axes, respectively. (3)利用深度相机获取透明试管的彩色图像和深度图像并进行对齐,所述深度相机固定在机械臂末端;(3) Use a depth camera to acquire color and depth images of the transparent test tube and align them. The depth camera is fixed at the end of the robotic arm. (4)将步骤(1)搭建的深度补全模型移植到ROS开发平台上,并加载步骤(2)中保存的网络权重参数,对ROS中实时获取到的试管图像进行处理,获得完整的深度图像;(4) The depth completion model built in step (1) is ported to the ROS development platform, and the network weight parameters saved in step (2) are loaded to process the test tube image obtained in real time in ROS to obtain a complete depth image. (5)机械臂订阅步骤(3)中发布的话题得到试管的位置信息,结合试管顶部的中心坐标系(xt , yt , zt)计算机械臂末端执行器进行抓取的位置,得到机械臂末端执行器在笛卡尔空间下的位姿信息;(5) The robotic arm subscribes to the topic published in step (3) to obtain the position information of the test tube. Combined with the center coordinate system (x t , y t , z t ) at the top of the test tube, the position of the robotic arm end effector for grasping is calculated, and the pose information of the robotic arm end effector in Cartesian space is obtained. (6)根据步骤(4)中得到的世界坐标系下的位姿信息,结合三次B样条插值算法得到机械臂末端夹爪的轨迹信息;三次B样条插值定义公式如下:(6) Based on the pose information in the world coordinate system obtained in step (4), the trajectory information of the end effector gripper of the robotic arm is obtained by combining the cubic B-spline interpolation algorithm; the cubic B-spline interpolation formula is defined as follows: C3 (u)= ∑PiN i,3 (u)C <sub>3 </sub>(u) = ∑P <sub>i </sub>N <sub>i,3</sub> (u) 其中,Pi是样条曲线的控制点,将Pi设置为初始点以及试管的抓取点,Ni ,3是三次样条曲线的基函数,其方程可以通过递推公式求解:Where Pi is the control point of the spline curve, Pi is set as the initial point and the gripping point of the test tube, and Ni ,3 is the basis function of the cubic spline curve, whose equation can be solved by recursion formula: , 其中,k是曲线的次数;Where k is the degree of the curve; 对获得的轨迹信息添加时间戳、速度和加速度值,以得到完整的运动轨迹,并通过逆运动学转换到关节空间下的运动信息,发送给机械臂控制模块完成试管抓取任务。The obtained trajectory information is added with timestamps, velocity and acceleration values to obtain a complete motion trajectory, and then converted into motion information in joint space through inverse kinematics, which is sent to the robotic arm control module to complete the test tube grasping task. 2.根据权利要求1所述的一种基于透明物体深度补全的机械臂抓取试管方法,其特征在于,所述透明深度补全模型包括点云补全模块和深度补全模块,所述点云补全模块用于深度图的预处理,所述深度补全模块将由点云转换而来的较为完整的深度数据输入到深度补全模块来进一步细化深度信息。2. The method for a robotic arm to grasp test tubes based on depth completion of a transparent object according to claim 1, characterized in that the transparent depth completion model includes a point cloud completion module and a depth completion module, the point cloud completion module is used for preprocessing of the depth map, and the depth completion module inputs the relatively complete depth data converted from the point cloud into the depth completion module to further refine the depth information. 3.根据权利要求2所述的一种基于透明物体深度补全的机械臂抓取试管方法,其特征在于所述步骤(1)包括:3. The method for a robotic arm to grasp a test tube based on depth completion of a transparent object according to claim 2, characterized in that step (1) includes: (11)构建点云补全模块,将透明物体的深度反投影到点云,通过预测完整的点云形状来估计正确的深度图像;(11) Construct a point cloud completion module to back-project the depth of the transparent object onto the point cloud and estimate the correct depth image by predicting the complete point cloud shape; (12)将投影后的稀疏点云作为点云补全模块的输入;(12) Use the projected sparse point cloud as the input to the point cloud completion module; (121)针对步骤(12)中的无序的稀疏点云,构建一个3D网格将无序的点云信息转换为能够表示点云局部信息和结构的规则数据;(121) For the disordered sparse point cloud in step (12), a 3D mesh is constructed to convert the disordered point cloud information into regular data that can represent the local information and structure of the point cloud. (122)定义一个Grid G=<V,W>,其中,V、W分别表示Grid的顶点集和值集;(122) Define a Grid G = <V, W>, where V and W represent the vertex set and value set of the Grid, respectively; (123)判断原始点云的坐标位置,计算每个顶点对应的权值;(123) Determine the coordinate position of the original point cloud and calculate the weight of each vertex; (124)使用3D CNN编码器-解码器结构来学 习补全点云所需的特征数据,并通过Gridding Reverse将Grid转换为无序的点云;(124) Use a 3D CNN encoder-decoder architecture to learn the feature data required to complete the point cloud, and convert the grid into an unordered point cloud through Gridding Reverse; (125)通过创建MLP来补全点云,并对粗糙点云进一步细化。(125) The point cloud is completed by creating an MLP and the coarse point cloud is further refined. 4.根据权利要求3所述的一种基于透明物体深度补全的机械臂抓取试管方法,其特征在于,所述步骤(11)中,深度图像的像素点在3D空间坐标的表征方程为:4. The method for a robotic arm to grasp a test tube based on depth completion of a transparent object according to claim 3, characterized in that, in step (11), the representation equation of the pixel points of the depth image in 3D space coordinates is: , 其中, (x,y,z)为空间中的一个三维坐标点,(x',y')为深度图中的一个像素点坐标,D为深度值,fx ,fy为相机内参;Where (x,y,z) is a three-dimensional coordinate point in space, (x',y') is the coordinate of a pixel in the depth map, D is the depth value, and fx and fy are camera intrinsic parameters; 步骤(122)中,其中, 表示网格中生成的每一个顶点,wi则表示每一个顶点的权值;In step (122), in, represents each vertex generated in the mesh, and w<sub> i </sub> represents the weight of each vertex; 步骤(123)中,原始点云为,若点云中的一点p= (x,y,z)的坐标位置满足:In step (123), the original point cloud is If the coordinates of a point p = (x, y, z) in the point cloud satisfy: , 则将该点作为顶点vi邻域Nvi中的一点,每个顶点所对应的权值表示为:Then, this point is taken as a point in the neighborhood N vi of vertex v i , and the weight corresponding to each vertex is represented as: , 其中,in, . 5.根据权利要求1所述的一种基于透明物体深度补全的机械臂抓取试管方法,其特征在于,所述步骤(3)中,采取“眼在手上”的标定方式;5. The method for a robotic arm to grasp test tubes based on depth completion of transparent objects according to claim 1, characterized in that, in step (3), the calibration method of "eye on hand" is adopted; 采用Clear Grasp、TODD和TransCG作为训练数据集。Clear Grasp, TODD, and TransCG were used as training datasets. 6.根据权利要求1所述的一种基于透明物体深度补全的机械臂抓取试管方法,其特征在于,步骤(4)中,完整的深度图像获取后,将深度图映射到3D空间中的坐标点,通过ROS消息机制将最终获得的坐标点发布。6. The method for a robotic arm to grasp test tubes based on depth completion of transparent objects according to claim 1, characterized in that, in step (4), after the complete depth image is acquired, the depth image is mapped to coordinate points in 3D space, and the finally obtained coordinate points are published through the ROS message mechanism. 7.根据权利要求1所述的一种基于透明物体深度补全的机械臂抓取试管方法,其特征在于,步骤(5)包括:7. The method for a robotic arm to grasp a test tube based on depth completion of a transparent object according to claim 1, characterized in that step (5) includes: (51)规定夹爪开合的中心点为末端坐标系的原点,垂直夹爪平面向下为Z轴方向,平行于夹爪抓取方向向前为X轴方向,Y轴的方向根据右手定则来确定;(51) The center point of the opening and closing of the gripper is defined as the origin of the end coordinate system. The direction of the Z-axis is perpendicular to the gripper plane and downward. The direction of the X-axis is parallel to the gripper gripping direction and forward. The direction of the Y-axis is determined according to the right-hand rule. (52) 将 补 全 后 的 深 度 图 再 次 反 投 影 到3 D 空 间 ,完 整 的 点 云可 以 表 示 为:(52) The completed depth map is then back-projected into 3D space. The complete point cloud can be represented as: 透明试管的摆放位置位于试管架上,通过遍历所有的点云来定位抓取的位姿,首先搜索试管顶部的最高点: The transparent test tubes are placed on a test tube rack. The grasping pose is determined by traversing all point clouds, starting with searching for the highest point at the top of the test tube: , 对于顶点phighest= (xh,yh,zh),搜寻其邻域N(phighest)内的所有满足以下要求的点:For a vertex phighest = (x h , y h , z h ), search its neighborhood N (p highest ) for all points that satisfy the following requirements: , 其中,rT表示试管管口的半径,然后搜索邻域N(phighest)的中心点 (xc ,yc ,zc)作为夹爪抓取的空间位置;Where rT represents the radius of the test tube opening, and then the center point ( xc , yc , zc ) of the neighborhood N (p highest ) is searched as the spatial position for the gripper to grasp; (53)得到试管的抓取位置后,计算抓取点夹爪的姿态,根据(51)中的坐标系设置,x轴的正方向为夹爪抓取的前进方向,抓取时夹爪坐标系的姿态为获取图片时的姿态;(53) After obtaining the gripping position of the test tube, calculate the posture of the gripping point. According to the coordinate system setting in (51), the positive direction of the x-axis is the forward direction of the gripping, and the posture of the gripping coordinate system during gripping is the posture when the image is acquired. (54)根据获得的相机坐标系下的试管位置信息,以及相机坐标系与世界坐标系的转换关系,获得世界坐标系下的试管坐标位置,其过程可表示为:(54) Based on the obtained test tube position information in the camera coordinate system and the transformation relationship between the camera coordinate system and the world coordinate system, the test tube coordinate position in the world coordinate system is obtained. The process can be expressed as follows: .
CN202310085582.7A 2023-01-12 2023-01-12 A method for robotic arm to grasp test tubes based on depth completion of transparent objects Active CN116385518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310085582.7A CN116385518B (en) 2023-01-12 2023-01-12 A method for robotic arm to grasp test tubes based on depth completion of transparent objects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310085582.7A CN116385518B (en) 2023-01-12 2023-01-12 A method for robotic arm to grasp test tubes based on depth completion of transparent objects

Publications (2)

Publication Number Publication Date
CN116385518A CN116385518A (en) 2023-07-04
CN116385518B true CN116385518B (en) 2026-03-17

Family

ID=86975769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310085582.7A Active CN116385518B (en) 2023-01-12 2023-01-12 A method for robotic arm to grasp test tubes based on depth completion of transparent objects

Country Status (1)

Country Link
CN (1) CN116385518B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274349B (en) * 2023-09-22 2026-02-27 南开大学 A Method and System for Reconstructing Transparent Objects Based on Consistent Depth Prediction from an RGB-D Camera
CN118453114A (en) * 2024-05-10 2024-08-09 南京信息工程大学 Mechanical arm grabbing method, equipment, medium and product of mirror medical instrument

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115502967A (en) * 2022-01-25 2022-12-23 中国科学院自动化研究所 Humanoid dexterous hand object grasping method and robot system based on depth generation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140418B (en) * 2021-11-26 2025-07-11 上海交通大学宁波人工智能研究院 Seven-DOF grasping posture detection method based on RGB image and depth image

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115502967A (en) * 2022-01-25 2022-12-23 中国科学院自动化研究所 Humanoid dexterous hand object grasping method and robot system based on depth generation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
An Optimized Depth Complementation of Transparent Objects Based Robotic Arm Grasping System;Zhaojian Gu,Hongbo Chen 3 , Ping Zhu 3 , Mingyu Gao , Yan Huang;Smart Grid and Innovative Frontiers in Telecommunications:7th EAI International Conference,smartGIFT 2022;20221010;1-6页 *
基于注意力机制的轻量级 RGB-D 图像语义分割网络;孙刘杰,张煜森,王文举,赵进;包装工程;20220224;第43卷(第3期);第1-10页 *
基于视觉反馈的自动移液机械臂轨迹规划与控 制;顾赵键;中国优秀硕士学位论文;20250416;全文 *
多源多视的三维场景和物体重建;谢浩哲;中国博士学位论文全文数据库;20220215;正文第3.3 *

Also Published As

Publication number Publication date
CN116385518A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN114851201B (en) Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction
CN108010078B (en) An object grasping detection method based on three-level convolutional neural network
CN109702741B (en) Robotic arm visual grasping system and method based on self-supervised learning neural network
CN105225269B (en) Object modelling system based on motion
CN110378325B (en) A Target Pose Recognition Method in Robot Grasping Process
CN114387513B (en) Robot grasping methods, devices, electronic equipment and storage media
CN112947458B (en) Robot accurate grabbing method based on multi-mode information and computer readable medium
CN110692082A (en) Learning device, learning method, learning model, estimation device, and clamping system
CN113752255B (en) Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning
CN116385518B (en) A method for robotic arm to grasp test tubes based on depth completion of transparent objects
CN115578460B (en) Robot grabbing method and system based on multi-mode feature extraction and dense prediction
CN115861780B (en) A YOLO-GGCNN-based robotic arm detection and grasping method
CN112975957B (en) Target extraction method, system, robot and storage medium
CN116749198A (en) A method for guiding robotic arm to grab based on binocular stereo vision
CN114714365A (en) Disordered workpiece grabbing method and system based on cloud platform
JP7051751B2 (en) Learning device, learning method, learning model, detection device and gripping system
CN109461184B (en) An automatic positioning method for grasping points of a robot arm grasping objects
CN120715903B (en) Object grabbing method and system based on point cloud deep learning
CN113538576B (en) Grabbing method and device based on double-arm robot and double-arm robot
CN113989373B (en) Device and method for establishing robot grasping data set based on teaching and deep learning
CN116423520A (en) A Manipulator Trajectory Planning Method Based on Vision and Dynamic Motion Primitives
CN111598172A (en) Fast detection method of dynamic target grasping pose based on heterogeneous deep network fusion
CN114347028B (en) An intelligent grasping method at the end of a robot based on RGB-D images
CN114211490A (en) Robot arm gripper pose prediction method based on Transformer model
CN110796700A (en) Localization method of multi-object grasping area based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant