Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating an RGBD labeled dataset #9563

Closed
maberrospi opened this issue Aug 3, 2021 · 16 comments
Closed

Creating an RGBD labeled dataset #9563

maberrospi opened this issue Aug 3, 2021 · 16 comments

Comments

@maberrospi
Copy link

maberrospi commented Aug 3, 2021


Required Info
Camera Model { D435i}
Operating System & Version {Win 10)
Platform PC
SDK Version { 2.45.0 }
Language {Python}

Issue Description

I recorded a video sequence using the Intel D435i camera. Further on i managed to extract RGB and Depth Frames. As i understand it the only way to store the depth information is by extracting the .raw version of the Depth Frames and so i did that as well. The goal of this dataset is for it to be used later on to train a neural network for object detection. Now im really new to machine learning and from what i understood, i should label my images and then feed them in a neural network for training. Where im having a really hard time is finding how exactly i can label the Depth Frames. There are many tools available online for labeling normal RGB images but i cannot seem to find any information on how to label/annotate Depth Frames. The Depths Frames as i said above are stored in a .raw format and most of the labeling tools i have found do not accept that format. I have been searching for quite some time now so i am hoping someone might know how to help me with this!

I am sorry in advance if the format of my issue is not correct this is my first time posting a question on github!

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Aug 3, 2021

Hi @blckmessiah In addition to exporting depth as a .raw image file with Python, you could also export depth as an .npy array file that may be compatible with a wider range of machine learning tools, such as TensorFlow and PyTorch. For example, the link below has a Python script for saving the depth into an npy file.

#4934 (comment)

You may then be able to load the npy file into PyTorch, as described in the tutorial linked to below.

https://towardsdatascience.com/beginners-guide-to-loading-image-data-with-pytorch-289c60b7afec


Alternatively, there is a guide in the link below for using a RealSense D435 camera with TensorFlow to perform custom object detection with a point cloud and then label it.

https://github.com/jediofgever/PointNet_Custom_Object_Detection

The label creation guidance for this tutorial is in the article below:

https://github.com/jediofgever/PointNet_Custom_Object_Detection/blob/master/PREPARE_DATA.md


If you are very new to neural networks, the OpenVINO Toolkit would be a user-friendly yet powerful gateway to it.

https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html

The RealSense SDK has a compatibility 'wrapper' interface for OpenVINO Toolkit.

https://github.com/IntelRealSense/librealsense/tree/master/wrappers/openvino

@maberrospi
Copy link
Author

maberrospi commented Aug 3, 2021

Hello @MartyG-RealSense,
Thank you for the references, i will have a look at them asap and reply if i have any more questions or if they have solved my questions!

@maberrospi
Copy link
Author

From the links you have mentioned i found the 3rd and 4th the most useful but i still have some questions. Concerning the numpy array that i could pass to tensorflow or pytorch, my question is, (1) don't i have to label the depth frames so that the machine learning can happen? Turning them into a numpy array i don't know how that can happen. In the 3rd and 4th link they propose creating a point cloud and then using a tool that can label the cloud format. And later on feeding that to a neural net for learning purposes. This seems like a good approach and regarding this, (2) i would like to know if there is a way to convert already extracted depth frames in .raw format into point clouds. I am using the rs-convert Tool for extracting these frames from an already recorded and saved .bag file.

@MartyG-RealSense
Copy link
Collaborator

If you need to convert frames extracted from a bag into point clouds then it would be easier to export .ply from rs-convert instead of .raw, as .ply is a point cloud file format.

This would produce pointcloud data files that may be usable with the point cloud labelling system described in links 3 and 4. My interpretation of the instructions of those links is that if you name the ply files exported from rs-convert with sequential numbering (0.ply, 1.ply, 2.ply etc) and place them into a folder named POINTNET_LABELED_REAL_DATA then you can input the ply into the process in the link below to add RGB labels and create an .h5 file.

https://github.com/jediofgever/PointNet_Custom_Object_Detection/blob/master/PREPARE_DATA.md#create-h5-files-of-real-data


Whilst you can load an npy file into TensorFlow and set up labelling, the process is a little complex. The tutorial in the link below may be a good starting point.

https://newbedev.com/feeding-npy-numpy-files-into-tensorflow-data-pipeline

Then google for the term tensorflow .npy labeling for further research leads.


If you would prefer to extract and convert .raw image files, a RealSense team member suggests in the link below to convert the .raw files to pointcloud using the SDK's rs2_deproject_pixel_to_point function.

#2488 (comment)


As mentioned earlier though, OpenVINO Toolkit is user-friendly and could provide a means of labelling .npy file data using its Annotation Converter system.

https://docs.openvinotoolkit.org/2021.3/omz_tools_accuracy_checker_annotation_converters.html

@maberrospi
Copy link
Author

Hello @MartyG-RealSense
Thanks again for the suggestions i will take a look into them as soon as possible and reply again!!

@MartyG-RealSense
Copy link
Collaborator

You are very welcome @blckmessiah - I look forward to your next update. Good luck!

@maberrospi
Copy link
Author

maberrospi commented Aug 5, 2021

Hello again @MartyG-RealSense ,
So i tried running the labeling tool mentioned in 3 and 4 and run into some problems. As for the .npy labeling from the very little research that i did on it, i still do not get how you can label the npy files of images for machine learning purposes.

So since i couldn't get the labeling tool mentioned above to work i tried to find an alternative and thankfully i did. It's a tool called labelCloud. It seems to be working well and i will look a bit more into it.

About the rs2_deproject_pixel_to_point function, i have searched in the documentations and such and i am having a hard time finding examples that explain how the function should be used. In addition to that i think that the function is supposed to run while recording a video/depth stream and not with already extracted .raw depth images. Although i might be mistaken here.

While researching the above i developed a new question. What are the benefits of using point cloud labeling and training a machine learning algorithm on that data, instead of just using Cuboid labeling on normal RGB frames?

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Aug 6, 2021

A good place to start in learning rs2_deproject_pixel_to_point may be the tutorial in the link below, which uses Python code and utilizes the 2D coordinate system left up is (0,0), right is x , down is y.

https://medium.com/@yasuhirachiba/converting-2d-image-coordinates-to-3d-coordinates-using-ros-intel-realsense-d435-kinect-88621e8e733a

There is not much information available about use of .raw files though and I could find no references about using them with rs2_deproject_pixel_to_point.

Your question about point cloud labeling for machine learning led me to a Python tool called labelCloud that can annote 3D point clouds and supports .ply file format.

https://github.com/ch-sa/labelCloud
https://analyticsindiamag.com/labelcloud-python-tool-for-annotating-3d-point-clouds/

The above article lists the following additional resources about labelCloud:

Original research paper
https://arxiv.org/ftp/arxiv/papers/2103/2103.04970.pdf

YouTube video about labelCloud
https://www.youtube.com/watch?v=8GF9n1WeR8A

@maberrospi
Copy link
Author

It's unfortunate that there are no references for using the rs2_deproject_pixel_to_point with .raw depth frames because like i said before i already took a recording with the D435i camera and extracted the .raw depth images. After that i went through the frames and copied only the frames that were going to be useful for me since the entirety of frames is too big and also not always useful.

The labelCloud tool that you mentioned i also referred to it on my previous comment but my question was something else.

My question is : What are the benefits of using point cloud labeling and training a machine learning algorithm on that data, instead of just using Cuboid labeling on normal RGB frames? It would be great if you had any insight on this!

@MartyG-RealSense
Copy link
Collaborator

I carefully researched your question but have nothing more that I can add about it unfortunately. It is not a subject that I am famiiar with and an extensive search could not find a reference that met your requirements.

@maberrospi
Copy link
Author

That's unfortunate, thank you anyway!

@MartyG-RealSense
Copy link
Collaborator

Case closed due to no further action able to be taken.

@666tua
Copy link

666tua commented Mar 13, 2023

如果您需要将从包中提取的帧转换为点云,那么从 rs-convert 导出.ply而不是 .raw会更容易,因为 .ply 是一种点云文件格式。

这将生成可与链接 3 和 4 中描述的点云标记系统一起使用的点云数据文件。我对这些链接说明的解释是,如果您将从 rs-convert 导出的层文件命名为顺序编号 (0. ply、1.ply、2.ply 等)并将它们放入名为POINTNET_LABELED_REAL_DATA的文件夹中,然后您可以将层输入到下面链接中的过程中以添加 RGB 标签并创建一个**.h5**文件。

https://github.com/jediofgever/PointNet_Custom_Object_Detection/blob/master/PREPARE_DATA.md#create-h5-files-of-real-data

虽然您可以将 npy 文件加载到 TensorFlow 中并设置标签,但这个过程有点复杂。下面链接中的教程可能是一个很好的起点。

https://newbedev.com/feeding-npy-numpy-files-into-tensorflow-data-pipeline

然后谷歌搜索术语tensorflow .npy labeling以获得进一步的研究线索。

如果您更愿意提取和转换 .raw 图像文件,RealSense 团队成员在下面的链接中建议使用 SDK 的rs2_deproject_pixel_to_point函数将 .raw 文件转换为点云。

第2488章(评论)

不过,如前所述,OpenVINO Toolkit 是用户友好的,可以提供一种使用其注释转换器系统标记 .npy 文件数据的方法。

https://docs.openvinotoolkit.org/2021.3/omz_tools_accuracy_checker_annotation_converters.html

Hello! I want to do key point detection of mouse body parts. Can depth information help? Can point clouds be used to mark key points?

@MartyG-RealSense
Copy link
Collaborator

Hi @666tua I believe that this question is related to your earlier mouse analysis case in 2022 at #9563

There is a research paper at the link below regarding detecting body key points of a mouse.

https://www.biorxiv.org/content/10.1101/2020.05.21.109629v1.full

To quote the paper:


Other studies have used depth-cameras for animal tracking, fitting a physical body-model of the animal to 3D data.

https://www.biorxiv.org/content/10.1101/2020.05.21.109629v1.full#ref-11
https://www.biorxiv.org/content/10.1101/2020.05.21.109629v1.full#ref-12

These methods are powerful because they explicitly model the 3D movement and poses of multiple animals. However, due to technical limitations of depth imaging hardware (frame rate, resolution, motion blur), it is to date only possible to extract partial posture information about small and fast-moving animals, such as lab mice.

Consequently, when applied to mice, these methods are prone to tracking mistakes when interacting animals get close to each other and the tracking algorithms require continuous manual supervision to detect and correct errors. This severely restricts throughput, making tracking across long time scales infeasible.

Here we describe a novel system for multi-animal tracking that combines ideal features from both approaches. Our method fuses physical modeling of depth data and deep learning-based analysis of synchronized color video to estimate the body postures, enabling us to reliably track multiple mice during naturalistic social interactions

@666tua
Copy link

666tua commented Mar 14, 2023

你好@666tua 我相信这个问题与您早先在 2022 年的鼠标分析案例有关#9563

下面的链接中有一篇关于检测鼠标身体关键点的研究论文。

https://www.biorxiv.org/content/10.1101/2020.05.21.109629v1.full

引用这篇论文:

其他研究使用深度相机进行动物追踪,将动物的身体模型拟合到 3D 数据。

https://www.biorxiv.org/content/10.1101/2020.05.21.109629v1.full#ref-11 https://www.biorxiv.org/content/10.1101/2020.05.21.109629v1.full#ref-12

这些方法非常强大,因为它们明确地模拟了多种动物的 3D 运动和姿势。然而,由于深度成像硬件的技术限制(帧率、分辨率、运动模糊),迄今为止只能提取有关小型和快速移动动物(例如实验室小鼠)的部分姿势信息。

因此,当应用于小鼠时,当相互作用的动物彼此靠近时,这些方法很容易出现跟踪错误,并且跟踪算法需要持续的人工监督来检测和纠正错误。这严重限制了吞吐量,使得跨长时间尺度的跟踪变得不可行。

在这里,我们描述了一种新的多动物跟踪系统,它结合了两种方法的理想特征。我们的方法融合了深度数据的物理建模和基于深度学习的同步彩色视频分析来估计身体姿势,使我们能够在自然社交互动中可靠地跟踪多只老鼠

Thank you very much! I will read these papers carefully.I'm using color images to make key points right now. I have a question: I extracted the color pictures from the recorded bag file. I deduce one of the pictures and derive the coordinate information of the key point, then how should I get the corresponding distance information.

@MartyG-RealSense
Copy link
Collaborator

If you are using the RealSense SDK to obtain a color frame from the bag file then you could use the SDK program instruction rs2_project_color_pixel_to_depth_pixel to convert a 2D XY color pixel coordinate to a 3D XYZ depth coordinate. #5603 (comment) has a Python script that demonstrates its use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants