-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating an RGBD labeled dataset #9563
Comments
Hi @blckmessiah In addition to exporting depth as a .raw image file with Python, you could also export depth as an .npy array file that may be compatible with a wider range of machine learning tools, such as TensorFlow and PyTorch. For example, the link below has a Python script for saving the depth into an npy file. You may then be able to load the npy file into PyTorch, as described in the tutorial linked to below. https://towardsdatascience.com/beginners-guide-to-loading-image-data-with-pytorch-289c60b7afec Alternatively, there is a guide in the link below for using a RealSense D435 camera with TensorFlow to perform custom object detection with a point cloud and then label it. https://github.com/jediofgever/PointNet_Custom_Object_Detection The label creation guidance for this tutorial is in the article below: https://github.com/jediofgever/PointNet_Custom_Object_Detection/blob/master/PREPARE_DATA.md If you are very new to neural networks, the OpenVINO Toolkit would be a user-friendly yet powerful gateway to it. https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html The RealSense SDK has a compatibility 'wrapper' interface for OpenVINO Toolkit. https://github.com/IntelRealSense/librealsense/tree/master/wrappers/openvino |
Hello @MartyG-RealSense, |
From the links you have mentioned i found the 3rd and 4th the most useful but i still have some questions. Concerning the numpy array that i could pass to tensorflow or pytorch, my question is, (1) don't i have to label the depth frames so that the machine learning can happen? Turning them into a numpy array i don't know how that can happen. In the 3rd and 4th link they propose creating a point cloud and then using a tool that can label the cloud format. And later on feeding that to a neural net for learning purposes. This seems like a good approach and regarding this, (2) i would like to know if there is a way to convert already extracted depth frames in .raw format into point clouds. I am using the rs-convert Tool for extracting these frames from an already recorded and saved .bag file. |
If you need to convert frames extracted from a bag into point clouds then it would be easier to export .ply from rs-convert instead of .raw, as .ply is a point cloud file format. This would produce pointcloud data files that may be usable with the point cloud labelling system described in links 3 and 4. My interpretation of the instructions of those links is that if you name the ply files exported from rs-convert with sequential numbering (0.ply, 1.ply, 2.ply etc) and place them into a folder named POINTNET_LABELED_REAL_DATA then you can input the ply into the process in the link below to add RGB labels and create an .h5 file. Whilst you can load an npy file into TensorFlow and set up labelling, the process is a little complex. The tutorial in the link below may be a good starting point. https://newbedev.com/feeding-npy-numpy-files-into-tensorflow-data-pipeline Then google for the term tensorflow .npy labeling for further research leads. If you would prefer to extract and convert .raw image files, a RealSense team member suggests in the link below to convert the .raw files to pointcloud using the SDK's rs2_deproject_pixel_to_point function. As mentioned earlier though, OpenVINO Toolkit is user-friendly and could provide a means of labelling .npy file data using its Annotation Converter system. https://docs.openvinotoolkit.org/2021.3/omz_tools_accuracy_checker_annotation_converters.html |
Hello @MartyG-RealSense |
You are very welcome @blckmessiah - I look forward to your next update. Good luck! |
Hello again @MartyG-RealSense , So since i couldn't get the labeling tool mentioned above to work i tried to find an alternative and thankfully i did. It's a tool called labelCloud. It seems to be working well and i will look a bit more into it. About the rs2_deproject_pixel_to_point function, i have searched in the documentations and such and i am having a hard time finding examples that explain how the function should be used. In addition to that i think that the function is supposed to run while recording a video/depth stream and not with already extracted .raw depth images. Although i might be mistaken here. While researching the above i developed a new question. What are the benefits of using point cloud labeling and training a machine learning algorithm on that data, instead of just using Cuboid labeling on normal RGB frames? |
A good place to start in learning rs2_deproject_pixel_to_point may be the tutorial in the link below, which uses Python code and utilizes the 2D coordinate system left up is (0,0), right is x , down is y. There is not much information available about use of .raw files though and I could find no references about using them with rs2_deproject_pixel_to_point. Your question about point cloud labeling for machine learning led me to a Python tool called labelCloud that can annote 3D point clouds and supports .ply file format. https://github.com/ch-sa/labelCloud The above article lists the following additional resources about labelCloud: Original research paper YouTube video about labelCloud |
It's unfortunate that there are no references for using the rs2_deproject_pixel_to_point with .raw depth frames because like i said before i already took a recording with the D435i camera and extracted the .raw depth images. After that i went through the frames and copied only the frames that were going to be useful for me since the entirety of frames is too big and also not always useful. The labelCloud tool that you mentioned i also referred to it on my previous comment but my question was something else. My question is : What are the benefits of using point cloud labeling and training a machine learning algorithm on that data, instead of just using Cuboid labeling on normal RGB frames? It would be great if you had any insight on this! |
I carefully researched your question but have nothing more that I can add about it unfortunately. It is not a subject that I am famiiar with and an extensive search could not find a reference that met your requirements. |
That's unfortunate, thank you anyway! |
Case closed due to no further action able to be taken. |
Hello! I want to do key point detection of mouse body parts. Can depth information help? Can point clouds be used to mark key points? |
Hi @666tua I believe that this question is related to your earlier mouse analysis case in 2022 at #9563 There is a research paper at the link below regarding detecting body key points of a mouse. https://www.biorxiv.org/content/10.1101/2020.05.21.109629v1.full To quote the paper: Other studies have used depth-cameras for animal tracking, fitting a physical body-model of the animal to 3D data. https://www.biorxiv.org/content/10.1101/2020.05.21.109629v1.full#ref-11 These methods are powerful because they explicitly model the 3D movement and poses of multiple animals. However, due to technical limitations of depth imaging hardware (frame rate, resolution, motion blur), it is to date only possible to extract partial posture information about small and fast-moving animals, such as lab mice. Consequently, when applied to mice, these methods are prone to tracking mistakes when interacting animals get close to each other and the tracking algorithms require continuous manual supervision to detect and correct errors. This severely restricts throughput, making tracking across long time scales infeasible. Here we describe a novel system for multi-animal tracking that combines ideal features from both approaches. Our method fuses physical modeling of depth data and deep learning-based analysis of synchronized color video to estimate the body postures, enabling us to reliably track multiple mice during naturalistic social interactions |
Thank you very much! I will read these papers carefully.I'm using color images to make key points right now. I have a question: I extracted the color pictures from the recorded bag file. I deduce one of the pictures and derive the coordinate information of the key point, then how should I get the corresponding distance information. |
If you are using the RealSense SDK to obtain a color frame from the bag file then you could use the SDK program instruction rs2_project_color_pixel_to_depth_pixel to convert a 2D XY color pixel coordinate to a 3D XYZ depth coordinate. #5603 (comment) has a Python script that demonstrates its use. |
Issue Description
I recorded a video sequence using the Intel D435i camera. Further on i managed to extract RGB and Depth Frames. As i understand it the only way to store the depth information is by extracting the .raw version of the Depth Frames and so i did that as well. The goal of this dataset is for it to be used later on to train a neural network for object detection. Now im really new to machine learning and from what i understood, i should label my images and then feed them in a neural network for training. Where im having a really hard time is finding how exactly i can label the Depth Frames. There are many tools available online for labeling normal RGB images but i cannot seem to find any information on how to label/annotate Depth Frames. The Depths Frames as i said above are stored in a .raw format and most of the labeling tools i have found do not accept that format. I have been searching for quite some time now so i am hoping someone might know how to help me with this!
I am sorry in advance if the format of my issue is not correct this is my first time posting a question on github!
The text was updated successfully, but these errors were encountered: