-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FAQ New Question Notification #179
Comments
I updated the FAQ. I also updated this issue with a general title so that we can notify with a new post in this thread instead of creating a new thread to notify for convenience. |
Hey @matlabbe I am trying to replicate the camera tracking on the GPU. I was going through the Odometry and Registration code while working with a RGBD camera. I understand that visual correspondences are used to match features between the last two frames and these matched 3d/2d points are fed into solvepnp to return a rotation and translation vector. The inverse of the previous pose is multiplied with the current transform in OdometryF2M.cpp and then the previous pose again is multiplied with the result from F2M resulting in just the transform we get from solvepnp. This would just constitute the deltas(change in transformation from one frame to another) instead of global transform but the application seems to be tracking the global transform. Am i missing out on something here? :D I have disabled all the bundle adjustment and motion estimation to test out the results from solvepnp purely. |
The odometry pose is updated here: rtabmap/corelib/src/Odometry.cpp Line 662 in 9cb1e4b
from the incremental transform t computed by the selected odometry approach (e.g. F2M). In F2M, the result from PnP is the pose, not the increment (note that tmpMap contains 3D points of the local feature map in odometry frame):rtabmap/corelib/src/odometry/OdometryF2M.cpp Lines 315 to 320 in 9cb1e4b
To make it work like other odometry approaches that output incremental transforms, we have to convert it as incremental too: rtabmap/corelib/src/odometry/OdometryF2M.cpp Lines 520 to 521 in 9cb1e4b
so that pose update above (in parent Odometry class) still work. |
Thanks @matlabbe! I thought that the matching was done with only the present frame and the one before(hence the increment) and not with the 3d local feature map and the present frame. Thanks for clarifying! |
Hello |
Hi, When scanning large environments, decrease point cloud density during mapping to reduce rendering load (and save battery). Note that even if you decrease point cloud density, rtabmap still record full resolution depth images, so high resolution point clouds can be generated offline afterwards. Outdoor, increase Max depth range to better see what is scanned. |
Hi @matlabbe Any information available about the CUDA support for cloud based mapping? Any efforts known ? I can see some CUDA support for RGB with OpenCV in FAQ section. Any thing related to PCL ? Or as an alternative, any document available of parameter tuning for faster point cloud based mapping and processing ? |
For PCL, you may check/ask on their github: https://github.com/PointCloudLibrary/pcl. It seems they have some algorithms ported to cuda: https://github.com/PointCloudLibrary/pcl/tree/master/cuda, but rtabmap doesn't use them. RTAB-Map uses PCL for ICP-based vo / loop closure and for 3D local occupancy grid that require voxel filtering and/or normals estimation. On post-processing, it uses PCL for meshing and texture mapping. To answer your question: "any document available of parameter tuning for faster point cloud based mapping and processing ?", which part exactly do you want to improve speed? |
@matlabbe Thanks for your prompt answer. I will give you background. We are trying to improve overall RTAB-Map SLAM processing speed using pointcloud and external odometry as inputs. In our code analysis, we found RTAB-Map does cluster extraction and segmentation in local grid mapping. We are trying to use octree based cuda implementation from PCL in RTAB-Map and estimating execution time in order to improve overall throughput. We are using external odometry, so ICP vo estimation won't come into the play in execution time. On my point '"any document available of parameter tuning for faster point cloud based mapping and processing ?", I meant any document available which has analysis of parameter tuning (like memory/Grid/optimizer based) impact on overall mapping speed or occupancy grid generation and loop closure? By using these both ways together, our aim is to accelerate RTAB-Map for NVIDIA GPU. Let us know your feedback on these approach. If you think there is something we are missing in our analysis let us know. |
I haven't used the CUDA part of PCL in several years. Not sure what updates they made later. In the past two years, I have tried NVIDIA's cuPCL, which includes implementations of ICP, NDT, Octree, etc. It supports x86 and Jetson platforms. The only problem is that it is not open source, but provides pre-compiled dynamic link libraries for different platforms, so integrating it into RTAB-Map may be bloated. |
@borongyuan That's whole new perspective and approach you have mentioned. Great to know usage of openVDB and VDBFusion in PCL processing. I agreed to your point about cuPCL, hence relying on PCL/CUDA implementations. |
That sounds a good idea! We could handle PCL-CUDA like we do with OpenCV CUDA, detecting if PCL's CUDA module is available, then enabling related parameters to use GPU version of some of the filtering algorithms.
In that paper section 5, we benchmarked the different local and global occupancy grid approaches provided in rtabmap, though not with super extensive or detailed results of every parts of the chain (like time for clustering / downsampling / voxel filtering / normal estimation, ...).
Well, it depends what you want to update at this rate. For global maps, I don't think we need super dense point clouds that are processed super fast, unlike local occupancy/voxel grids for obstacle avoidance. The current bottleneck I see with current occupancy grid is not really the time to create local grids (which time could be improved with some PCL's CUDA implementation but it is constant), but the time to update the global occupancy grid map after a loop closure. With RTAB-Map's memory management disabled, these updates can create spikes over real-time limit when continuously doing SLAM for long time as shown in Figure 18 of that paper (note that in that figure the local grids were 2D, so with 3D local grids and using OctoMap, the "Global Assembling Time" would have increased a lot faster).
Currently we have GPU options that are more related to 2D features (with OpenCV CUDA, more to come in that PR), not for point cloud processing. @borongyuan cuPCL looks great for jetson optimization, though the offer seems similar to what is already in PCL (which seems easier to integrate as rtabmap already uses a lot PCL). Just stumbled on this page, the guy tried OpenVDB on TUM RGBD dataset. That could give an idea how to use the library for similar sensors. Maybe another alternative https://github.com/facontidavide/Bonxai cheers, |
There is Octree implementation in PCL's gpu module. But I don't see any ICP and NDT related parts in cuda and gpu modules. The way cuPCL is provided is indeed not very friendly. |
@matlabbe @borongyuan Thanks for all the responses. Our detailed STM timing analysis (like time for clustering / downsampling / voxel filtering / normal estimation, ...) has shown that the large part of time is taken by Search algorithm for segmentation and clustering. we used radius search (PCL GPU implementation) in octree based implementation. This thread #1045 (comment) has new thought process of optimizing segmentation process without use of searching algorithm specifically computationally heavy functions like radius search or KNN. So far we have been working on Lidar-based slam only so haven't used any of visual libraries. Our focus is only PCL based optimization. |
Those optimizations can be great to reduce "Local Occupancy Grid" time, in particular with sensors generating a lot of points and at long range (e.g., OS2-128 lidar). Another part of the STM time is compressing data to save to database, I opened the other day an issue with possible improvement #1334 (nvCOMP) |
I'm specifically curious if re-processing datasets will be faster.
Are there Word Types that are/aren't CUDA accelerated?
The text was updated successfully, but these errors were encountered: