Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deskewing...again #421

Merged
merged 3 commits into from
Jan 9, 2025
Merged

Deskewing...again #421

merged 3 commits into from
Jan 9, 2025

Conversation

tizianoGuadagnino
Copy link
Collaborator

@tizianoGuadagnino tizianoGuadagnino commented Jan 9, 2025

Motivation

Since the beginning of time we decided to deskew the input scans at the middle of the scan duration (0.5), the motivation for this was given in #299, and it is an heritage of our initial development based on KITTI (everybody has his dark little secrets). This PR aim at removing the magic parameter 0.5, and deskew the scans in a more principled way.

This PR

Our final solution is to first deskew the scans at the beginning of the scan duration. We made this choice as there is no easy way to know if the a scan has been stamped at the beginning or the end of the scan collection, especially from a python perspective. We go for the beginning as this is the most common, and also the "right" way of timestamping. We then transform the resulting point cloud in the local frame of the scan after applying the constant velocity initial guess, basically moving the scan to the end of the predicted motion. This new deskewed frame is than passed to the successive steps of the pipeline.

Results

On all tested sequences there is no significant difference between deskeing at 0.5 and the current version, so this can be seen as a style change and a way of having things in the right reference frame. This might be more relevant for low frequency scanners than for typical LiDARS running at 10Hz.

@benemer
Copy link
Member

benemer commented Jan 9, 2025

Some more insights on this:

We always thought it did not matter which timestamp of the scan duration we set the reference for deskewing; see #299. However, there is one thing we always missed: In our current implementation, it does mar at which timestamp we estimate the pose. If we set the deskewing reference timestamp to 0, we effectively estimate the pose at the beginning of the scan duration; with 1.0, we estimate the pose at the end of the scan duration, and with 0.5 (default KISS before this PR), we estimate the pose at the middle of the scan duration. Since the KITTI ground truth reports the pose at the middle of the scan duration, we chose 0.5 as explained in #299.

Why does it matter, and why do we want to change KISS to estimate the pose at the end of the scan? Because we use the same constant velocity estimate last_delta_ for the initial guess and for deskewing. Now, if you estimate the pose at the beginning of scan i, our last_delta_ describes the motion from the beginning of scan i-2 to the beginning of scan i-1, and you use it for deskewing from the beginning of scan i to the end of scan i. This means we assume a constant motion from the beginning of scan i-2 to the end of scan i, which is three scan durations (!).

Let's do the same by estimating the pose at the end of scan i (this PR). Our constant velocity estimate is the motion between the end of scan i-2 and the end of scan i-1; therefore, we assume a constant motion for only two scan durations (the end of scan i-2, which is the beginning of scan i-1, and the end of scan i).

In the case of non-linear motion (surprise, this is reality), this is a big deal. We observed that if we estimate the pose at the beginning (by setting mid_pose_timestamp = 0.0), the odometry is much worse when the vehicle accelerates. As explained above, our constant velocity estimate was "too old" for deskewing and a worse approximation than taking it from the estimated poses at the end of the scans.

Estimating the pose at the middle of the scan instead of at the end will yield better results. The constant velocity model is more accurate for deskewing.

TL;DR

When estimating the poses at the end of the scan duration, the constant velocity estimate we also use for deskewing is more accurate because it is closer in time to the actual motion during the current scan.

@tizianoGuadagnino tizianoGuadagnino merged commit 361f2ce into main Jan 9, 2025
19 checks passed
@tizianoGuadagnino tizianoGuadagnino deleted the tiz_and_ben/deskew branch January 9, 2025 15:41
@niosus
Copy link
Member

niosus commented Jan 21, 2025

Hey folks, I've been confused by these topics for a while now and still don't fully understand it 😅 So I wanted to take this chance to wrestle with these ideas with you a bit. Do you mind? 😉

@benemer I think I largely agree with what you wrote especially about the time difference between various scans under various assumptions. However, it seems that the code in this PR could be implemented a bit simpler if I understand it correctly by essentially picking mid_pose_stamp == 1.0. 🤔 See the write-up below.

Definition of motion compensation

First, the setup. Let's say we get need to compensate motion between two scans taken at times $t_0 \ldots t_1$ and $t_1 \ldots t_2$. The scans of course happen continuously, so every point $p_t$ originates from some pose at time $t \in [t_0, t_2]$.

We can denote these poses for every point as $T^w_t$, where $w$ stands for some "world" frame. The outcome of point cloud motion compensation for, say, the first of our LiDAR scans (taken at times $t_0 \ldots t_1$) would be a point cloud $\{p_t\}$ such that all of the $p_t$ points in this point cloud are represented in a single common coordinate system that corresponds to the pose of the robot at a chosen time $t_\mathrm{ref} \in [t_0, t_1]$ within this scan: $T^w_{t_\mathrm{ref}}$. In the code before this PR the reference point was a midpoint of the point cloud, $t_\mathrm{ref} = t_{0.5}$.

Motion compensation algorithm

Without loss of generality let's focus on motion compensating the first scan taken at times $t \in [t_0, t_1]$.

  1. We first select a reference time: $t_\mathrm{ref} \in [t_0, t_1]$
  2. For each point $p_t$ at times $t \in [t_0, t_1]$ we then estimate a transformation to apply to this point: $T^{t_\mathrm{ref}}_{t}$
  3. We apply these transforms: $p_{t_\mathrm{ref}} = T^{t_\mathrm{ref}}_{t} p_t$
  4. This produces a point cloud with all points $\{p_{t_\mathrm{ref}}\}$ stemming from the same coordinate system with the pose: $T^w_{t_\mathrm{ref}}$

It seems there is a couple of options of how to pick the reference frame and then how to compute the resulting per-point transformation.

Reference point selection

Before this PR, we chose the timestamp of the midpoint of the cloud as a reference: $t_\mathrm{ref} = t_{0.5}$ for the scan taken at times $t_0 \ldots t_1$. We can of course also pick $t_0$ or $t_1$ as a reference point too. In this PR we argue that we pick $t_0$ but then we multiply the motion compensated point by $T^{t_1}_{t_0}$ so I claim that essentially we choose the $t_\mathrm{ref} = t_1$. More on this below.

Reference transform computation

We can compute the reference transformation for a point at time $t$ by scaling the original relative transformation $T^{t_0}_{t_1}$:

$$\begin{aligned} \omega_t = \frac{t - t_\mathrm{ref}}{t_1 - t_0} \\\ T^{t_\mathrm{ref}}_{t} = \omega_t \odot T^{t_0}_{t_1}, \end{aligned}$$

Where the $\odot$ operation is either SLERP or the lie algebra way you guys use here:

Sophus::SE3d::exp(stamp_fraction * t_0_T_t_1.log()),

that essentially interpolates the transform $T^{t_0}_{t_1}$ by a given ratio $\omega_t$.

In this PR we essentially assume $t_\mathrm{ref} = t_0$, subsequently multiplying the resulting pose with an inverse motion transform $T^{t_1}_{t_0}$ thus putting the resulting point into the frame of reference of the robot at time $t_1$:

$$\begin{aligned} \omega_t &=& \frac{t - t_0}{t_1 - t_0} \\\ T^{t_1}_{t} &=& T^{t_1}_{t_0} (\omega_t \odot T^{t_0}_{t_1}) \\\ p_{t_1} &=& T^{t_1}_{t} p_t \\\ &=& T^{t_1}_{t_0} T^{t_0}_{t} p_t. \end{aligned}$$

We can see here that $T^{t_1}_{t_0} T^{t_0}_t$ is essentially $T^{t_1}_{t_0}$ , so the code in this PR seems to be essentially the same as picking the $t_\mathrm{ref} = t_1$. So I believe this PR does not need to apply another new matrix multiplication and can just choose the mid_pose_stamp == 1 🤔

What do you guys think? This topic breaks my head for a long time and in my own experiments with my own code (similar but not the same as KISS-ICP) indicate that the best performance is achieved with $t_\mathrm{ref} = t_{0.5}$, choosing $t_\mathrm{ref} = t_1$ makes a lot of sense but produces slightly worse results, while choosing $t_\mathrm{ref} = t_0$ leads to a completely divergent solutions and I still don't fully understand why.

That is why I was super stoked to see this PR of yours 😆 I'd like to figure this out once and for all.

@benemer
Copy link
Member

benemer commented Jan 21, 2025

Hey Igor,

First of all, thanks for the write-up, this is a perfect future reference if someone wants to understand what's happening! I am happy to hear that we are not the only ones struggling/dealing with de-skewing 😆

You are right, this PR does the same as setting mid_pose_stamp = 1.0 without the inverse multiplication from the left. If it becomes more clear, we can change it to

diff --git a/cpp/kiss_icp/core/Preprocessing.cpp b/cpp/kiss_icp/core/Preprocessing.cpp
index aacf0f2..e9911cd 100644
--- a/cpp/kiss_icp/core/Preprocessing.cpp
+++ b/cpp/kiss_icp/core/Preprocessing.cpp
@@ -60,7 +60,6 @@ std::vector<Eigen::Vector3d> Preprocessor::Preprocess(const std::vector<Eigen::V
             return frame;
         } else {
             const auto &omega = relative_motion.log();
-            const Sophus::SE3d &inverse_motion = relative_motion.inverse();
             std::vector<Eigen::Vector3d> deskewed_frame(frame.size());
             tbb::parallel_for(
                 // Index Range
@@ -70,7 +69,7 @@ std::vector<Eigen::Vector3d> Preprocessor::Preprocess(const std::vector<Eigen::V
                     for (size_t idx = r.begin(); idx < r.end(); ++idx) {
                         const auto &point = frame.at(idx);
                         const auto &stamp = timestamps.at(idx);
-                        const auto pose = inverse_motion * Sophus::SE3d::exp(stamp * omega);
+                        const auto pose = Sophus::SE3d::exp((stamp - 1.0) * omega);
                         deskewed_frame.at(idx) = pose * point;
                     };
                 });

I also like this more. We did split it to avoid confusion about the subtraction of -1, but your feedback already tells us that this is more readable. Thanks!

Regarding your different results with $t_{ref}=t_{0.5}$ and $t_{ref}=t_1$: How do you evaluate the odometry estimation?

For example, we also experienced better results on the KITTI Odometry benchmark we used back when using $t_{ref}=t_{0.5}$. We believe this comes from the KITTI ground truth containing the poses $T^w_{t_{0.5}}$ such that you achieve the best metrics when estimating exactly those poses.

In general, I expect to get the best results when using $t_{ref} =t_1 $, this will ensure that the constant velocity motion used for de-skewing is as recent as possible.

@niosus
Copy link
Member

niosus commented Jan 21, 2025

Thanks for a quick reply @benemer 🙏

My comment was not a prompt to do force any changes upon you but more of a post to make sure I understand what you did and how it aligns with how I understand things. I'm happy that we agree that they are the same but you guys decide if you want to change the code from this point on, I do not pressure you in any direction. 😅

How do you evaluate the odometry estimation?

Mostly in a very stupid way. The falls seem less "fluffy" when I use midpoint than the end point. And, as I mentioned before, odometry that assumes constant velocity and predicts the point clouds forward with the same motion plainly diverges if the reference point is at the start of the scan 🤷 I don't fully understand why though.

@benemer
Copy link
Member

benemer commented Jan 21, 2025

And, as I mentioned before, odometry that assumes constant velocity and predicts the point clouds forward with the same motion plainly diverges if the reference point is at the start of the scan

This is my insight on this: If your reference point is at the start of the scan, you estimate the pose $T^{w}_{t_0}$. In this case, our constant velocity motion (last_delta) will be the delta between the last two estimated poses, therefore $T^{t_{-2}}_{t_{-1}}$, since $T^{w}_{t_0}$ is not yet available. If we use this last_delta for de-skewing the current frame, we implicitly approximate the current motion between $t_0$ and $t_1$ as $T^{t_0}_{t_1}</code> \approx <code>T^{t_{-2}}_{t_{-1}}$.

Let's now compare with having the reference point at the end of the scan: We estimate the pose $T^{w}_{t_1}$ and our delta_motion will be $T^{t_{-1}}_{t_{0}}$, leading to the approximation $T^{t_0}_{t_1}</code> \approx <code>T^{t_{-1}}_{t_{0}}$.

If the sensor moves nonlinearly, the second case provides a better approximation because we use a more recent estimate of its motion.

In other words, we always need an estimate of our recent motion for de-skewing, so it is better to have an estimate of where the last scan ended ($t_{ref}=t_1$) instead of knowing where the last scan began ($t_{ref}=t_0$).

@niosus
Copy link
Member

niosus commented Jan 21, 2025

This totally makes sense from a cursory look. Thanks! I'll need to sleep on this a bit to make sure I fully understand though 😅

@benemer benemer mentioned this pull request Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants