Skip to content

Latest commit

 

History

History
21 lines (11 loc) · 2.57 KB

EVALUATION.md

File metadata and controls

21 lines (11 loc) · 2.57 KB

Evaluation

The inference time of LightGlue-ONNX is compared to that of the original PyTorch implementation with adaptive configuration and FlashAttention.

Methods

Following the implementation details of the LightGlue paper, we report the inference time, or latency, of only the LightGlue matcher; that is, the time taken for feature extraction, postprocessing, copying data between the host & device, or finding inliers (e.g., CONSAC/MAGSAC) is not measured. The average inference time is defined as the median over all samples in the MegaDepth test dataset. We use the data provided by LoFTR here - a total of 403 image pairs.

Each image is resized such that its longer side is 1024 before being fed into the feature extractor. The average inference time of the LightGlue matcher is then measured for different numbers of keypoints: 512, 1024, 2048, and 4096. The SuperPoint extractor is used. See eval.py for the measurement code.

All experiments are conducted on an i9-12900HX CPU and RTX4080 12GB GPU with CUDA==11.8.1, TensorRT==8.6.1, torch==2.1.0, and onnxruntime==1.16.0.

Results

The measured latencies are plotted in the figure below as image pairs per second.

Latency Comparison

Number of Keypoints512102420484096
ModelLatency (ms)
PyTorch (Adaptive)12.8113.6516.4924.35
ORT Fused FP329.5214.9036.2197.37
ORT Fused FP167.489.0612.9928.97
TensorRT FP167.117.5610.8124.46

In general, the fused ORT models can match the speed of the adaptive PyTorch model despite being non-adaptive (going through all attention layers). The PyTorch model provides more consistent latencies across the board, while the fused ORT models become slower at higher keypoint numbers due to a bottleneck in the NonZero operator. On the other hand, the TensorRT Execution Provider can reach very low latencies, but it is also inconsistent and unpredictable.