Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selective Mechanism for Feature Extraction #1620

Closed
2 tasks done
emirhanbayar opened this issue Sep 11, 2024 · 38 comments
Closed
2 tasks done

Selective Mechanism for Feature Extraction #1620

emirhanbayar opened this issue Sep 11, 2024 · 38 comments
Labels
enhancement New feature or request Stale

Comments

@emirhanbayar
Copy link

Search before asking

  • I have searched the Yolov8 Tracking issues and found no similar enhancement requests.

Description

https://arxiv.org/abs/2409.06617

The mechanism described in the above work is designed to determine which detections require feature extraction on the fly and avoid unnecessary feature extractions. This way it increases FPS without sacrificing accuracy.

It can be applied to any tracker in this repo. However, it is only tested on strongsort and deepocsort. It is as easy as modifying a few lines to implement this method to an existing tracker. I can apply it to strongsort and add a new tracker called Fast-StrongSORT. We can also add this to all trackers with a command line argument to activate.

Use case

This enhancement is proposed to solve the exact problem that is brought about in #1595.

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@emirhanbayar emirhanbayar added the enhancement New feature or request label Sep 11, 2024
@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 11, 2024

This sounds awesome. Looking forward to the PRs! 🚀 I guess this implies a modification of this code snippet:

@torch.no_grad()
def get_features(self, xyxys, img):
if xyxys.size != 0:
crops = self.get_crops(xyxys, img)
crops = self.inference_preprocess(crops)
features = self.forward(crops)
features = self.inference_postprocess(features)
else:
features = np.array([])
features = features / np.linalg.norm(features)
return features

and an argument for wether to apply it or not

@emirhanbayar
Copy link
Author

emirhanbayar commented Sep 11, 2024

Actually, we need existing tracks to perform this algorithm. I implemented it by changing "boxmot/trackers/strongsort/strong_sort.py" and "boxmot/trackers/strongsort/sort/track.py" as follows:

boxmot/trackers/strongsort/strong_sort.py

# Mikel Broström 🔥 Yolo Tracking 🧾 AGPL-3.0 license

import numpy as np

from boxmot.appearance.reid_auto_backend import ReidAutoBackend
from boxmot.motion.cmc import get_cmc_method
from boxmot.trackers.strongsort.sort.detection import Detection
from boxmot.trackers.strongsort.sort.tracker import Tracker
from boxmot.utils.matching import NearestNeighborDistanceMetric
from boxmot.utils.ops import xyxy2tlwh
from boxmot.utils import PerClassDecorator
from boxmot.utils.iou import iou_batch

class StrongSORT(object):
    def __init__(
        self,
        model_weights,
        device,
        fp16,
        per_class=False,
        max_dist=0.2,
        max_iou_dist=0.7,
        max_age=30,
        n_init=1,
        nn_budget=100,
        mc_lambda=0.995,
        ema_alpha=0.9,
        iou_threshold=0.2,
        ars_threshold=0.6,
    ):
        self.per_class = per_class
        self.model = ReidAutoBackend(
            weights=model_weights, device=device, half=fp16
        ).model

        self.tracker = Tracker(
            metric=NearestNeighborDistanceMetric("cosine", max_dist, nn_budget),
            max_iou_dist=max_iou_dist,
            max_age=max_age,
            n_init=n_init,
            mc_lambda=mc_lambda,
            ema_alpha=ema_alpha,
        )
        self.cmc = get_cmc_method('ecc')()
        self.iou_threshold = iou_threshold
        self.ars_threshold = ars_threshold
        self.last_feature_extractions = 0

    def aspect_ratio_similarity(self, box1, box2):
        w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
        w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
        aspect_ratio1 = w1 / h1
        aspect_ratio2 = w2 / h2
        return 4 / (np.pi ** 2) * (np.arctan(aspect_ratio1) - np.arctan(aspect_ratio2)) ** 2

    @PerClassDecorator
    def update(self, dets: np.ndarray, img: np.ndarray, embs: np.ndarray = None) -> np.ndarray:
        assert isinstance(
            dets, np.ndarray
        ), f"Unsupported 'dets' input format '{type(dets)}', valid format is np.ndarray"
        assert isinstance(
            img, np.ndarray
        ), f"Unsupported 'img' input format '{type(img)}', valid format is np.ndarray"
        assert (
            len(dets.shape) == 2
        ), "Unsupported 'dets' dimensions, valid number of dimensions is two"
        assert (
            dets.shape[1] == 6
        ), "Unsupported 'dets' 2nd dimension lenght, valid lenghts is 6"

        xyxy = dets[:, 0:4]
        confs = dets[:, 4]
        clss = dets[:, 5]

        if len(self.tracker.tracks) >= 1:
            warp_matrix = self.cmc.apply(img, xyxy)
            for track in self.tracker.tracks:
                track.camera_update(warp_matrix)

        # Determine which detections need feature extraction
        risky_detections = []
        non_risky_matches = {}
        for i, det in enumerate(xyxy):
            matching_tracks = []
            for track in self.tracker.tracks:
                if track.is_confirmed():
                    iou = iou_batch(det.reshape(1, -1), track.to_tlbr().reshape(1, -1))[0][0]
                    if iou > self.iou_threshold:
                        matching_tracks.append((track, iou))

            if len(matching_tracks) == 1:
                track, iou = matching_tracks[0]
                ars = self.aspect_ratio_similarity(det, track.to_tlbr())
                v = ars
                alpha = v / ((1 - iou) + v)
                if alpha > self.ars_threshold:
                    # Non-risky detection, use track's features
                    non_risky_matches[i] = track
                    continue

            # Risky detection, needs feature extraction
            risky_detections.append(i)
        
        self.last_feature_extractions = 0

        # Extract features only for risky detections
        if embs is not None:
            features = embs[risky_detections]
        else:
            features = self.model.get_features(xyxy[risky_detections], img)

        # Update the feature extraction counter
        self.last_feature_extractions = len(risky_detections)

        # Prepare detections
        tlwh = xyxy2tlwh(xyxy)
        detections = []
        for i, (box, conf, cls) in enumerate(zip(tlwh, confs, clss)):
            if i in risky_detections:
                feat = features[risky_detections.index(i)]
            else:
                # For non-risky detections, use the matching track's features
                feat = non_risky_matches[i].features[-1]  # Use the latest feature from the matching track
            detections.append(Detection(box, conf, cls, i, feat))

        # Update tracker
        self.tracker.predict()
        self.tracker.update(detections)

        # Output bbox identities
        outputs = []
        for track in self.tracker.tracks:
            if not track.is_confirmed():
                continue

            x1, y1, x2, y2 = track.to_tlbr()

            id = track.id
            conf = track.conf
            cls = track.cls
            det_ind = track.det_ind

            outputs.append(
                np.concatenate(([x1, y1, x2, y2], [id], [conf], [cls], [det_ind])).reshape(1, -1)
            )
        if len(outputs) > 0:
            return np.concatenate(outputs)
        return np.array([])
        

boxmot/trackers/strongsort/sort/track.py


# Mikel Broström 🔥 Yolo Tracking 🧾 AGPL-3.0 license

import numpy as np

from boxmot.motion.kalman_filters.xyah_kf import KalmanFilterXYAH


class TrackState:
    """
    Enumeration type for the single target track state. Newly created tracks are
    classified as `tentative` until enough evidence has been collected. Then,
    the track state is changed to `confirmed`. Tracks that are no longer alive
    are classified as `deleted` to mark them for removal from the set of active
    tracks.

    """

    Tentative = 1
    Confirmed = 2
    Deleted = 3


class Track:
    """
    A single target track with state space `(x, y, a, h)` and associated
    velocities, where `(x, y)` is the center of the bounding box, `a` is the
    aspect ratio and `h` is the height.

    Parameters
    ----------
    mean : ndarray
        Mean vector of the initial state distribution.
    covariance : ndarray
        Covariance matrix of the initial state distribution.
    track_id : int
        A unique track identifier.
    n_init : int
        Number of consecutive detections before the track is confirmed. The
        track state is set to `Deleted` if a miss occurs within the first
        `n_init` frames.
    max_age : int
        The maximum number of consecutive misses before the track state is
        set to `Deleted`.
    feature : Optional[ndarray]
        Feature vector of the detection this track originates from. If not None,
        this feature is added to the `features` cache.

    Attributes
    ----------
    mean : ndarray
        Mean vector of the initial state distribution.
    covariance : ndarray
        Covariance matrix of the initial state distribution.
    track_id : int
        A unique track identifier.
    hits : int
        Total number of measurement updates.
    age : int
        Total number of frames since first occurance.
    time_since_update : int
        Total number of frames since last measurement update.
    state : TrackState
        The current track state.
    features : List[ndarray]
        A cache of features. On each measurement update, the associated feature
        vector is added to this list.

    """

    def __init__(
        self,
        detection,
        id,
        n_init,
        max_age,
        ema_alpha,
    ):
        self.id = id
        self.bbox = detection.to_xyah()
        self.conf = detection.conf
        self.cls = detection.cls
        self.det_ind = detection.det_ind
        self.hits = 1
        self.age = 1
        self.time_since_update = 1
        self.ema_alpha = ema_alpha
        self.alpha_prime = ema_alpha  # Initialize alpha_prime for feature decay

        self.state = TrackState.Confirmed
        self.features = []
        if detection.feat is not None:
            detection.feat /= np.linalg.norm(detection.feat)
            self.features.append(detection.feat)

        self._n_init = n_init
        self._max_age = max_age

        self.kf = KalmanFilterXYAH()
        self.mean, self.covariance = self.kf.initiate(self.bbox)

    def to_tlwh(self):
        """Get current position in bounding box format `(top left x, top left y,
        width, height)`.

        Returns
        -------
        ndarray
            The bounding box.

        """
        ret = self.mean[:4].copy()
        ret[2] *= ret[3]
        ret[:2] -= ret[2:] / 2
        return ret

    def to_tlbr(self):
        """Get kf estimated current position in bounding box format `(min x, miny, max x,
        max y)`.

        Returns
        -------
        ndarray
            The predicted kf bounding box.

        """
        ret = self.to_tlwh()
        ret[2:] = ret[:2] + ret[2:]
        return ret

    def camera_update(self, warp_matrix):
        [a, b] = warp_matrix
        warp_matrix = np.array([a, b, [0, 0, 1]])
        warp_matrix = warp_matrix.tolist()
        x1, y1, x2, y2 = self.to_tlbr()
        x1_, y1_, _ = warp_matrix @ np.array([x1, y1, 1]).T
        x2_, y2_, _ = warp_matrix @ np.array([x2, y2, 1]).T
        w, h = x2_ - x1_, y2_ - y1_
        cx, cy = x1_ + w / 2, y1_ + h / 2
        self.mean[:4] = [cx, cy, w / h, h]

    def increment_age(self):
        self.age += 1
        self.time_since_update += 1

    def predict(self):
        """Propagate the state distribution to the current time step using a
        Kalman filter prediction step.
        """
        self.mean, self.covariance = self.kf.predict(self.mean, self.covariance)
        self.age += 1
        self.time_since_update += 1

        # Implement feature decay
        self.alpha_prime *= self.ema_alpha

    def update(self, detection):
        """Perform Kalman filter measurement update step and update the feature
        cache.
        Parameters
        ----------
        detection : Detection
            The associated detection.
        """
        self.bbox = detection.to_xyah()
        self.conf = detection.conf
        self.cls = detection.cls
        self.det_ind = detection.det_ind
        self.mean, self.covariance = self.kf.update(
            self.mean, self.covariance, self.bbox, self.conf
        )

        feature = detection.feat / np.linalg.norm(detection.feat)

        # Implement feature decay
        if detection.feat is not None:
            smooth_feat = (
                self.alpha_prime * self.features[-1] + (1 - self.alpha_prime) * feature
            )
            smooth_feat /= np.linalg.norm(smooth_feat)
            self.features = [smooth_feat]
            self.alpha_prime = self.ema_alpha  # Reset alpha_prime after feature update

        self.hits += 1
        self.time_since_update = 0
        if self.state == TrackState.Tentative and self.hits >= self._n_init:
            self.state = TrackState.Confirmed

    def mark_missed(self):
        """Mark this track as missed (no association at the current time step)."""
        if self.state == TrackState.Tentative:
            self.state = TrackState.Deleted
        elif self.time_since_update > self._max_age:
            self.state = TrackState.Deleted

    def is_tentative(self):
        """Returns True if this track is tentative (unconfirmed)."""
        return self.state == TrackState.Tentative

    def is_confirmed(self):
        """Returns True if this track is confirmed."""
        return self.state == TrackState.Confirmed

    def is_deleted(self):
        """Returns True if this track is dead and should be deleted."""
        return self.state == TrackState.Deleted
        
        
        

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 11, 2024

To make this available for all trackers I will need to make major modifications. As data from different sources (feature history in KFs, get_features function from reid_auto_backend, detections from tracker) need to be centralized to enable this computation. Could you post the DeepOCSORT example here?

@emirhanbayar
Copy link
Author

I have tested method on the original implementation and the reported results are obtained running the following repository:

https://github.com/emirhanbayar/Fast-Deep-OC-SORT

Upon your message, I also applied it to deepocsort in this repository, and pasting it. However, I did not test it thoroughly:

#Fast-Deep-OC-SORT

# Mikel Broström 🔥 Yolo Tracking 🧾 AGPL-3.0 license

import numpy as np
import torch
from pathlib import Path
from collections import deque
from typing import List, Tuple

from boxmot.appearance.reid_auto_backend import ReidAutoBackend
from boxmot.motion.cmc import get_cmc_method
from boxmot.motion.kalman_filters.xysr_kf import KalmanFilterXYSR
from boxmot.motion.kalman_filters.xywh_kf import KalmanFilterXYWH
from boxmot.utils.association import associate, linear_assignment
from boxmot.utils.iou import get_asso_func
from boxmot.trackers.basetracker import BaseTracker
from boxmot.utils.ops import xyxy2xysr


def k_previous_obs(observations, cur_age, k):
    if len(observations) == 0:
        return [-1, -1, -1, -1, -1]
    for i in range(k):
        dt = k - i
        if cur_age - dt in observations:
            return observations[cur_age - dt]
    max_age = max(observations.keys())
    return observations[max_age]


def convert_x_to_bbox(x, score=None):
    """
    Takes a bounding box in the centre form [x,y,s,r] and returns it in the form
      [x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right
    """
    w = np.sqrt(x[2] * x[3])
    h = x[2] / w
    if score is None:
        return np.array([x[0] - w / 2.0, x[1] - h / 2.0, x[0] + w / 2.0, x[1] + h / 2.0]).reshape((1, 4))
    else:
        return np.array([x[0] - w / 2.0, x[1] - h / 2.0, x[0] + w / 2.0, x[1] + h / 2.0, score]).reshape((1, 5))


def speed_direction(bbox1, bbox2):
    cx1, cy1 = (bbox1[0] + bbox1[2]) / 2.0, (bbox1[1] + bbox1[3]) / 2.0
    cx2, cy2 = (bbox2[0] + bbox2[2]) / 2.0, (bbox2[1] + bbox2[3]) / 2.0
    speed = np.array([cy2 - cy1, cx2 - cx1])
    norm = np.sqrt((cy2 - cy1) ** 2 + (cx2 - cx1) ** 2) + 1e-6
    return speed / norm


class KalmanBoxTracker(object):
    """
    This class represents the internal state of individual tracked objects observed as bbox.
    """
    count = 0

    def __init__(self, det, delta_t=3, emb=None, alpha=0, max_obs=50, Q_xy_scaling = 0.01, Q_s_scaling = 0.0001):
        """
        Initialises a tracker using initial bounding box.
        """
        # define constant velocity model
        self.max_obs=max_obs
        bbox = det[0:5]
        self.conf = det[4]
        self.cls = det[5]
        self.det_ind = det[6]

        self.Q_xy_scaling = Q_xy_scaling
        self.Q_s_scaling = Q_s_scaling

        self.kf = KalmanFilterXYSR(dim_x=7, dim_z=4)
        self.kf.F = np.array(
            [
                # x  y  s  r  x' y' s'
                [1, 0, 0, 0, 1, 0, 0],
                [0, 1, 0, 0, 0, 1, 0],
                [0, 0, 1, 0, 0, 0, 1],
                [0, 0, 0, 1, 0, 0, 0],
                [0, 0, 0, 0, 1, 0, 0],
                [0, 0, 0, 0, 0, 1, 0],
                [0, 0, 0, 0, 0, 0, 1],
            ]
        )
        self.kf.H = np.array(
            [
                [1, 0, 0, 0, 0, 0, 0],
                [0, 1, 0, 0, 0, 0, 0],
                [0, 0, 1, 0, 0, 0, 0],
                [0, 0, 0, 1, 0, 0, 0],
            ]
        )
        self.kf.R[2:, 2:] *= 10.0
        self.kf.P[4:, 4:] *= 1000.0  # give high uncertainty to the unobservable initial velocities
        self.kf.P *= 10.0
        self.kf.Q[4:6, 4:6] *= self.Q_xy_scaling
        self.kf.Q[-1, -1] *= self.Q_s_scaling

        self.bbox_to_z_func = xyxy2xysr
        self.x_to_bbox_func = convert_x_to_bbox

        self.kf.x[:4] = self.bbox_to_z_func(bbox)

        self.time_since_update = 0
        self.id = KalmanBoxTracker.count
        KalmanBoxTracker.count += 1
        self.history = deque([], maxlen=self.max_obs)
        self.hits = 0
        self.hit_streak = 0
        self.age = 0
        self.last_observation = np.array([-1, -1, -1, -1, -1])  # placeholder
        self.features = deque([], maxlen=self.max_obs)
        self.observations = dict()
        self.velocity = None
        self.delta_t = delta_t
        self.history_observations = deque([], maxlen=self.max_obs)

        self.emb = emb

        self.frozen = False

    def update(self, det):
        """
        Updates the state vector with observed bbox.
        """
        if det is not None:
            bbox = det[0:5]
            self.conf = det[4]
            self.cls = det[5]
            self.det_ind = det[6]
            self.frozen = False

            if self.last_observation.sum() >= 0:  # no previous observation
                previous_box = None
                for dt in range(self.delta_t, 0, -1):
                    if self.age - dt in self.observations:
                        previous_box = self.observations[self.age - dt]
                        break
                if previous_box is None:
                    previous_box = self.last_observation
                self.velocity = speed_direction(previous_box, bbox)

            self.last_observation = bbox
            self.observations[self.age] = bbox
            self.history_observations.append(bbox)

            self.time_since_update = 0
            self.hits += 1
            self.hit_streak += 1

            self.kf.update(self.bbox_to_z_func(bbox))
        else:
            self.kf.update(det)
            self.frozen = True

    def update_emb(self, emb, alpha=0.9):
        self.emb = alpha * self.emb + (1 - alpha) * emb
        self.emb /= np.linalg.norm(self.emb)

    def get_emb(self):
        return self.emb

    def apply_affine_correction(self, affine):
        m = affine[:, :2]
        t = affine[:, 2].reshape(2, 1)
        if self.last_observation.sum() > 0:
            ps = self.last_observation[:4].reshape(2, 2).T
            ps = m @ ps + t
            self.last_observation[:4] = ps.T.reshape(-1)

        for dt in range(self.delta_t, -1, -1):
            if self.age - dt in self.observations:
                ps = self.observations[self.age - dt][:4].reshape(2, 2).T
                ps = m @ ps + t
                self.observations[self.age - dt][:4] = ps.T.reshape(-1)

        self.kf.apply_affine_correction(m, t)

    def predict(self):
        """
        Advances the state vector and returns the predicted bounding box estimate.
        """
        if (self.kf.x[6] + self.kf.x[2]) <= 0:
            self.kf.x[6] *= 0.0
        Q = None

        self.kf.predict(Q=Q)
        self.age += 1
        if self.time_since_update > 0:
            self.hit_streak = 0
        self.time_since_update += 1
        self.history.append(self.x_to_bbox_func(self.kf.x))
        return self.history[-1]

    def get_state(self):
        """
        Returns the current bounding box estimate.
        """
        return self.x_to_bbox_func(self.kf.x)

    def mahalanobis(self, bbox):
        """Should be run after a predict() call for accuracy."""
        return self.kf.md_for_measurement(self.bbox_to_z_func(bbox))

    def decay_feature(self, alpha):
        """Apply feature decay."""
        self.emb *= alpha


class DeepOCSort(BaseTracker):
    def __init__(
        self,
        model_weights: Path,
        device: torch.device,
        fp16: bool,
        per_class: bool = False,
        det_thresh: float = 0.3,
        max_age: int = 30,
        min_hits: int = 3,
        iou_threshold: float = 0.3,
        delta_t: int = 3,
        asso_func: str = "iou",
        inertia: float = 0.2,
        w_association_emb: float = 0.5,
        alpha_fixed_emb: float = 0.95,
        aw_param: float = 0.5,
        embedding_off: bool = False,
        cmc_off: bool = False,
        aw_off: bool = False,
        Q_xy_scaling: float = 0.01,
        Q_s_scaling: float = 0.0001,
        # New parameters for selective feature extraction
        selective_feature_extraction: bool = True,
        iou_threshold_sfe: float = 0.2,
        ars_threshold: float = 0.6,
        feature_decay: bool = True,
        **kwargs: dict
    ):
        super().__init__(max_age=max_age, per_class=per_class)
        self.max_age = max_age
        self.min_hits = min_hits
        self.iou_threshold = iou_threshold
        self.det_thresh = det_thresh
        self.delta_t = delta_t
        self.asso_func = get_asso_func(asso_func)
        self.inertia = inertia
        self.w_association_emb = w_association_emb
        self.alpha_fixed_emb = alpha_fixed_emb
        self.aw_param = aw_param
        self.per_class = per_class
        self.Q_xy_scaling = Q_xy_scaling
        self.Q_s_scaling = Q_s_scaling
        KalmanBoxTracker.count = 1

        self.model = ReidAutoBackend(
            weights=model_weights, device=device, half=fp16
        ).model
        self.cmc = get_cmc_method('sof')()
        self.embedding_off = embedding_off
        self.cmc_off = cmc_off
        self.aw_off = aw_off

        # New attributes for selective feature extraction
        self.selective_feature_extraction = selective_feature_extraction
        self.iou_threshold_sfe = iou_threshold_sfe
        self.ars_threshold = ars_threshold
        self.feature_decay = feature_decay

    def aspect_ratio_similarity(self, bbox1: np.ndarray, bbox2: np.ndarray) -> float:
        """Calculate the aspect ratio similarity between two bounding boxes."""
        w1, h1 = bbox1[2] - bbox1[0], bbox1[3] - bbox1[1]
        w2, h2 = bbox2[2] - bbox2[0], bbox2[3] - bbox2[1]
        return 1 - (4 / (np.pi ** 2)) * (np.arctan(w1 / h1) - np.arctan(w2 / h2)) ** 2

    def calculate_alpha(self, iou: float, v: float) -> float:
        """Calculate alpha based on IoU and aspect ratio similarity."""
        return v / ((1 - iou) + v)

    def is_risky_detection(self, det: np.ndarray, tracklets: List[KalmanBoxTracker]) -> Tuple[bool, int]:
        """Determine if a detection is risky and needs feature extraction."""
        candidate_count = 0
        candidate_tracklet = None
        for tracklet in tracklets:
            if tracklet.hits > 1:  # Only consider confirmed tracklets
                iou = self.asso_func(det[:4], tracklet.get_state()[0])
                if iou > self.iou_threshold_sfe:
                    candidate_count += 1
                    candidate_tracklet = tracklet
                    if candidate_count > 1:
                        return True, -1  # More than one candidate, risky

        if candidate_count == 1:
            # Check aspect ratio similarity
            v = self.aspect_ratio_similarity(det[:4], candidate_tracklet.get_state()[0])
            alpha = self.calculate_alpha(iou, v)
            if alpha > self.ars_threshold:
                return False, candidate_tracklet.id
        
        return True, -1  # No candidates or aspect ratio check failed, risky

    @BaseTracker.per_class_decorator
    def update(self, dets: np.ndarray, img: np.ndarray, embs: np.ndarray = None) -> np.ndarray:
        self.check_inputs(dets, img)
        self.frame_count += 1
        self.height, self.width = img.shape[:2]

        scores = dets[:, 4]
        dets = np.hstack([dets, np.arange(len(dets)).reshape(-1, 1)])
        remain_inds = scores > self.det_thresh
        dets = dets[remain_inds]

        # Selective feature extraction
        if self.selective_feature_extraction and not self.embedding_off and dets.shape[0] > 0:
            risky_mask = np.ones(dets.shape[0], dtype=bool)
            candidate_ids = np.full(dets.shape[0], -1)
            for i, det in enumerate(dets):
                is_risky, candidate_id = self.is_risky_detection(det, self.active_tracks)
                risky_mask[i] = is_risky
                candidate_ids[i] = candidate_id

            # Extract features only for risky detections
            risky_dets = dets[risky_mask]
            if risky_dets.shape[0] > 0:
                risky_embs = self.model.get_features(risky_dets[:, 0:4], img)
            else:
                risky_embs = np.array([])

            # Create full embedding array
            dets_embs = np.zeros((dets.shape[0], risky_embs.shape[1] if risky_embs.shape[0] > 0 else 0))
            dets_embs[risky_mask] = risky_embs

            # Copy features for non-risky detections
            for i, (is_risky, candidate_id) in enumerate(zip(risky_mask, candidate_ids)):
                if not is_risky and candidate_id != -1:
                    dets_embs[i] = self.active_tracks[candidate_id].get_emb()
        else:
            # Original feature extraction logic
            if self.embedding_off or dets.shape[0] == 0:
                dets_embs = np.ones((dets.shape[0], 1))
            elif embs is not None:
                dets_embs = embs
            else:
                dets_embs = self.model.get_features(dets[:, 0:4], img)

        # CMC
        if not self.cmc_off:
            transform = self.cmc.apply(img, dets[:, :4])
            for trk in self.active_tracks:
                trk.apply_affine_correction(transform)

        trust = (dets[:, 4] - self.det_thresh) / (1 - self.det_thresh)
        af = self.alpha_fixed_emb
        dets_alpha = af + (1 - af) * (1 - trust)

        # get predicted locations from existing trackers.
        trks = np.zeros((len(self.active_tracks), 5))
        trk_embs = []
        to_del = []
        ret = []
        for t, trk in enumerate(trks):
            pos = self.active_tracks[t].predict()[0]
            trk[:] = [pos[0], pos[1], pos[2], pos[3], 0]
            if np.any(np.isnan(pos)):
                to_del.append(t)
            else:
                trk_embs.append(self.active_tracks[t].get_emb())
        trks = np.ma.compress_rows(np.ma.masked_invalid(trks))

        if len(trk_embs) > 0:
            trk_embs = np.vstack(trk_embs)
        else:
            trk_embs = np.array(trk_embs)

        for t in reversed(to_del):
            self.active_tracks.pop(t)

        velocities = np.array([trk.velocity if trk.velocity is not None else np.array((0, 0)) for trk in self.active_tracks])
        last_boxes = np.array([trk.last_observation for trk in self.active_tracks])
        k_observations = np.array([k_previous_obs(trk.observations, trk.age, self.delta_t) for trk in self.active_tracks])

        # First round of association
        if self.embedding_off or dets.shape[0] == 0 or trk_embs.shape[0] == 0:
            stage1_emb_cost = None
        else:
            stage1_emb_cost = dets_embs @ trk_embs.T
        matched, unmatched_dets, unmatched_trks = associate(
            dets[:, 0:5],
            trks,
            self.asso_func,
            self.iou_threshold,
            velocities,
            k_observations,
            self.inertia,
            img.shape[1],
            img.shape[0],
            stage1_emb_cost,
            self.w_association_emb,
            self.aw_off,
            self.aw_param,
        )
        for m in matched:
            self.active_tracks[m[1]].update(dets[m[0], :])
            self.active_tracks[m[1]].update_emb(dets_embs[m[0]], alpha=dets_alpha[m[0]])

        # Second round of association by OCR
        if unmatched_dets.shape[0] > 0 and unmatched_trks.shape[0] > 0:
            left_dets = dets[unmatched_dets]
            left_dets_embs = dets_embs[unmatched_dets]
            left_trks = last_boxes[unmatched_trks]
            left_trks_embs = trk_embs[unmatched_trks]

            iou_left = self.asso_func(left_dets, left_trks)
            emb_cost_left = left_dets_embs @ left_trks_embs.T
            if self.embedding_off:
                emb_cost_left = np.zeros_like(emb_cost_left)
            iou_left = np.array(iou_left)
            if iou_left.max() > self.iou_threshold:
                rematched_indices = linear_assignment(-iou_left)
                to_remove_det_indices = []
                to_remove_trk_indices = []
                for m in rematched_indices:
                    det_ind, trk_ind = unmatched_dets[m[0]], unmatched_trks[m[1]]
                    if iou_left[m[0], m[1]] < self.iou_threshold:
                        continue
                    self.active_tracks[trk_ind].update(dets[det_ind, :])
                    self.active_tracks[trk_ind].update_emb(dets_embs[det_ind], alpha=dets_alpha[det_ind])
                    to_remove_det_indices.append(det_ind)
                    to_remove_trk_indices.append(trk_ind)
                unmatched_dets = np.setdiff1d(unmatched_dets, np.array(to_remove_det_indices))
                unmatched_trks = np.setdiff1d(unmatched_trks, np.array(to_remove_trk_indices))

        for m in unmatched_trks:
            self.active_tracks[m].update(None)

        # create and initialise new trackers for unmatched detections
        for i in unmatched_dets:
            trk = KalmanBoxTracker(
                dets[i],
                delta_t=self.delta_t,
                emb=dets_embs[i],
                alpha=dets_alpha[i],
                Q_xy_scaling=self.Q_xy_scaling, 
                Q_s_scaling=self.Q_s_scaling,                
                max_obs=self.max_obs
            )
            self.active_tracks.append(trk)
        i = len(self.active_tracks)
        for trk in reversed(self.active_tracks):
            if trk.last_observation.sum() < 0:
                d = trk.get_state()[0]
            else:
                """
                this is optional to use the recent observation or the kalman filter prediction,
                we didn't notice significant difference here
                """
                d = trk.last_observation[:4]
            if (trk.time_since_update < 1) and (trk.hit_streak >= self.min_hits or self.frame_count <= self.min_hits):
                # +1 as MOT benchmark requires positive
                ret.append(np.concatenate((d, [trk.id], [trk.conf], [trk.cls], [trk.det_ind])).reshape(1, -1))
            i -= 1
            # remove dead tracklet
            if trk.time_since_update > self.max_age:
                self.active_tracks.pop(i)
        if len(ret) > 0:
            return np.concatenate(ret)
        return np.array([])

    # Feature decay
    def apply_feature_decay(self):
        if self.feature_decay:
            for trk in self.active_tracks:
                if trk.time_since_update > 0:
                    trk.decay_feature(self.alpha_fixed_emb)

@emirhanbayar
Copy link
Author

emirhanbayar commented Sep 11, 2024

As a side note, you can just merge pull request #1621 for faststrongsort implementation and leave it there.

I really appreciate and am grateful for this awesome repository and your effort to present sota methods in a easy-to-use format.

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 11, 2024

How is this supposed to work for the first features generation round @emirhanbayar ?

For example here:

for i, _ in enumerate(xyxy):
    if i in risky_detections:
        feat = features[risky_detections.index(i)]
    else:
        # For non-risky detections, use the matching track's features
        feat = non_risky_matches[i].features[-1]  # What if it is None?
feats.append(feat)

I am basically trying to apply this to all trackers:

    @torch.no_grad()
    def get_features_fast(self, xyxy, img, active_tracks, embs):
        risky_detections = []
        non_risky_matches = {}
        for i, det in enumerate(xyxy):
            matching_tracks = []
            for at in active_tracks:
                iou = iou_batch(det.reshape(1, -1), at.get_state())[0][0]
                if iou > self.iou_threshold:
                    matching_tracks.append((at, iou))

            if len(matching_tracks) == 1:
                track, iou = matching_tracks[0]
                ars = self.aspect_ratio_similarity(det, track.get_state())
                v = ars
                alpha = v / ((1 - iou) + v)
                if alpha <= self.ars_threshold:
                    # Non-risky detection, use track's features
                    non_risky_matches[i] = track
                    continue

            # Risky detection, needs feature extraction
            risky_detections.append(i)
            
        # Extract features only for risky detections otherwise use last feature
        if embs is not None:
            features = embs[risky_detections]
        else:
            features = self.get_features(xyxy[risky_detections], img)

        # Prepare detections
        feats = []
        for i, _ in enumerate(xyxy):
            if i in risky_detections:
                feat = features[risky_detections.index(i)]
            else:
                # For non-risky detections, use the matching track's features
                feat = if non_risky_matches[i].features[-1] is None  # Use the latest feature from the matching track
            feats.append(feat)
        feats = torch.tensor(feats, dtype=torch.float32)
            
        return feats

@emirhanbayar
Copy link
Author

emirhanbayar commented Sep 11, 2024

A candidate must be a confirmed track. A confirmed track must have a feature vector since it is matched at least once.

                if track.is_confirmed(): # only confirmed tracks !
                    iou = iou_batch(det.reshape(1, -1), track.to_tlbr().reshape(1, -1))[0][0]
                    if iou > self.iou_threshold:
                        matching_tracks.append((track, iou))

Assuming "active_tracks" will be the confirmed tracks your implementation seems good to me.

Additionally, we need to implement feature decay mechanism.

@emirhanbayar
Copy link
Author

Looking at the line https://github.com/mikel-brostrom/boxmot/blob/274b53289ca42a7eedc774e62f5643b1152227cb/boxmot/trackers/strongsort/sort/track.py#L88C9-L88C42

Just noticed that tracks are Confirmed upon initialization, not Tentative. Am I wrong? Is this intended, or a mistake?

@emirhanbayar
Copy link
Author

I am sorry, the following part:

                if alpha <= self.ars_threshold:
                    # Non-risky detection, use track's features
                    non_risky_matches[i] = track
                    continue

should be replaced as:

               if alpha >= self.ars_threshold:
                   # Non-risky detection, use track's features
                   non_risky_matches[i] = track
                   continue

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 11, 2024

Every test and ci job passes besides running it feeding a camera stream. Do you have any clue why it fails on camera stream @emirhanbayar?

0: 384x640 1 person, 154.9ms
(2, 512)
0: 384x640 1 person, 1 remote, 251.8ms
(2, 512)
0: 384x640 1 person, 1 remote, 250.6ms
(1, 512)
Traceback (most recent call last):
  File "/Users/mikel.brostrom/boxmot/tracking/track.py", line 174, in <module>
    run(opt)
  File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mikel.brostrom/boxmot/tracking/track.py", line 103, in run
    for r in results:
  File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
    response = gen.send(request)
               ^^^^^^^^^^^^^^^^^
  File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/predictor.py", line 262, in stream_inference
    self.run_callbacks("on_predict_postprocess_end")
  File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/predictor.py", line 399, in run_callbacks
    callback(self)
  File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/trackers/track.py", line 83, in on_predict_postprocess_end
    predictor.results[i] = predictor.results[i][idx]
                           ~~~~~~~~~~~~~~~~~~~~^^^^^
  File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/results.py", line 287, in __getitem__
    return self._apply("__getitem__", idx)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/results.py", line 359, in _apply
    setattr(r, k, getattr(v, fn)(*args, **kwargs))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/results.py", line 183, in __getitem__
    return self.__class__(self.data[idx], self.orig_shape)
                          ~~~~~~~~~^^^^^
IndexError: index 1 is out of bounds for dimension 0 with size 1

command:

(boxmot-py3.11) ➜  boxmot git:(emirhanbayar-master) ✗ python3 tracking/track.py --yolo-model yolov8n.pt --source 0 --show --tracking-method faststrongsort

@emirhanbayar
Copy link
Author

emirhanbayar commented Sep 11, 2024

I tried it now. Commenting the following lines in the BaseTracker

    for r in results:

        # img = yolo.predictor.trackers[0].plot_results(r.orig_img, args.show_trajectories)

        continue

        if args.show is True:
            cv2.imshow('BoxMOT', img)     
            key = cv2.waitKey(1) & 0xFF
            if key == ord(' ') or key == ord('q'):
                break

I am getting a different kind of error as follows, but it is thrown in strongsort as well, not specific to faststrongsort:

2024-09-11 23:48:12.780 | SUCCESS  | boxmot.appearance.reid_model_factory:load_pretrained_weights:183 - Loaded pretrained weights from /home/emirhan/boxmot-deneme/tracking/weights/osnet_x0_25_msmt17.pt
0: 480x640 1 person, 370.8ms
0: 480x640 1 person, 6.8ms
0: 480x640 1 person, 5.6ms
WARNING ⚠️ Waiting for stream 0
0: 480x640 1 person, 5.6ms
0: 480x640 1 person, 5.6ms
WARNING ⚠️ Waiting for stream 0
0: 480x640 1 person, 5.6ms
0: 480x640 1 person, 19.8ms
0: 480x640 1 person, 8.7ms
0: 480x640 1 person, 11.5ms
WARNING ⚠️ Waiting for stream 0
0: 480x640 1 person, 20.5ms
0: 480x640 1 person, 10.9ms
0: 480x640 1 person, 20.3ms
0: 480x640 1 person, 16.0ms
0: 480x640 1 person, 20.3ms
0: 480x640 1 person, 11.7ms
0: 480x640 1 person, 1 refrigerator, 20.6ms
0: 480x640 1 person, 1 refrigerator, 5.8ms
0: 480x640 1 person, 1 refrigerator, 5.8ms
WARNING ⚠️ Waiting for stream 0
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "/home/emirhan/boxmot-deneme/tracking/track.py", line 174, in <module>
    run(opt)
  File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/emirhan/boxmot-deneme/tracking/track.py", line 103, in run
    for r in results:
  File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
    response = gen.send(request)
               ^^^^^^^^^^^^^^^^^
  File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/ultralytics/engine/predictor.py", line 249, in stream_inference
    with profilers[0]:
  File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/ultralytics/utils/ops.py", line 46, in __enter__
    self.start = self.time()
                 ^^^^^^^^^^^
  File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/ultralytics/utils/ops.py", line 61, in time
    torch.cuda.synchronize(self.device)
  File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/torch/cuda/__init__.py", line 801, in synchronize
    return torch._C._cuda_synchronize()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Aborted (core dumped)

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 11, 2024

I see. It seems that there is some issue with strongsort in general. It does not output tracks for every single input detection which is required by ultralytics. Need to look deeper into this. Nothing wrong with this, just that it is not following ultralytics' standard.

@emirhanbayar
Copy link
Author

emirhanbayar commented Sep 12, 2024

Running StrongSORT from 2dd45fe yields 68.338 HOTA score on MOT17 val.

Running StrongSORT from the current master yields 52.462 HOTA score on MOT17 val.

A major bug that caught my attention is Tracks are initialized as:

        self.time_since_update = 1
        self.state = TrackState.Confirmed

Whereas they were correctly initialized in 2dd45fe as:

        self.time_since_update = 0
        self.state = TrackState.Tentative

When I fixed this and set n_init to 3 as in the original implementation, I got 62.591. However, I could not detect the causes of the gap of 6 points.

What is the purpose of these modifications that disrupted the performance of StrongSORT?

To get these scores, I placed dets and embs shared by StrongSORT as:

├── MOT17_train_YOLOX+BoT
│   ├── MOT17-02-FRCNN.npy
│   ├── MOT17-04-FRCNN.npy
│   ├── MOT17-05-FRCNN.npy
│   ├── MOT17-09-FRCNN.npy
│   ├── MOT17-10-FRCNN.npy
│   ├── MOT17-11-FRCNN.npy
│   └── MOT17-13-FRCNN.npy

and run the following script:

import os
import cv2
import numpy as np
from datetime import datetime
from pathlib import Path
import argparse

from boxmot.trackers.strongsort.strong_sort import StrongSORT

def get_seq_paths(dataset_path):
    imgs = {}
    seq_names = []
    for root, dirs, files in os.walk(dataset_path):
        for dire in dirs:
            if dire.startswith("MOT"):
                seq_names.append(dire)
                imgs[dire] = [os.path.join(r, file) for r, d, f in os.walk(os.path.join(root, dire)) for file in f if file.endswith(".jpg")]
                imgs[dire].sort(key=lambda x: int(x.split("/")[-1].split(".")[0]))
    return imgs, seq_names

def parse_options():
    parser = argparse.ArgumentParser()
    parser.add_argument('--reid', type=str, default='osnet_x1_0_msmt17.pt', help='model.pt path')
    parser.add_argument('--dataset-path', type=str, default='./data/MOT17/train', help='dataset path')
    parser.add_argument('--dataset', type=str, default='train', help='dataset type')
    parser.add_argument('--device', type=str, default='0', help='device \'cpu\' or \'0\', \'1\', ... for gpu')
    parser.add_argument('--fp16', action='store_true', help='use fp16')
    return parser.parse_args()

def create_tracker(args):
    return StrongSORT(
        model_weights=Path(args.reid),
        device=args.device,
        fp16=args.fp16,
    )

def process_detection(row):
    tlwh = row[2:6]
    return np.array([tlwh[0], tlwh[1], tlwh[0] + tlwh[2], tlwh[1] + tlwh[3], row[6], 0])

def write_result(f, frame_no, det):
    f.write(f"{frame_no + 1},{int(det[4])},{int(det[0])},{int(det[1])},{int(det[2] - det[0])},{int(det[3] - det[1])},{det[5]:.2f},-1,-1,-1\n")

if __name__ == "__main__":
    args = parse_options()
    dataset_path = "/".join(args.dataset_path.split("/")[:-1] + [args.dataset])
    imgs, seq_names = get_seq_paths(dataset_path)

    total_time = 0
    total_dets = 0
    total_frames = 0

    for seq in seq_names:
        tracker = create_tracker(args)
        print(f"Sequence: {seq}")
        seq_imgs = imgs[seq]
        output_dir = Path(f"MOT_{args.dataset}/{args.reid}/faststrongsort/{seq}")
        output_dir.mkdir(parents=True, exist_ok=True)
        
        with open(output_dir.parent / f"{seq}.txt", "w") as f:
            print(f"Writing results to: {f.name}")
            seq_time = 0

            det_file = Path(f"MOT17_{args.dataset}_YOLOX+BoT/{seq}.npy")
            seq_det = np.load(det_file, allow_pickle=True)

            for frame_no, img_path in enumerate(seq_imgs):
                frame = cv2.imread(img_path)
                frame_dets = seq_det[seq_det[:, 0] == frame_no + 1]

                if len(frame_dets) < 1:
                    continue

                total_dets += len(frame_dets)
                total_frames += 1

                processed_dets = np.array([process_detection(row) for row in frame_dets])
                features = [row[10:] for row in frame_dets]

                if len(processed_dets) > 0:
                    start = datetime.now()
                    tracked_det = tracker.update(processed_dets, frame, embs=np.array(features))
                    # tracked_det = tracker.update(processed_dets, frame)
                    seq_time += (datetime.now() - start).total_seconds()

                    for det in tracked_det:
                        write_result(f, frame_no, det)
                        scale = frame.shape[0] / 1080
                        cv2.putText(frame, f"id: {int(det[4])}", (int(det[0]), int(det[1])), cv2.FONT_HERSHEY_SIMPLEX, scale, (0, 0, 255), 2)

                cv2.imwrite(str(output_dir / f"{frame_no + 1}.jpg"), frame)

        print(f"{seq} time: {seq_time:.2f}s")
        total_time += seq_time

    print(f"Total time: {total_time:.2f}s")
    print(f"FPS: {total_frames / total_time:.2f}")
    print(f"Total frames: {total_frames}")

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 12, 2024

You can run

git diff 274b532 2dd45fe -- boxmot/trackers/strongsort

to see the differences between the strongsort folder in master and the commit you notice had the highest performance. The only thing I can see, besides what you mentioned in your comment above, is that the KalmanFilter got refactored. I would recommend you to create a strongsort_kf.py containing the previous filter (https://github.com/mikel-brostrom/boxmot/blob/2dd45fe4d7f7d584f8a86cce9a40ed30d70be3af/boxmot/motion/kalman_filters/strongsort_kf.py) and in track then do:

from boxmot.trackers.strongsort.sort.strongsort_kf import KalmanFilter
...
self.kf = KalmanFilter()

Let me know if this is the issue 😄

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 12, 2024

I may not be capturing every single bug in the pipeline as I only runs the evaluation on a small subset of MOT17. So something could potentially slip through, without me noticing it.

@mikel-brostrom
Copy link
Owner

All trackers show similar results under benchmark here:

https://github.com/mikel-brostrom/boxmot/actions/runs/10838200299/job/30075883940

@emirhanbayar
Copy link
Author

With #1627, the StrongSORT accuracy is 68.3 again. I also, added Fast-StrongSORT. In the following table _x_y number ath the end indicates the iou_threshold

Model (IOU Threshold) HOTA DetA AssA IDSW IDF1
boxmot_FSS (iou_threshold = 0.0) 64.191 60.471 68.847 190 76.183
boxmot_FSS (iou_threshold = 0.1) 64.008 60.491 68.449 194 75.874
boxmot_FSS (iou_threshold = 0.2) 64.090 60.537 68.565 214 76.178
boxmot_FSS (iou_threshold = 0.3) 63.673 60.992 67.210 210 75.503
boxmot_FSS (iou_threshold = 0.4) 64.059 61.203 67.790 179 75.911
boxmot_FSS (iou_threshold = 0.5) 63.292 60.764 66.657 200 74.792
boxmot_FSS (iou_threshold = 1.0) 64.294 60.456 69.078 185 76.518

boxmot_FSS (iou_threshold = 1.0) accuracy should be the same as StrongSORT, but it is not. I will be looking at this.

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 14, 2024

With #1627, the StrongSORT accuracy is 68.3 again. I also, added Fast-StrongSORT. In the following table _x_y number ath the end indicates the iou_threshold

Model (IOU Threshold) HOTA DetA AssA IDSW IDF1
boxmot_FSS (iou_threshold = 0.0) 64.191 60.471 68.847 190 76.183
boxmot_FSS (iou_threshold = 0.1) 64.008 60.491 68.449 194 75.874
boxmot_FSS (iou_threshold = 0.2) 64.090 60.537 68.565 214 76.178
boxmot_FSS (iou_threshold = 0.3) 63.673 60.992 67.210 210 75.503
boxmot_FSS (iou_threshold = 0.4) 64.059 61.203 67.790 179 75.911
boxmot_FSS (iou_threshold = 0.5) 63.292 60.764 66.657 200 74.792
boxmot_FSS (iou_threshold = 1.0) 64.294 60.456 69.078 185 76.518
boxmot_FSS (iou_threshold = 1.0) accuracy should be the same as StrongSORT, but it is not. I will be looking at this.

Are these results based on the last half of the MOT17 training set or the whole training set?

@emirhanbayar
Copy link
Author

Last half of the training set.

@mikel-brostrom
Copy link
Owner

With #1627, the StrongSORT accuracy is 68.3 again. I also, added Fast-StrongSORT. In the following table _x_y number ath the end indicates the iou_threshold

Model (IOU Threshold) HOTA DetA AssA IDSW IDF1
boxmot_FSS (iou_threshold = 0.0) 64.191 60.471 68.847 190 76.183
boxmot_FSS (iou_threshold = 0.1) 64.008 60.491 68.449 194 75.874
boxmot_FSS (iou_threshold = 0.2) 64.090 60.537 68.565 214 76.178
boxmot_FSS (iou_threshold = 0.3) 63.673 60.992 67.210 210 75.503
boxmot_FSS (iou_threshold = 0.4) 64.059 61.203 67.790 179 75.911
boxmot_FSS (iou_threshold = 0.5) 63.292 60.764 66.657 200 74.792
boxmot_FSS (iou_threshold = 1.0) 64.294 60.456 69.078 185 76.518
boxmot_FSS (iou_threshold = 1.0) accuracy should be the same as StrongSORT, but it is not. I will be looking at this.

Let me know when the results are closer to StrongSORT? 🚀

@emirhanbayar
Copy link
Author

With #1634 results are as follows:

Configuration HOTA MOTA IDF1 IDSW AssA DetA
StrongSORT 68.329 76.348 81.206 260 71.900 65.438
Fast-StrongSORT (iou_threshold=1.0) 68.329 76.348 81.206 260 71.900 65.438
Fast-StrongSORT (iou_threshold=0.0) 68.132 76.361 80.924 259 71.489 65.440
Fast-StrongSORT (iou_threshold=0.1) 68.309 76.391 80.975 267 71.822 65.473
Fast-StrongSORT (iou_threshold=0.2) 68.252 76.411 81.029 267 71.653 65.512
Fast-StrongSORT (iou_threshold=0.3) 67.981 76.463 81.071 241 71.157 65.453
Fast-StrongSORT (iou_threshold=0.4) 68.249 76.656 80.856 171 71.636 65.536
Fast-StrongSORT (iou_threshold=0.5) 67.741 76.617 79.996 203 70.560 65.550

@emirhanbayar
Copy link
Author

If you like the idea and want to implement this idea to other methods, I can share you with the following insights:

Although this is a rare edge case, a tracklet can be a candidate for more than one detection. As a caution, if you want to apply this to BoT-SORT and Deep-OC-SORT, it is good to consider only the detections with high confidence in the scope of this mechanism. Thus, only the first stage of the matching will be affected.

  • If there is no occlusion throughout the sequence faststrongsort behaves similar to SORT
  • If there is no occlusion throughout the sequence Fast-BoT-SORT behaves similar to ByteTrack
  • If there is no occlusion throughout the sequence Fast-Deep-OC-SORT behaves similar to OC-SORT

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 18, 2024

Do you have any speedup numbers for each of the IoUs @emirhanbayar. It was mostly for adding this table to the experiment section 😄

@emirhanbayar
Copy link
Author

Using osnetx1 model, IoU threshold of 0.4 increases FPS from 4.4 to 6.7 on GTX1650. Max memory usage also decreases from 1200 MBs to 900s.

That's all I tested in addition to tests in the original paper. I do not have a low power edge device right now. If you think tests on GTX1650 is still meaningful, I can perform experiments.

@mikel-brostrom
Copy link
Owner

I do not have a low power edge device right now. If you think tests on GTX1650 is still meaningful, I can perform experiments.

Sure! If you could add a speedup column to the table that would be awesome, even if it is on a GTX1650

@mikel-brostrom
Copy link
Owner

Btw @emirhanbayar , I am in the process of automating the evaluation of the tracking modules in the CI. I would need these detections and embeddings for being able to generate paper results:

├── MOT17_train_YOLOX+BoT
│   ├── MOT17-02-FRCNN.npy
│   ├── MOT17-04-FRCNN.npy
│   ├── MOT17-05-FRCNN.npy
│   ├── MOT17-09-FRCNN.npy
│   ├── MOT17-10-FRCNN.npy
│   ├── MOT17-11-FRCNN.npy
│   └── MOT17-13-FRCNN.npy

Do you have a link to them? Or could you point out where to find them?

@emirhanbayar
Copy link
Author

They were shared by the authors of the StrongSORT on the following link:

https://drive.google.com/drive/folders/1zzzUROXYXt8NjxO1WUcwSzqD-nn7rPNr

@mikel-brostrom
Copy link
Owner

They were shared by the authors of the StrongSORT on the following link:

https://drive.google.com/drive/folders/1zzzUROXYXt8NjxO1WUcwSzqD-nn7rPNr

Thanks!

@emirhanbayar
Copy link
Author

emirhanbayar commented Sep 19, 2024

Configuration HOTA MOTA IDF1 IDSW AssA DetA
StrongSORT 68.329 76.348 81.206 260 71.900 65.438
Fast-StrongSORT (iou_threshold=0.0) 68.132 76.361 80.924 259 71.489 65.440
Fast-StrongSORT (iou_threshold=0.1) 68.309 76.391 80.975 267 71.822 65.473
Fast-StrongSORT (iou_threshold=0.2) 68.252 76.411 81.029 267 71.653 65.512
Fast-StrongSORT (iou_threshold=0.3) 67.981 76.463 81.071 241 71.157 65.453
Fast-StrongSORT (iou_threshold=0.4) 68.249 76.656 80.856 171 71.636 65.536
Fast-StrongSORT (iou_threshold=0.5) 67.741 76.617 79.996 203 70.560 65.550
Configuration FPS (osnet_x1 GTX1650) FPS (mobilenetv2_x1_0 GTX1650) FPS (osnet_x0_25 CPU) FPS (osnet_x1 T4)
StrongSORT 4.85 5.09 1.33 5.80
Fast-StrongSORT (iou_threshold=0.0) 5.23 5.33 1.68 6.67
Fast-StrongSORT (iou_threshold=0.1) 5.59 5.57 2.02 7.22
Fast-StrongSORT (iou_threshold=0.2) 5.97 5.86 2.48 7.71
Fast-StrongSORT (iou_threshold=0.3) 6.42 6.11 2.94 8.32
Fast-StrongSORT (iou_threshold=0.4) 6.84 6.34 3.64 8.80
Fast-StrongSORT (iou_threshold=0.5) 7.23 6.57 4.51 9.22

@mikel-brostrom
Copy link
Owner

Thank you for providing all these details @emirhanbayar. Super valuable! Adding this to the experiments section

@mikel-brostrom
Copy link
Owner

I tried to replicate your original StrongSORT results:

Configuration HOTA MOTA IDF1 IDSW AssA DetA
StrongSORT 68.329 76.348 81.206 260 71.900 65.438

in the CI pipeline. Mine are:

Configuration HOTA MOTA IDF1 IDSW AssA DetA
Strongsort 67.505 75.992 79.379 NA NA NA

With this configuration

The data is:

├── MOT17_train_YOLOX+BoT
│   ├── MOT17-02-FRCNN.npy
│   ├── MOT17-04-FRCNN.npy
│   ├── MOT17-05-FRCNN.npy
│   ├── MOT17-09-FRCNN.npy
│   ├── MOT17-10-FRCNN.npy
│   ├── MOT17-11-FRCNN.npy
│   └── MOT17-13-FRCNN.npy

Do you see any differences or made any changes?

@emirhanbayar
Copy link
Author

I set max_cost_dist to 0.4 to obtain these results.

@mikel-brostrom
Copy link
Owner

I set max_cost_dist to 0.4 to obtain these results.

Could reproduce. Thanks!

{"HOTA": 68.329, "MOTA": 76.356, "IDF1": 81.21}

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 21, 2024

Initializing the StrongSORT tracks as confirmed just gave me:

{"HOTA": 67.87, "MOTA": 76.422, "IDF1": 79.348}

So this was a major bug

Copy link

github-actions bot commented Oct 2, 2024

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

@github-actions github-actions bot added the Stale label Oct 2, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 6, 2024
@Fleyderer
Copy link
Contributor

In the paper:

We have "Aspect ratio similarity" formula:

$$V = \frac{4}{pi^2} (arctan(\frac{w1}{h1}) - arctan(\frac{w2}{h2}))^2$$

But everywhere "distance" means 0 - closest, 1 - furthest, and "similarity" is opposite: 0 - absolutely different, 1 - identical.

I see that you have correct formula in your code, but can't you fix it in paper too? I've spent about an hour to be completely sure that it is not my misunderstanding...

BTW, good job!

@emirhanbayar
Copy link
Author

I apologize for the confusion and appreciate you bringing this to my attention.

The formula has been updated as follows:

$$V = 1 - \frac{4}{pi^2} (arctan(\frac{w1}{h1}) - arctan(\frac{w2}{h2}))^2$$

That should solve the problem, right? Feel free to share if there’s anything else I should consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

3 participants