-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selective Mechanism for Feature Extraction #1620
Comments
This sounds awesome. Looking forward to the PRs! 🚀 I guess this implies a modification of this code snippet: boxmot/boxmot/appearance/backends/base_backend.py Lines 79 to 89 in 274b532
and an argument for wether to apply it or not |
Actually, we need existing tracks to perform this algorithm. I implemented it by changing "boxmot/trackers/strongsort/strong_sort.py" and "boxmot/trackers/strongsort/sort/track.py" as follows: boxmot/trackers/strongsort/strong_sort.py# Mikel Broström 🔥 Yolo Tracking 🧾 AGPL-3.0 license
import numpy as np
from boxmot.appearance.reid_auto_backend import ReidAutoBackend
from boxmot.motion.cmc import get_cmc_method
from boxmot.trackers.strongsort.sort.detection import Detection
from boxmot.trackers.strongsort.sort.tracker import Tracker
from boxmot.utils.matching import NearestNeighborDistanceMetric
from boxmot.utils.ops import xyxy2tlwh
from boxmot.utils import PerClassDecorator
from boxmot.utils.iou import iou_batch
class StrongSORT(object):
def __init__(
self,
model_weights,
device,
fp16,
per_class=False,
max_dist=0.2,
max_iou_dist=0.7,
max_age=30,
n_init=1,
nn_budget=100,
mc_lambda=0.995,
ema_alpha=0.9,
iou_threshold=0.2,
ars_threshold=0.6,
):
self.per_class = per_class
self.model = ReidAutoBackend(
weights=model_weights, device=device, half=fp16
).model
self.tracker = Tracker(
metric=NearestNeighborDistanceMetric("cosine", max_dist, nn_budget),
max_iou_dist=max_iou_dist,
max_age=max_age,
n_init=n_init,
mc_lambda=mc_lambda,
ema_alpha=ema_alpha,
)
self.cmc = get_cmc_method('ecc')()
self.iou_threshold = iou_threshold
self.ars_threshold = ars_threshold
self.last_feature_extractions = 0
def aspect_ratio_similarity(self, box1, box2):
w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
aspect_ratio1 = w1 / h1
aspect_ratio2 = w2 / h2
return 4 / (np.pi ** 2) * (np.arctan(aspect_ratio1) - np.arctan(aspect_ratio2)) ** 2
@PerClassDecorator
def update(self, dets: np.ndarray, img: np.ndarray, embs: np.ndarray = None) -> np.ndarray:
assert isinstance(
dets, np.ndarray
), f"Unsupported 'dets' input format '{type(dets)}', valid format is np.ndarray"
assert isinstance(
img, np.ndarray
), f"Unsupported 'img' input format '{type(img)}', valid format is np.ndarray"
assert (
len(dets.shape) == 2
), "Unsupported 'dets' dimensions, valid number of dimensions is two"
assert (
dets.shape[1] == 6
), "Unsupported 'dets' 2nd dimension lenght, valid lenghts is 6"
xyxy = dets[:, 0:4]
confs = dets[:, 4]
clss = dets[:, 5]
if len(self.tracker.tracks) >= 1:
warp_matrix = self.cmc.apply(img, xyxy)
for track in self.tracker.tracks:
track.camera_update(warp_matrix)
# Determine which detections need feature extraction
risky_detections = []
non_risky_matches = {}
for i, det in enumerate(xyxy):
matching_tracks = []
for track in self.tracker.tracks:
if track.is_confirmed():
iou = iou_batch(det.reshape(1, -1), track.to_tlbr().reshape(1, -1))[0][0]
if iou > self.iou_threshold:
matching_tracks.append((track, iou))
if len(matching_tracks) == 1:
track, iou = matching_tracks[0]
ars = self.aspect_ratio_similarity(det, track.to_tlbr())
v = ars
alpha = v / ((1 - iou) + v)
if alpha > self.ars_threshold:
# Non-risky detection, use track's features
non_risky_matches[i] = track
continue
# Risky detection, needs feature extraction
risky_detections.append(i)
self.last_feature_extractions = 0
# Extract features only for risky detections
if embs is not None:
features = embs[risky_detections]
else:
features = self.model.get_features(xyxy[risky_detections], img)
# Update the feature extraction counter
self.last_feature_extractions = len(risky_detections)
# Prepare detections
tlwh = xyxy2tlwh(xyxy)
detections = []
for i, (box, conf, cls) in enumerate(zip(tlwh, confs, clss)):
if i in risky_detections:
feat = features[risky_detections.index(i)]
else:
# For non-risky detections, use the matching track's features
feat = non_risky_matches[i].features[-1] # Use the latest feature from the matching track
detections.append(Detection(box, conf, cls, i, feat))
# Update tracker
self.tracker.predict()
self.tracker.update(detections)
# Output bbox identities
outputs = []
for track in self.tracker.tracks:
if not track.is_confirmed():
continue
x1, y1, x2, y2 = track.to_tlbr()
id = track.id
conf = track.conf
cls = track.cls
det_ind = track.det_ind
outputs.append(
np.concatenate(([x1, y1, x2, y2], [id], [conf], [cls], [det_ind])).reshape(1, -1)
)
if len(outputs) > 0:
return np.concatenate(outputs)
return np.array([])
boxmot/trackers/strongsort/sort/track.py
|
To make this available for all trackers I will need to make major modifications. As data from different sources (feature history in KFs, get_features function from reid_auto_backend, detections from tracker) need to be centralized to enable this computation. Could you post the DeepOCSORT example here? |
I have tested method on the original implementation and the reported results are obtained running the following repository: https://github.com/emirhanbayar/Fast-Deep-OC-SORT Upon your message, I also applied it to deepocsort in this repository, and pasting it. However, I did not test it thoroughly: #Fast-Deep-OC-SORT # Mikel Broström 🔥 Yolo Tracking 🧾 AGPL-3.0 license
import numpy as np
import torch
from pathlib import Path
from collections import deque
from typing import List, Tuple
from boxmot.appearance.reid_auto_backend import ReidAutoBackend
from boxmot.motion.cmc import get_cmc_method
from boxmot.motion.kalman_filters.xysr_kf import KalmanFilterXYSR
from boxmot.motion.kalman_filters.xywh_kf import KalmanFilterXYWH
from boxmot.utils.association import associate, linear_assignment
from boxmot.utils.iou import get_asso_func
from boxmot.trackers.basetracker import BaseTracker
from boxmot.utils.ops import xyxy2xysr
def k_previous_obs(observations, cur_age, k):
if len(observations) == 0:
return [-1, -1, -1, -1, -1]
for i in range(k):
dt = k - i
if cur_age - dt in observations:
return observations[cur_age - dt]
max_age = max(observations.keys())
return observations[max_age]
def convert_x_to_bbox(x, score=None):
"""
Takes a bounding box in the centre form [x,y,s,r] and returns it in the form
[x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right
"""
w = np.sqrt(x[2] * x[3])
h = x[2] / w
if score is None:
return np.array([x[0] - w / 2.0, x[1] - h / 2.0, x[0] + w / 2.0, x[1] + h / 2.0]).reshape((1, 4))
else:
return np.array([x[0] - w / 2.0, x[1] - h / 2.0, x[0] + w / 2.0, x[1] + h / 2.0, score]).reshape((1, 5))
def speed_direction(bbox1, bbox2):
cx1, cy1 = (bbox1[0] + bbox1[2]) / 2.0, (bbox1[1] + bbox1[3]) / 2.0
cx2, cy2 = (bbox2[0] + bbox2[2]) / 2.0, (bbox2[1] + bbox2[3]) / 2.0
speed = np.array([cy2 - cy1, cx2 - cx1])
norm = np.sqrt((cy2 - cy1) ** 2 + (cx2 - cx1) ** 2) + 1e-6
return speed / norm
class KalmanBoxTracker(object):
"""
This class represents the internal state of individual tracked objects observed as bbox.
"""
count = 0
def __init__(self, det, delta_t=3, emb=None, alpha=0, max_obs=50, Q_xy_scaling = 0.01, Q_s_scaling = 0.0001):
"""
Initialises a tracker using initial bounding box.
"""
# define constant velocity model
self.max_obs=max_obs
bbox = det[0:5]
self.conf = det[4]
self.cls = det[5]
self.det_ind = det[6]
self.Q_xy_scaling = Q_xy_scaling
self.Q_s_scaling = Q_s_scaling
self.kf = KalmanFilterXYSR(dim_x=7, dim_z=4)
self.kf.F = np.array(
[
# x y s r x' y' s'
[1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 1],
]
)
self.kf.H = np.array(
[
[1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0],
]
)
self.kf.R[2:, 2:] *= 10.0
self.kf.P[4:, 4:] *= 1000.0 # give high uncertainty to the unobservable initial velocities
self.kf.P *= 10.0
self.kf.Q[4:6, 4:6] *= self.Q_xy_scaling
self.kf.Q[-1, -1] *= self.Q_s_scaling
self.bbox_to_z_func = xyxy2xysr
self.x_to_bbox_func = convert_x_to_bbox
self.kf.x[:4] = self.bbox_to_z_func(bbox)
self.time_since_update = 0
self.id = KalmanBoxTracker.count
KalmanBoxTracker.count += 1
self.history = deque([], maxlen=self.max_obs)
self.hits = 0
self.hit_streak = 0
self.age = 0
self.last_observation = np.array([-1, -1, -1, -1, -1]) # placeholder
self.features = deque([], maxlen=self.max_obs)
self.observations = dict()
self.velocity = None
self.delta_t = delta_t
self.history_observations = deque([], maxlen=self.max_obs)
self.emb = emb
self.frozen = False
def update(self, det):
"""
Updates the state vector with observed bbox.
"""
if det is not None:
bbox = det[0:5]
self.conf = det[4]
self.cls = det[5]
self.det_ind = det[6]
self.frozen = False
if self.last_observation.sum() >= 0: # no previous observation
previous_box = None
for dt in range(self.delta_t, 0, -1):
if self.age - dt in self.observations:
previous_box = self.observations[self.age - dt]
break
if previous_box is None:
previous_box = self.last_observation
self.velocity = speed_direction(previous_box, bbox)
self.last_observation = bbox
self.observations[self.age] = bbox
self.history_observations.append(bbox)
self.time_since_update = 0
self.hits += 1
self.hit_streak += 1
self.kf.update(self.bbox_to_z_func(bbox))
else:
self.kf.update(det)
self.frozen = True
def update_emb(self, emb, alpha=0.9):
self.emb = alpha * self.emb + (1 - alpha) * emb
self.emb /= np.linalg.norm(self.emb)
def get_emb(self):
return self.emb
def apply_affine_correction(self, affine):
m = affine[:, :2]
t = affine[:, 2].reshape(2, 1)
if self.last_observation.sum() > 0:
ps = self.last_observation[:4].reshape(2, 2).T
ps = m @ ps + t
self.last_observation[:4] = ps.T.reshape(-1)
for dt in range(self.delta_t, -1, -1):
if self.age - dt in self.observations:
ps = self.observations[self.age - dt][:4].reshape(2, 2).T
ps = m @ ps + t
self.observations[self.age - dt][:4] = ps.T.reshape(-1)
self.kf.apply_affine_correction(m, t)
def predict(self):
"""
Advances the state vector and returns the predicted bounding box estimate.
"""
if (self.kf.x[6] + self.kf.x[2]) <= 0:
self.kf.x[6] *= 0.0
Q = None
self.kf.predict(Q=Q)
self.age += 1
if self.time_since_update > 0:
self.hit_streak = 0
self.time_since_update += 1
self.history.append(self.x_to_bbox_func(self.kf.x))
return self.history[-1]
def get_state(self):
"""
Returns the current bounding box estimate.
"""
return self.x_to_bbox_func(self.kf.x)
def mahalanobis(self, bbox):
"""Should be run after a predict() call for accuracy."""
return self.kf.md_for_measurement(self.bbox_to_z_func(bbox))
def decay_feature(self, alpha):
"""Apply feature decay."""
self.emb *= alpha
class DeepOCSort(BaseTracker):
def __init__(
self,
model_weights: Path,
device: torch.device,
fp16: bool,
per_class: bool = False,
det_thresh: float = 0.3,
max_age: int = 30,
min_hits: int = 3,
iou_threshold: float = 0.3,
delta_t: int = 3,
asso_func: str = "iou",
inertia: float = 0.2,
w_association_emb: float = 0.5,
alpha_fixed_emb: float = 0.95,
aw_param: float = 0.5,
embedding_off: bool = False,
cmc_off: bool = False,
aw_off: bool = False,
Q_xy_scaling: float = 0.01,
Q_s_scaling: float = 0.0001,
# New parameters for selective feature extraction
selective_feature_extraction: bool = True,
iou_threshold_sfe: float = 0.2,
ars_threshold: float = 0.6,
feature_decay: bool = True,
**kwargs: dict
):
super().__init__(max_age=max_age, per_class=per_class)
self.max_age = max_age
self.min_hits = min_hits
self.iou_threshold = iou_threshold
self.det_thresh = det_thresh
self.delta_t = delta_t
self.asso_func = get_asso_func(asso_func)
self.inertia = inertia
self.w_association_emb = w_association_emb
self.alpha_fixed_emb = alpha_fixed_emb
self.aw_param = aw_param
self.per_class = per_class
self.Q_xy_scaling = Q_xy_scaling
self.Q_s_scaling = Q_s_scaling
KalmanBoxTracker.count = 1
self.model = ReidAutoBackend(
weights=model_weights, device=device, half=fp16
).model
self.cmc = get_cmc_method('sof')()
self.embedding_off = embedding_off
self.cmc_off = cmc_off
self.aw_off = aw_off
# New attributes for selective feature extraction
self.selective_feature_extraction = selective_feature_extraction
self.iou_threshold_sfe = iou_threshold_sfe
self.ars_threshold = ars_threshold
self.feature_decay = feature_decay
def aspect_ratio_similarity(self, bbox1: np.ndarray, bbox2: np.ndarray) -> float:
"""Calculate the aspect ratio similarity between two bounding boxes."""
w1, h1 = bbox1[2] - bbox1[0], bbox1[3] - bbox1[1]
w2, h2 = bbox2[2] - bbox2[0], bbox2[3] - bbox2[1]
return 1 - (4 / (np.pi ** 2)) * (np.arctan(w1 / h1) - np.arctan(w2 / h2)) ** 2
def calculate_alpha(self, iou: float, v: float) -> float:
"""Calculate alpha based on IoU and aspect ratio similarity."""
return v / ((1 - iou) + v)
def is_risky_detection(self, det: np.ndarray, tracklets: List[KalmanBoxTracker]) -> Tuple[bool, int]:
"""Determine if a detection is risky and needs feature extraction."""
candidate_count = 0
candidate_tracklet = None
for tracklet in tracklets:
if tracklet.hits > 1: # Only consider confirmed tracklets
iou = self.asso_func(det[:4], tracklet.get_state()[0])
if iou > self.iou_threshold_sfe:
candidate_count += 1
candidate_tracklet = tracklet
if candidate_count > 1:
return True, -1 # More than one candidate, risky
if candidate_count == 1:
# Check aspect ratio similarity
v = self.aspect_ratio_similarity(det[:4], candidate_tracklet.get_state()[0])
alpha = self.calculate_alpha(iou, v)
if alpha > self.ars_threshold:
return False, candidate_tracklet.id
return True, -1 # No candidates or aspect ratio check failed, risky
@BaseTracker.per_class_decorator
def update(self, dets: np.ndarray, img: np.ndarray, embs: np.ndarray = None) -> np.ndarray:
self.check_inputs(dets, img)
self.frame_count += 1
self.height, self.width = img.shape[:2]
scores = dets[:, 4]
dets = np.hstack([dets, np.arange(len(dets)).reshape(-1, 1)])
remain_inds = scores > self.det_thresh
dets = dets[remain_inds]
# Selective feature extraction
if self.selective_feature_extraction and not self.embedding_off and dets.shape[0] > 0:
risky_mask = np.ones(dets.shape[0], dtype=bool)
candidate_ids = np.full(dets.shape[0], -1)
for i, det in enumerate(dets):
is_risky, candidate_id = self.is_risky_detection(det, self.active_tracks)
risky_mask[i] = is_risky
candidate_ids[i] = candidate_id
# Extract features only for risky detections
risky_dets = dets[risky_mask]
if risky_dets.shape[0] > 0:
risky_embs = self.model.get_features(risky_dets[:, 0:4], img)
else:
risky_embs = np.array([])
# Create full embedding array
dets_embs = np.zeros((dets.shape[0], risky_embs.shape[1] if risky_embs.shape[0] > 0 else 0))
dets_embs[risky_mask] = risky_embs
# Copy features for non-risky detections
for i, (is_risky, candidate_id) in enumerate(zip(risky_mask, candidate_ids)):
if not is_risky and candidate_id != -1:
dets_embs[i] = self.active_tracks[candidate_id].get_emb()
else:
# Original feature extraction logic
if self.embedding_off or dets.shape[0] == 0:
dets_embs = np.ones((dets.shape[0], 1))
elif embs is not None:
dets_embs = embs
else:
dets_embs = self.model.get_features(dets[:, 0:4], img)
# CMC
if not self.cmc_off:
transform = self.cmc.apply(img, dets[:, :4])
for trk in self.active_tracks:
trk.apply_affine_correction(transform)
trust = (dets[:, 4] - self.det_thresh) / (1 - self.det_thresh)
af = self.alpha_fixed_emb
dets_alpha = af + (1 - af) * (1 - trust)
# get predicted locations from existing trackers.
trks = np.zeros((len(self.active_tracks), 5))
trk_embs = []
to_del = []
ret = []
for t, trk in enumerate(trks):
pos = self.active_tracks[t].predict()[0]
trk[:] = [pos[0], pos[1], pos[2], pos[3], 0]
if np.any(np.isnan(pos)):
to_del.append(t)
else:
trk_embs.append(self.active_tracks[t].get_emb())
trks = np.ma.compress_rows(np.ma.masked_invalid(trks))
if len(trk_embs) > 0:
trk_embs = np.vstack(trk_embs)
else:
trk_embs = np.array(trk_embs)
for t in reversed(to_del):
self.active_tracks.pop(t)
velocities = np.array([trk.velocity if trk.velocity is not None else np.array((0, 0)) for trk in self.active_tracks])
last_boxes = np.array([trk.last_observation for trk in self.active_tracks])
k_observations = np.array([k_previous_obs(trk.observations, trk.age, self.delta_t) for trk in self.active_tracks])
# First round of association
if self.embedding_off or dets.shape[0] == 0 or trk_embs.shape[0] == 0:
stage1_emb_cost = None
else:
stage1_emb_cost = dets_embs @ trk_embs.T
matched, unmatched_dets, unmatched_trks = associate(
dets[:, 0:5],
trks,
self.asso_func,
self.iou_threshold,
velocities,
k_observations,
self.inertia,
img.shape[1],
img.shape[0],
stage1_emb_cost,
self.w_association_emb,
self.aw_off,
self.aw_param,
)
for m in matched:
self.active_tracks[m[1]].update(dets[m[0], :])
self.active_tracks[m[1]].update_emb(dets_embs[m[0]], alpha=dets_alpha[m[0]])
# Second round of association by OCR
if unmatched_dets.shape[0] > 0 and unmatched_trks.shape[0] > 0:
left_dets = dets[unmatched_dets]
left_dets_embs = dets_embs[unmatched_dets]
left_trks = last_boxes[unmatched_trks]
left_trks_embs = trk_embs[unmatched_trks]
iou_left = self.asso_func(left_dets, left_trks)
emb_cost_left = left_dets_embs @ left_trks_embs.T
if self.embedding_off:
emb_cost_left = np.zeros_like(emb_cost_left)
iou_left = np.array(iou_left)
if iou_left.max() > self.iou_threshold:
rematched_indices = linear_assignment(-iou_left)
to_remove_det_indices = []
to_remove_trk_indices = []
for m in rematched_indices:
det_ind, trk_ind = unmatched_dets[m[0]], unmatched_trks[m[1]]
if iou_left[m[0], m[1]] < self.iou_threshold:
continue
self.active_tracks[trk_ind].update(dets[det_ind, :])
self.active_tracks[trk_ind].update_emb(dets_embs[det_ind], alpha=dets_alpha[det_ind])
to_remove_det_indices.append(det_ind)
to_remove_trk_indices.append(trk_ind)
unmatched_dets = np.setdiff1d(unmatched_dets, np.array(to_remove_det_indices))
unmatched_trks = np.setdiff1d(unmatched_trks, np.array(to_remove_trk_indices))
for m in unmatched_trks:
self.active_tracks[m].update(None)
# create and initialise new trackers for unmatched detections
for i in unmatched_dets:
trk = KalmanBoxTracker(
dets[i],
delta_t=self.delta_t,
emb=dets_embs[i],
alpha=dets_alpha[i],
Q_xy_scaling=self.Q_xy_scaling,
Q_s_scaling=self.Q_s_scaling,
max_obs=self.max_obs
)
self.active_tracks.append(trk)
i = len(self.active_tracks)
for trk in reversed(self.active_tracks):
if trk.last_observation.sum() < 0:
d = trk.get_state()[0]
else:
"""
this is optional to use the recent observation or the kalman filter prediction,
we didn't notice significant difference here
"""
d = trk.last_observation[:4]
if (trk.time_since_update < 1) and (trk.hit_streak >= self.min_hits or self.frame_count <= self.min_hits):
# +1 as MOT benchmark requires positive
ret.append(np.concatenate((d, [trk.id], [trk.conf], [trk.cls], [trk.det_ind])).reshape(1, -1))
i -= 1
# remove dead tracklet
if trk.time_since_update > self.max_age:
self.active_tracks.pop(i)
if len(ret) > 0:
return np.concatenate(ret)
return np.array([])
# Feature decay
def apply_feature_decay(self):
if self.feature_decay:
for trk in self.active_tracks:
if trk.time_since_update > 0:
trk.decay_feature(self.alpha_fixed_emb) |
As a side note, you can just merge pull request #1621 for faststrongsort implementation and leave it there. I really appreciate and am grateful for this awesome repository and your effort to present sota methods in a easy-to-use format. |
How is this supposed to work for the first features generation round @emirhanbayar ? For example here: for i, _ in enumerate(xyxy):
if i in risky_detections:
feat = features[risky_detections.index(i)]
else:
# For non-risky detections, use the matching track's features
feat = non_risky_matches[i].features[-1] # What if it is None?
feats.append(feat) I am basically trying to apply this to all trackers: @torch.no_grad()
def get_features_fast(self, xyxy, img, active_tracks, embs):
risky_detections = []
non_risky_matches = {}
for i, det in enumerate(xyxy):
matching_tracks = []
for at in active_tracks:
iou = iou_batch(det.reshape(1, -1), at.get_state())[0][0]
if iou > self.iou_threshold:
matching_tracks.append((at, iou))
if len(matching_tracks) == 1:
track, iou = matching_tracks[0]
ars = self.aspect_ratio_similarity(det, track.get_state())
v = ars
alpha = v / ((1 - iou) + v)
if alpha <= self.ars_threshold:
# Non-risky detection, use track's features
non_risky_matches[i] = track
continue
# Risky detection, needs feature extraction
risky_detections.append(i)
# Extract features only for risky detections otherwise use last feature
if embs is not None:
features = embs[risky_detections]
else:
features = self.get_features(xyxy[risky_detections], img)
# Prepare detections
feats = []
for i, _ in enumerate(xyxy):
if i in risky_detections:
feat = features[risky_detections.index(i)]
else:
# For non-risky detections, use the matching track's features
feat = if non_risky_matches[i].features[-1] is None # Use the latest feature from the matching track
feats.append(feat)
feats = torch.tensor(feats, dtype=torch.float32)
return feats |
A candidate must be a confirmed track. A confirmed track must have a feature vector since it is matched at least once. if track.is_confirmed(): # only confirmed tracks !
iou = iou_batch(det.reshape(1, -1), track.to_tlbr().reshape(1, -1))[0][0]
if iou > self.iou_threshold:
matching_tracks.append((track, iou)) Assuming "active_tracks" will be the confirmed tracks your implementation seems good to me. Additionally, we need to implement feature decay mechanism. |
Looking at the line https://github.com/mikel-brostrom/boxmot/blob/274b53289ca42a7eedc774e62f5643b1152227cb/boxmot/trackers/strongsort/sort/track.py#L88C9-L88C42 Just noticed that tracks are Confirmed upon initialization, not Tentative. Am I wrong? Is this intended, or a mistake? |
I am sorry, the following part: if alpha <= self.ars_threshold:
# Non-risky detection, use track's features
non_risky_matches[i] = track
continue should be replaced as: if alpha >= self.ars_threshold:
# Non-risky detection, use track's features
non_risky_matches[i] = track
continue |
Every test and ci job passes besides running it feeding a camera stream. Do you have any clue why it fails on camera stream @emirhanbayar? 0: 384x640 1 person, 154.9ms
(2, 512)
0: 384x640 1 person, 1 remote, 251.8ms
(2, 512)
0: 384x640 1 person, 1 remote, 250.6ms
(1, 512)
Traceback (most recent call last):
File "/Users/mikel.brostrom/boxmot/tracking/track.py", line 174, in <module>
run(opt)
File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/mikel.brostrom/boxmot/tracking/track.py", line 103, in run
for r in results:
File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
response = gen.send(request)
^^^^^^^^^^^^^^^^^
File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/predictor.py", line 262, in stream_inference
self.run_callbacks("on_predict_postprocess_end")
File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/predictor.py", line 399, in run_callbacks
callback(self)
File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/trackers/track.py", line 83, in on_predict_postprocess_end
predictor.results[i] = predictor.results[i][idx]
~~~~~~~~~~~~~~~~~~~~^^^^^
File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/results.py", line 287, in __getitem__
return self._apply("__getitem__", idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/results.py", line 359, in _apply
setattr(r, k, getattr(v, fn)(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/mikel.brostrom/Library/Caches/pypoetry/virtualenvs/boxmot-YDNZdsaB-py3.11/lib/python3.11/site-packages/ultralytics/engine/results.py", line 183, in __getitem__
return self.__class__(self.data[idx], self.orig_shape)
~~~~~~~~~^^^^^
IndexError: index 1 is out of bounds for dimension 0 with size 1 command: (boxmot-py3.11) ➜ boxmot git:(emirhanbayar-master) ✗ python3 tracking/track.py --yolo-model yolov8n.pt --source 0 --show --tracking-method faststrongsort |
I tried it now. Commenting the following lines in the BaseTracker for r in results:
# img = yolo.predictor.trackers[0].plot_results(r.orig_img, args.show_trajectories)
continue
if args.show is True:
cv2.imshow('BoxMOT', img)
key = cv2.waitKey(1) & 0xFF
if key == ord(' ') or key == ord('q'):
break I am getting a different kind of error as follows, but it is thrown in strongsort as well, not specific to faststrongsort: 2024-09-11 23:48:12.780 | SUCCESS | boxmot.appearance.reid_model_factory:load_pretrained_weights:183 - Loaded pretrained weights from /home/emirhan/boxmot-deneme/tracking/weights/osnet_x0_25_msmt17.pt
0: 480x640 1 person, 370.8ms
0: 480x640 1 person, 6.8ms
0: 480x640 1 person, 5.6ms
WARNING ⚠️ Waiting for stream 0
0: 480x640 1 person, 5.6ms
0: 480x640 1 person, 5.6ms
WARNING ⚠️ Waiting for stream 0
0: 480x640 1 person, 5.6ms
0: 480x640 1 person, 19.8ms
0: 480x640 1 person, 8.7ms
0: 480x640 1 person, 11.5ms
WARNING ⚠️ Waiting for stream 0
0: 480x640 1 person, 20.5ms
0: 480x640 1 person, 10.9ms
0: 480x640 1 person, 20.3ms
0: 480x640 1 person, 16.0ms
0: 480x640 1 person, 20.3ms
0: 480x640 1 person, 11.7ms
0: 480x640 1 person, 1 refrigerator, 20.6ms
0: 480x640 1 person, 1 refrigerator, 5.8ms
0: 480x640 1 person, 1 refrigerator, 5.8ms
WARNING ⚠️ Waiting for stream 0
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
File "/home/emirhan/boxmot-deneme/tracking/track.py", line 174, in <module>
run(opt)
File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/emirhan/boxmot-deneme/tracking/track.py", line 103, in run
for r in results:
File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
response = gen.send(request)
^^^^^^^^^^^^^^^^^
File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/ultralytics/engine/predictor.py", line 249, in stream_inference
with profilers[0]:
File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/ultralytics/utils/ops.py", line 46, in __enter__
self.start = self.time()
^^^^^^^^^^^
File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/ultralytics/utils/ops.py", line 61, in time
torch.cuda.synchronize(self.device)
File "/home/emirhan/anaconda3/envs/boxmot-new/lib/python3.11/site-packages/torch/cuda/__init__.py", line 801, in synchronize
return torch._C._cuda_synchronize()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Aborted (core dumped) |
I see. It seems that there is some issue with |
Running StrongSORT from 2dd45fe yields 68.338 HOTA score on MOT17 val. Running StrongSORT from the current master yields 52.462 HOTA score on MOT17 val. A major bug that caught my attention is Tracks are initialized as: self.time_since_update = 1
self.state = TrackState.Confirmed Whereas they were correctly initialized in 2dd45fe as: self.time_since_update = 0
self.state = TrackState.Tentative When I fixed this and set n_init to 3 as in the original implementation, I got 62.591. However, I could not detect the causes of the gap of 6 points. What is the purpose of these modifications that disrupted the performance of StrongSORT? To get these scores, I placed dets and embs shared by StrongSORT as:
and run the following script: import os
import cv2
import numpy as np
from datetime import datetime
from pathlib import Path
import argparse
from boxmot.trackers.strongsort.strong_sort import StrongSORT
def get_seq_paths(dataset_path):
imgs = {}
seq_names = []
for root, dirs, files in os.walk(dataset_path):
for dire in dirs:
if dire.startswith("MOT"):
seq_names.append(dire)
imgs[dire] = [os.path.join(r, file) for r, d, f in os.walk(os.path.join(root, dire)) for file in f if file.endswith(".jpg")]
imgs[dire].sort(key=lambda x: int(x.split("/")[-1].split(".")[0]))
return imgs, seq_names
def parse_options():
parser = argparse.ArgumentParser()
parser.add_argument('--reid', type=str, default='osnet_x1_0_msmt17.pt', help='model.pt path')
parser.add_argument('--dataset-path', type=str, default='./data/MOT17/train', help='dataset path')
parser.add_argument('--dataset', type=str, default='train', help='dataset type')
parser.add_argument('--device', type=str, default='0', help='device \'cpu\' or \'0\', \'1\', ... for gpu')
parser.add_argument('--fp16', action='store_true', help='use fp16')
return parser.parse_args()
def create_tracker(args):
return StrongSORT(
model_weights=Path(args.reid),
device=args.device,
fp16=args.fp16,
)
def process_detection(row):
tlwh = row[2:6]
return np.array([tlwh[0], tlwh[1], tlwh[0] + tlwh[2], tlwh[1] + tlwh[3], row[6], 0])
def write_result(f, frame_no, det):
f.write(f"{frame_no + 1},{int(det[4])},{int(det[0])},{int(det[1])},{int(det[2] - det[0])},{int(det[3] - det[1])},{det[5]:.2f},-1,-1,-1\n")
if __name__ == "__main__":
args = parse_options()
dataset_path = "/".join(args.dataset_path.split("/")[:-1] + [args.dataset])
imgs, seq_names = get_seq_paths(dataset_path)
total_time = 0
total_dets = 0
total_frames = 0
for seq in seq_names:
tracker = create_tracker(args)
print(f"Sequence: {seq}")
seq_imgs = imgs[seq]
output_dir = Path(f"MOT_{args.dataset}/{args.reid}/faststrongsort/{seq}")
output_dir.mkdir(parents=True, exist_ok=True)
with open(output_dir.parent / f"{seq}.txt", "w") as f:
print(f"Writing results to: {f.name}")
seq_time = 0
det_file = Path(f"MOT17_{args.dataset}_YOLOX+BoT/{seq}.npy")
seq_det = np.load(det_file, allow_pickle=True)
for frame_no, img_path in enumerate(seq_imgs):
frame = cv2.imread(img_path)
frame_dets = seq_det[seq_det[:, 0] == frame_no + 1]
if len(frame_dets) < 1:
continue
total_dets += len(frame_dets)
total_frames += 1
processed_dets = np.array([process_detection(row) for row in frame_dets])
features = [row[10:] for row in frame_dets]
if len(processed_dets) > 0:
start = datetime.now()
tracked_det = tracker.update(processed_dets, frame, embs=np.array(features))
# tracked_det = tracker.update(processed_dets, frame)
seq_time += (datetime.now() - start).total_seconds()
for det in tracked_det:
write_result(f, frame_no, det)
scale = frame.shape[0] / 1080
cv2.putText(frame, f"id: {int(det[4])}", (int(det[0]), int(det[1])), cv2.FONT_HERSHEY_SIMPLEX, scale, (0, 0, 255), 2)
cv2.imwrite(str(output_dir / f"{frame_no + 1}.jpg"), frame)
print(f"{seq} time: {seq_time:.2f}s")
total_time += seq_time
print(f"Total time: {total_time:.2f}s")
print(f"FPS: {total_frames / total_time:.2f}")
print(f"Total frames: {total_frames}") |
You can run
to see the differences between the strongsort folder in master and the commit you notice had the highest performance. The only thing I can see, besides what you mentioned in your comment above, is that the KalmanFilter got refactored. I would recommend you to create a from boxmot.trackers.strongsort.sort.strongsort_kf import KalmanFilter
...
self.kf = KalmanFilter() Let me know if this is the issue 😄 |
I may not be capturing every single bug in the pipeline as I only runs the evaluation on a small subset of MOT17. So something could potentially slip through, without me noticing it. |
All trackers show similar results under benchmark here: https://github.com/mikel-brostrom/boxmot/actions/runs/10838200299/job/30075883940 |
With #1627, the StrongSORT accuracy is 68.3 again. I also, added Fast-StrongSORT. In the following table _x_y number ath the end indicates the iou_threshold
boxmot_FSS (iou_threshold = 1.0) accuracy should be the same as StrongSORT, but it is not. I will be looking at this. |
Are these results based on the last half of the MOT17 training set or the whole training set? |
Last half of the training set. |
Let me know when the results are closer to StrongSORT? 🚀 |
With #1634 results are as follows:
|
If you like the idea and want to implement this idea to other methods, I can share you with the following insights: Although this is a rare edge case, a tracklet can be a candidate for more than one detection. As a caution, if you want to apply this to BoT-SORT and Deep-OC-SORT, it is good to consider only the detections with high confidence in the scope of this mechanism. Thus, only the first stage of the matching will be affected.
|
Do you have any speedup numbers for each of the IoUs @emirhanbayar. It was mostly for adding this table to the experiment section 😄 |
Using osnetx1 model, IoU threshold of 0.4 increases FPS from 4.4 to 6.7 on GTX1650. Max memory usage also decreases from 1200 MBs to 900s. That's all I tested in addition to tests in the original paper. I do not have a low power edge device right now. If you think tests on GTX1650 is still meaningful, I can perform experiments. |
Sure! If you could add a speedup column to the table that would be awesome, even if it is on a GTX1650 |
Btw @emirhanbayar , I am in the process of automating the evaluation of the tracking modules in the CI. I would need these detections and embeddings for being able to generate paper results:
Do you have a link to them? Or could you point out where to find them? |
They were shared by the authors of the StrongSORT on the following link: https://drive.google.com/drive/folders/1zzzUROXYXt8NjxO1WUcwSzqD-nn7rPNr |
Thanks! |
|
Thank you for providing all these details @emirhanbayar. Super valuable! Adding this to the experiments section |
I tried to replicate your original StrongSORT results:
in the CI pipeline. Mine are:
With this configuration The data is:
Do you see any differences or made any changes? |
I set max_cost_dist to 0.4 to obtain these results. |
Could reproduce. Thanks! {"HOTA": 68.329, "MOTA": 76.356, "IDF1": 81.21} |
Initializing the StrongSORT tracks as confirmed just gave me: {"HOTA": 67.87, "MOTA": 76.422, "IDF1": 79.348} So this was a major bug |
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. |
In the paper: We have "Aspect ratio similarity" formula: But everywhere "distance" means 0 - closest, 1 - furthest, and "similarity" is opposite: 0 - absolutely different, 1 - identical. I see that you have correct formula in your code, but can't you fix it in paper too? I've spent about an hour to be completely sure that it is not my misunderstanding... BTW, good job! |
I apologize for the confusion and appreciate you bringing this to my attention. The formula has been updated as follows: That should solve the problem, right? Feel free to share if there’s anything else I should consider. |
Search before asking
Description
https://arxiv.org/abs/2409.06617
The mechanism described in the above work is designed to determine which detections require feature extraction on the fly and avoid unnecessary feature extractions. This way it increases FPS without sacrificing accuracy.
It can be applied to any tracker in this repo. However, it is only tested on strongsort and deepocsort. It is as easy as modifying a few lines to implement this method to an existing tracker. I can apply it to strongsort and add a new tracker called Fast-StrongSORT. We can also add this to all trackers with a command line argument to activate.
Use case
This enhancement is proposed to solve the exact problem that is brought about in #1595.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: