Skip to content

Stomach-ache/awesome-long-tail-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Awesome Long-Tail Learning Awesome

This repo pays special attention to the long-tailed distribution, where labels follow a long-tailed or power-law distribution in the training dataset and/or test dataset. Related papers are summarized, including its application in computer vision, in particular image classification, and extreme multi-label learning (XML), in particular text categorization.

🔆 Updated 2024-07-13

Long-tailed Learning

Type of Long-Tailed Learning Methods

Type TST IS CBS CLW NC ENS DA
Meaning Two-Stage Training Instance Sampling Class-Balanced Sampling Class-Level Weighting Normalized Classifier Ensemble Data Augmentation

Long-Tailed Learning Workshops

Year Venue Title Remark
2021 CVPR Open World Vision long-tail, open-set, streaming labels
2021 CVPR Learning from Limited and Imperfect Data (L2ID) label noise, SSL, long-tail

Long-Tailed Classification

Year Venue Title Remark
2024 CVPR DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets code
2024 ICML Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition code
2024 ICML Learning Label Shift Correction for Test-Agnostic Long-Tailed Recognition code
2024 ICML Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts 🔥🔥🔥 code
2023 TPAMI Deep Long-Tailed Learning: A Survey
2023 TPAMI Probabilistic Contrastive Learning for Long-Tailed Visual Recognition code
2023 ICLR Delving into Semantic Scale Imbalance
2023 ICLR Temperature Schedules for self-supervised contrastive methods on long-tail data
2023 ICLR On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning
2023 ICLR Long-Tailed Learning Requires Feature Learning
2023 ICLR Decoupled Training for Long-Tailed Classification With Stochastic Representations
2023 ICLR LPT: Long-tailed Prompt Tuning for Image Classification fine-tune ViT
2023 ICLR CUDA: Curriculum of Data Augmentation for Long-tailed Recognition
2023 NeurIPS A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning code
2023 NeurIPS Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models code
2023 arXiv Exploring Vision-Language Models for Imbalanced Learning pre-trained model
2023 ECCV VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition fine-tune CLIP
2023 AAAI Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition video dataset, code
2022 ECCV Tailoring Self-Supervision for Supervised Learning video dataset, code
2022 NeurIPS Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition code
2022 arXiv Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification
2022 TPAMI Key Point Sensitive Loss for Long-tailed Visual Recognition
2022 IJCV A Survey on Long-Tailed Visual Recognition survey
2022 arXiv Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning
2022 ICLR OPTIMAL TRANSPORT FOR LONG-TAILED RECOGNI- TION WITH LEARNABLE COST MATRIX
2022 ICLR SELF-SUPERVISED LEARNING IS MORE ROBUST TO DATASET IMBALANCE
2022 AAAI Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification code
2021 NeurIPS Improving Contrastive Learning on Imbalanced Seed Data via Open-World Sampling
2021 NeurIPS Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective code, mixup+LA
2021 arXiv HAR: Hardness Aware Reweighting for Imbalanced Datasets
2021 arXiv Feature Generation for Long-tail Classification
2021 arXiv Label-Aware Distribution Calibration for Long-tailed Classification
2021 arXiv Self-supervised Learning is More Robust to Dataset Imbalance
2021 Arixiv Long-tailed Distribution Adaptation
2021 arXiv LEARNING FROM LONG-TAILED DATA WITH NOISY LABELS
2021 ICCV Self Supervision to Distillation for Long-Tailed Visual Recognition
2021 ICCV Distilling Virtual Examples for Long-tailed Recognition
2021 CVPR Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
2021 CVPR MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
2021 CVPR Disentangling Label Distribution for Long-tailed Visual Recognition
2021 CVPR Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-Balanced Samplings
2021 CVPR Seesaw Loss for Long-Tailed Instance Segmentation
2021 ICLR Exploring balanced feature spaces for representation learning
2021 ICLR IS LABEL SMOOTHING TRULY INCOMPATIBLE WITH KNOWLEDGE DISTILLATION: AN EMPIRICAL STUDY
2021 arXiv Improving Long-Tailed Classification from Instance Level
2021 arXiv ResLT: Residual Learning for Long-tailed Recognition
2021 arXiv Improving Long-Tailed Classification from Instance Level
2021 arXiv Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces by Google
2021 arXiv Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition
2021 arXiv Procrustean Training for Imbalanced Deep Learning
2021 arXiv Balanced Knowledge Distillation for Long-tailed Learning CBS+IS, Code
2021 arXiv Class-Balanced Distillation for Long-Tailed Visual Recognition ENS+DA+IS, by Google Research
2021 arXiv Distributional Robustness Loss for Long-tail Learning TST+CBS
2021 CVPR Improving Calibration for Long-Tailed Recognition DA+TST, Code
2021 CVPR Distribution Alignment: A Unified Framework for Long-tail Visual Recognition TST
2021 CVPR Adversarial Robustness under Long-Tailed Distribution
2021 ICLR HETEROSKEDASTIC AND IMBALANCED DEEP LEARNING WITH ADAPTIVE REGULARIZATION Code
2021 ICLR LONG-TAILED RECOGNITION BY ROUTING DIVERSE DISTRIBUTION-AWARE EXPERTS ENS+NC, Code, by Zi-Wei Liu
2021 ICLR Long-Tail Learning via Logit Adjustment by Google
2021 AAAI Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks
2021 arXiv Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification
2020 arXiv ELF: An Early-Exiting Framework for Long-Tailed Classification
2020 CVPR Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective
2020 CVPR Equalization Loss for Long-Tailed Object Recognition
2020 CVPR Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective
2020 ICLR Decoupling representation and classifier for long-tailed recognition Code
2020 NeurIPS Balanced Meta-Softmax for Long-Tailed Visual Recognition
2020 NeurIPS Rethinking the Value of Labels for Improving Class-Imbalanced Learning Code
2020 CVPR Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition Code
2019 NeurIPS Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss Code
2019 CVPR Large-Scale Long-Tailed Recognition in an Open World Code, bibtex, by CUHK
2018 - iNatrualist. The inaturalist 2018 competition dataset long-tailed dataset
2017 arXiv The Devil is in the Tails: Fine-grained Classification in the Wild
2017 NeurIPS Learning to model the tail

Long-Tailed Regression

Year Venue Title Remark
2022 CVPR Balanced MSE for Imbalanced Visual Regression
2021 OpenReview LIFTING IMBALANCED REGRESSION WITH SELF- SUPERVISED LEARNING iclr rejected
2021 ICML Delving into Deep Imbalanced Regression code

Long-Tailed Semi-Supervised Learning

Year Venue Title Remark
2024 arXiv Towards Realistic Long-tailed Semi-supervised Learning in an Open World code
2024 ICML SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning code
2023 CVPR Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need 🔥🔥🔥code
2023 NeurIPS Towards Distribution-Agnostic Generalized Category Discovery code
2023 ICLR Imbalanced Semi-supervised Learning with Bias Adaptive Classifier
2023 ICLR Adaptive Robust Evidential Optimization For Open Set Detection from Imbalanced Data
2023 ICLR INPL: PSEUDO-LABELING THE INLIERS FIRST FOR IMBALANCED SEMI-SUPERVISED LEARNING
2022 CVPR DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning code
2022 MLJ Transfer and Share: Semi-Supervised Learning from Long-Tailed Data code
2022 ICML Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution Data code
2022 ICLR THE RICH GET RICHER: DISPARATE IMPACT OF SEMI-SUPERVISED LEARNING
2022 ICLR ON NON-RANDOM MISSING LABELS IN SEMI-SUPERVISED LEARNING
2022 OpenReview UNIFYING DISTRIBUTION ALIGNMENT AS A LOSS FOR IMBALANCED SEMI-SUPERVISED LEARNING
2021 NeurIPS ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning
2021 arXiv CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning
2021 CVPR CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning by Google, Code, Tensorflow
2021 arXiv DISTRIBUTION-AWARE SEMANTICS-ORIENTED PSEUDO-LABEL FOR IMBALANCED SEMI-SUPERVISED LEARNING SSL, Code
2020 NeurIPS Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning Code

Long-Tailed Learning with Noisy Labels

Year Venue Title Remark
2024 CVPR SURE: SUrvey REcipes for building reliable and robust deep networks code
2023 ICLR LONG-TAILED PARTIAL LABEL LEARNING VIA DYNAMIC REBALANCING code, partial label
2023 ICCV When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration Method
2022 ECCV Identifying Hard Noise in Long-Tailed Sample Distribution code, large datasets
2022 ICLR SAMPLE SELECTION WITH UNCERTAINTY OF LOSSES FOR LEARNING WITH NOISY LABELS
2022 PAKDD Prototypical Classifier for Robust Class-Imbalanced Learning code
2021 arXiv ROBUST LONG-TAILED LEARNING UNDER LABEL NOISE code

Long-Tailed OOD Detection

Year Venue Title Remark
2024 AAAI EAT: Towards Long-Tailed Out-of-Distribution Detection 🔥🔥🔥 code

Long-Tailed Federated Learning

Year Venue Title Remark
2022 IJCAI Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features

eXtreme Multi-label Learning

Binary Relevance

Year Venue Title Remark
2019 Machine learning Data Scarcity, Robustness and Extreme Multi-label Classification
2019 WSDM Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches
2017 KDD PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification
2017 AISTATS Label Filters for Large Scale Multilabel Classification
2016 WSDM DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
2016 ICML PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

Tree-based Methods

Year Venue Title Remark
2021 KDD Extreme Multi-label Learning for Semantic Matching in Product Search by Amazon, code
2020 arXiv Probabilistic Label Trees for Extreme Multi-label Classification PLT survey, code
2020 arXiv Online probabilistic label trees
2020 AISTATS LdSM: Logarithm-depth Streaming Multi-label Decision Trees Instance tree,c++ code
2019 NeurIPS AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks Label tree
2019 arXiv Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification Label tree
2018 ICML CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning Instance tree
2018 WWW Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising Label tree...by Manik Varma
2016 ICML Extreme F-Measure Maximization using Sparse Probability Estimates Label tree
2016 KDD Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications Instance tree
2014 KDD A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning Instance tree, python implementation
2013 ICML Label Partitioning For Sublinear Ranking Label tree
2013 WWW Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages Instance tree, Random Forest, Gini Index
2011 NeurIPS Efficient label tree learning for large scale object recognition Label tree, multi-class
2010 NeurIPS Label embedding trees for large multi-class tasks Label tree, multi-class
2008 ECML Workshop Effective and Efficient Multilabel Classification in Domains with Large Number of Labels Label tree

Embedding-based Methods

Year Venue Title Remark
2019 AAAI Distributional Semantics Meets Multi-Label Learning bibtex
2019 arXiv Ranking-Based Autoencoder for Extreme Multi-label Classification
2019 NeurIPS Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Ouput Spaces by Google Research
2017 KDD AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification
2015 NeurIPS Sparse Local Embeddings for Extreme Multi-label Classification
2014 ICML Large-scale Multi-label Learning with Missing Labels
2014 ICML Multi-label Classification via Feature-aware Implicit Label Space Encoding
2013 ICML Efficient Multi-label Classification with Many Labels
2012 NeurIIPS Feature-aware Label Space Dimension Reduction for Multi-label Classification
2011 IJCAI WSABIE: Scaling Up To Large Vocabulary Image Annotation bibtex
2009 NeurIPS Multi-Label Prediction via Compressed Sensing
2008 KDD Extracting Shared Subspaces for Multi-label Classification

Speed-up and Compression

Year Venue Title Remark
2020 KDD Large-Scale Training System for 100-Million Classification at Alibaba Applied Data Science Track
2020 arXiv SOLAR: Sparse Orthogonal Learned and Random Embeddings
2020 ICLR EXTREME CLASSIFICATION VIA ADVERSARIAL SOFTMAX APPROXIMATION
2019 AISTATS Stochastic Negative Mining for Learning with Large Output Spaces by Google
2019 NeurIPS Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products Rice University, bibtex
2019 arXiv An Embarrassingly Simple Baseline for eXtreme Multi-label Prediction
2019 arXiv Accelerating Extreme Classification via Adaptive Feature Agglomeration bibtex, authors from IIT
2019 SDM Fast Training for Large-Scale One-versus-All Linear Classifiers using Tree-Structured Initialization code bibtex

Noval XML Settings

Year Venue Title Remark
2020 arXiv Extreme Multi-label Classification from Aggregated Labels by Inderjit Dhillon. This paper considers multi-instance learning in XML
2020 arXiv Unbiased Loss Functions for Extreme Classification With Missing Labels by Rohit Babbar. Missing labels
2020 ICML Deep Streaming Label Learning code, by Dacheng Tao, streaming multi-label learning
2016 arXiv Streaming Label Learning for Modeling Labels on the Fly by Dacheng Tao, streaming multi-label learning

Theoretical Studies

Year Venue Title Remark
2019 ICML Sparse Extreme Multi-label Learning with Oracle Property Code, by Weiwei Liu
2019 NeurIPS Multilabel reductions: what is my loss optimising? bibtex, by Google

Text Classification

Year Venue Title Remark
2022 TKDE BGNN-XML: Bilateral Graph Neural Networks for Extreme Multi-label Text Classification
2021 ICML SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels
2020 KDD Correlation Networks for Extreme Multi-label Text Classification code
2020 arXiv GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification
2020 ICML Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification code
2019 ACL Large-Scale Multi-Label Text Classification on EU Legislation Eur-Lex 4.3K, bibtex
2019 arXiv X-BERT: eXtreme Multi-label Text Classification with BERT code by Yiming Yang, Inderjit Dhillon
2019 NeurIPS AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks
2018 EMNLP Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces few-shot, zero-shot, evaluation metric
2018 NeurIPS A no-regret generalization of hierarchical softmax to extreme multi-label classification code, PLT code
2017 SIGIR Deep Learning for Extreme Multi-label Text Classification by Yiming Yang at CMU, bibtex

Others

Label Correlation

Year Venue Title Remark
2019 ICML DL2: Training and Querying Neural Networks with Logic
2015 KDD Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning
2010 KDD Multi-Label Learning by Exploiting Label Dependency

Long-tailed Continual Learning

Year Venue Title Remark
2020 ECCV Imbalanced Continual Learning with Partitioning Reservoir Sampling

Train/Test Split

Year Venue Title Remark
2021 arXiv Stratified Sampling for Extreme Multi-Label Data

XML Seminar

Year Venue Title Remark
2019 Dagstuhl Seminar 18291 Extreme Classification

Survey References:

  1. https://arxiv.org/pdf/1901.00248.pdf
  2. http://www.iith.ac.in/~saketha/research/AkshatMTP2018.pdf
  3. http://manikvarma.org/pubs/bengio19.pdf
  4. The Emerging Trends of Multi-Label Learning

XML Datasets link

Extreme Classification Workshops link

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published