Awesome Long-Tail Learning

This repo pays special attention to the long-tailed distribution, where labels follow a long-tailed or power-law distribution in the training dataset and/or test dataset. Related papers are summarized, including its application in computer vision, in particular image classification, and extreme multi-label learning (XML), in particular text categorization.

🔆 Updated 2024-07-13

Long-tailed Distribution

Long-tailed Learning

Type of Long-Tailed Learning Methods

Type	`TST`	`IS`	`CBS`	`CLW`	`NC`	`ENS`	`DA`
Meaning	Two-Stage Training	Instance Sampling	Class-Balanced Sampling	Class-Level Weighting	Normalized Classifier	Ensemble	Data Augmentation

Long-Tailed Learning Workshops

Year	Venue	Title	Remark
2021	CVPR	Open World Vision	long-tail, open-set, streaming labels
2021	CVPR	Learning from Limited and Imperfect Data (L2ID)	label noise, SSL, long-tail

Long-Tailed Classification

Year	Venue	Title	Remark
2024	CVPR	DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets	code
2024	ICML	Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition	code
2024	ICML	Learning Label Shift Correction for Test-Agnostic Long-Tailed Recognition	code
2024	ICML	Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts	🔥🔥🔥 code
2023	TPAMI	Deep Long-Tailed Learning: A Survey
2023	TPAMI	Probabilistic Contrastive Learning for Long-Tailed Visual Recognition	code
2023	ICLR	Delving into Semantic Scale Imbalance
2023	ICLR	Temperature Schedules for self-supervised contrastive methods on long-tail data
2023	ICLR	On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning
2023	ICLR	Long-Tailed Learning Requires Feature Learning
2023	ICLR	Decoupled Training for Long-Tailed Classification With Stochastic Representations
2023	ICLR	LPT: Long-tailed Prompt Tuning for Image Classification	fine-tune ViT
2023	ICLR	CUDA: Curriculum of Data Augmentation for Long-tailed Recognition
2023	NeurIPS	A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning	code
2023	NeurIPS	Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models	code
2023	arXiv	Exploring Vision-Language Models for Imbalanced Learning	pre-trained model
2023	ECCV	VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition	fine-tune CLIP
2023	AAAI	Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition	video dataset, code
2022	ECCV	Tailoring Self-Supervision for Supervised Learning	video dataset, code
2022	NeurIPS	Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition	code
2022	arXiv	Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification
2022	TPAMI	Key Point Sensitive Loss for Long-tailed Visual Recognition
2022	IJCV	A Survey on Long-Tailed Visual Recognition	survey
2022	arXiv	Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning
2022	ICLR	OPTIMAL TRANSPORT FOR LONG-TAILED RECOGNI- TION WITH LEARNABLE COST MATRIX
2022	ICLR	SELF-SUPERVISED LEARNING IS MORE ROBUST TO DATASET IMBALANCE
2022	AAAI	Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification	code
2021	NeurIPS	Improving Contrastive Learning on Imbalanced Seed Data via Open-World Sampling
2021	NeurIPS	Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective	code, mixup+LA
2021	arXiv	HAR: Hardness Aware Reweighting for Imbalanced Datasets
2021	arXiv	Feature Generation for Long-tail Classification
2021	arXiv	Label-Aware Distribution Calibration for Long-tailed Classification
2021	arXiv	Self-supervised Learning is More Robust to Dataset Imbalance
2021	Arixiv	Long-tailed Distribution Adaptation
2021	arXiv	LEARNING FROM LONG-TAILED DATA WITH NOISY LABELS
2021	ICCV	Self Supervision to Distillation for Long-Tailed Visual Recognition
2021	ICCV	Distilling Virtual Examples for Long-tailed Recognition
2021	CVPR	Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
2021	CVPR	MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
2021	CVPR	Disentangling Label Distribution for Long-tailed Visual Recognition
2021	CVPR	Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-Balanced Samplings
2021	CVPR	Seesaw Loss for Long-Tailed Instance Segmentation
2021	ICLR	Exploring balanced feature spaces for representation learning
2021	ICLR	IS LABEL SMOOTHING TRULY INCOMPATIBLE WITH KNOWLEDGE DISTILLATION: AN EMPIRICAL STUDY
2021	arXiv	Improving Long-Tailed Classification from Instance Level
2021	arXiv	ResLT: Residual Learning for Long-tailed Recognition
2021	arXiv	Improving Long-Tailed Classification from Instance Level
2021	arXiv	Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces	by Google
2021	arXiv	Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition
2021	arXiv	Procrustean Training for Imbalanced Deep Learning
2021	arXiv	Balanced Knowledge Distillation for Long-tailed Learning	`CBS`+`IS`, Code
2021	arXiv	Class-Balanced Distillation for Long-Tailed Visual Recognition	`ENS`+`DA`+`IS`, by Google Research
2021	arXiv	Distributional Robustness Loss for Long-tail Learning	`TST`+`CBS`
2021	CVPR	Improving Calibration for Long-Tailed Recognition	`DA`+`TST`, Code
2021	CVPR	Distribution Alignment: A Unified Framework for Long-tail Visual Recognition	`TST`
2021	CVPR	Adversarial Robustness under Long-Tailed Distribution
2021	ICLR	HETEROSKEDASTIC AND IMBALANCED DEEP LEARNING WITH ADAPTIVE REGULARIZATION	Code
2021	ICLR	LONG-TAILED RECOGNITION BY ROUTING DIVERSE DISTRIBUTION-AWARE EXPERTS	`ENS`+`NC`, Code, by Zi-Wei Liu
2021	ICLR	Long-Tail Learning via Logit Adjustment	by Google
2021	AAAI	Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks
2021	arXiv	Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification
2020	arXiv	ELF: An Early-Exiting Framework for Long-Tailed Classification
2020	CVPR	Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective
2020	CVPR	Equalization Loss for Long-Tailed Object Recognition
2020	CVPR	Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective
2020	ICLR	Decoupling representation and classifier for long-tailed recognition	Code
2020	NeurIPS	Balanced Meta-Softmax for Long-Tailed Visual Recognition
2020	NeurIPS	Rethinking the Value of Labels for Improving Class-Imbalanced Learning	Code
2020	CVPR	Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition	Code
2019	NeurIPS	Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss	Code
2019	CVPR	Large-Scale Long-Tailed Recognition in an Open World	Code, bibtex, by CUHK
2018	-	iNatrualist. The inaturalist 2018 competition dataset	long-tailed dataset
2017	arXiv	The Devil is in the Tails: Fine-grained Classification in the Wild
2017	NeurIPS	Learning to model the tail

Long-Tailed Regression

Year	Venue	Title	Remark
2022	CVPR	Balanced MSE for Imbalanced Visual Regression
2021	OpenReview	LIFTING IMBALANCED REGRESSION WITH SELF- SUPERVISED LEARNING	iclr rejected
2021	ICML	Delving into Deep Imbalanced Regression	code

Long-Tailed Semi-Supervised Learning

Year	Venue	Title	Remark
2024	arXiv	Towards Realistic Long-tailed Semi-supervised Learning in an Open World	code
2024	ICML	SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning	code
2023	CVPR	Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need	🔥🔥🔥code
2023	NeurIPS	Towards Distribution-Agnostic Generalized Category Discovery	code
2023	ICLR	Imbalanced Semi-supervised Learning with Bias Adaptive Classifier
2023	ICLR	Adaptive Robust Evidential Optimization For Open Set Detection from Imbalanced Data
2023	ICLR	INPL: PSEUDO-LABELING THE INLIERS FIRST FOR IMBALANCED SEMI-SUPERVISED LEARNING
2022	CVPR	DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning	code
2022	MLJ	Transfer and Share: Semi-Supervised Learning from Long-Tailed Data	code
2022	ICML	Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution Data	code
2022	ICLR	THE RICH GET RICHER: DISPARATE IMPACT OF SEMI-SUPERVISED LEARNING
2022	ICLR	ON NON-RANDOM MISSING LABELS IN SEMI-SUPERVISED LEARNING
2022	OpenReview	UNIFYING DISTRIBUTION ALIGNMENT AS A LOSS FOR IMBALANCED SEMI-SUPERVISED LEARNING
2021	NeurIPS	ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning
2021	arXiv	CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning
2021	CVPR	CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning	by Google, Code, Tensorflow
2021	arXiv	DISTRIBUTION-AWARE SEMANTICS-ORIENTED PSEUDO-LABEL FOR IMBALANCED SEMI-SUPERVISED LEARNING	SSL, Code
2020	NeurIPS	Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning	Code

Long-Tailed Learning with Noisy Labels

Year	Venue	Title	Remark
2024	CVPR	SURE: SUrvey REcipes for building reliable and robust deep networks	code
2023	ICLR	LONG-TAILED PARTIAL LABEL LEARNING VIA DYNAMIC REBALANCING	code, partial label
2023	ICCV	When Noisy Labels Meet Long Tail Dilemmas: A Representation Calibration Method
2022	ECCV	Identifying Hard Noise in Long-Tailed Sample Distribution	code, large datasets
2022	ICLR	SAMPLE SELECTION WITH UNCERTAINTY OF LOSSES FOR LEARNING WITH NOISY LABELS
2022	PAKDD	Prototypical Classifier for Robust Class-Imbalanced Learning	code
2021	arXiv	ROBUST LONG-TAILED LEARNING UNDER LABEL NOISE	code

Long-Tailed OOD Detection

Year	Venue	Title	Remark
2024	AAAI	EAT: Towards Long-Tailed Out-of-Distribution Detection	🔥🔥🔥 code

Long-Tailed Federated Learning

Year	Venue	Title	Remark
2022	IJCAI	Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features

eXtreme Multi-label Learning

Binary Relevance

Year	Venue	Title
2019	Machine learning	Data Scarcity, Robustness and Extreme Multi-label Classification
2019	WSDM	Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches
2017	KDD	PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification
2017	AISTATS	Label Filters for Large Scale Multilabel Classification
2016	WSDM	DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
2016	ICML	PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

Tree-based Methods

Year	Venue	Title	Remark
2021	KDD	Extreme Multi-label Learning for Semantic Matching in Product Search	by Amazon, code
2020	arXiv	Probabilistic Label Trees for Extreme Multi-label Classification	PLT survey, code
2020	arXiv	Online probabilistic label trees
2020	AISTATS	LdSM: Logarithm-depth Streaming Multi-label Decision Trees	Instance tree,c++ code
2019	NeurIPS	AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks	Label tree
2019	arXiv	Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification	Label tree
2018	ICML	CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning	Instance tree
2018	WWW	Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising	Label tree...by Manik Varma
2016	ICML	Extreme F-Measure Maximization using Sparse Probability Estimates	Label tree
2016	KDD	Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications	Instance tree
2014	KDD	A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning	Instance tree, python implementation
2013	ICML	Label Partitioning For Sublinear Ranking	Label tree
2013	WWW	Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages	Instance tree, Random Forest, Gini Index
2011	NeurIPS	Efficient label tree learning for large scale object recognition	Label tree, multi-class
2010	NeurIPS	Label embedding trees for large multi-class tasks	Label tree, multi-class
2008	ECML Workshop	Effective and Efficient Multilabel Classification in Domains with Large Number of Labels	Label tree

Embedding-based Methods

Year	Venue	Title	Remark
2019	AAAI	Distributional Semantics Meets Multi-Label Learning	bibtex
2019	arXiv	Ranking-Based Autoencoder for Extreme Multi-label Classification
2019	NeurIPS	Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Ouput Spaces	by Google Research
2017	KDD	AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification
2015	NeurIPS	Sparse Local Embeddings for Extreme Multi-label Classification
2014	ICML	Large-scale Multi-label Learning with Missing Labels
2014	ICML	Multi-label Classification via Feature-aware Implicit Label Space Encoding
2013	ICML	Efficient Multi-label Classification with Many Labels
2012	NeurIIPS	Feature-aware Label Space Dimension Reduction for Multi-label Classification
2011	IJCAI	WSABIE: Scaling Up To Large Vocabulary Image Annotation	bibtex
2009	NeurIPS	Multi-Label Prediction via Compressed Sensing
2008	KDD	Extracting Shared Subspaces for Multi-label Classification

Speed-up and Compression

Year	Venue	Title	Remark
2020	KDD	Large-Scale Training System for 100-Million Classification at Alibaba	Applied Data Science Track
2020	arXiv	SOLAR: Sparse Orthogonal Learned and Random Embeddings
2020	ICLR	EXTREME CLASSIFICATION VIA ADVERSARIAL SOFTMAX APPROXIMATION
2019	AISTATS	Stochastic Negative Mining for Learning with Large Output Spaces	by Google
2019	NeurIPS	Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products	Rice University, bibtex
2019	arXiv	An Embarrassingly Simple Baseline for eXtreme Multi-label Prediction
2019	arXiv	Accelerating Extreme Classification via Adaptive Feature Agglomeration	bibtex, authors from IIT
2019	SDM	Fast Training for Large-Scale One-versus-All Linear Classifiers using Tree-Structured Initialization	code bibtex

Noval XML Settings

Year	Venue	Title	Remark
2020	arXiv	Extreme Multi-label Classification from Aggregated Labels	by Inderjit Dhillon. This paper considers multi-instance learning in XML
2020	arXiv	Unbiased Loss Functions for Extreme Classification With Missing Labels	by Rohit Babbar. Missing labels
2020	ICML	Deep Streaming Label Learning	code, by Dacheng Tao, streaming multi-label learning
2016	arXiv	Streaming Label Learning for Modeling Labels on the Fly	by Dacheng Tao, streaming multi-label learning

Theoretical Studies

Year	Venue	Title	Remark
2019	ICML	Sparse Extreme Multi-label Learning with Oracle Property	Code, by Weiwei Liu
2019	NeurIPS	Multilabel reductions: what is my loss optimising?	bibtex, by Google

Text Classification

Year	Venue	Title	Remark
2022	TKDE	BGNN-XML: Bilateral Graph Neural Networks for Extreme Multi-label Text Classification
2021	ICML	SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels
2020	KDD	Correlation Networks for Extreme Multi-label Text Classification	code
2020	arXiv	GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification
2020	ICML	Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification	code
2019	ACL	Large-Scale Multi-Label Text Classification on EU Legislation	Eur-Lex 4.3K, bibtex
2019	arXiv	X-BERT: eXtreme Multi-label Text Classification with BERT	code by Yiming Yang, Inderjit Dhillon
2019	NeurIPS	AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks
2018	EMNLP	Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces	few-shot, zero-shot, evaluation metric
2018	NeurIPS	A no-regret generalization of hierarchical softmax to extreme multi-label classification	code, PLT code
2017	SIGIR	Deep Learning for Extreme Multi-label Text Classification	by Yiming Yang at CMU, bibtex

Others

Label Correlation

Year	Venue	Title
2019	ICML	DL2: Training and Querying Neural Networks with Logic
2015	KDD	Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning
2010	KDD	Multi-Label Learning by Exploiting Label Dependency

Long-tailed Continual Learning

Year	Venue	Title	Remark
2020	ECCV	Imbalanced Continual Learning with Partitioning Reservoir Sampling

Train/Test Split

Year	Venue	Title	Remark
2021	arXiv	Stratified Sampling for Extreme Multi-Label Data

XML Seminar

Year	Venue	Title	Remark
2019	Dagstuhl Seminar 18291	Extreme Classification

Name		Name	Last commit message	Last commit date
Latest commit History 203 Commits
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Long-Tail Learning

🔆 Updated 2024-07-13

Long-tailed Learning

Type of Long-Tailed Learning Methods

Long-Tailed Learning Workshops

Long-Tailed Classification

Long-Tailed Regression

Long-Tailed Semi-Supervised Learning

Long-Tailed Learning with Noisy Labels

Long-Tailed OOD Detection

Long-Tailed Federated Learning

eXtreme Multi-label Learning

Binary Relevance

Tree-based Methods

Embedding-based Methods

Speed-up and Compression

Noval XML Settings

Theoretical Studies

Text Classification

Others

Label Correlation

Long-tailed Continual Learning

Train/Test Split

XML Seminar

Survey References:

XML Datasets link

Extreme Classification Workshops link

About

Releases

Packages

Contributors 3

License

Stomach-ache/awesome-long-tail-learning

Folders and files

Latest commit

History

Repository files navigation

Awesome Long-Tail Learning

🔆 Updated 2024-07-13

Long-tailed Learning

Type of Long-Tailed Learning Methods

Long-Tailed Learning Workshops

Long-Tailed Classification

Long-Tailed Regression

Long-Tailed Semi-Supervised Learning

Long-Tailed Learning with Noisy Labels

Long-Tailed OOD Detection

Long-Tailed Federated Learning

eXtreme Multi-label Learning

Binary Relevance

Tree-based Methods

Embedding-based Methods

Speed-up and Compression

Noval XML Settings

Theoretical Studies

Text Classification

Others

Label Correlation

Long-tailed Continual Learning

Train/Test Split

XML Seminar

Survey References:

XML Datasets link

Extreme Classification Workshops link

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages