Great Deep Learning Tutorials for Compter Vision

A Great Collection of Deep Learning Tutorials and Repositories for Compter Vision

General:

Microsoft Computer Vision Recipes [Great]
OpenMMLab [Great]
OpenMMLab - GitHub
OpenMMLab - MMCV is a foundational python library for computer vision research
OpenMMLab - MMEditing is an open source image and video editing toolbox
OpenMMLab - MMDetection
Kornia - is a differentiable computer vision library for PyTorch
Great Computer Vision Tutorials and Notebooks

Model Visualization:

CNN Visualizations [Very Good]
CNN-heatmap
Tools to Design or Visualize Architecture of Neural Network [Great]
Netron - GitHub [Excellent]
Monitor your GPUs [Excellent]
Understanding CNN
Exploring Neural Networks with Activation Atlases [Great]
Explaining What Explainable AI Did Not [Interesting]
CNN Explainer [Interesting]
Interactive Tools for ML, DL and Math [Interesting]
Visualizing Neural Networks with the Grand Tour [Interesting]
Zoom In: An Introduction to Circuits
Concept: Concept Modeling on Images

GradCAM:

GradCAM Main Paper
PyTorch GradCAM 1
PyTorch GradCAM 2
Keras GradCAM
pyimagesearch GradCAM

t-SNE Visualization:

TSNE-Visualization of large dataset images using pre-trained networks in Tensorflow and Keras [Great]

UMAP Visualization:

Understanding UMAP

EDA and visualization of Image/Video Datasets:

fastdup: a powerful free tool designed to rapidly extract valuable insights from your image & video datasets

Image Classification Models:

GhostNet (CVPR 2020) in PyTorch and TensorFlow
GhostNet - GitHub
GhostNet - PyTorch Hub [Excellent]
Residual blocks — Building blocks of ResNet
EfficientNet-PyTorch
EfficientNet Explanation
DeepMind - NFNets
NFNets - PyTorch
EfficientNetV2
Deit: Data-Efficient architectures and training for Image classification
How to Train State-Of-The-Art Models Using TorchVision’s Latest Primitives [Excellent]

Data Augmentation for Image Classification:

Image Augmentation
AugLy - data augmentations library that supports different modalities
Learnable Test-time Augmentation

Self-Supervised Learning:

VISSL - VIsion library for state-of-the-art Self-Supervised Learning
VISSL - GitHub
DINO: Self-Supervised Vision Transformers
DINOv2: Learning Robust Visual Features without Supervision
I-JEPA (the Image-based Joint-Embedding Predictive Architecture)

Transformers in Computer Vision:

Vision Transformers Tutorial [Great]
Transformers in computer vision: ViT architectures, tips, tricks and improvements [Great]
Vision Transformer
Vision Transformer - Pytorch [Great]
PyTorch External Attention [Great]
MobileViT in PyTorch
Clip-vit
OpenAI CLIP
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
CLOC: Contrastive Localized Language-Image Pre-Training
Transformers: from NLP to CV [Very Great & Excellent]

Vision-Language Models (LLMs in Image & Computer Vision):

Prismer: A Vision-Language Model
ViperGPT: Visual Inference via Python Execution for Reasoning
LLaVA: Large Language and Vision Assistant
ImageBind-LLM
An Introduction to Vision-Language Modeling
A comprehensive tutorial on building Vision-Language Models (VLMs)
Video Web Arena: agent models for OS and web control with memory
Qwen2-VL: To See the World More Clearly

Multi-Modal LLMs:

CoDi: Any-to-Any Generation via Composable Diffusion
kosmos-2: Grounding Multimodal Large Language Models to the World

Language-Vision Intelligence:

Salesforce LAVIS: A Library for Language-Vision Intelligence

Generative AI in Image and Computer Vision:

Dream Gaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Stable Diffusion Video
Tutorial on Diffusion Models for Imaging and Vision

Detection & Segmentation:

Object Detection:

Object Detection for Dummies Part 1
Object Detection for Dummies Part 2
Object Detection for Dummies Part 3
Object Detection for Dummies Part 4
Open MMLab Detection Toolbox and Benchmark
RetinaNet: how Focal Loss fixes Single-Shot Detection
Getting Started With Bounding Box Regression In TensorFlow
Pelee: A Real-Time Object Detection System on Mobile Devices
Pelee: Tutorial
An overview of deep-learning based object-detection algorithms
Object detection and tracking in PyTorch
Object Detection with RetinaNet - Keras
PP-YOLO
Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds
Signfeld - Synthetic traffic sign detection
Swin-Transformer
DETR - DEtection TRansformer [Great]
YOLOX
Yolo-v5 vs EfficientDet
MMRotate: OpenMMLab Rotated Object Detection [Great]
YOLO v7
Ultralytics YOLOv8
Yolov8 Tracking
YOLO-NAS
Object Detection Leaderboard
Object Detection Leaderboard Linkedin Post

Augmentation for Object Detection & Instance Segmentation:

Albumentations - Great Library for image augmentation and transformations
Shapely - Manipulation and analysis of geometric objects

Faster R-CNN Object Detector Tutorials:

Guide to build Faster RCNN in PyTorch
Simple and Fast Implementation of Faster R-CNN
PyTorch Tutorial to Object Detection
Object Detection and Classification using R-CNNs
Faster R-CNN: Down the rabbit hole of modern object detection
Faster R-CNN (object detection) implemented by Keras for custom data
Mask R-CNN Unmasked

Instance Segmentation:

Image Segmentation: tips and tricks from 39 Kaggle competitions [Excellent]

Semantic Segmentation:

Simple Guide to Semantic Segmentation
Background Matting
Facebook's Segment Anything Model (SAM)
SAM 2 by Meta: Segment Anything in Images and Videos

License Plate Detection and Recognition:

ALPR in Unscontrained Scenarios [Good]
ALPR in Unscontrained Scenarios - Project Page
Lisence plate detection and recognition - ANPR
License Plate Detection
yolo3 to detection license plate

Keypoint Regression:

Transfer Learning Library
DEKR: dense keypoint regression framework
CenterNet
AdelaiDet
Keypoint Regression

Loss Functions:

Use Focal Loss To Train Model Using Imbalanced Dataset

Image Augmentation:

Image Augmentation for more train data - Kaggle [Great]

Deformable Convolution PyTorch Implementation:

TorchVision - Deformable Convolution - Link1
TorchVision - Deformable Convolution - Link2
Simple PyTorch Deformable Convolution v2 [Great]
Deformable Convolutional Networks v2 with Pytorch [Good]
Deformable ConvNets v2
MMLab Detection Toolbox
DCNv2 in PyTorch
PyTorch implementation of Deformable Convolution

3D Image Classification & 3D Computer Vision:

3D MNIST
3D CNN Images with Tensorflow
Point Cloud Library (PCL)
Kaolin - a PyTorch library for accelerating 3D deep learning
PyTorch3D [Fantastic]
OpenCV Tutorial Homography
Total 3D Understanding

OCR:

Easy OCR [Great]
MMOCR [Great]
Word Level OCR Dataset for Persian Language
Simple Persian Word-Level OCR
Amazon Textract - OCR
TextFuseNet
Transformer-OCR
Microsoft TrOCR
vedastr: open source scene text recognition toolbox
Goodbye OCR - Welcome Donut from MIT
Nougat: Neural Optical Understanding for Academic Documents
Accurate line-level text detection and recognition (OCR) in any language

Persian OCR:

Persian Dataset
Arshasb: Persian OCR dataset
Above Data Set (complete set)

Few-Shot Learning:

Advances in few-shot learning
Advances in few-shot learning: reproducing results in PyTorch
One Shot learning, Siamese networks and Triplet Loss
Meta-Learning with Differentiable Convex Optimization
Building a One-Shot Learning Network with PyTorch
Annotation-Efficient Learning [Good Few-Shot Learning Tutorial]
Awesome Papers - Few shot
Few-Shot-Learning
OpenMMLab FewShot Learning [Great]

Learning to Hash & General Hashing (+ Fast Searching Methods):

Finding similar images using Deep learning and Locality Sensitive Hashing [Very Good]
Image similarity using Triplet Loss
Finding duplicate images made easy!
Duplicate Image Detection - perspective hash (pHash)
ImageHash
NGT - Neighborhood Graph and Tree for Indexing High-dimensional Data [Great]
NGT - Tutorial
NGT - Python
Visual Search with MXNet Gluon and HNSW
Annoy: Approximate Nearest Neighbors in C++/Python [Great]
datasketch: Big Data Looks Small [Great: probabilistic data structures that can process and search very large amount of data super fast]
Image Similarity with Hugging Face Datasets and Transformers

Video Understanding:

Holistic Video Understanding Challenge
Holistic Video Understanding Dataset
Holistic Large Scale Video Understanding - Tutorial
Deep Learning on Video - Part1
VMZ: Model Zoo for Video Modeling
The 3rd YouTube-8M Video Understanding Challenge - 1st Place Solution
torchvideo
torchvideo - GitHub
PytorchVideo - GitHub [Great]
PytorchVideo - main page [Great]
MoViNet-pytorch [Interesting]
Basic Video transforms for PyTorch
Fast and Easy to use video feature extractor
Video Augmentation Techniques for Deep Learning [Great]
Decord_loader - Excellent Video Data Loader [Great]
Decord - GitHub
PyAV - Pythonic binding for the FFmpeg libraries
Python bindings for FFmpeg
GStreamer: Multimedia Framework
NÜWA: text to video synthesis
Awesome Video Datasets
Iranian Movies Kaggle Dataset
Kinetics 400 Data Set - Download Link via DropBox

Text-to-Video:

Text to Video Synthesis

Optical Flow:

RAFT

Visual Attention Method:

Attention Mechanisms in Computer Vision Part 1: CBAM [Excellent]
Efficient Channel Attention for Deep Convolutional Neural Networks (ECA-Net)
ECA-Net: Efficient Channel Attention
Channel Attention and Squeeze-and-Excitation Networks (SENet)
Self-Attention In Computer Vision

Visual Question Answering (VQA):

Vanilla VQA

Pose Estimation:

Pose Animator
RepNet: Weakly Supervised 3D Human Pose Estimation

Object Tracking:

GSDT

GANs:

MMGeneration
Generate Anime Style Face Using DCGAN and Explore Its Latent Feature Representation
StyleGAN2
DeepFaceLab
imagen-pytorch: Google's Text-to-Image Neural Network [Great]
DragGAN

Image Super-Resolution:

Image Super-Resolution: A Comprehensive Review

Siamese Networks:

Siamese Network Tutorial with TensorFlow

Model Evaluation:

Precision and Recall for Multi-Class
Precision and Recall
Precision-Recall metric

Model Size & Speed of Models:

GFLOPs & Number of Parameters of ResNet Models
EfficientNet Models Accuracy vs FLOPs

Computer Vision Annotation Tools:

CVAT
SuperAnnotate
OpenCV - SuperAnnotate Desktop
Curve-GCN
VoTT (Visual Object Tagging Tool)
MakeSense AI
Awesome Data Labeling

Building Datasets:

Building Image Datasets for Computer Vision Algorithms
icrawler - mini image framework of web crawlers [Great]
Unity Computer Vision
pytube - downloading YouTube Videos
Subreddit Media Downloader
pigeon: Quickly annotate data on Jupyter

LLMs & Generative AI in Vision Tasks:

Draw UI via Generative AI
Emu video & Emu edit meta's Models

NSFW (Not Safe for Work):

Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification

Soccer & Football AI code:

Football AI code
Football AI

Other:

Color Channel Order:

OpenCV is BGR, Pillow is RGB, and Decord is RGB
- Numpy Image bgr-to-rgb
Imagenet 1000 class indices to human readable labels
streamlit Make Images & Faces as Comic one!!
How DALL-E 2 could solve major computer vision challenges
PyTorch jpeg Decoding on the GPU

Miscellaneous:

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

Files

ComputerVision.md

Latest commit

History

ComputerVision.md

File metadata and controls

Great Deep Learning Tutorials for Compter Vision

General:

Model Visualization:

GradCAM:

t-SNE Visualization:

UMAP Visualization:

EDA and visualization of Image/Video Datasets:

Image Classification Models:

Data Augmentation for Image Classification:

Self-Supervised Learning:

Transformers in Computer Vision:

Vision-Language Models (LLMs in Image & Computer Vision):

Multi-Modal LLMs:

Language-Vision Intelligence:

Generative AI in Image and Computer Vision:

Detection & Segmentation:

Object Detection:

Augmentation for Object Detection & Instance Segmentation:

Faster R-CNN Object Detector Tutorials:

Instance Segmentation:

Semantic Segmentation:

License Plate Detection and Recognition:

Keypoint Regression:

Loss Functions:

Image Augmentation:

Deformable Convolution PyTorch Implementation:

3D Image Classification & 3D Computer Vision:

OCR:

Persian OCR:

Few-Shot Learning:

Learning to Hash & General Hashing (+ Fast Searching Methods):

Video Understanding:

Text-to-Video:

Optical Flow:

Visual Attention Method:

Visual Question Answering (VQA):

Pose Estimation:

Object Tracking:

GANs:

Image Super-Resolution:

Siamese Networks:

Model Evaluation:

Model Size & Speed of Models:

Computer Vision Annotation Tools:

Building Datasets:

LLMs & Generative AI in Vision Tasks:

NSFW (Not Safe for Work):

Soccer & Football AI code:

Other:

Color Channel Order:

Miscellaneous: