A Great Collection of Deep Learning Tutorials and Repositories for Compter Vision
- Microsoft Computer Vision Recipes [Great]
- OpenMMLab [Great]
- OpenMMLab - GitHub
- OpenMMLab - MMCV is a foundational python library for computer vision research
- OpenMMLab - MMEditing is an open source image and video editing toolbox
- OpenMMLab - MMDetection
- Kornia - is a differentiable computer vision library for PyTorch
- Great Computer Vision Tutorials and Notebooks
- CNN Visualizations [Very Good]
- CNN-heatmap
- Tools to Design or Visualize Architecture of Neural Network [Great]
- Netron - GitHub [Excellent]
- Monitor your GPUs [Excellent]
- Understanding CNN
- Exploring Neural Networks with Activation Atlases [Great]
- Explaining What Explainable AI Did Not [Interesting]
- CNN Explainer [Interesting]
- Interactive Tools for ML, DL and Math [Interesting]
- Visualizing Neural Networks with the Grand Tour [Interesting]
- Zoom In: An Introduction to Circuits
- Concept: Concept Modeling on Images
- TSNE-Visualization of large dataset images using pre-trained networks in Tensorflow and Keras [Great]
- GhostNet (CVPR 2020) in PyTorch and TensorFlow
- GhostNet - GitHub
- GhostNet - PyTorch Hub [Excellent]
- Residual blocks — Building blocks of ResNet
- EfficientNet-PyTorch
- EfficientNet Explanation
- DeepMind - NFNets
- NFNets - PyTorch
- EfficientNetV2
- Deit: Data-Efficient architectures and training for Image classification
- How to Train State-Of-The-Art Models Using TorchVision’s Latest Primitives [Excellent]
- Image Augmentation
- AugLy - data augmentations library that supports different modalities
- Learnable Test-time Augmentation
- VISSL - VIsion library for state-of-the-art Self-Supervised Learning
- VISSL - GitHub
- DINO: Self-Supervised Vision Transformers
- DINOv2: Learning Robust Visual Features without Supervision
- I-JEPA (the Image-based Joint-Embedding Predictive Architecture)
- Vision Transformers Tutorial [Great]
- Transformers in computer vision: ViT architectures, tips, tricks and improvements [Great]
- Vision Transformer
- Vision Transformer - Pytorch [Great]
- PyTorch External Attention [Great]
- MobileViT in PyTorch
- Clip-vit
- Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
- CLOC: Contrastive Localized Language-Image Pre-Training
- Transformers: from NLP to CV [Very Great & Excellent]
- Prismer: A Vision-Language Model
- ViperGPT: Visual Inference via Python Execution for Reasoning
- LLaVA: Large Language and Vision Assistant
- ImageBind-LLM
- An Introduction to Vision-Language Modeling
- A comprehensive tutorial on building Vision-Language Models (VLMs)
- Video Web Arena: agent models for OS and web control with memory
- Qwen2-VL: To See the World More Clearly
- CoDi: Any-to-Any Generation via Composable Diffusion
- kosmos-2: Grounding Multimodal Large Language Models to the World
- Dream Gaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
- Stable Diffusion Video
- Tutorial on Diffusion Models for Imaging and Vision
- Object Detection for Dummies Part 1
- Object Detection for Dummies Part 2
- Object Detection for Dummies Part 3
- Object Detection for Dummies Part 4
- Open MMLab Detection Toolbox and Benchmark
- RetinaNet: how Focal Loss fixes Single-Shot Detection
- Getting Started With Bounding Box Regression In TensorFlow
- Pelee: A Real-Time Object Detection System on Mobile Devices
- Pelee: Tutorial
- An overview of deep-learning based object-detection algorithms
- Object detection and tracking in PyTorch
- Object Detection with RetinaNet - Keras
- Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds
- Signfeld - Synthetic traffic sign detection
- Swin-Transformer
- DETR - DEtection TRansformer [Great]
- Yolo-v5 vs EfficientDet
- MMRotate: OpenMMLab Rotated Object Detection [Great]
- YOLO v7
- Ultralytics YOLOv8
- Yolov8 Tracking
- Object Detection Leaderboard
- Object Detection Leaderboard Linkedin Post
- Albumentations - Great Library for image augmentation and transformations
- Shapely - Manipulation and analysis of geometric objects
- Guide to build Faster RCNN in PyTorch
- Simple and Fast Implementation of Faster R-CNN
- PyTorch Tutorial to Object Detection
- Object Detection and Classification using R-CNNs
- Faster R-CNN: Down the rabbit hole of modern object detection
- Faster R-CNN (object detection) implemented by Keras for custom data
- Mask R-CNN Unmasked
- Simple Guide to Semantic Segmentation
- Background Matting
- Facebook's Segment Anything Model (SAM)
- SAM 2 by Meta: Segment Anything in Images and Videos
- ALPR in Unscontrained Scenarios [Good]
- ALPR in Unscontrained Scenarios - Project Page
- Lisence plate detection and recognition - ANPR
- License Plate Detection
- yolo3 to detection license plate
- Transfer Learning Library
- DEKR: dense keypoint regression framework
- CenterNet
- AdelaiDet
- Keypoint Regression
- TorchVision - Deformable Convolution - Link1
- TorchVision - Deformable Convolution - Link2
- Simple PyTorch Deformable Convolution v2 [Great]
- Deformable Convolutional Networks v2 with Pytorch [Good]
- Deformable ConvNets v2
- MMLab Detection Toolbox
- DCNv2 in PyTorch
- PyTorch implementation of Deformable Convolution
- 3D CNN Images with Tensorflow
- Point Cloud Library (PCL)
- Kaolin - a PyTorch library for accelerating 3D deep learning
- PyTorch3D [Fantastic]
- OpenCV Tutorial Homography
- Total 3D Understanding
- Easy OCR [Great]
- MMOCR [Great]
- Word Level OCR Dataset for Persian Language
- Simple Persian Word-Level OCR
- Amazon Textract - OCR
- TextFuseNet
- Transformer-OCR
- Microsoft TrOCR
- vedastr: open source scene text recognition toolbox
- Goodbye OCR - Welcome Donut from MIT
- Nougat: Neural Optical Understanding for Academic Documents
- Accurate line-level text detection and recognition (OCR) in any language
- Advances in few-shot learning
- Advances in few-shot learning: reproducing results in PyTorch
- One Shot learning, Siamese networks and Triplet Loss
- Meta-Learning with Differentiable Convex Optimization
- Building a One-Shot Learning Network with PyTorch
- Annotation-Efficient Learning [Good Few-Shot Learning Tutorial]
- Awesome Papers - Few shot
- Few-Shot-Learning
- OpenMMLab FewShot Learning [Great]
- Finding similar images using Deep learning and Locality Sensitive Hashing [Very Good]
- Image similarity using Triplet Loss
- Finding duplicate images made easy!
- Duplicate Image Detection - perspective hash (pHash)
- ImageHash
- NGT - Neighborhood Graph and Tree for Indexing High-dimensional Data [Great]
- NGT - Tutorial
- NGT - Python
- Visual Search with MXNet Gluon and HNSW
- Annoy: Approximate Nearest Neighbors in C++/Python [Great]
- datasketch: Big Data Looks Small [Great: probabilistic data structures that can process and search very large amount of data super fast]
- Image Similarity with Hugging Face Datasets and Transformers
- Holistic Video Understanding Challenge
- Holistic Video Understanding Dataset
- Holistic Large Scale Video Understanding - Tutorial
- Deep Learning on Video - Part1
- VMZ: Model Zoo for Video Modeling
- The 3rd YouTube-8M Video Understanding Challenge - 1st Place Solution
- torchvideo
- torchvideo - GitHub
- PytorchVideo - GitHub [Great]
- PytorchVideo - main page [Great]
- MoViNet-pytorch [Interesting]
- Basic Video transforms for PyTorch
- Fast and Easy to use video feature extractor
- Video Augmentation Techniques for Deep Learning [Great]
- Decord_loader - Excellent Video Data Loader [Great]
- Decord - GitHub
- PyAV - Pythonic binding for the FFmpeg libraries
- Python bindings for FFmpeg
- GStreamer: Multimedia Framework
- NÜWA: text to video synthesis
- Awesome Video Datasets
- Iranian Movies Kaggle Dataset
- Kinetics 400 Data Set - Download Link via DropBox
- Attention Mechanisms in Computer Vision Part 1: CBAM [Excellent]
- Efficient Channel Attention for Deep Convolutional Neural Networks (ECA-Net)
- ECA-Net: Efficient Channel Attention
- Channel Attention and Squeeze-and-Excitation Networks (SENet)
- Self-Attention In Computer Vision
- MMGeneration
- Generate Anime Style Face Using DCGAN and Explore Its Latent Feature Representation
- StyleGAN2
- DeepFaceLab
- imagen-pytorch: Google's Text-to-Image Neural Network [Great]
- DragGAN
- SuperAnnotate
- OpenCV - SuperAnnotate Desktop
- Curve-GCN
- VoTT (Visual Object Tagging Tool)
- MakeSense AI
- Awesome Data Labeling
- Building Image Datasets for Computer Vision Algorithms
- icrawler - mini image framework of web crawlers [Great]
- Unity Computer Vision
- pytube - downloading YouTube Videos
- Subreddit Media Downloader
- pigeon: Quickly annotate data on Jupyter
OpenCV is BGR, Pillow is RGB, and Decord is RGB