Collection of learning materials and resources about data science, machine learning, deep learning and other interesting fields.
- Algorithms (by Jeff Erickson)
-
- A playbook for systematically maximizing the performance of deep learning models.
-
MIT Deep Learning Book: [Official HTML format] [PDF format]
- An MIT Press book by Ian Goodfellow and Yoshua Bengio and Aaron Courville.
-
- Interactive deep learning book with code, math, and discussions. Implemented with PyTorch, NumPy/MXNet, and TensorFlow
-
- The second edition of
Deep Learning Interviews
is home to hundreds of fully-solved problems, from a wide range of key topics in AI.
- The second edition of
-
The Mathematical Engineering of Deep Learning
- This book provides a complete and concise overview of the mathematical engineering of deep learning.
-
Deep Learning with PyTorch Lightning
- Getting Started with PyTorch Lightning, Published by Packt.
-
A Concise Handbook of TensorFlow 2
- This is a concise handbook of TensorFlow 2 based on Keras and Eager Execution mode, aiming to help developers with some basic machine learning and Python knowledge to get started with TensorFlow 2 quickly.
-
Machine learning from scratch: Derivations in concept and code
-
Patterns, predictions, and actions: A story about machine learning
- This graduate textbook on machine learning tells a story of how patterns in data support predictions and consequential actions.
-
- A visual introduction to probability and statistics.
-
- Run Stable Diffusion on Apple Silicon with Core ML.
-
Hawkeye
is a unified deep learning based fine-grained image recognition toolbox built on PyTorch, which is designed for researchers and engineers.
-
- An open-source python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code.
-
- AI imagined images. Pythonic generation of stable diffusion images.
-
Kornia
is a differentiable computer vision library for PyTorch.
-
- High-Resolution Image Synthesis with Latent Diffusion Models.
-
Stable Diffusion
is a latent text-to-image diffusion model.
-
- A Keras / Tensorflow implementation of Stable Diffusion.
-
Ultralytics YOLOv8
, developed by Ultralytics, is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility.
-
- Python library for audio and music analysis.
-
pandarallel
is a simple and efficient tool to parallelize Pandas operations on all available CPUs.
-
- Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js
-
pyclustering
is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks).
-
TimeSide
is a Python framework enabling low and high level audio analysis, imaging, transcoding, streaming and labelling.
-
Xorbits
is an open-source computing framework that makes it easy to scale data science and machine learning workloads — from data preprocessing to tuning, training, and model serving.
-
-
Alpa
is a system for training and serving large-scale neural networks.
-
-
- Automatic architecture search and hyperparameter optimization for PyTorch.
-
Catalyst
is a PyTorch framework for Deep Learning Research and Development.
-
-
Colossal-AI
is a unified deep learning system for the big model era to make training and inference of large AI models efficient, easy, and scalable.
-
-
DeepTables
(DT) is a easy-to-use toolkit that enables deep learning to unleash great power on tabular data.
-
DGL
is an easy-to-use, high performance and scalable Python package for deep learning on graphs.
-
- A unified ensemble framework for PyTorch to improve the performance and robustness of your deep learning model.
-
OneFlow
is a deep learning framework designed to be user-friendly, scalable and efficient.
-
- A python library focuses on constructing Deep Probabilistic Models (DPMs).
-
- A scikit-learn compatible neural network library that wraps PyTorch.
-
TensorFlow Decision Forests
(TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking.
-
AutoX
is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.
-
EvalML
is an AutoML library that builds, optimizes, and evaluates machine learning pipelines using domain-specific objective functions.
-
Evidently
helps analyze and track data and ML model quality throughout the model lifecycle.
-
Feature-engine
is a Python library with multiple transformers to engineer and select features for use in machine learning models.
-
FLAML
is a lightweight Python library for efficient automation of machine learning, including selection of models, hyperparameters, and other tunable choices of an application.
-
- A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
-
- Python package for concise, transparent, and accurate predictive modeling.
-
LightAutoML
(LAMA) is an AutoML framework by Sber AI Lab.
-
- An extension of LightGBM to probabilistic forecasting.
-
- This project is about explaining what machine learning classifiers (or models) are doing.
-
- The
mljar-supervised
is an Automated Machine Learning Python package that works with tabular data.
- The
-
NGBoost
is a Python library that implements Natural Gradient Boosting, as described in "NGBoost: Natural Gradient Boosting for Probabilistic Prediction".
-
PyCaret
is an open-source, low-code machine learning library in Python that automates machine learning workflows.
-
River
is a Python library for online machine learning. It aims to be the most user-friendly library for doing machine learning on streaming data.
-
SHAP
(SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model.
-
skfolio
is a Python library for portfolio optimization built on top of scikit-learn. It offers a unified interface and tools compatible with scikit-learn to build, fine-tune, and cross-validate portfolio models.
-
TPOT
is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
-
- Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
-
LAVIS
is a Python deep learning library for LAnguage-and-VISion intelligence research and applications.
-
- Pre-Training with Whole Word Masking for Chinese BERT
-
ChatLLaMA
is a library that allows you to create hyper-personalized ChatGPT-like assistants using your own data and the least amount of compute possible.
-
- A fast, affordable, scalable and open system framework for enabling end-to-end Reinforcement Learning Human Feedback (RLHF) training experience to generate high-quality ChatGPT-style models at all scales.
-
- A Modularized and Extensible NLP Framework.
-
- Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts.
-
Kashgari
is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
-
OpenNMT-py
is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine translation framework.
-
- Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM.
-
picoGPT
is an unnecessarily tiny and minimal implementation of GPT-2 in plain NumPy.
-
- Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks.
-
- This library is based on the Transformers library by HuggingFace.
Simple Transformers
lets you quickly train and evaluate Transformer models.
- This library is based on the Transformers library by HuggingFace.
-
Stanza
is a collection of accurate and efficient tools for the linguistic analysis of many human languages.
-
TigerBot
is a multi-language and multitask LLM.
-
FinRL
is the first open-source framework to show the great potential of financial reinforcement learning.
-
ArviZ
is a Python package for exploratory analysis of Bayesian models.
-
Pingouin
is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy.
-
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
-
18 May 2023, Xingang Pan, et al.
-
-
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
-
14 May 2023, Chunhui Zhang, et al.
-
-
SegGPT: Segmenting Everything In Context
-
06 Apr 2023, Xinlong Wang, et al.
-
-
-
05 Apr 2023, Alexander Kirillov, et al.
-
[Official Code] [Demo]
-
-
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
-
08 Mar 2023, Chenfei Wu, et al.
-
-
Adding Conditional Control to Text-to-Image Diffusion Models
-
10 Feb 2023, Lvmin Zhang, et al.
-
-
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
-
02 Jan 2023, Sanghyun Woo, et al.
-
-
Scalable Diffusion Models with Transformers
-
19 Dec 2022, William Peebles, Saining Xie
-
-
FlexiViT: One Model for All Patch Sizes
-
15 Dec 2022, Lucas Beyer, et al.
-
-
SinDiffusion: Learning a Diffusion Model from a Single Natural Image
-
22 Nov 2022, Weilun Wang, et al.
-
-
DiffusionDet: Diffusion Model for Object Detection
-
17 Nov 2022, Shoufa Chen, et al.
-
-
MetaFormer Baselines for Vision
-
24 Oct 2022, Weihao Yu, et al.
-
-
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
-
09 Oct 2022, Feng Liang, et al.
-
-
VToonify: Controllable High-Resolution Portrait Video Style Transfer
-
22 Sep 2022, Shuai Yang, et al.
-
-
Diffusion Models in Vision: A Survey
- 10 Sep 2022, Florinel-Alin Croitoru, et al.
-
Diffusion Models: A Comprehensive Survey of Methods and Applications
-
02 Sep 2022, Ling Yang, et al.
-
-
-
10 Jan 2022, Zhuang Liu. et al.
-
-
Masked Autoencoders Are Scalable Vision Learners
-
11 Nov 2021, Kaiming He, et al.
-
-
Denoising Diffusion Probabilistic Models
- 19 Jun 2020, Jonathan Ho, et al.
-
Transformer models: an introduction and catalog
- 12 Feb 2023, Xavier Amatriain
-
ClimaX: A foundation model for weather and climate
-
24 Jan 2023, Tung Nguyen, et al.
-
-
The Forward-Forward Algorithm: Some Preliminary Investigations
- 02 Dec 2022, Geoffrey Hinton
-
Stochastic Weight Averaging Revisited
-
03 Jan 2022, Hao Guo, et al.
-
-
FNet: Mixing Tokens with Fourier Transforms
-
09 May 2021, James Lee-Thorp, et al.
-
-
Averaging Weights Leads to Wider Optima and Better Generalization
-
14 Mar 2018, Pavel Izmailov, et al.
-
-
Abstract Visual Reasoning with Tangram Shapes
- 29 Nov 2022, Anya Ji, et al.
-
Towards artificial general intelligence via a multimodal foundation model
- 2 Jun 2022, Nanyi Fei, et al.
-
"Low-Resource" Text Classification: A Parameter-Free Classification Method with Compressors
-
Jul 9, Zhiying Jiang, et al.
-
-
A Survey of Large Language Models
-
31 Mar 2023, Wayne Xin Zhao, et al.
-
-
-
15 Mar 2023, OpenAI
-
-
Cramming: Training a Language Model on a Single GPU in One Day
-
28 Dec 2022, Jonas Geiping, Tom Goldstein.
-
-
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
-
22 Dec 2022, Srinivasan Iyer, et al.
-
-
Scaling Instruction-Finetuned Language Models
-
20 Oct 2022, Hyung Won Chung, et al.
-
-
Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning
- 30 Jun 2022, Julien Perolat, et al.
-
Auto-GPT
is an experimental open-source application showcasing the capabilities of the GPT-4 language model.
-
- The goal of this project is to promote the development of an open-source community for Chinese conversational large language models, with the vision of becoming an LLM Engine that can help everyone.
-
- An ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue.
-
NovelAI
is a monthly subscription service for AI-assisted authorship, storytelling, virtual companionship, or simply a GPT powered sandbox for your imagination.
-
OpenChatKit
provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications.
-
- Colorize your photos for free!
-
Riffusion
is an app for real-time music generation with stable diffusion.
-
StableStudio
is Stability AI's official open-source variant of DreamStudio. It is a web-based application that allows users to create and edit generated images.
-
Modelverse
is a model sharing and search platform that contains a diverse set of deep generative models such as GANs, diffusion models, and autoregressive models.
-
- Awesome lists about all kinds of interesting topics.
-
- A list of totally open alternatives to ChatGPT.
-
- A complete computer science study plan to become a software engineer.
-
- Methods and Implements of Deep Clustering.
-
- Roadmap to becoming a developer in 2022.
-
- Data Science, Machine Learning, and Deep Learning. Projects, Tutorials and Cheatsheets.
-
Must-read papers on Recommender System
- A Curated List of Must-read Papers on Recommender System.
-
- Selected papers, corresponding codes and pre-trained models in the review paper "Neural Style Transfer: A Review".
-
- An Open-Source Engineering Guide for Prompt-in-context-learning from EgoAlpha Lab.
-
- Learn how to design large-scale systems.
-
- Specialized ChatGPT extension for research work, optimized for academic paper proofreading experience.
-
- Create a ChatGPT like experience over your custom docs using LangChain.
-
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
-
- The simplest, fastest repository for training/finetuning medium-sized GPTs.
-
- Ask questions to your Notion database in natural language.
-
brat
is a web-based tool for text annotation.
-
Computer Vision Annotation Tool (CVAT)
CVAT
is an interactive video and image annotation tool for computer vision.
-
ControlNet for Stable Diffusion WebUI
- The WebUI extension for ControlNet and other injection-based SD controls.
-
Flutter
is Google's SDK for crafting beautiful, fast user experiences for mobile, web, and desktop from a single codebase.
-
LangChain
is a framework for developing applications powered by language models.
-
latexify
is a Python package to compile a fragment of Python source code to a corresponding LaTeX expression.
-
- Documentation that simply works.
-
ML Visuals
contains figures and templates which you can reuse and customize to improve your scientific writing.
-
Netron
is a viewer for neural network, deep learning and machine learning models.
-
Prodigy
is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration.
-
Transformer Explainer
is an interactive visualization tool designed to help anyone learn how Transformer-based models like GPT work.
-
A walkthrough of transformer architecture code
- The notebook walks through a single forward pass of the Transformer architecture in pytorch.
-
Getting-Things-Done-with-Pytorch
- Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch.
-
labml.ai Deep Learning Paper Implementations
- This is a collection of simple PyTorch implementations of neural networks and related algorithms.
-
llama3 implemented from scratch
- llama3 implementation one matrix multiplication at a time.
-
- A curated collection of interactive Machine Learning projects.
-
Modern Convolutional Neural Network Architectures
- Revisions and implementations of modern Convolutional Neural Networks architectures in TensorFlow and Keras.
-
- This tutorial explains how to implement the Neural-Style algorithm developed by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge.
-
nlp-tutorial
is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch.
-
- Guides, papers, lecture, and resources for prompt engineering.
-
Pytorch: how and when to use Module, Sequential, ModuleList and ModuleDict
- Effective way to share, reuse and break down the complexity of your models.
-
Recommendation systems | TensorFlow
- Resources for building recommendation systems from TensorFlow.
-
- Master the command line, in one page.