Skip to content

Collections of learning materials and resources about data science, machine learning, deep learning and other interesting fields.

Notifications You must be signed in to change notification settings

DaoSword/data-science-collections

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Data Science Collections

GitHub last commit GitHub commit activity GitHub repo size GitHub watchers GitHub Repo stars

Collection of learning materials and resources about data science, machine learning, deep learning and other interesting fields.

Table of Contents

Articles

Blogs

Books

Algorithms

Deep Learning

Linear Algebra

Large Language Model (LLM)

Machine Learning

Reinforcement Learning

Statistics

  • Seeing Theory

    • A visual introduction to probability and statistics.

Courses

AIGC

Computer Science

Data Science

Deep Learning

Machine Learning

Natural Language Processing

Reinforcement Learning

Statistics

Libraries

Computer Vision

  • Hawkeye

    • Hawkeye is a unified deep learning based fine-grained image recognition toolbox built on PyTorch, which is designed for researchers and engineers.
  • ImageAI

    • An open-source python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code.
  • ImaginAIry

    • AI imagined images. Pythonic generation of stable diffusion images.
  • Kornia

    • Kornia is a differentiable computer vision library for PyTorch.
  • Latent Diffusion

    • High-Resolution Image Synthesis with Latent Diffusion Models.
  • Stable Diffusion

    • Stable Diffusion is a latent text-to-image diffusion model.
  • stable-diffusion-tensorflow

    • A Keras / Tensorflow implementation of Stable Diffusion.
  • Ultralytics YOLOv8

    • Ultralytics YOLOv8, developed by Ultralytics, is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility.

Data Science

  • librosa

    • Python library for audio and music analysis.
  • Pandaral·lel

    • pandarallel is a simple and efficient tool to parallelize Pandas operations on all available CPUs.
  • Polars

    • Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js
  • PyClustering

    • pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks).
  • TimeSide

    • TimeSide is a Python framework enabling low and high level audio analysis, imaging, transcoding, streaming and labelling.
  • Xorbits

    • Xorbits is an open-source computing framework that makes it easy to scale data science and machine learning workloads — from data preprocessing to tuning, training, and model serving.

Deep Learning

  • Alpa

  • Auto-PyTorch

    • Automatic architecture search and hyperparameter optimization for PyTorch.
  • Catalyst

    • Catalyst is a PyTorch framework for Deep Learning Research and Development.
  • Colossal-AI

    • Colossal-AI is a unified deep learning system for the big model era to make training and inference of large AI models efficient, easy, and scalable.

    • [Service Demo]

  • DeepTables

    • DeepTables(DT) is a easy-to-use toolkit that enables deep learning to unleash great power on tabular data.
  • DGL (Deep Graph Library)

    • DGL is an easy-to-use, high performance and scalable Python package for deep learning on graphs.
  • Ensemble PyTorch

    • A unified ensemble framework for PyTorch to improve the performance and robustness of your deep learning model.
  • OneFlow

    • OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
  • PyDPM

    • A python library focuses on constructing Deep Probabilistic Models (DPMs).
  • skorch

    • A scikit-learn compatible neural network library that wraps PyTorch.
  • TensorFlow Decision Forests

    • TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking.

Machine Learning

  • AutoX

    • AutoX is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.
  • EvalML

    • EvalML is an AutoML library that builds, optimizes, and evaluates machine learning pipelines using domain-specific objective functions.
  • Evidently

    • Evidently helps analyze and track data and ML model quality throughout the model lifecycle.
  • Feature Engine

    • Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models.
  • FLAML

    • FLAML is a lightweight Python library for efficient automation of machine learning, including selection of models, hyperparameters, and other tunable choices of an application.
  • Hypernets

    • A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
  • imodels

    • Python package for concise, transparent, and accurate predictive modeling.
  • LightAutoML

    • LightAutoML (LAMA) is an AutoML framework by Sber AI Lab.
  • LightGBMLSS

    • An extension of LightGBM to probabilistic forecasting.
  • Lime

    • This project is about explaining what machine learning classifiers (or models) are doing.
  • mljar-supervised

    • The mljar-supervised is an Automated Machine Learning Python package that works with tabular data.
  • NGBoost

  • PyCaret

    • PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows.
  • River

    • River is a Python library for online machine learning. It aims to be the most user-friendly library for doing machine learning on streaming data.
  • SHAP

    • SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model.
  • skfolio

    • skfolio is a Python library for portfolio optimization built on top of scikit-learn. It offers a unified interface and tools compatible with scikit-learn to build, fine-tune, and cross-validate portfolio models.
  • TPOT

    • TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Multimodal

  • Chinese-CLIP

    • Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
  • LAVIS

    • LAVIS is a Python deep learning library for LAnguage-and-VISion intelligence research and applications.

Natural Language Processing

  • 🤗 Transformers

  • AllenNLP

  • Chinese-BERT-wwm

    • Pre-Training with Whole Word Masking for Chinese BERT
  • ChatLLaMA

    • ChatLLaMA is a library that allows you to create hyper-personalized ChatGPT-like assistants using your own data and the least amount of compute possible.
  • DeepSpeed-Chat

    • A fast, affordable, scalable and open system framework for enabling end-to-end Reinforcement Learning Human Feedback (RLHF) training experience to generate high-quality ChatGPT-style models at all scales.
  • fastNLP

    • A Modularized and Extensible NLP Framework.
  • gpt-2-simple

    • Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts.
  • Kashgari

    • Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
  • OpenNMT-py

    • OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine translation framework.
  • PaLM-rlhf-pytorch

    • Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM.
  • picoGPT

    • picoGPT is an unnecessarily tiny and minimal implementation of GPT-2 in plain NumPy.
  • pyhanlp

  • Scikit-LLM

    • Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks.
  • Simple Transformers

    • This library is based on the Transformers library by HuggingFace. Simple Transformers lets you quickly train and evaluate Transformer models.
  • Stanza

    • Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages.
  • TigerBot

    • TigerBot is a multi-language and multitask LLM.

Reinforcement Learning

  • FinRL

    • FinRL is the first open-source framework to show the great potential of financial reinforcement learning.

Statistics

  • ArviZ

    • ArviZ is a Python package for exploratory analysis of Bayesian models.
  • Pingouin

    • Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy.

Papers

Computer Vision

Deep Learning

Multimodal

Natural Language Processing

Reinforcement Learning

Platforms

Applications

  • Auto-GPT

    • Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model.
  • BELLE

    • The goal of this project is to promote the development of an open-source community for Chinese conversational large language models, with the vision of becoming an LLM Engine that can help everyone.
  • GPT4All

    • An ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue.
  • NovelAI

    • NovelAI is a monthly subscription service for AI-assisted authorship, storytelling, virtual companionship, or simply a GPT powered sandbox for your imagination.
  • OpenChatKit

    • OpenChatKit provides a powerful, open-source base to create both specialized and general purpose chatbots for various applications.
  • Palette

    • Colorize your photos for free!
  • Riffusion App

    • Riffusion is an app for real-time music generation with stable diffusion.
  • StableStudio

    • StableStudio is Stability AI's official open-source variant of DreamStudio. It is a web-based application that allows users to create and edit generated images.

Coding

Models

  • Modelverse

    • Modelverse is a model sharing and search platform that contains a diverse set of deep generative models such as GANs, diffusion models, and autoregressive models.

Notebooks

Projects

Repositories

Natural Language Processing

  • chatgpt_academic

    • Specialized ChatGPT extension for research work, optimized for academic paper proofreading experience.
  • Chat-Your-Data

    • Create a ChatGPT like experience over your custom docs using LangChain.
  • gpt-fast

    • Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
  • nanoGPT

    • The simplest, fastest repository for training/finetuning medium-sized GPTs.
  • notion-qa

    • Ask questions to your Notion database in natural language.

Tools

  • ML Visuals

    • ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
  • Netron

    • Netron is a viewer for neural network, deep learning and machine learning models.
  • Prodigy

    • Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration.
  • Transformer Explainer

    • Transformer Explainer is an interactive visualization tool designed to help anyone learn how Transformer-based models like GPT work.

Tutorials

About

Collections of learning materials and resources about data science, machine learning, deep learning and other interesting fields.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published