Great Deep Learning Tutorials for Natural Language Processing (NLP)

A Great Collection of Deep Learning Tutorials and Repositories for Natural Language Processing (NLP)

General:

Great NLP Posts
Awesome NLP Paper Discussions - Hugging Face [Excellent]
Ten trends in Deep learning NLP
Attention in RNNs
Understanding self-attention and other types of attention mechanisms
BERT - TensorFlow
Understanding XLNet
XLNet - TensorFlow
XLM (PyTorch implementation of Cross-lingual Language Model Pretraining)
Pretrained PyTorch models for BERT
Library of state-of-the-art pretrained models for NLP [Excellent]
DistilBERT
FastBert
FastBert Linkedin Post
PyTorch Hub - BERT
A Simple Guide On Using BERT for Binary Text Classification
Core ML 3 implementation of BERT for Question answering
NLP - Keras - Intro
AllenNLP [General NLP]
Stanza - A Python NLP Library for Many Human Languages
The Best NLP Papers From ICLR 2020
Deep learning for natural language processing and information retrieval at the University of Waterloo
Natural Language Processing With spaCy in Python [Great]
NLP Papers
A Great NLP Course
KerasNLP: Modular NLP Workflows for Keras
NLP Test: Deliver Safe & Effective Models
Karpathy minbpe
Karpathy's 2 Hours Tutorial for Building GPT Tokenizer
Learning Core Foundational Concepts in NLP by Examples and by calculation by Hand
SetFit: Efficient Few-shot Learning with Sentence Transformers

General Persian based libraries & Data Sets:

Parsivar: library for Persian text preprocessing
Hazm
persianNLP
ParsiNLU: Comprehensive suit of high-level NLP tasks for Persian language
FarsTail: A Persian Natural Language Inference Dataset
wordfreq: Access a database of word frequencies
Persian Stop Words List
Persian Stop Words List in Hazm Repo
PCoQA: Persian Conversational Question Answering Dataset
Khayyam Challenge (PersianMMLU): Is Your LLM Truly Wise to The Persian Language? [Good paper & dataset]
Basalam Dataset via RadeAI Team
Basalam Datasets for LLM Fine-tuning

Text Representation:

Beyond Word Embeddings Part 1
Beyond Word Embeddings Part 2
Learning Word Embedding
Introduction to Word Embedding and Word2Vec
Word Embedding
Understanding Word Embeddings
Introduction to Word Vectors
Word2vec Made Easy
What is GloVe? Part I
What is GloVe? Part II
What is GloVe? Part III
What is GloVe? Part IV
What is GloVe? Part V
ELMo: Deep Contextualized Word Representation
A Step-by-Step NLP Guide to Learn ELMo
ELMo: Contextual language embedding
word embeddings with ELMo
Doc2Vec - Gensim

Self-Supervised Learning in NLP:

https://amitness.com/2020/05/self-supervised-learning-nlp/
COSINE: Fine-Tuning Pre-trained Language Model with Weak Supervision

RNN, LSTM, and GRU:

Understanding LSTM Networks
Illustrated Guide to LSTM’s and GRU’s
Animated RNN, LSTM and GRU
Recurrent Neural Networks and LSTM explained
Long Short-Term Memory (LSTM): Concept
Understanding architecture of LSTM cell from scratch
Basic understanding of LSTM
Taming LSTMs with PyTorch
Introduction to LSTM
Introduction to RNNs
xLSTM - Post1
Were RNNs All We Needed? [Interesting Paper]

Transformers:

How Transformers Work
The Illustrated Transformer
Transformers from Scratch
What is a Transformer?
How Transformers work in deep learning and NLP
Transformer: A Novel Neural Network Architecture for Language Understanding
How do Transformers Work in NLP?
The Essence of Transformers [Good]
Transformers and Multi Head Attention
Multi Head Attention
BERT for Dummies
The Dark Secrets of BERT
A Survey of Long-Term Context in Transformers [Great]
The Transformer Family
The Transformer Isn’t As Hard To Understand As You Might Think
Review of Compact Transformer Architectures [Great]
REFORMER: The Efficient Transformer
GPT-3: Language Models are Few-Shot Learners
GPT-3 Sandbox
Microsoft will launch GPT-4
OpenAI GPT-4
Some information about GPT-4
Regular Expressions (Regex) Generated by GPT-3
Auto Regex: Converting English description to Regex [Good]
minGPT
NVIDIA FasterTransformer: Transformer related optimization, including BERT & GPT
OpenNMT CTranslate2: Fast inference engine for Transformer models
Deploying GPT-J and T5 with FasterTransformer and Triton Inference Server [Interesting]
MEND: Fast Model Editing at Scale [Excellent Work]
BorealisAI Transformers I: Introduction
OpenAI Best Practices for Deploying Language Models
OPT-IML
RetNet: an Alternative to Transformers
Transformer Taxonomy [Great]
Generative AI exists because of the transformer: Great Visual Explanation [Great]

Reinforcement Learning from Human Feedback (RLHF):

RLHF Tutorial
New method instead of RLHF: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Finetuning an LLM: RLHF and alternatives (Part I)
Finetuning an LLM: RLHF and alternatives (Part II)
Finetuning an LLM: RLHF and alternatives (Part III)
How good is AI feedback?
Direct Preference Optimization (DPO) for LLM Alignment (From Scratch)

Tokenizer Notes:

𝗻𝗲𝘄 𝗽𝗮𝗽𝗲𝗿 𝗯𝘆 𝗠𝗲𝘁𝗮 𝗰𝗹𝗮𝗶𝗺𝘀 𝘁𝗵𝗮𝘁 𝘄𝗲 𝗰𝗮𝗻 𝗴𝗲𝘁 𝗿𝗶𝗱 𝗼𝗳 𝘁𝗼𝗸𝗲𝗻𝗶𝘇𝗲𝗿𝘀: Byte Latent Transformer: Patches Scale Better Than Tokens --> we could get rid of tokenizers
Byte Latent Transformer: Patches Scale Better Than Tokens (paper)

Large Language Models (LLMs):

LLM Reading Papers
LLaMA
Toolformer: Language Models Can Teach Themselves to Use Tools [Great]
Toolformer GitHub
Amazon Multimodal Chain-of-Thought Reasoning in Language Models
LLaMA-based ChatGPT Training [Great]
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
Stanford Alpaca: An Instruction-following LLaMA model
Alpaca: A Strong, Replicable Instruction-Following Model
Fine-Tune Alpaca in Arabic
TRL: Transformer Reinforcement Learning
Large Language Model (LLM) Primers Tutorial [Great]
Dolly
Microsoft JARVIS & HuggingGPT [Interesting]
open-source LLMs
GPT4Free
HuggingChat
LaMini-LM: A Diverse Herd of Distilled Models
RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset
BigCode
OpenLLaMA
Dromedary: towards helpful, ethical and reliable LLMs
MPT-7B Model with Commercial Licence
MPT-7B Story Writer
MPT-7B
MPT-7B Blog
Open LLMs
Google PaLM 2
BLOOMChat
LLMs Practical Guide
FrugalGPT
ChatALL [Great]
Falcon LLM
The Falcon has landed in the Hugging Face ecosystem [Great]
Open LLMs [Great]
OpenLLMs: Less is More for Open-source Models [Great]
LLaMA2
source code of llama2-chatbot
Notes about OpenAI's GPT-4 Model
GPT-4 is getting worse over time
OpenChat: Less is More for Open-source Models
Instruction Tuning Datasets
ToolLLM
Falcon 180B
Fine-tune Falcon 180B using QLoRA and Flash Attention on Amazon SageMaker
Large Language Models as Optimizers
Favourite LLM Authors
Open Source LLMs for Commercial Use
Optimizing your LLM in production [Important]
In Context Vectors (ICV): an alternative to Few-Shot Learning and Finetuning techniques like LoRA to improve an LLMs performance
NexusRavan v2 13B Fuction Calling LLM Surpassing GPT-4
Phixtral model
Eagle-7B LLM: 100% attention-free RNN Model!
Eagle-7B LLM: Blog Post
Can LLMs improve themselves? Self-play fine-tuning (SPIN)
AI2 OLMo Model: Linkedin Post
AI2 OLMo Model: HuggingFace
AI2 OLMo Model: Original Blog post
Some Notes about OLMo Model
Mixtral in colab [Great]
Grok-1 LLM with 314B Size: Post1
Grok-1 LLM: Post2
DBRX LLM
DBRX LLM: Post1
DBRX LLM: Post2
LLMs via Multi-Token Prediction

Merge LLMs:

Linkedin Post
Colab Notebook
Main Github of Mergekit
huggingface merge-models blog post
Making the NeuralBeagle14-7B LLM Model (via Merging models and other methods)
Merge Large Language Models with mergekit
Fine-tune a Mistral-7b model with Direct Preference Optimization
AutoMerger
Evolutionary LLM Merging - Post1
Evolutionary LLM Merging - Post2
Mixture of Experts (MoEs) Explained [Great]
Mixture of Experts (MoEs) Papers List
Mixture of Experts (MoEs) Linkedin Post
Mixture-of-Depths - Post1
Mixture-of-Depths (MoD) - Post2
AutoLoRA-Merging Linkedin Post

LLaMA2 Related Links:

A colab gradio web UI for running Large Language Models [Great]
llama-2-7b-chat-GPTQ-4bit
camenduru
llama-2 philschmid
fine-tuning LLMs with TRL
lora tuning peft finetuning llama2
LLaMA2 with PEFT
Baby LLaMA2 in C
Releasing LLongMA-2 16k
LLaMA2 API in Hugging Face Inference
LLaMA2 API in Monster API
LLaMA2-Accessory
Hermes-LLongMA-2 8k
Training Llama 2
Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API
LLaMA-Factory
LLaMA-Factory Notes
Purple llama by Meta - Link1
Purple llama by Meta - Link2
Purple llama by Meta - Link3
TinyLLaMa-1.1B
Can llama learn new language?
Persian LLaMa

LLaMA3 Related Links:

LLaMA3 Linkedin Post1
Meta LLaMA3-8B
Fine tune LLaMA3
LLaMA3 Long Context
LLaMA3.1
LLaMA 3.1 Some Notes
LLaMA 3.1 Model Finetunning
LLaMA 3.1 Detail Notes
LLaMA 3.2 Detail Notes
Mobile LLaMA 3.2
Llama-3.3-70B-Instruct
How an online gifting site is using Llama to help protect customer privacy [interesting]

DeepSeek Models Related Links:

DeepSeek-V3 Linkedin Post

Phi-3 Related Links:

Phi-3 Linkedin Post1
Phi-3 Linkedin Post2

Mistral & Mixtral Models Related Links:

Mistral AI models
Is Mistral's first model a good replacement for OpenAI?
Mistral Mixture of Experts (MoE) Model
Mixtral - a SOTA Mixture of Experts
Mistraltrx
Nous-Hermes-Mixtral model
Mixtral in colab [Great]
Brev.dev Notebooks: Fine-tuning mistral, mixtral, phi-2 and etc [Excellent]
Optimized LLM inference api for mistral-7b using vllm and AWQ [Excellent]
Run Mistral7b Quantized for free on any computer (CPU or GPU) [Interesting]
Mixtral 8x22B a 176B MoE Model
Mistral-7B-Instruct-v0.3
Codestral: A model fluent in 80+ programming languages
Mistral Finetune: the official repo and guide on how to fine-tune Mistral open-source models
Mistral Large 2 Model

Yi Models:

Yi Github
Yi Website
Yi-VL-6B HuggingFace

Qwen Models:

Introducing Qwen1.5 Blog Post
Qwen1.5 Linkedin Post
Qwen1.5 HuggingFace
Qwen2 HuggingFace
Qwen MoE Model
Qwen2
Qwen 2.5 - Linkedin Post
Qwen 2.5 - Models

Jamba (SSM-Transformer Model):

AI21 Labs Jamba Model
Fine-tune jamba with TRL
Fine-tune jamba code

1-bit LLMs:

1-bit LLMs (AlphaSignal Post)
1-bit Quantization
Some Notes about 1-bit LLMs (Their benefits and drawbacks)
AutoBitnet (Train your 1.58-bit LLM based on LLama Architecture for free on Colab T4 GPU)
Llama2 7b in 1-bit precision
Microsoft 1-Bit LLM

Long Context Window LLMs (e.g., 100K Tokens LLMs):

Claude LLM
Some Notes about the 100K Claude LLM Model
Anthropic's Claude-2
Claude-2, Anthropic's ChatGPT competitor
Some Information about Claude 3
LongNet: Scaling Transformers to 1B Tokens
Lost in the Middle: How Language Models Use Long Contexts
Notes about How Language Models Use Long Contexts
Scaling LLaMA and GPTNeoX to >8k input context
Unofficial Claude-API
Claude Unofficial API
YARN & LongLlaMa
YaRN: Efficient Context Window Extension of LLMs
LLMs get lost when the context becomes too long: Lost in the Middle: How Language Models Use Long Contexts [Very Important]
LongLoRA: Efficient Fine-tuning of Long-Context LLMs
LongLoRA: Efficient Fine-tuning of Long-Context LLMs (another post)
Efficient Streaming LLMs with Attention Sinks for infinite-length inputs
MemGPT: Teaching LLMs memory management for unbounded context
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs [Interesting]
Llmlingua Prompt Compress [Interesting]

Small Language Models (SLMs):

Microsoft Phi-2 Model (with 2.7B Parameters)
Can "small" finetuned LLMs with less than 2B parameters outperform larger openly available LLMs (Mixtral, Llama 2 Chat) and proprietary LLMs (ChatGPT)?
Smol LM
Hymba Small LM

Frameworks for Training & Using Large Language Models (LLMs):

ColossalAI: Library for LLMs
LangChain: Library for Building applications with LLMs
LangChain Chat
LangChain Crash Course
LangChain 101
LangChain Resources
LangChain & Vector Databases in Production Course
Building LLM Powered Apps via LangChain Course
OpenFlamingo
Deepset Haystack Framework
LMQL: A query language for programming LLMs
LLM Training Frameworks List
NeMo Guardrails
Lamini: The LLM engine for rapidly customizing models
Scikit-LLM: Sklearn Meets Large Language Models
Chainlit
ChatUI
Streamlit-Chat
Gradio: Creating a Streaming chatbot fast
Streamlit-Weaviate Connection: provides a custom streamlit connection to query data from weaviate
LangKit: an open-source text metrics toolkit for monitoring language models
HuggingFace Transformers Agents
privateGPT: Ask questions to your documents using the power of LLMs
Spacy LLM
Lit-GPT
Zero to LitGPT Tutorial: Getting Started with Pretraining, Finetuning, and Using LLMs [Great]
GPTCache: A Library for Creating Semantic Cache for LLM Queries
AutoTrain-Advanced
Monster API: API for using & fine-tuning LLMs
AnythingLLM: A full-stack personalized AI assistant
EasyLLM: helpful tools and methods for working with LLMs
gpt-llm-trainer: input a description of your task, and fine-tune a LLaMA 2 model for you
Embedchain: a framework to easily create LLM powered bots
PandasAI [It is not related strictly in this section, but it is interesting]
GPT Engineer: Specify what you want it to build, the AI asks for clarification, and then builds it
Ludwig: a low-code framework for building custom AI models like LLMs
open-interpreter
kani: is a lightweight and highly hackable framework for chat-based language models with tool usage/function calling
Kani colab samples
Kani Linkedin Post
Argilla: the open-source data curation platform for LLMs
LiteLLM: Call all LLM APIs using the OpenAI format
LLM Finetuning with PEFT
ChatGPT-AutoExpert: Supercharged Custom Instructions for ChatGPT
PyTorch thunder (pytorch compiler for speed up training of LLMs) - Linkedin Post
PyTorch Lightning Thunder
unsloth library: 2-5X faster 70% less memory QLoRA & LoRA finetuning [Great for fine-tuning LLMs]
TorchTune: A Native-PyTorch Library for LLM Fine-tuning

Notes and Codes for Training and fine-tuning LLMs:

LLM Finetuning with PEFT Colab Notebooks
Self Instruct TRL for LLMs
Self Instruct TRL for LLMs - Link2
How to Fine-Tune LLMs in 2024 with Hugging Face
Fine tune LLMs in your own hardware via PyTorch team (great)
RLHF in 2024 with DPO & Hugging Face
A little guide to building Large Language Models in 2024 (PPT by HuggingFace Team) [Great]
Video Link1 of A little guide to building Large Language Models in 2024 (PPT by HuggingFace Team)
Video Link2 of A little guide to building Large Language Models in 2024 (PPT by HuggingFace Team)
Understanding the instruction fine-tuning process in LLMs
Top 5 Tips and Tricks for LLM Fine-Tuning and Inference from Intel Experts

Reflection-Tuning of LLMs:

Reflection-Tuning of LLMs

Memory Layer for LLMs:

Memory layer for LLMs
Memory layer for LLMs - GitHub Repo

LLMs for Coding:

CodeGen
Code Llama
Code Llama Notes

LLMs as Front-End Engineers:

Design2Code: How Far Are We From Automating Front-End Engineering?
Llama Coder: Can generate full React apps

LLMs Courses & Tutorials:

LLM Bootcamp - Spring 2023
LLM University
List of LLM Courses
Anti-hype LLM reading list
Microsoft Generative AI Course
Google and Kaggle five-day generative AI course [Good]
Best Resources for learning to work with LLMs
Start with Large Language Models (LLMs) - Become an expert for free! [Interesting]
Intro to LLMs: Andrej Karpathy 1 Hour Lecture
LLM Course [good]
LLM Course in ChatGPT Plus
Build a Large Language Model (From Scratch) great Course and Book Tutorial [Great]
Learning Resources about LLMs
The Transformer Layer by Layer Course
The Transformer Layer by Layer Course: Linkedin
Hands-on LLMs Course
Direct Preference Optimization (DPO) Method for LLMs Tutorial
CS25: Transformers United V3 Courses - Autumn 2023
CS336: Language Modeling from Scratch
Visual and Animated Lecture about LLMs and Transformers and Deep Learning
LLMs Roadmap [Great]
Brev.dev Notebooks: Fine-tuning mistral, mixtral, phi-2 and etc [Excellent]
LLM Summer School
LLM Engineer's Handbook
LLM Twin Course: Building Your Production-Ready AI Replica
Hands-On Large Language Models Book

LLMs Ranking:

Open LLM Leaderboard
Chatbot Arena Leaderboard
AlpacaEval Leaderboard
CanAiCode Leaderboard
Small LLMs Performance Ranking
Chatbot Arena: Benchmarking LLMs in the Wild [Great]
Chatbot Arena Leaderboard
AI2 WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild [Great]
AI2 WildBench Linkedin Post
Persian LLM Leaderboard (via Part AI)

Building NLP Applications Powered by LLMs (Different Methods for Augmenting Knowledge to LLMs (or Retrieval-Augmented Generation (RAG) applications)):

Ask a Book Questions with LangChain OpenAI [Great]
OpenAI Web QA Embeddings
Deepset Haystack Framework
Stanford Retrieval-based NLP
Hypothetical Document Embeddings (HyDE)
ChatDB: Augmenting LLMs with Databases
ChatNode
Emerging Architectures for LLM Applications
Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
Fine tuning vs. RAG for LLMs
Building RAG-based LLM Applications for Production (Part 1) [Good]
Verba: The Golden RAGtriever, user-friendly interface for Retrieval-Augmented Generation (RAG) applications
DocsGPT: GPT-powered chat for documentation, chat with your documents
RAFT: Retrieval Augmented Fine Tuning - Post1
RAFT: Retrieval Augmented Fine Tuning - Post2
RAFT: Retrieval Augmented Fine Tuning - Microsoft Blog
RAFT: Retrieval Augmented Fine Tuning - Berkeley Blog
RAFT Code
Long context LLMs vs RAG [Interesting]
RAGFlow: an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding
Two Step RAG: Speculative RAG: Enhancing retrieval augmented generation through drafting
Exploring Multimodal RAG with LlamaIndex and GPT-4 or the New Anthropic Sonnet Model
PaperQA2: High accuracy RAG for answering questions from scientific documents with citations
Sophisticated Controllable Agent for Complex RAG Tasks
Anthropic's Cluade Introducing Contextual Retrieval RAG
Docling: Get your docs ready for gen AI
Recent RAG Research from Google

Graph RAG & Its Related Data Bases:

ArangoDB: The Most Complete And Scalable Platform For Graph-Powered GenAI
Microsoft GraphRAG
llamaindex Graph RAG
Gephi: The Open Graph Viz Platform
JanusGraph: is a scalable graph database optimized for storing and querying graphs
cayley: Open Source Graph Data Base
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering (Paper)
The GraphRAG Manifesto: Adding Knowledge to GenAI
Neo4j for GenAI

Vector Database Libraries:

weaviate
weaviate GitHub
chroma
Qdrant: Vector Database for AI Applications
pinecone
rektor-db
pgvector
LlamaIndex: comprehensive toolkit to perform data augmentation for LLMs
jina-ai VectorDB
sqlite-vec: A vector search SQLite extension

Great Embedding Models for Search (for Augmenting External Knowledge into ChatBot Vector DB) [Retrieval Augmented Generation (RAG)]:

Massive Text Embedding Benchmark (MTEB) Leaderboard
Word and sentence embeddings is how LLMs understand text
FlagEmbedding
E5 embedding vs OpenAI Ada
M2-BERT-80M-32k-Retrieval
Embedding Quantization - Post1
Embedding Quantization - Post2
Embedding Quantization - HuggingFace Blog Post
Quantization Fundamentals with Hugging Face Course
Is Cosine-Similarity of Embeddings Really About Similarity?
LLM2Vec [Great]
Fine tuning embedding models for RAG (Linkedin post)
Fine tuning embedding models for RAG (Original Post)
all-MiniLM-L6-v2 --> Sentence-Transformers Model for Embedding
Learn How to Fine-tuning Embedding Models Course [Great]
LLMs Embedding Course - Link1
LLMs Embedding Course - Link2
txtai: All-in-one embeddings database
NVIDIA NV-emb-2 embeddings
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
ModernBert: Linkedin Post1
ModernBert: Linkedin Post2

Prevent Hallucinations from LLMs & Controling their outputs:

Deep Dive Into LLM Hallucinations Across Generative Tasks
Controlled Generation Tools
Guidance: Controlling LLMs
NeMo Guardrails
Minimising Hallucinations in LLM Applications: NeMo Guradrails Video Tutorial
Mitigate Hallucination in LLMs
LLMs Hallucinations Benchmark
Mitigating LLM Hallucinations: a multifaceted approach [Great]

Training & Using Large Language Models (LLMs) on Low Resource Machines:

Cramming: Training a Language Model on a Single GPU in One Day [Great]
Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU [Great]
PEFT: State-of-the-art Parameter-Efficient Fine-Tuning [Great]
PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware [Great]
Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes
bitsandbytes: 8-bit CUDA functions for PyTorch
Alpaca-LoRA: Low-Rank LLaMA Instruct-Tuning on consumer hardware [Great]
LLaMA & Alpaca Tutorial: “ChatGPT” On Your Local Computer
Dalai: The simplest way to run LLaMA on your local machine
pyllama
Alpaca-LoRA-Serve
llama.cpp: Port of Facebook's LLaMA model in C/C++
alpaca.cpp
SparseGPT: Remove 100 Billion Parameters of LLMs
xFormers: Toolbox to Accelerate Research on Transformers
LLaMA-Adapter: Efficient Fine-tuning of LLaMA (Fine-tuning LLaMA to follow instructions within 1 Hour and 1.2M Parameters)
GPT4All [Great]
Vicuna web page [Great]
Vicuna GitHub: FastChat
PetGPT
GPT-4-LLM
baize Chatbot
Koala
Gorilla: An API store for LLMs
Lit-LLaMA
Auto-GPT
xTuring
GPTCache
Dolly-v2-12B
Web LLM
P-tuning v2
QLoRA: Efficient Finetuning of Quantized LLMs
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
GPTQ Quantization Method in Transformers
Optimize open LLMs using GPTQ and Hugging Face Optimum
GPTQ vs. bitsandbytes (BNB)
BNB Blog: Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
GPTQ Blog: Making LLMs lighter with AutoGPTQ and transformers
TensorRT-LLM
Overview of 🤗 Transformers Quantization: GPTQ vs bitsandbytes
LoRA Exchange (LoRAX): Serve 100s of Fine-Tuned LLMs for the Cost of 1
Introducing LoRAX
DeepSparse: Sparsity-aware deep learning inference runtime for CPUs
Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation) [Great]
Dare method for improving LLMs performance
Small model that surpass the GPT4 [Interesting]
Efficient LLMs Survey [Great]
LoRAX (LoRA eXchange): Framework that allows users to serve thousands of fine-tuned models on a single GPU
PowerInfer: High-speed LLMs Serving on PCs with Consumer-grade GPUs
LoRA From Scratch Implementation
Improving LoRA (DoRA): Implementing Weight-Decomposed Low-Rank Adaptation (DoRA)
DoRA Link2
Proxy-Tuning (new method for fine-tuning LLMs)
AutoQuantize (GGUF, AWQ, EXL2, GPTQ) Colab Notebook [Great]
DoRA: Weight-Decomposed Low-Rank Adaptation - Linkedin Post
DoRA: Weight-Decomposed Low-Rank Adaptation - Paper
GaLore: Memory Efficient Fine-tuning Technique
Quanto: a pytorch quantization toolkit [Great]
Quanto: Linkedin Post
Deleting 40% of LLM Layers Without Drop in Accuracy
The Unreasonable Ineffectiveness of the Deeper Layers
Continual Pretraining of LLMs
NOLA: run 10,000 customized LLaMA2 (70B) (4bit) models on a single 48GB GPU
NOLA LLaMA3
LoRA Learns Less and Forgets Less in comparision to full finetuning
Best Practices for Fine-Tuning & Training LLMs
TorchChat
The Evolution of Extreme LLM Compression: From QuIP to AQLM with PV-Tuning
Calculating GPU memory for serving LLMs
How Much GPU Memory is Needed to Serve a Large Language Model (LLM)?
CUDA-Free Inference for LLMs (PyTorch Blog)

Productionizing LLMs:

LLM From the Trenches: 10 Lessons Learned Operationalizing Models at GoDaddy

LLMs on Mobile Devices:

MLC LLM

LLM Applications & APIs:

Building LLM applications for production
Bard API
Amazon Bedrock: build and scale generative AI applications [Great]

Natural Language to SQL:

text to SQL Github Repos
vanna
sqlchat
dataherald
WrenAI
Practical text-to-SQL for data analytics by Linkedin [Great]
Persian abstract of above Practical text-to-SQL for data analytics by Linkedin - Out of Distribution Telegram Channel

Prompt Engineering:

Different Kinds of Prompt Engineering
Prompt Engineering Guide
PromptTools: tools for prompt testing and experimentation
Prompt engineering for Claude's long context window
Chain of Verification Prompt engineering method
Analogical Prompting
Prompt Flow: Build high-quality LLM apps
Contrastive Chain-of-Thought Prompting (CCoT)
New Prompting Techniques
Openai Prompt Engineering Guide - Linkedin Post
Openai Prompt Engineering Guide
Anthropic Claude Metaprompt Tool
Anthropic Prompt Improver
Anthropic Prompt Improver Linkedin Post
Anthropic Evaluate Prompts Tool
Cohere Prompt Tuner: Prompt Optimization at Your Fingertips
Quality Prompts: Use and evaluate prompting techniques quickly
Prompt Design at Character.AI
Structured Prompting
Writing with AI: Five ways professional writers are leveraging ChatGPT
Google Prompt Gallery
ell: The Language Model Programming Library

LLM-based Recommender Systems:

ChatGPT-based Recommender Systems

LLMs for Tabular Data:

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science
LLMs for Tabular Data - Linkedin post

LLMs as Classifiers (finetuning LLMs for classification):

LLMs as Classifiers Linkedin Post1
Training LLMs for Spam Classification

LLM Data Sets:

SlimPajama: A 627B token cleaned and deduplicated version of RedPajama

LLM based Agents:

MetaGPT: Multi-Agent Framework
DevOpsGPT: AI-Driven Software Development Automation Solution
LLM Agent Survey
Microsoft AutoGen development of LLM applications using multiple agents
OpenDevin: autonomous AI software engineer
Composio: the best toolset to integrate AI Agents
MindSearch: An LLM-based Multi-agent Framework of Web Search Engine
OpenAI Swarm Library for Multi-Agent
Don't Sleep on Single-agent Systems
Linkedin post for Don't Sleep on Single-agent Systems
Microsoft TinyTroupe library for simulate human agents with LLMs [Interesting]
HuggingFace Smolagent Library blog post [Useful]

Structured Output in LLMs:

PydanticAI
PydanticAI Linkedin Post

Deploying LLMs:

ExecuTorch Post1

LLM Engineering:

Langfuse: Open Source LLM Engineering Platform

Notes about Cost & Price of Training and Using LLMs:

Cost to Deploy LLaMA2 vs. ChatGPT [Very Important]
Anyscale Training Cost
LLMs APIs Pricing Benchmark: pricing of AWS Bedrock, OpenAI, Microsoft Azure
LLM Token-based Price Sheet
LLM Pricing Table Sheet
LLM Pricing Table Linkedin Post
Pricibg Sheet for Hosted LLMs
LLM Pricing Comparison Tool in HuggingFace Space

Excellent & Easy to Learn Resources for Learning Transformers:

e2eml transformers from scratch [Excellent]
annotated-transformer: Learning transformers from code
Transformers Recipe

Persian based Transformer Models:

ALBERT-Persian
ALBERT-Persian Demo Page
ALBERT-Farsi-base-v2 in HuggingFace
ParsBERT - Model for Persian Language Understanding
ARMAN [Great]
ParsBigBird: Persian Bert For Long-Range Sequences [Great]
PersianQA
Persian (Farsi) Pre-trained Language Models [Great]
Hezar: The all-in-one AI library for Persian, supporting a wide variety of tasks and modalities [Great & Important]
XLM-RoBERTa (Multilingual & supports Persian)
TookaBERT by PartAI [Great]
Dorna PartAI LLM

Transfer Learning with Transformers:

Transfer Learning for NLP via BERT for Text Classification
Text Classification with BERT Tokenizer
Bert Text Classification
Persian Semantic Search
Toward fine-tuning a state of the art Natural Language Inference (NLI) model for Persian

Siamese Netowrks and Dual BERT for Multi Text Classification:

Siamese and Dual BERT for Multi-text Classification
Transfer Learning via Siamese Networks

Attention Mechanism:

Attention Mechanism
Visualizing A Neural Machine Translation Model - Attention Mechanism
Intuitive Understanding of Attention Mechanism in Deep Learning
Structured Attention Networks

Sequence Modeling:

WaveNet: Increasing reception field using dilated convolution
Understanding WaveNet architecture
WaveNet: A Generative Model for Raw Audio
How WaveNet Works
PyTorch Tutorial to Sequence Labeling

Text Summarization:

Bert Extractive Summarizer [Great]
Generating Text Summaries Using GPT-2 on PyTorch with Minimal Training [Good]
A Gentle Introduction to Text Summarization in Machine Learning
Taming Recurrent Neural Networks for Better Summarization
PyTorch implementation of "Get to the point"
TensorFlow implementation of "Get to the point"

Language Model:

A Comprehensive Guide to Build your own Language Model in Python
D2L: Language Models and Dataset
Develop a word-level Neural Language Model in Keras
IBM deep learning language model
BERT language model
Facebook AI: GSLM
Language Modeling Great Tutorial
GALACTICA: general-purpose scientific language model [Great]
Distributed Training of Language Models with Reinforcement Learning via Human Feedback (RLHF) [Excellent]

Text & Document Classification:

hedwig - PyTorch deep learning models for document classification

Topic Modeling:

Topic Modeling with BERT
BERTopic: Great Library for Topic Modeling [Great]

Sentiment Analysis:

Introduction to Deep Learning – Sentiment Analysis

Co-Reference Resolution:

Coreference Resolution for Chatbots
Hugging Face - CoRef

Imbalance Handling in NLP:

Over-Sampling using SMOTE [SMOTE for high-dimensional class-imbalanced data]
Over-sampling via imbalanced-learn library
Imbalanced Data Handling

Information Retrieval:

PyTerrier: Python API for Terrier

Distance Measures:

Edit Distance

Text-based Emotion Recognition:

XLM-EMO: Multilingual Emotion Prediction in Social Media Text

Machine Translation:

Open-NLLB: No Language Left Behind (NLLB), models capable of delivering high-quality translations directly between any pair of 200+ languages

Chatbot:

Rasa Chatbot [Great]
Learn how to Build and Deploy a Chatbot in Minutes using Rasa
chatbot with DialoGPT
DialoGPT: huggingface Transformer
deeppavlov [Great]
PyTorch Chatbot Tutorial
Implement a Simple Chat Bot With PyTorch
GPT2 Chatbot PyTorch
PyTorch Official Chatbot Tutorial
PaddlePaddle Knover: toolkit for knowledge grounded dialogue generation
PaddlePaddle PLATO-2
ParlAI [Great]
huggingface: Transformers [Great]
huggingface: Blenderbot [Great]
huggingface: Blenderbot Small [Great]
huggingface: GPT-2 Text Generation [Great]
Seq2seq Chatbot
seq2seq Chatbot implemented in Pytorch
papers with code: chatbot
Proudly Leading the Chatbot
Real Python: Build a Chatbot with Python ChatterBot
A step-by-step guide to building a chatbot based on your own documents with GPT
GitHub Models
Git Ingest: Quickly turn a GitHub repository into text for LLMs [Great]
Create a Chatbot for any GitHub repo [Great]

Chatbot & LLMs Evaluation Metrics:

Chatbot Analytics: 9 Key Metrics
Chatbot Statistics for 2023
Chatbot Analytics 101: Essential Metrics to Track
12 Metrics For Chatbot Analytics
ParlAI Evaluation Metrics for Chatbot
Chatbot Evaluation Metrics [Great]
Databricks' report on LLM evaluation methods
AgentBench: Evaluating LLMs as Agents
Prometheus: Using GPT4 as SLMs Evaluator
LLM Model Evaluation Metrics - When and How to Use Them

OpenAI ChatGPT & Its Applications:

OpenAI ChatGPT [Amazing]
Description of How OpenAI ChatGPT Works: Illustrating Reinforcement Learning from Human Feedback (RLHF)
How ChatGPT was Trained
ChatGPT Android SDK
ChatGPT awesome apps
A Categorical Archive of ChatGPT Failures
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
aman.ai chatGPT Tutorial [Great]
ChatGPT for customer service
Chatgpt Retrieval Plugin
Trending AI Tools
Merlin: OpenAI ChatGPT Plus extension on all websites
Adrenaline
Using LLMs as agents that orchestrate tools [Interesting]
ChatGPT API Using Python
parthean: A Startup about Financial Expert via ChatGPT
Notes on the cost of ChatGPT
Ortus - your YouTube AI buddy
How Is ChatGPT’s Behavior Changing over Time?
LLM Drifts: How Is ChatGPT’s Behavior Changing over Time?
ChatGPT app Builder
GPT4 Turbo 128k analysis Notes (its price)
Designer GPT: website creator
OpenAI DevDay Breakout Sessions Videos
GPT Seed Parameter Notes
Awesome ChatGPT Prompts
GPT-4o Full Data Analysis
GPT4-o Architecture
Introducing Structured Outputs in the OpenAI API
OpenAI Realtime-api
OpenAI Model Distillation in the API
OpenAI Prompt Caching
LibreChat: Enhanced ChatGPT Clone [Great]

OpenAI Learning to Reason & O1 Models:

Learning to Reason with LLMs: OpenAI o1 Model
How does OpenAI train the Strawberry (o1) model to spend more time thinking?
Learning to Reason before you speak is how OpenAI o1 generates its response
5 Papers that better understanding Openai o1 models

Google Bard & Gemini:

Google DeepMind Gemini
Google released Gemini
Google Gemini official released notes

Anthropic Claude:

Anthropic Claude Tool Use
Anthropic Prompt Generator
Switched to Claude 3.5
Anthropic Message Batches API
Anthropic Message Batches API - Linkdin Post
OpenAI Prompt Caching in GPT 4o and o1: How Does It Compare To Claude Prompt Caching?
Anthropic Blog: Transformer Circuits Thread
Anthropic MCP (Model Context Protocol)

NLP Programming Notes:

100 Times Faster Natural Language Processing in Python
Multi-label Text Classification using BERT
Learning Meaning in Natural Language Processing
Train and Deploy the Mighty Transformer NLP models using FastBert and AWS SageMaker
Distilling knowledge from Neural Networks to build smaller and faster models
HarfBuzz - a text shaping library [Useful]
PruneBERT - Hugging Face
spacy-streamlit: spaCy building blocks for Streamlit apps
HuggingFace Evaluate Library
NeMo - toolkit for Conversational AI [Excellent]

Data Annotation Tools:

doccano is an open source text annotation tool [Great]
doccano-divar

Dataset Creator Tools:

Nvidia create dataset from massive pdf files tool

NLP Courses:

HuggingFace Course
NLP Zero to One: Full Course
Stanford CS25: Transformers United

Files

NLP.md

Latest commit

History

NLP.md

File metadata and controls

Great Deep Learning Tutorials for Natural Language Processing (NLP)

General:

General Persian based libraries & Data Sets:

Text Representation:

Self-Supervised Learning in NLP:

RNN, LSTM, and GRU:

Transformers:

Reinforcement Learning from Human Feedback (RLHF):

Tokenizer Notes:

Large Language Models (LLMs):

Merge LLMs:

LLaMA2 Related Links:

LLaMA3 Related Links:

DeepSeek Models Related Links:

Phi-3 Related Links:

Mistral & Mixtral Models Related Links:

Yi Models:

Qwen Models:

Gemma LLM Related Links (by Google):

Jamba (SSM-Transformer Model):

1-bit LLMs:

Long Context Window LLMs (e.g., 100K Tokens LLMs):

Small Language Models (SLMs):

Frameworks for Training & Using Large Language Models (LLMs):

Notes and Codes for Training and fine-tuning LLMs:

Reflection-Tuning of LLMs:

Memory Layer for LLMs:

LLMs for Coding:

LLMs as Front-End Engineers:

LLMs Courses & Tutorials:

LLMs Ranking:

Building NLP Applications Powered by LLMs (Different Methods for Augmenting Knowledge to LLMs (or Retrieval-Augmented Generation (RAG) applications)):

Graph RAG & Its Related Data Bases:

Vector Database Libraries:

Great Embedding Models for Search (for Augmenting External Knowledge into ChatBot Vector DB) [Retrieval Augmented Generation (RAG)]:

Prevent Hallucinations from LLMs & Controling their outputs:

Training & Using Large Language Models (LLMs) on Low Resource Machines:

Productionizing LLMs:

LLMs on Mobile Devices:

LLM Applications & APIs:

Natural Language to SQL:

Prompt Engineering:

LLM-based Recommender Systems:

LLMs for Tabular Data:

LLMs as Classifiers (finetuning LLMs for classification):

LLM Data Sets:

LLM based Agents:

Structured Output in LLMs:

Deploying LLMs:

LLM Engineering:

Notes about Cost & Price of Training and Using LLMs:

Excellent & Easy to Learn Resources for Learning Transformers:

Persian based Transformer Models:

Transfer Learning with Transformers:

Siamese Netowrks and Dual BERT for Multi Text Classification:

Attention Mechanism:

Sequence Modeling:

Text Summarization:

Language Model:

Text & Document Classification:

Topic Modeling:

Sentiment Analysis:

Co-Reference Resolution:

Imbalance Handling in NLP:

Information Retrieval:

Distance Measures:

Text-based Emotion Recognition:

Machine Translation:

Chatbot:

Chatbot & LLMs Evaluation Metrics:

OpenAI ChatGPT & Its Applications:

OpenAI Learning to Reason & O1 Models:

Google Bard & Gemini:

Anthropic Claude: