Skip to content

Latest commit

 

History

History
186 lines (148 loc) · 7.63 KB

notes.md

File metadata and controls

186 lines (148 loc) · 7.63 KB

research on research papers to research stuffs for fun as a hobby

protip, use perplexity.ai as a resource scrapper

Optimization algorithm

  • SGD -> Momentum -> NAG -> AdaGrad -> RMSProp -> Adam -> AdamW -> AdamScheduleFree (2024, by FAIR) read blog

To learn later (not interested right now)

  • Hessian Matrix (second order, BFGS, LBFGS etc)
  • AdamP, RAdam, and Stochastic Gradient Descent with Warm Restarts (SGDR)
  • Visualizing a loss landscape, interesting to implement (https://arxiv.org/abs/1712.09913)

Development of transformer based models and architecture

  • Transformer (Attention is all you need) blog
    • {Self, Multi Head, Cross} Attention
    • Fast weights
    • GPT-1 (2018) / GPT-2 [GPT paper, LLMs are multitask learners] blog
      • Summarization, still has some errors, didnt find exact fix to that problem but this paper might have answer
    • BERT (2018) blog
    • TransformerXL (2019) blog
    • Sparse Transformer (2019)-- N sqrt(N) complexity. blog
    • RoBERTa, DistilBERT, ALBERT-- these are BERT variations, good to know
    • T5 (2019)-- Encoder-Decoder model
    • Reformer (2020)-- N log(N) complexity
    • Linformer-- linear complexity
    • FlashAttention
    • Longformer (2020)
    • Conformers (2020)
    • ViT (2020)
    • PaLM (2022)
    • Galactia (2022)
    • Whisper (2022)
    • Persimmon (2023)
    • Fuyu (2023)
    • Mamba, S4, SSM (2023) <--- does this belong here, yet to find
    • InfiniAtten (2024)
    • Grouped Query Attention
    • Sliding Window Attention
    • GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Language Models

Positional embeddings

  • RoPE
  • CoPE

Tokenization

Finetuning

Vision

  • CNN Casestudy:
    • CNN - { Le -> Alex -> ZF -> VGG -> Google }Net
      • TODO: (inception architecture)
  • ResNet (residual and skip connection, research paper)
  • Classification + Localization = Object detection (cs231n)
    • R-CNN
    • Fast R-CNN
    • Faster R-CNN
    • YOLO: you only look once
  • segmentation?
  • SSD
  • CLIP-ResNet (read somewhere kinda interesting, mostprobably best ResNet till date? not sure)
  • train something on COCO dataset? A good task?
  • Visualizing CNN techniques
    • DeepDream?

Image Generation

  • Pixel RNN (maybe if interested)
  • VAE
  • GAN
  • Stable Diffusion
  • DALL-E
  • Vision QA models

Reinforcement learning

  • RF -- its a framework to teach agents
  • DQN
  • Policy Gradient Methods
  • DPO

Normalization and Regularization

  • Quantization? Factorization?
  • Weight Standardization
  • Label Smoothing
  • Filter Response Normalization
  • Normalization layers
    • BatchNorm
    • LayerNorm
    • GroupNorm
    • InstanceNorm
    • PowerNorm and weightNorm (are they good?)
    • RMSNorm-- most used, I think nowadays, used in LLaMA, Mistral, (Grok?), mostprobalbly also in GPT-4 (who knows)

Some research papers

New (saw somewhere in twitter)

Flops (Floating-Point Operations Per Second)

  • gemm in Python
  • gemm in JAX
  • gemm in c++

Open Architectures

  • Mixtral
  • is Phi architecture open source?
  • Grok-1
  • OpenHermes by NousResearch

Some resourceful repos