Skip to content

Exploring transformers by building a GPT model from scratch using nanoGPT, inspired by Andrej Karpathy’s tutorial.

License

Notifications You must be signed in to change notification settings

Pranavh-2004/GPT-From-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Learning Transformers: nanoGPT Exploration

Overview

This repository documents my learning journey through Andrej Karpathy's tutorial on building a GPT model from scratch, using nanoGPT as a reference. Due to compute limitations, I experimented with training on Google Colab.

Resources

Video and Articles:

Code and Research Papers:

Learning Highlights

  • Transformer Architecture: The model is built following the principles of the "Attention is All You Need" paper.
  • Self-Attention Mechanism: Understanding the role of attention heads and positional encoding.
  • TinyShakespeare Dataset: Used as a small-scale dataset for training the transformer.
  • Training on Google Colab: Limited compute power required adjustments to batch sizes and training iterations.

Running nanoGPT on Colab

  1. Clone the nanoGPT repository:
    git clone https://github.com/karpathy/nanoGPT.git
    cd nanoGPT
  2. Install dependencies:
    pip install torch numpy transformers
  3. Run training (adjust batch size for limited compute):
    python train.py --dataset=tinyshakespeare --batch_size=2

Next Steps

  • Experimenting with different datasets.
  • Fine-tuning on custom text corpora.
  • Exploring optimizations for running on limited hardware.

This repository serves as a documentation of my progress and learnings in understanding GPT and transformers. Contributions, suggestions, and discussions are welcome!

About

Exploring transformers by building a GPT model from scratch using nanoGPT, inspired by Andrej Karpathy’s tutorial.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published