This repository documents my learning journey through Andrej Karpathy's tutorial on building a GPT model from scratch, using nanoGPT as a reference. Due to compute limitations, I experimented with training on Google Colab.
- YouTube Video: "Build GPT – From Scratch, Spelled Out"
- Medium Article: "Train Your Own Language Model with nanoGPT"
- Transformer Architecture: The model is built following the principles of the "Attention is All You Need" paper.
- Self-Attention Mechanism: Understanding the role of attention heads and positional encoding.
- TinyShakespeare Dataset: Used as a small-scale dataset for training the transformer.
- Training on Google Colab: Limited compute power required adjustments to batch sizes and training iterations.
- Clone the nanoGPT repository:
git clone https://github.com/karpathy/nanoGPT.git cd nanoGPT
- Install dependencies:
pip install torch numpy transformers
- Run training (adjust batch size for limited compute):
python train.py --dataset=tinyshakespeare --batch_size=2
- Experimenting with different datasets.
- Fine-tuning on custom text corpora.
- Exploring optimizations for running on limited hardware.
This repository serves as a documentation of my progress and learnings in understanding GPT and transformers. Contributions, suggestions, and discussions are welcome!