This repository provides a simple implementation of an attention mechanism both with and without PyTorch. The primary goal is to understand the forward pass of this architecture.
- Embedding Layer: Initialized randomly using a normal distribution.
- Model Dimension:
dim_model = 64
- Sequence Length:
seq_length = 10
- Vocabulary Size:
vocab_size = 100
- Python 3.9
- PyTorch (optional)
This project is licensed under the MIT License.