This directory contains the Pytorch ML reference for Bloom model.
- Bloom language models
- Download and prepare the dataset
- How to run
- Notes on configuration files
- References
Bloom is a decoder-only transformer-based multilingual language model with up to 176B parameters from BigScience. Its architecture is very similar to the GPT2 model with the following changes:
- Tokenizer: Both the model and its tokenizer have a vocabulary size of around 250K which covers 46 natural languages and 13 programming languages.
- Position Embedding Instead of using
learned
position embeddings like GPT2, bloom adopts the ALiBi position embedding. ALiBi position embedding biases the quer-key attention scores with a penalty that is proportional to their distance. This type of inductive bias on recency enables ALiBi to extrapolate to longer input sequences during inference and some performance boost. Please refer to the paper for more details.
Reference:
Scao et al. (2023). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
Press, et al. (2021). TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION.
In order to run any of the models in this directory, you must go through the following steps:
- Download and preprocess the data (see Prepare the data for more details)
- Run training for your desired model (see Run pre-training)
configs/
: YAML configuration files.run.py
: Training script. Performs training and validation.model.py
: Defined under gpt2 directory.
Please refer to the section Download and prepare the dataset
in the gpt2 README file
Please refer to the section How to run
in the gpt2 README file
In order to train the model, you need to provide a yaml config file. Some reference yaml configs files are listed below for reference. Also, feel free to create your own following these examples:
- params_bloom_7b.yaml have the model metrics with
hidden_size=4096
,num_hidden_layers=30
,num_heads=32
.
To use the alibi embedding in the bloom model, you need to pay attention to the following fields under the model
tab:
-
position_embedding_type (str)
: set the value of this field toalibi
-
alibi_trainable_slopes (bool)
: whether the slopes of the alibi embedding is trainable (default to False). Note that based on the analysis of the original alibi paper, trainable slopes did not yield strong results (on-par with fixed slopes).
Reference:
Radford, A. et al. (2019). Language Models are Unsupervised Multitask Learners.
Scao et al. (2023). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
Press, et al. (2021). TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION.