Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Bloom language models

This directory contains the Pytorch ML reference for Bloom model.

List of topics

Overview of the model

Bloom

Bloom is a decoder-only transformer-based multilingual language model with up to 176B parameters from BigScience. Its architecture is very similar to the GPT2 model with the following changes:

  • Tokenizer: Both the model and its tokenizer have a vocabulary size of around 250K which covers 46 natural languages and 13 programming languages.
  • Position Embedding Instead of using learned position embeddings like GPT2, bloom adopts the ALiBi position embedding. ALiBi position embedding biases the quer-key attention scores with a penalty that is proportional to their distance. This type of inductive bias on recency enables ALiBi to extrapolate to longer input sequences during inference and some performance boost. Please refer to the paper for more details.

Reference:

Scao et al. (2023). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.

Press, et al. (2021). TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION.

Steps for running model training

In order to run any of the models in this directory, you must go through the following steps:

Structure of the code

  • configs/: YAML configuration files.
  • run.py: Training script. Performs training and validation.
  • model.py: Defined under gpt2 directory.

Download and prepare the dataset

Please refer to the section Download and prepare the dataset in the gpt2 README file

How to run

Please refer to the section How to run in the gpt2 README file

Notes on configuration files

config files

In order to train the model, you need to provide a yaml config file. Some reference yaml configs files are listed below for reference. Also, feel free to create your own following these examples:

  • params_bloom_7b.yaml have the model metrics with hidden_size=4096, num_hidden_layers=30, num_heads=32.

important yaml fields

To use the alibi embedding in the bloom model, you need to pay attention to the following fields under the model tab:

  • position_embedding_type (str): set the value of this field to alibi

  • alibi_trainable_slopes (bool): whether the slopes of the alibi embedding is trainable (default to False). Note that based on the analysis of the original alibi paper, trainable slopes did not yield strong results (on-par with fixed slopes).

References

Reference:

Radford, A. et al. (2019). Language Models are Unsupervised Multitask Learners.

Scao et al. (2023). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.

Press, et al. (2021). TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION.