Skip to content

Latest commit

 

History

History
68 lines (46 loc) · 4.58 KB

Checkpoints.md

File metadata and controls

68 lines (46 loc) · 4.58 KB

Checkpoints

There are 3 types of OLMo checkpoints.

  1. OLMo (standard) checkpoints. These checkpoints can be produced and used by the code in this repo. "OLMo checkpoints" will typically refer to these checkpoints.
  2. Transformers checkpoints. These checkpoints can be produced and used via the OLMo implementation in the Hugging Face Transformers library. As we continue to develop and improve OLMo, our implementation in this repo may temporariy become incompatible with the implementation in the Transformer library.
  3. HF OLMo checkpoints. These checkpoints can be produced and used via the hf_olmo package. The hf_olmo package provides basic Transformers functionality while always staying compatible with the OLMo library.

OLMo (standard) checkpoints

There are 2 categories of OLMo checkpoints:

  • unsharded: a complete checkpoint in a standard form;
  • sharded: a checkpoint that has been broken down into smaller components, for easier use in our multi-node training.

Unless otherwise specified, an OLMo checkpoint is assumed to be unsharded. OLMo sharded and unsharded checkpoints can be used with the pretraining/fine-tuning script provided in this repo.

Unsharded OLMo Checkpoints

Each unsharded checkpoint directory consists of:

  • config.yaml: the config at that training step.
  • model.safetensors, optim.safetensors, train.pt: model, optimizer and training state at that training step. Checkpoints of older OLMo releases use the .pt extension instead.

URLs to checkpoints at intermediate steps of our official models' trainings can be found in the csv files under checkpoints/official/. These 'directory' URLs cannot currently be directly accessed, but files within the directory are publicly accessible.

Sharded OLMo Checkpoints

There are currently 4 types of sharded checkpoints:

  • torch_legacy,
  • torch_new,
  • local,
  • and olmo_core.

We are still working on improving sharded checkpointing and thus do not have any guidelines for using them at present. A sharded checkpoint can be converted to an unsharded checkpoint using unshard.py.

Transformers Checkpoints

These checkpoints can be used with the OLMo implementation in the Transformers library. Since the OLMo implementation is integrated into the library, OLMo models support most Transformers model functionality. These checkpoints cannot be used with the pretraining/fine-tuning script provided in this repo.

Transformers checkpoints can be found in most of our HF Hub repos (e.g. OLMo-2-1124-7B). An OLMo 2 checkpoint can be converted into its Transformers equivalent using convert_olmo2_to_hf.py. Similarly, the script for OLMo 1 is convert_olmo_to_hf_new.py. Example usage:

python scripts/convert_olmo2_to_hf.py --input_dir /path/to/olmo/checkpoint --output_dir /path/to/hf/checkpoint/ --tokenizer_json_path tokenizers/allenai_gpt-neox-olmo-dolma-v1_5.json

Warning: As we continue to develop and improve OLMo, our implementation in this repo may become incompatible with the implementation in the Transformer library. During these periods, OLMo checkpoints may not be convertible to Transformers checkpoint. At present, all OLMo checkpoints of our officially released models are convertible to Transformers checkpoints.

HF OLMo checkpoints

These checkpoints can be used with the Transformers-style OLMo implementation in the hf_olmo package. This implementation has only partial support for Transformers functionality. Consequently, we recommend using Transformers checkpoints over these if available. "auto" methods like AutoModelForCausalLM are now supported for these checkpoints. These checkpoints cannot be used with the pretraining/fine-tuning script provided in this repo.

The following checkpoints on HF Hub are HF OLMo checkpoints:

An OLMo checkpoint can be converted into its HF OLMo equivalent using convert_olmo_to_hf.py. Example usage:

python hf_olmo/convert_olmo_to_hf.py --checkpoint-dir /path/to/checkpoint