Skip to content

Latest commit

 

History

History
88 lines (73 loc) · 11 KB

README.md

File metadata and controls

88 lines (73 loc) · 11 KB

Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA

Code repository for the paper: Minimal Ranks, Maximum Confidence: Parameter-efficient Uncertainty Quantification for LoRA

Our repository is based on
https://github.com/fortuinlab/swag-lora and https://github.com/MohammadrezaBanaei/LoRA-XS

Running experiments

Experimental setup is derived from the original repository and is presented below.

We use Accelerate and Hydra to run our experiments. Please see instructions here for guidance in setting up the environment.

Running training script

Running training using Hydra

To launch a training run locally:

accelerate launch launch_exp_hydra.py \
method=swag \
experiment.task=cola \
method.force_save=-1 \
method.swag_start=25 \
method.swag_anneal_epochs=5 \
method.swag_learning_rate=1e-3 \
experiment.learning_rate=1e-3 \
experiment.cls_learning_rate=5e-3 \
experiment.num_epochs=100 \
experiment.batch_size=32

Running training via job script

In run_script.sh is an example script used in our study.

Notable flags/hyperparameters

Important hydra flags/hyperparameters are discussed below. See a more comprehensive description of hyperparameters in config.yaml and swag.yaml. SWAG-specific flags are represented as method.<var_name> as in Hydra.

Hyperparameter Description Possible Values
model Model to use llama_7b_hf, roberta-base, roberta-large
method The method configuration to use Only swag
method.swag_learning_rate Learning rate for the SWAG training Any float (e.g., 1e-4)
method.swag_anneal_epochs Number of epochs for annealing in SWAG Any integer (e.g., 5)
method.swag_anneal_strategy Strategy for annealing in SWAG constant, linear, cosine, cosine_hard_restarts
method.swag_start The epoch to start SWAG collection Any integer (e.g., 8)
method.modules_to_swag Modules over which to learn the SWAG distribution (supports only LoRA layers, only layers with gradients, and all) grad_only, lora_only, all
method.swag_max_num_models Maximum number of models to maintain for covariance approximation in SWAG Any integer (e.g., 5)
method.swag_cov_mat Whether to learn a covariance matrix in SWAG (if False, only a diagonal covariance is used when sampling) True, False
method.force_save Force save epoch for early stopping SWAG training; SWAG model will be saved method.force_save epochs after method.swag_start Any integer (e.g. 5)
method.swag_sample_scale Covariance scale of the learned SWAG distribution we are sampling from Any float (e.g. 1.0)
method.swag_samples Number of SWAG model samples used durung evaluation Any integer (e.g. 15)
experiment.task The (ID) task for training/evaluation obqa, cqa, swag, mmlu, arc-e, arc-c, cola, mnli, mrpc, (other GLUE tasks...)
experiment.subtask (ID) Subtask for training/evaluation Subtask name (e.g. experiment.task=mmlu, experiment.subtask=anatomy)
experiment.ood_task (OOD) task for evaluation obqa, cqa, swag, mmlu, arc-e, arc-c, cola, mnli, mrpc, (other GLUE tasks...)
experiment.ood_subtask (OOD) subtask for evaluation task. Subtask name (e.g. experiment.ood_task=mmlu, experiment.ood_subtask=anatomy)
experiment.ood_batch_size Batch size for the OOD task Any integer (e.g., 32)
experiment.learning_rate The learning rate for training Any float (e.g., 0.001, 0.01)
experiment.cls_learning_rate The learning rate used for classifier part Any float (e.g., 0.001, 0.01)
experiment.num_epochs Total number of epochs for training Any integer (e.g., 20)
experiment.batch_size Batch size for the training Any integer (e.g., 16)
experiment.set_fraction Fraction of training dataset used for learning Any float between 0 and 1, inclusively
experiment.overwrite Whether to overwrite existing experiments True, False
experiment.data_path Path to the data folder for the tasks (only required if experiment.offline=True) /path/to/your/data/folder
experiment.model_path Path to the model folder (only required if experiment.offline=True) /path/to/your/models/folder
experiment.mnli_model_path Path to the pretrained LoRa-XS weights for given rank on MNLI task (only for RTE and MRPC tasks) /path/to/your/checkpoint/folder
experiment.wandb_path Path to the Weights and Biases logging folder /path/to/your/wandb/folder
experiment.wandb_group Group name for Weights and Biases tracking Any group name
experiment.wandb_entity Wandb entity Any entity name
experiment.wandb_project Wandb project name Any project name
experiment.wandb_mode Wandb running mode online or offline
experiment.exp_name (Optional) Name of the experiment Any experiment name

License

Copyright (C) 2025 Patryk Marszałek, Klaudia Bałazy, Jacek Tabor, Tomasz Kuśmierczyk

This project is distributed under the terms of the GNU Affero General Public License v3. Portions of the code derived from MIT-licensed sources remain compatible under both the MIT license and AGPL v3. Please see the SWAG LoRA LICENSE file for details.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Affero General Public License v3 for more details.