A hybrid Mamba-Transformer model for RNA base calling
- High-accuracy RNA base calling
- Efficient state space modeling
- Open weights, checkpoints, logs
- 100% reproducible
source setup.sh # Set up the new environment
source clean.sh # Clean the existing environment
source data/train_val/download_train_val.sh
This script downloads the training and validation splits into the data/train_val/
directory.
Note
aria2c
is used to improve the download speed, which may not be available on all systems: sudo apt-get install aria2
In preparation to run the evaluation script, download the test fast5 files as well:
source data/test/download_test.sh
To train the model from scratch, run:
python -m utils.train [OPTIONS]
--model
: Choose model architecture (melchior
orrodan
). Default:melchior
--state_dict
: Path to initial state dict. Default: None--epochs
: Number of training epochs. Default: 10--batch_size
: Batch size for training. Default: 16--lr
: Learning rate. Default: 0.001--weight_decay
: Weight decay for optimizer. Default: 0.1--save_path
: Path to save model checkpoints. Default:models/{model}
python -m utils.train --model melchior --epochs 10 --batch_size 32 --lr 0.002
Weights for the 134 million parameter model are available on HuggingFace.
Download the test set fasta and fast5 files in ./data/test/
:
./download_test.sh
Note
minimap2
is necessary, which may not available on all systems: sudo apt install minimap2
Then run the evaluation script to basecall the .fast5 files, align the basecalled sequences to the reference transcriptomes, and calculate the accuracy:
eval/run_tests.sh
Important
To benchmark against ONT's proprietary basecallers, you need to download them first:
For Guppy:
./basecallers/download_guppy.sh [-g|-c]
-g: Download GPU version
-c: Download CPU version
Run these scripts before proceeding with the evaluation.
We welcome contributions. Please see our Contributing Guidelines for more details.
Melchior is released under the MIT License.
If you use Melchior in your research, please cite:
@article{melchior2025,
title={Melchior: A Hybrid Mamba-Transformer RNA Basecaller},
author={Litman, Elon},
journal={arXiv preprint arXiv:2025.xxxxx},
year={2025}
}