Skip to content

elonlit/melchior

Repository files navigation

Melchior

A hybrid Mamba-Transformer model for RNA base calling

License: MIT Python 3.8+ arXiv

Features

  • High-accuracy RNA base calling
  • Efficient state space modeling
  • Open weights, checkpoints, logs
  • 100% reproducible

Quick Start

source setup.sh # Set up the new environment
source clean.sh # Clean the existing environment

Datasets

source data/train_val/download_train_val.sh

This script downloads the training and validation splits into the data/train_val/ directory.

Note

aria2c is used to improve the download speed, which may not be available on all systems: sudo apt-get install aria2

In preparation to run the evaluation script, download the test fast5 files as well:

source data/test/download_test.sh

Training

To train the model from scratch, run:

python -m utils.train [OPTIONS]

Options:

  • --model: Choose model architecture (melchior or rodan). Default: melchior
  • --state_dict: Path to initial state dict. Default: None
  • --epochs: Number of training epochs. Default: 10
  • --batch_size: Batch size for training. Default: 16
  • --lr: Learning rate. Default: 0.001
  • --weight_decay: Weight decay for optimizer. Default: 0.1
  • --save_path: Path to save model checkpoints. Default: models/{model}

Example:

python -m utils.train --model melchior --epochs 10 --batch_size 32 --lr 0.002

Reproducible Evaluation

Weights for the 134 million parameter model are available on HuggingFace.

Download the test set fasta and fast5 files in ./data/test/:

./download_test.sh

Note

minimap2 is necessary, which may not available on all systems: sudo apt install minimap2

Then run the evaluation script to basecall the .fast5 files, align the basecalled sequences to the reference transcriptomes, and calculate the accuracy:

eval/run_tests.sh

Important

To benchmark against ONT's proprietary basecallers, you need to download them first:

For Guppy:

./basecallers/download_guppy.sh [-g|-c]
  -g: Download GPU version
  -c: Download CPU version

Run these scripts before proceeding with the evaluation.

Contributing

We welcome contributions. Please see our Contributing Guidelines for more details.

License

Melchior is released under the MIT License.

Citation

If you use Melchior in your research, please cite:

@article{melchior2025,
  title={Melchior: A Hybrid Mamba-Transformer RNA Basecaller},
  author={Litman, Elon},
  journal={arXiv preprint arXiv:2025.xxxxx},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published