This repository contains scripts for converting pretrained Hugging Face models to OpenVINO Intermediate Representation (IR), applying various XAMBA techniques on the SSM (Mamba, Mamba-2, etc.) models, and benchmarking the execution latency using OpenVINO.
- convert.py: Converts pretrained Hugging Face models to OpenVINO IR format.
- xamba.py: Implements CumBA, ReduBA & ActiBA techniques on the Mamba-2 model.
- benchmark.py: Evaluates execution latency using OpenVINO's
benchmark_app
.
To use this repository, you need to install the following dependencies:
- Python 3.6 or higher
- PyTorch
- OpenVINO
- Hugging Face Transformers library
pip install torch openvino transformers
Run the convert.py
script to convert a pretrained Hugging Face model into OpenVINO Intermediate Representation (IR). Make sure to locally update the Mamba-2 model inside the transformers
library with the xamba.py
.
Example:
python convert.py
To benchmark the execution latency of a model with OpenVINO, run the benchmark.py
script:
python benchmark.py
This will evaluate the inference latency on the specified device.
├── convery.py # Convert HF models to OpenVINO IR
├── xamba.py # XAMBA techniques on Mamba-2 model
├── benchmark.py # Evaluate execution latency using OpenVINO benchmark_app
└── README.md # This README file