update for moirai-moe (#139)

Co-authored-by: Chenghao Liu <74166079+chenghaoliu89@users.noreply.github.com>
SalesforceAIResearch · Nov 1, 2024 · bdebc12 · bdebc12
1 parent 959622c
commit bdebc12
Show file tree

Hide file tree

Showing 15 changed files with 668 additions and 49 deletions.
diff --git a/README.md b/README.md
@@ -1,19 +1,20 @@
-# Unified Training of Universal Time Series Forecasting Transformers
-[Paper](https://arxiv.org/abs/2402.02592) | [Blog Post](https://blog.salesforceairesearch.com/moirai/)
+# Unified Training of Universal Time Series Transformers
 
-Uni2TS is a PyTorch based library for research and applications related to Time Series Transformers.
-This library aims to provide a unified solution to large-scale pre-training of Universal Time Series Transformers.
-Uni2TS also provides tools for fine-tuning, inference, and evaluation for time series forecasting.
+Uni2TS is a PyTorch based library for research and applications related to Time Series Forecasting. It provides a unified framework for large-scale pre-training, fine-tuning, inference, and evaluation of Universal Time Series Transformers.
+
+Related reading: [Moirai Paper](https://arxiv.org/abs/2402.02592), [Moirai Salesforce Blog](https://blog.salesforceairesearch.com/moirai/), [Moirai-MoE Paper](https://arxiv.org/abs/2410.10469), [Moirai-MoE AI Horizon Forecast Blog](https://aihorizonforecast.substack.com/p/moirai-moe-upgrading-moirai-with), [Moirai-MoE Jiqizhixin Blog](https://mp.weixin.qq.com/s/LQvlgxx9vU965Yzy6RuBfQ).
 
 ## 🎉 What's New
 
-* Oct 2024: A new model Moirai-MoE! The preprint is now available on [arXiv](https://arxiv.org/abs/2410.10469). Model weights to be released soon.
+* Oct 2024: A new model Moirai-MoE! The preprint is available on [arXiv](https://arxiv.org/abs/2410.10469), along with model weights of [small](https://huggingface.co/Salesforce/moirai-moe-1.0-R-small) and [base](https://huggingface.co/Salesforce/moirai-moe-1.0-R-base), and [simple example](https://github.com/SalesforceAIResearch/uni2ts/project/moirai-moe-1) to get started.
+
+* Sep 2024: Released [Evaluation Code](https://github.com/SalesforceAIResearch/uni2ts/tree/main/project/benchmarks) of [TimesFM](https://arxiv.org/abs/2310.10688), [Chronos](https://arxiv.org/abs/2403.07815) and [VisionTS](https://arxiv.org/abs/2408.17253) on Monash, LSF and PF benchmarks.
 
 * Jun 2024: Released Moirai-1.1-R model weights in [small](https://huggingface.co/Salesforce/moirai-1.1-R-small), [base](https://huggingface.co/Salesforce/moirai-1.1-R-base), and [large](https://huggingface.co/Salesforce/moirai-1.1-R-large).
 
-* May 2024: The Uni2TS paper has been accepted to ICML 2024 as an Oral presentation!
+* May 2024: The [Moirai Paper](https://arxiv.org/abs/2402.02592) has been accepted to ICML 2024 as an Oral presentation!
 
-* Mar 2024: Release of Uni2TS library, along with [Moirai-1.0-R](https://huggingface.co/collections/Salesforce/moirai-10-r-models-65c8d3a94c51428c300e0742) and [LOTSA data](https://huggingface.co/datasets/Salesforce/lotsa_data/)!
+* Mar 2024: Release of Uni2TS library, along with [Moirai Paper](https://arxiv.org/abs/2402.02592), [Moirai-1.0-R Models](https://huggingface.co/collections/Salesforce/moirai-10-r-models-65c8d3a94c51428c300e0742), and [LOTSA Data](https://huggingface.co/datasets/Salesforce/lotsa_data/).
 
 ## ✅ TODO
 
@@ -230,15 +231,22 @@ python -m cli.train \
   data=lotsa_v1_unweighted
 ```
 
-## 👀 Citing Uni2TS
+## 👀 Citation
 
-If you're using Uni2TS in your research or applications, please cite it using this BibTeX:
+If you're using this repository in your research or applications, please cite using the following BibTeX:
 
 ```markdown
-@article{woo2024unified,
+@article{liu2024moiraimoe,
+  title={Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts},
+  author={Liu, Xu and Liu, Juncheng and Woo, Gerald and Aksu, Taha and Liang, Yuxuan and Zimmermann, Roger and Liu, Chenghao and Savarese, Silvio and Xiong, Caiming and Sahoo, Doyen},
+  journal={arXiv preprint arXiv:2410.10469},
+  year={2024}
+}
+
+@inproceedings{woo2024unified,
   title={Unified Training of Universal Time Series Forecasting Transformers},
   author={Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Xiong, Caiming and Savarese, Silvio and Sahoo, Doyen},
-  journal={arXiv preprint arXiv:2402.02592},
+  booktitle={Forty-first International Conference on Machine Learning},
   year={2024}
 }
-```
+```
diff --git a/cli/conf/eval/model/moirai_moe_1.0_R_base.yaml b/cli/conf/eval/model/moirai_moe_1.0_R_base.yaml
@@ -0,0 +1,8 @@
+_target_: uni2ts.model.moirai.MoiraiForecast
+module:
+  _target_: uni2ts.model.moirai.MoiraiMoEModule.from_pretrained
+  pretrained_model_name_or_path: Salesforce/moirai-moe-1.0-R-base
+mode: autoregressive
+num_samples: 100
+patch_size: 16
+context_length: ???
diff --git a/cli/conf/eval/model/moirai_moe_1.0_R_small.yaml b/cli/conf/eval/model/moirai_moe_1.0_R_small.yaml
@@ -0,0 +1,8 @@
+_target_: uni2ts.model.moirai.MoiraiForecast
+module:
+  _target_: uni2ts.model.moirai.MoiraiMoEModule.from_pretrained
+  pretrained_model_name_or_path: Salesforce/moirai-moe-1.0-R-small
+mode: autoregressive
+num_samples: 100
+patch_size: 16
+context_length: ???
diff --git a/project/moirai-moe-1/README.md b/project/moirai-moe-1/README.md
@@ -0,0 +1,99 @@
+# Moirai-MoE-1.0-R
+
+Our paper [Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts](https://arxiv.org/abs/2410.10469) introduces the first mixture-of-experts time series foundation model.
+
+The figure below presents the major difference between Moirai-MoE and Moirai. Compared to Moirai using multi-heuristic-defined input/output projection layers to model time series with different frequencies, Moirai-MoE utilizes a single input/output projection layer while delegating the task of capturing diverse time series patterns to the sparse mixture of experts Transformers. With these designs, the specialization of Moirai-MoE is achieved in a data-driven manner and operates at the token level.
+
+<p align="center">
+<img src="./img/framework.png" height="200" alt="" align=center />
+</p>
+
+
+## Models
+
+The pre-trained weights of Moirai-MoE can be found in the following table.
+
+| Model | # Activated Parameters | # Total Parameters |
+| :---: | :---: | :---: |
+| [Moirai-MoE-1.0-R-Small](https://huggingface.co/Salesforce/moirai-moe-1.0-R-small) | 11m | 117m |
+| [Moirai-MoE-1.0-R-Base](https://huggingface.co/Salesforce/moirai-moe-1.0-R-base) | 86m | 935m |
+
+
+## Usage
+
+Let's see a simple example on how to use pre-trained Moirai-MoE models to make forecasts. 
+
+```python
+import matplotlib.pyplot as plt
+from gluonts.dataset.repository import dataset_recipes
+
+from uni2ts.eval_util.data import get_gluonts_test_dataset
+from uni2ts.eval_util.plot import plot_next_multi
+from uni2ts.model.moirai import MoiraiForecast, MoiraiMoEModule
+
+SIZE = "small"  # model size: choose from {'small', 'base'}
+CTX = 1000  # context length: any positive integer
+BSZ = 32  # batch size: any positive integer
+
+# Load dataset
+test_data, metadata = get_gluonts_test_dataset(
+    "electricity", prediction_length=None, regenerate=False
+)
+# Uncomment the below line to find other datasets
+# print(sorted(dataset_recipes.keys()))
+
+# Prepare model
+model = MoiraiForecast(
+    module=MoiraiMoEModule.from_pretrained(
+        f"Salesforce/moirai-moe-1.0-R-{SIZE}",
+    ),
+    mode="autoregressive",
+    prediction_length=metadata.prediction_length,
+    context_length=CTX,
+    patch_size=16,
+    num_samples=100,
+    target_dim=metadata.target_dim,
+    feat_dynamic_real_dim=metadata.feat_dynamic_real_dim,
+    past_feat_dynamic_real_dim=metadata.past_feat_dynamic_real_dim,
+)
+
+predictor = model.create_predictor(batch_size=BSZ)
+forecasts = predictor.predict(test_data.input)
+
+input_it = iter(test_data.input)
+label_it = iter(test_data.label)
+forecast_it = iter(forecasts)
+
+# Visualize forecasts
+fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(25, 10))
+plot_next_multi(
+    axes,
+    input_it,
+    label_it,
+    forecast_it,
+    context_length=200,
+    intervals=(0.5, 0.9),
+    dim=None,
+    name="pred",
+    show_label=True,
+)
+```
+
+
+## Results
+
+Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios.
+
+<p align="center">
+<img src="./img/in-dist.png" height="250" alt="" align=center />
+</p>
+
+The above figure presents the in-distribution evaluation using a total of 29 datasets from the Monash benchmark. The evaluation results show that Moirai-MoE beats all competitors.
+
+<p align="center">
+<img src="./img/zero-shot.png" height="410" alt="" align=center />
+</p>
+
+The above table shows a zero-shot forecasting evaluation on 10 datasets and Moirai-MoE-Base achieves the best zero-shot performance.
+
+We will soon release scripts to reproduce the results.
diff --git a/project/moirai-moe-1/img/framework.png b/project/moirai-moe-1/img/framework.png
diff --git a/project/moirai-moe-1/img/in-dist.png b/project/moirai-moe-1/img/in-dist.png
diff --git a/project/moirai-moe-1/img/zero-shot.png b/project/moirai-moe-1/img/zero-shot.png
diff --git a/src/uni2ts/common/torch_util.py b/src/uni2ts/common/torch_util.py
@@ -42,6 +42,18 @@ def packed_attention_mask(
     return attention_mask
 
 
+def packed_causal_attention_mask(
+    sample_id: Int[torch.Tensor, "*batch seq_len"],
+    time_id: Int[torch.Tensor, "*batch seq_len"],
+) -> Bool[torch.Tensor, "*batch seq_len seq_len"]:
+    attention_mask = packed_attention_mask(sample_id)
+    expanded_id1 = time_id.unsqueeze(-2)
+    expanded_id2 = time_id.unsqueeze(-1)
+    compare_res = expanded_id1 <= expanded_id2
+    attention_mask = attention_mask * compare_res
+    return attention_mask
+
+
 def mask_fill(
     tensor: Float[torch.Tensor, "*batch dim"],
     mask: Bool[torch.Tensor, "*batch"],

diff --git a/src/uni2ts/model/moirai/__init__.py b/src/uni2ts/model/moirai/__init__.py
@@ -16,6 +16,13 @@
 from .finetune import MoiraiFinetune
 from .forecast import MoiraiForecast
 from .module import MoiraiModule
+from .module_moe import MoiraiMoEModule
 from .pretrain import MoiraiPretrain
 
-__all__ = ["MoiraiFinetune", "MoiraiForecast", "MoiraiModule", "MoiraiPretrain"]
+__all__ = [
+    "MoiraiFinetune",
+    "MoiraiForecast",
+    "MoiraiModule",
+    "MoiraiMoEModule",
+    "MoiraiPretrain",
+]