GitHub - Ucas-HaoranWei/Vary-tiny-600k: Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)

Background

The Huggingface version of Vary-tiny suffers potential issues, leading to the loss being hard to converge under multiple epochs.
Many friends are very interested in the train data of Vary.

Release

[2024/9/03] 🔥🔥🔥 We release a very strong and comprehensive OCR model GOT-OCR2.0.
[2024/4/21] 🔥🔥🔥 For OneChart, we have released the web demo in Project Page. Have fun!!
[2024/4/21] 🔥🔥🔥 We present a Vary-tiny LAVIS codebase and the Vary-600k dataset !!!

Install

Clone this repository and navigate to the Vary-tiny-600k folder

git clone https://github.com/Ucas-HaoranWei/Vary-tiny-600k.git
cd LAVIS-main

Install Package

pip install -e .

Prepare Pretrain Weights and Data
- download the OPT-125M here and the SAM-b weights here
- download the Vary-600k here with code "vary"
- prepare the dirs as follows:

Train

python -m torch.distributed.run --nproc_per_node=8 --master_port=29501 train.py --cfg-path lavis/projects/varytiny/train/pretrain.yaml

or multi machines

python -m torch.distributed.run --master_addr xxx --master_port xxx --node_rank xxx --nnodes xxx --nproc_per_node xxx  train.py --cfg-path lavis/projects/varytiny/train/pretrain.yaml

If your training goes smoothly, your loss (end of each epoch) will be similar to the following (2×8 H800)：

Demo

change the "pretrained" and "finetuned" path with your checkpoints in ``LAVIS-main/lavis/configs/models/varytiny/varytiny_inference.yaml'', such as:

python tests/models/test_varytiny.py  --image-file  xxx.jpg

We also provide the model weights we trained Vary-tiny upon Vary-600k from scratch: Vary-tiny-600k.pth. Code: "Vary". You can use it and directly run the inference.

Vary-600k

Vary-600k is a PDF image-text pair dataset with about 30W English and 30W Chinese pages.
The dataset is extracted using Fitz. A BERT model is used to merge sentences within paragraphs. Paragraphs are separated by "<lb>". The reason why we do not use "\n" is because we use "\n" as the "EOS" of opt-125m in this codebase.
You can use Vary-600k for your pretrain, warm-up, and so on.
Note that Vary-600k is only a sub-data of the pretrain data used in the original Vary.
Download Vary-600k here. Code: "Vary"

Acknowledgement

LAVIS: the codebase we built upon!

Citation

If you find our work useful in your research, please consider citing Vary:

@article{wei2023vary,
  title={Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models},
  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yang, Jinrong and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2312.06109},
  year={2023}
}

@article{wei2024small,
  title={Small Language Model Meets with Reinforced Vision Vocabulary},
  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yu, En and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2401.12503},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
LAVIS-main		LAVIS-main
asset		asset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

Release

Contents

Install

Train

Demo

Vary-600k

Acknowledgement

Citation

About

Releases

Packages

Languages

Ucas-HaoranWei/Vary-tiny-600k

Folders and files

Latest commit

History

Repository files navigation

Background

Release

Contents

Install

Train

Demo

Vary-600k

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages