GitHub - Ucas-HaoranWei/Slow-Perception: Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step

Slow Perception:Let's Perceive Geometric Figures Step-by-step

Haoran Wei*, Youyang Yin*, Yumeng Li, Jia Wang, Liang Zhao, Jianjian Sun, Zheng Ge, Xiangyu Zhang

Accurate copying is the first step to visual o1!

Release

[2024/12/31]🔥🔥🔥 The paper can be found in Arxiv.
[2024/12/24]🔥🔥🔥 We release the slow perception! The paper can be found here temporarily and we will submit it to arxiv after we completing the appendix part.

Install

The codebase is based on GOT-OCR2.0, and if you have installed the GOT environment, use the GOT conda is OK.
Clone this repository and navigate to the Slow-Perception-master folder

git clone https://github.com/Ucas-HaoranWei/Slow-Perception.git
cd 'Slow-Perception-master'

Install Package

conda create -n sp python=3.10 -y
conda activate sp
pip install -e .

Install Flash-Attention

pip install ninja
pip install flash-attn --no-build-isolation

Weights

Google Drive

Download the SP-1/weights.zip to Slow-Perception-master

unzip weights.zip

We provide the baseline and 4-length perceptual ruler weights.

Data-prepare

Google Drive

Download the SP-1/train_sp1.zip and all SP-1/*.json to Slow-Perception-master for train

unzip train_sp1.zip

Download the SP-1/benchmarks.zip to Slow-Perception-master for eval.

unzip benchmarks.zip

Note: The folders hierarchy are as follows:

  --Slow-Perception-master
      --SP-1
      --SP  
      --...

Eval

python3 SP/demo/run_jihe_parsing.py  --model-name SP-1/weights/4ruler/  --image-file SP-1/benchmarks/val_set/

python3 calculate_f1.py

If you want to input a single image:

python3 SP/demo/run_jihe_parsing.py  --model-name SP-1/weights/4ruler/  --image-file results/jihe_demo.jpg

Train

Download the GOT weights .

deepspeed     SP/train/train_SP.py \
 --deepspeed   zero_config/zero2.json \
 --model_name_or_path /GOT_weights/   \
 --freeze_vision_tower False \
 --freeze_lm_model False  \
 --vision_select_layer -2 \
 --use_im_start_end True   \
 --fp16 True   \
 --gradient_accumulation_steps 2    \
 --evaluation_strategy "no"   \
 --save_strategy "steps"  \
 --save_steps 1000   \
 --save_total_limit 1   \
 --weight_decay 0.    \
 --warmup_ratio 0.003     \
 --lr_scheduler_type "cosine"    \
 --logging_steps 1    \
 --tf32 True     \
 --model_max_length 4096    \
 --gradient_checkpointing True   \
 --dataloader_num_workers 8    \
 --report_to none  \
 --per_device_train_batch_size 2    \
 --num_train_epochs 2  \
 --learning_rate 3e-5   \
 --datasets  SP-1 \
 --output_dir jihe_sp_4ruler/ \

Contact

Don't hesitate to contact me by email, weihaoran18@mails.ucas.ac.cn, if you have any questions.

Acknowledgement

GOT-OCR2.0: the codebase we built upon!

Citation

@article{wei2024slow,
  title={Slow Perception: Let's Perceive Geometric Figures Step-by-step},
  author={Wei, Haoran and Yin, Youyang and Li, Yumeng and Wang, Jia and Zhao, Liang and Sun, Jianjian and Ge, Zheng and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2412.20631},
  year={2024}
}
@article{wei2024general,
  title={General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model},
  author={Wei, Haoran and Liu, Chenglong and Chen, Jinyue and Wang, Jia and Kong, Lingyu and Xu, Yanming and Ge, Zheng and Zhao, Liang and Sun, Jianjian and Peng, Yuang and others},
  journal={arXiv preprint arXiv:2409.01704},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Slow-Perception-master		Slow-Perception-master
assets		assets
README.md		README.md
Slow_perception.pdf		Slow_perception.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slow Perception:Let's Perceive Geometric Figures Step-by-step

Release

Contents

Install

Weights

Data-prepare

Eval

Train

Contact

Acknowledgement

Citation

About

Releases

Packages

Languages

Ucas-HaoranWei/Slow-Perception

Folders and files

Latest commit

History

Repository files navigation

Slow Perception:Let's Perceive Geometric Figures Step-by-step

Release

Contents

Install

Weights

Data-prepare

Eval

Train

Contact

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages