Skip to content

Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step

Notifications You must be signed in to change notification settings

Ucas-HaoranWei/Slow-Perception

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Haoran Wei*, Youyang Yin*, Yumeng Li, Jia Wang, Liang Zhao, Jianjian Sun, Zheng Ge, Xiangyu Zhang

     Accurate copying is the first step to visual o1!

Release

  • [2024/12/31]🔥🔥🔥 The paper can be found in Arxiv.
  • [2024/12/24]🔥🔥🔥 We release the slow perception! The paper can be found here temporarily and we will submit it to arxiv after we completing the appendix part.

Code License Data License

Contents

Install

  1. The codebase is based on GOT-OCR2.0, and if you have installed the GOT environment, use the GOT conda is OK.
  2. Clone this repository and navigate to the Slow-Perception-master folder
git clone https://github.com/Ucas-HaoranWei/Slow-Perception.git
cd 'Slow-Perception-master'
  1. Install Package
conda create -n sp python=3.10 -y
conda activate sp
pip install -e .
  1. Install Flash-Attention
pip install ninja
pip install flash-attn --no-build-isolation

Weights

  1. Download the SP-1/weights.zip to Slow-Perception-master
unzip weights.zip
  1. We provide the baseline and 4-length perceptual ruler weights.

Data-prepare

  1. Download the SP-1/train_sp1.zip and all SP-1/*.json to Slow-Perception-master for train
unzip train_sp1.zip
  1. Download the SP-1/benchmarks.zip to Slow-Perception-master for eval.
unzip benchmarks.zip

Note: The folders hierarchy are as follows:

  --Slow-Perception-master
      --SP-1
      --SP  
      --...

Eval

python3 SP/demo/run_jihe_parsing.py  --model-name SP-1/weights/4ruler/  --image-file SP-1/benchmarks/val_set/
python3 calculate_f1.py

If you want to input a single image:

python3 SP/demo/run_jihe_parsing.py  --model-name SP-1/weights/4ruler/  --image-file results/jihe_demo.jpg

Train

  1. Download the GOT weights .
deepspeed     SP/train/train_SP.py \
 --deepspeed   zero_config/zero2.json \
 --model_name_or_path /GOT_weights/   \
 --freeze_vision_tower False \
 --freeze_lm_model False  \
 --vision_select_layer -2 \
 --use_im_start_end True   \
 --fp16 True   \
 --gradient_accumulation_steps 2    \
 --evaluation_strategy "no"   \
 --save_strategy "steps"  \
 --save_steps 1000   \
 --save_total_limit 1   \
 --weight_decay 0.    \
 --warmup_ratio 0.003     \
 --lr_scheduler_type "cosine"    \
 --logging_steps 1    \
 --tf32 True     \
 --model_max_length 4096    \
 --gradient_checkpointing True   \
 --dataloader_num_workers 8    \
 --report_to none  \
 --per_device_train_batch_size 2    \
 --num_train_epochs 2  \
 --learning_rate 3e-5   \
 --datasets  SP-1 \
 --output_dir jihe_sp_4ruler/ \

Contact

Don't hesitate to contact me by email, weihaoran18@mails.ucas.ac.cn, if you have any questions.

Acknowledgement

Citation

@article{wei2024slow,
  title={Slow Perception: Let's Perceive Geometric Figures Step-by-step},
  author={Wei, Haoran and Yin, Youyang and Li, Yumeng and Wang, Jia and Zhao, Liang and Sun, Jianjian and Ge, Zheng and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2412.20631},
  year={2024}
}
@article{wei2024general,
  title={General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model},
  author={Wei, Haoran and Liu, Chenglong and Chen, Jinyue and Wang, Jia and Kong, Lingyu and Xu, Yanming and Ge, Zheng and Zhao, Liang and Sun, Jianjian and Peng, Yuang and others},
  journal={arXiv preprint arXiv:2409.01704},
  year={2024}
}


About

Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages