Project completed on 21/03/2023
This project is inspired by MC-OCR, more information about this competition can access here: Link
The dataset is contain more than 1500 receipts from the competition and our collected.
Receipt Information Extraction (RIE) is a task that involves extracting structured data from unstructured receipts. The goal is to identify and extract key information such as date, time, total amount, tax amount, items purchased, etc. from receipts in various formats and languages. This task can be useful for applications such as expense management, accounting, fraud detection and analytics.
Download the project
- Click here: RIE
- Unzip the
main.zip
file - Navigate into the project folder:
cd path-to-the-project-folder
Clone with git: (NOT RECOMMENDED)
# very slow download speed
git clone https://github.com/HT0710/Receipt-Information-Extraction
cd Receipt-Information-Extraction
Using wget
on Linux:
# faster download speed
wget https://github.com/HT0710/Receipt-Information-Extraction/archive/refs/heads/main.zip
unzip main.zip
cd Receipt-Information-Extraction-main
Python 3.8
Using Conda:
# Installation: https://docs.conda.io/en/latest/miniconda.html
conda create -n rie python=3.8
conda activate rie
- rembg = 2.0.30
- torch = 1.11
- torchvision = 0.12
- opencv-python = 4.7.0.72
- scikit-learn = 1.2.1
- scikit-image = 0.19.3
- scipy = 1.9.3
- imutils = 0.5.4
- PyYAML = 6.0
- einops = 0.6.0
- gdown = 4.6.4
pip install -r requirements.txt
Note: CUDA is required if you want to use GPU. You can follow my instructions here
Modify the configurations in config.yaml
then run:
python run.py
With CLI:
python run.py -h
- -i: Image path or Folder path
- -o: Output folder path
- -g: Which gpu to run | 0 for cpu | -1 for all (Default: -1)
- -mp: Maximum of cpu can use | -1 for 80% of your cpu (Default: -1)
Caution: Using 100% of your cpu may crash your system!
Example:
python run.py -i data/test/test_1.jpg -o output -g 0 -mp 10
More configurations can access in config.yaml
- Remove the image background
- Input and output folder can be modify in
background_remove.py
- Note: only run with folder input
Execute:
python background_remove.py
- Rotate horizontal, invert and align straight
- Input and output folder can be modify in
rotate.py
- Note: only run with folder input
Execute:
python rotate.py
- Extract the receipt information
- Input and output can be modify in
extract_info.py
- Note: only run with single image input, if you want extract a folder please use
run.py
Execute:
python extract_info.py
You can find the paper here: RIE
This project is licensed under the MIT License. See LICENSE for more details.
Open an issue: New issue
Mail: pthung7102002@gmail.com