GitHub - tulasiram58827/Information-Extraction-From-Documents: This repository contains an implementation of the "Representation Learning for Information Extraction from Form-like Documents" paper.

Note : We are still in the process of implementing. Use it at your own risk.

This repository contains an implementation of the Representation Learning for Information Extraction From Form Like Documents paper.

Project setup

python -m virtualenv -p python3.8 venv
source venv/bin/activate
pip install -e .
gdown --id 10r9y17wg8Elo-3Zi61xA_8QDaKix8giN -O data.tar.xz
tar -xf data.tar.xz
gdown --id 16FzDxLOFxNmYi3JNXaYCmnZvR4x5T54I -O ocr_modified_files.tar.xz
tar -xf ocr_modified_files.tar.xz && mv ocr_modified_files data/

python data_processing.py

At this point your data dir should have box, img, key, new_processed_files, and ocr_modified_files

If you are interested about the paper or implementation details you can this report published in Weights and Biases.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitignore		.gitignore
README.md		README.md
SROIE_Candidate_Generators.ipynb		SROIE_Candidate_Generators.ipynb
SROIE_Data_Study.ipynb		SROIE_Data_Study.ipynb
data_processing.py		data_processing.py
generators.py		generators.py
layers.py		layers.py
model.py		model.py
setup.py		setup.py
sweep.yaml		sweep.yaml
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project setup

About

Releases

Packages

Contributors 2

Languages

tulasiram58827/Information-Extraction-From-Documents

Folders and files

Latest commit

History

Repository files navigation

Project setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages