Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eynollah light integration #86

Merged
merged 69 commits into from
May 13, 2023
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
2736ddb
light version
vahidrezanezhad Mar 10, 2022
b8a5321
light version integration
vahidrezanezhad Mar 11, 2022
cf5ef8f
light version as option
Mar 14, 2022
c606391
flow from directory
Mar 29, 2022
2eacb9a
renaming the models
Apr 5, 2022
8d19c4c
updating readme
Apr 5, 2022
94c3b0f
updating readme
Apr 5, 2022
e564451
updating readme
Apr 5, 2022
3871e22
how the models are trained
Apr 11, 2022
3bbbeec
all options are enabled for light version
Apr 20, 2022
735abc4
option to ignore page extraction
Apr 27, 2022
cd9920e
extracting page
May 4, 2022
ae7c424
Update eynollah.py
vahidrezanezhad May 13, 2022
01bfc39
extracting page as an option
May 19, 2022
402c533
issue #77 is resolved
Jul 22, 2022
dbf9187
Adapt to new location of models
cneud Sep 13, 2022
07fe0d8
Update Makefile
cneud Sep 13, 2022
583cdce
new (hybrid cnn+transformer) textline model which can accelerate to e…
Sep 13, 2022
89e5891
new (hybrid cnn+transformer) textline model which can accelerate to e…
Sep 13, 2022
38bf0d8
solving issue by loading model by directory as input
Sep 13, 2022
000402f
Update README.md
cneud Sep 13, 2022
b75d8af
Update README.md
cneud Sep 13, 2022
ffc7f82
Update README.md
cneud Sep 13, 2022
5ca8570
Update README.md
cneud Sep 14, 2022
30ef006
Update README.md
cneud Sep 14, 2022
4807be1
Update requirements.txt
cneud Mar 28, 2023
4276417
Update README.md
cneud Mar 28, 2023
f37d324
Use renamed models in SavedModel format
cneud Mar 28, 2023
27834ce
update CI
cneud Mar 28, 2023
4642ccb
Update config.yml
cneud Mar 28, 2023
2c13f1b
Update README.md
cneud Mar 28, 2023
d21cc42
Update README.md
cneud Mar 28, 2023
58ca226
apply some fixes from main
cneud Mar 28, 2023
3d54719
fix import
cneud Mar 28, 2023
a078a18
issue #77 is resolved on main branch
Jul 22, 2022
73057d5
silentium!
bertsky Feb 11, 2023
1ac0a7e
try loading as TF SavedModel instead of HDF5
bertsky Feb 10, 2023
9849541
Update Makefile
cneud Mar 30, 2023
d4dd532
Update Makefile
cneud Mar 30, 2023
fb6d970
OCR-D wrapper: expose tables param
bertsky Feb 16, 2023
a9728bb
Update eynollah.py
cneud Mar 30, 2023
817e5a6
update docstring
cneud Mar 30, 2023
31be789
Makefile hack to rename model dir
cneud Mar 31, 2023
fd4c0ed
Update Makefile
cneud Mar 31, 2023
aecc2ea
Update README.md
cneud Mar 31, 2023
0279ebf
Update README.md
cneud Mar 31, 2023
22a8e93
Update README.md
cneud Apr 5, 2023
d3735b1
pushing commits 2d9ccac and 7345f6b into eynollah_light
Apr 11, 2023
abb0b29
use find_namespace_packages in setup.py
kba Apr 2, 2023
456fccb
use the SavedModel format
cneud Apr 12, 2023
63d9968
include 3.8 in GitHub Actions
cneud Apr 12, 2023
f264eaf
test CircleCI machine executor (more RAM?)
cneud Apr 13, 2023
0462ae0
Update config.yml
cneud Apr 13, 2023
cb8cfad
Update config.yml
cneud Apr 13, 2023
c251c4f
update badges
cneud Apr 14, 2023
50b9ce3
Update README.md
cneud Apr 14, 2023
d98689e
Update README.md
cneud Apr 14, 2023
000e39c
Update README.md
cneud Apr 14, 2023
fef7cf3
Update README.md
cneud Apr 14, 2023
1e172cc
Update README.md
cneud Apr 14, 2023
7078637
Update README.md
cneud Apr 14, 2023
cb5ffae
Update README.md
cneud Apr 14, 2023
529f2c0
set_memory_growth to all GPU devices alike
bertsky Apr 13, 2023
29e6ad0
renaming textline light model
vahidrezanezhad Apr 18, 2023
380f59a
let hybrid textline light model be loaded
Apr 18, 2023
d68f240
loading TensorFlow SavedModel format is now present
Apr 27, 2023
4c21701
textline light version -tll can not work without enabling -light option
Apr 27, 2023
1621532
Merge branch 'main' into eynollah_light
vahidrezanezhad May 8, 2023
48f2ce6
re-enable Action for Python 3.8
cneud May 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 28 additions & 6 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ version: 2

jobs:

build-python36:
docker:
- image: python:3.6
build-python37:
machine:
- image: ubuntu-2004:2023.02.1
steps:
- checkout
- restore_cache:
Expand All @@ -16,13 +16,35 @@ jobs:
paths:
models_eynollah.tar.gz
models_eynollah
- run:
name: "Set Python Version"
command: pyenv install -s 3.7.16 && pyenv global 3.7.16
- run: make install
- run: make smoke-test

build-python38:
machine:
- image: ubuntu-2004:2023.02.1
steps:
- checkout
- restore_cache:
keys:
- model-cache
- run: make models
- save_cache:
key: model-cache
paths:
models_eynollah.tar.gz
models_eynollah
- run:
name: "Set Python Version"
command: pyenv install -s 3.8.16 && pyenv global 3.8.16
- run: make install
- run: make smoke-test

workflows:
version: 2
build:
jobs:
- build-python36
#- build-python37
#- build-python38 # no tensorflow for python 3.8
- build-python37
- build-python38
4 changes: 2 additions & 2 deletions .github/workflows/test-eynollah.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.6'] # '3.7'
python-version: ['3.7', '3.8']

steps:
- uses: actions/checkout@v2
Expand All @@ -33,4 +33,4 @@ jobs:
pip install .
pip install -r requirements-test.txt
- name: Test with pytest
run: make test
run: make test
8 changes: 6 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,14 @@ help:
models: models_eynollah

models_eynollah: models_eynollah.tar.gz
tar xf models_eynollah.tar.gz
# tar xf models_eynollah_renamed.tar.gz --transform 's/models_eynollah_renamed/models_eynollah/'
# tar xf models_eynollah_renamed.tar.gz
tar xf 2022-04-05.SavedModel.tar.gz --transform 's/models_eynollah_renamed/models_eynollah/'

models_eynollah.tar.gz:
wget 'https://qurator-data.de/eynollah/models_eynollah.tar.gz'
# wget 'https://qurator-data.de/eynollah/2021-04-25/models_eynollah.tar.gz'
# wget 'https://qurator-data.de/eynollah/2022-04-05/models_eynollah_renamed.tar.gz'
wget 'https://ocr-d.kba.cloud/2022-04-05.SavedModel.tar.gz'

# Install with pip
install:
Expand Down
167 changes: 69 additions & 98 deletions README.md

Large diffs are not rendered by default.

61 changes: 52 additions & 9 deletions qurator/eynollah/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
"-i",
help="image filename",
type=click.Path(exists=True, dir_okay=False),
required=True,
)
@click.option(
"--out",
Expand All @@ -19,6 +18,12 @@
type=click.Path(exists=True, file_okay=False),
required=True,
)
@click.option(
"--dir_in",
"-di",
help="directory of images",
type=click.Path(exists=True, file_okay=False),
)
@click.option(
"--model",
"-m",
Expand Down Expand Up @@ -49,6 +54,12 @@
help="if a directory is given, all plots needed for documentation will be saved there",
type=click.Path(exists=True, file_okay=False),
)
@click.option(
"--save_page",
"-sp",
help="if a directory is given, page crop of image will be saved there",
type=click.Path(exists=True, file_okay=False),
)
@click.option(
"--enable-plotting/--disable-plotting",
"-ep/-noep",
Expand All @@ -65,7 +76,13 @@
"--curved-line/--no-curvedline",
"-cl/-nocl",
is_flag=True,
help="if this parameter set to true, this tool will try to return contoure of textlines instead of rectabgle bounding box of textline. This should be taken into account that with this option the tool need more time to do process.",
help="if this parameter set to true, this tool will try to return contoure of textlines instead of rectangle bounding box of textline. This should be taken into account that with this option the tool need more time to do process.",
)
@click.option(
"--textline_light/--no-textline_light",
"-tll/-notll",
is_flag=True,
help="if this parameter set to true, this tool will try to return contoure of textlines instead of rectangle bounding box of textline with a faster method.",
)
@click.option(
"--full-layout/--no-full-layout",
Expand All @@ -92,11 +109,23 @@
help="if this parameter set to true, this tool would check the scale and if needed it will scale it to perform better layout detection",
)
@click.option(
"--headers-off/--headers-on",
"--headers_off/--headers-on",
"-ho/-noho",
is_flag=True,
help="if this parameter set to true, this tool would ignore headers role in reading order",
)
@click.option(
"--light_version/--original",
"-light/-org",
is_flag=True,
help="if this parameter set to true, this tool would use lighter version",
)
@click.option(
"--ignore_page_extraction/--extract_page_included",
"-ipe/-epi",
is_flag=True,
help="if this parameter set to true, this tool would ignore page extraction",
)
@click.option(
"--log-level",
"-l",
Expand All @@ -106,49 +135,63 @@
def main(
image,
out,
dir_in,
model,
save_images,
save_layout,
save_deskewed,
save_all,
save_page,
enable_plotting,
allow_enhancement,
curved_line,
textline_light,
full_layout,
tables,
input_binary,
allow_scaling,
headers_off,
light_version,
ignore_page_extraction,
log_level
):
if log_level:
setOverrideLogLevel(log_level)
initLogging()
if not enable_plotting and (save_layout or save_deskewed or save_all or save_images or allow_enhancement):
print("Error: You used one of -sl, -sd, -sa, -si or -ae but did not enable plotting with -ep")
if not enable_plotting and (save_layout or save_deskewed or save_all or save_page or save_images or allow_enhancement):
print("Error: You used one of -sl, -sd, -sa, -sp, -si or -ae but did not enable plotting with -ep")
sys.exit(1)
elif enable_plotting and not (save_layout or save_deskewed or save_all or save_page or save_images or allow_enhancement):
print("Error: You used -ep to enable plotting but set none of -sl, -sd, -sa, -sp, -si or -ae")
sys.exit(1)
elif enable_plotting and not (save_layout or save_deskewed or save_all or save_images or allow_enhancement):
print("Error: You used -ep to enable plotting but set none of -sl, -sd, -sa, -si or -ae")
if textline_light and not light_version:
print('Error: You used -tll to enable light textline detection but -light is not enabled')
sys.exit(1)
eynollah = Eynollah(
image_filename=image,
dir_out=out,
dir_in=dir_in,
dir_models=model,
dir_of_cropped_images=save_images,
dir_of_layout=save_layout,
dir_of_deskewed=save_deskewed,
dir_of_all=save_all,
dir_save_page=save_page,
enable_plotting=enable_plotting,
allow_enhancement=allow_enhancement,
curved_line=curved_line,
textline_light=textline_light,
full_layout=full_layout,
tables=tables,
input_binary=input_binary,
allow_scaling=allow_scaling,
headers_off=headers_off,
light_version=light_version,
ignore_page_extraction=ignore_page_extraction,
)
pcgts = eynollah.run()
eynollah.writer.write_pagexml(pcgts)
eynollah.run()
#pcgts = eynollah.run()
##eynollah.writer.write_pagexml(pcgts)

if __name__ == "__main__":
main()
Loading