Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Crash with pytesseract.pytesseract.TesseractError - Error opening data file #353

Closed
tio-trom opened this issue Jan 20, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@tio-trom
Copy link

tio-trom commented Jan 20, 2023

This is the output

15:07:28 - CRITICAL - normcap.gui.utils:168 - Uncaught exception! Quitting NormCap!

System:

{  'cli_args': '/usr/bin/normcap',
   'config_directory': PosixPath('/home/REDACTED/.config/normcap'),
   'desktop_environment': <DesktopEnvironment.OTHER: 0>,
   'display_manager_is_wayland': False,
   'envs': {  'LD_LIBRARY_PATH': None,
              'TESSDATA_PREFIX': None,
              'TESSERACT_CMD': None,
              'TESSERACT_VERSION': None},
   'gnome_version': None,
   'is_flatpak_package': False,
   'is_prebuild_package': None,
   'normcap_version': '0.3.15',
   'platform': 'linux',
   'pyside6_version': '6.4.1',
   'qt_library_path': '/usr/lib/qt6/plugins, /usr/bin',
   'qt_version': '6.4.1',
   'screens': {  0: Screen(is_primary=True,
                           device_pixel_ratio=1.0,
                           geometry=Rect(left=0,
                                         top=0,
                                         right=1920,
                                         bottom=1200),
                           index=0,
                           screenshot=None),
                 1: Screen(is_primary=False,
                           device_pixel_ratio=1.0,
                           geometry=Rect(left=1920,
                                         top=0,
                                         right=3840,
                                         bottom=1080),
                           index=1,
                           screenshot=None)},
   'tessdata_path': None}

Variables:

   '_capture_to_ocr': {'language': (...,), 'self': 'REDACTED'},
   'image_to_data': {  'args': [...],
                       'config': '-c tessedit_create_tsv=1 --oem 3 --psm 1',
                       'image': <PIL.Image.Image image mode=RGB size=1209x428 at 0x7F33749B71F0>,
                       'lang': 'eng',
                       'nice': 0,
                       'output_type': 'dict',
                       'pandas_config': None,
                       'timeout': 30},
   'recognize': {  'image': <PIL.Image.Image image mode=RGB size=1209x428 at 0x7F33749B71F0>,
                   'languages': (...,),
                   'padding_size': 80,
                   'parse': True,
                   'resize_factor': 3.2,
                   'tess_args': TessArgs(path=None,
                                         lang='eng',
                                         oem=<OEM.DEFAULT: 3>,
                                         psm=<PSM.AUTO_OSD: 1>,
                                         version=<Version('5.3.0')>),
                   'tessdata_path': None},
   'run_and_get_output': {  'config': '-c tessedit_create_tsv=1 --oem 3 --psm '
                                      '1',
                            'extension': 'tsv',
                            'image': <PIL.Image.Image image mode=RGB size=1209x428 at 0x7F33749B71F0>,
                            'input_filename': '/tmp/tess_ertnipgv_input.PNG',
                            'kwargs': {...},
                            'lang': 'eng',
                            'nice': 0,
                            'return_bytes': False,
                            'temp_name': '/tmp/tess_ertnipgv',
                            'timeout': 30},
   'run_tesseract': {  'cmd_args': [...],
                       'config': '-c tessedit_create_tsv=1 --oem 3 --psm 1',
                       'error_string': b'Error opening data file /usr/share/t'
                                       b'essdata/eng.traineddata\nPlease make '
                                       b'sure the TESSDATA_PREFIX environment'
                                       b' variable is set to your "tessdata" '
                                       b"directory.\nFailed loading language '"
                                       b"eng'\nTesseract couldn't load any lan"
                                       b'guages!\nCould not initialize tessera'
                                       b'ct.\n',
                       'extension': 'tsv',
                       'input_filename': '/tmp/tess_ertnipgv_input.PNG',
                       'lang': 'eng',
                       'nice': 0,
                       'output_filename_base': '/tmp/tess_ertnipgv',
                       'proc': <Popen: returncode: 1 args: ['tesseract', '/tmp/tess_ertnipgv_input.PNG', '/...>,
                       'timeout': 30}}

Exception:

  pytesseract.pytesseract.TesseractError: (1, 'Error opening data file /usr/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

Traceback:

  File "/usr/lib/python3.10/site-packages/normcap/gui/tray.py", line 287, in _capture_to_ocr
    ocr_result = ocr.recognize(
  File "/usr/lib/python3.10/site-packages/normcap/ocr/recognize.py", line 39, in recognize
    tsv_data = pytesseract.image_to_data(
  File "/usr/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 527, in image_to_data
    return {
  File "/usr/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 533, in <lambda>
    Output.DICT: lambda: file_to_dict(run_and_get_output(*args), '\t', -1),
  File "/usr/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 288, in run_and_get_output
    run_tesseract(**kwargs)
  File "/usr/lib/python3.10/site-packages/pytesseract/pytesseract.py", line 264, in run_tesseract
    raise TesseractError(proc.returncode, get_errors(error_string))

15:07:28 - CRITICAL - normcap.gui.utils:235 - Please open an issue with the output above on https://github.com/dynobo/normcap/issues
~ >>>                               

It allows me to select a region on the screen then nothing happens. I am using the AUR package.

@tio-trom tio-trom changed the title Cannot amke it work in XFCE: CRITICAL - normcap.gui.utils:235 Cannot make it work in XFCE: CRITICAL - normcap.gui.utils:235 Jan 20, 2023
@dynobo
Copy link
Owner

dynobo commented Jan 21, 2023

Hi @tio-trom, thanks for reporting this issue!

Could you please run the following command in a terminal and post its output?

tesseract --list-langs

(This is to verify that tesseract is installed on your system correctly and find out which languages are available.)

@tio-trom
Copy link
Author

For sure:

List of available languages in "/usr/share/tessdata/" (2):
afr
osd

@dynobo dynobo self-assigned this Jan 22, 2023
@dynobo dynobo added the bug Something isn't working label Jan 22, 2023
@dynobo
Copy link
Owner

dynobo commented Jan 22, 2023

Oh, I think I see the problem! Tesseract and languages are correctly installed on your system. The problem is, that NormCap's language is by default set to eng (English), without verifying, if that is available. Instead, it should check, which languages are available, pick English, if it is there, and fallback to a different available language. This fallback is currently not implemented.

Could you please try the following workaround?

  1. Installing the english language via sudo pacman -Syu tesseract-data-eng-
  2. (Re-)start NormCap
  3. In NormCap's settings (cogwheel-icon in top right) select your desired language afr
  4. Select a region with text to see if it get's recognized correctly
  5. (Optionally) remove the eng language again with sudo pacman -R tesseract-data-eng.

If this works for you, that would confirm the hypothesis and I can start implementing a prober solution.

@dynobo dynobo changed the title Cannot make it work in XFCE: CRITICAL - normcap.gui.utils:235 [Linux, XFCE] Crash with pytesseract.pytesseract.TesseractError - Error opening data file Jan 22, 2023
@dynobo dynobo changed the title [Linux, XFCE] Crash with pytesseract.pytesseract.TesseractError - Error opening data file [Linux] Crash with pytesseract.pytesseract.TesseractError - Error opening data file Jan 22, 2023
@tio-trom
Copy link
Author

Hi,

Now the issue is that the AUR package I was using pushed an update yesterday and I cannot open Normcap at all. I get this error:

https://aur.archlinux.org/packages/normcap#comment-898729

Maybe it is AUR related so I have to wait for a fix from them? I would be very happy to test it further.

@dynobo
Copy link
Owner

dynobo commented Jan 22, 2023

Yeah, it's the AUR package. I get the same error. You either have to wait or try the AppImage in the meanwhile. Thanks for your help!

@tio-trom
Copy link
Author

The AUR package got updated. Now all works perfectly fine! Awesome little application. Fantastic even. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants