Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random numbers being added to text output #446

Closed
Victor239 opened this issue May 20, 2023 · 2 comments
Closed

Random numbers being added to text output #446

Victor239 opened this issue May 20, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Victor239
Copy link

What happened?

Using this image:

Produces this text:

After a visit to ruined Berlin, he wrote
his wife on July 21, 1945: Berlin
5	1	1	1	2	8	1938	289	246	93	96.806129	gave

How did you install NormCap?

FlatPak (Linux)

Operating System + Version?

Kubuntu 23.04

[Linux only] Display Server (DS) + Desktop environment (DE)?

DS: Wayland, DE: KDE Plasma

Debug log output?

flatpak run --command=normcap com.github.dynobo.normcap -v debug
12:56:44 - INFO    - normcap:29 - Start NormCap v0.4.1
12:56:44 - DEBUG   - normcap:84 - Set QT_QPA_PLATFORM=wayland
12:56:44 - DEBUG   - normcap.gui.tray:58 - System info:
{'cli_args': '/app/bin/normcap -v debug', 'is_briefcase_package': False, 'is_flatpak_package': True, 'platform': 'linux', 'pyside6_version': '6.4.2', 'qt_version': '6.4.2', 'qt_library_path': '/usr/share/runtime/lib/plugins, /app/lib/python3.10/site-packages/PySide6/Qt/plugins, /usr/bin', 'config_directory': PosixPath('/home/boss/.var/app/com.github.dynobo.normcap/config/normcap'), 'normcap_version': '0.4.1', 'ressources_path': PosixPath('/app/lib/python3.10/site-packages/normcap/resources'), 'tesseract_path': PosixPath('/app/bin/tesseract'), 'tessdata_path': PosixPath('/home/boss/.var/app/com.github.dynobo.normcap/config/normcap/tessdata'), 'envs': {'TESSDATA_PREFIX': '/app/share', 'LD_LIBRARY_PATH': ''}, 'desktop_environment': <DesktopEnvironment.KDE: 2>, 'display_manager_is_wayland': True, 'screens': [Screen(is_primary=True, device_pixel_ratio=1.0, rect=Rect(left=1920, top=0, right=3840, bottom=1080), index=0, screenshot=None), Screen(is_primary=False, device_pixel_ratio=1.0, rect=Rect(left=0, top=0, right=1920, bottom=1080), index=1, screenshot=None)]}
12:56:44 - DEBUG   - normcap.gui.tray:332 - Listen on local socket v0.4.1-normcap.
12:56:44 - DEBUG   - normcap.gui.settings:128 - Skip update of non existing setting (cli_mode: False)
12:56:44 - DEBUG   - normcap.screengrab:45 - Select capture method DBUS portal
12:56:44 - DEBUG   - normcap.screengrab.dbus_portal:194 - Request screenshot with interactive=False
12:56:44 - DEBUG   - normcap.screengrab.dbus_portal:77 - Request accepted
12:56:45 - DEBUG   - normcap.screengrab.dbus_portal:104 - Parse response
12:56:45 - DEBUG   - normcap.screengrab.utils:26 - Virtual geometry width: 3840
12:56:45 - DEBUG   - normcap.screengrab.utils:27 - Image width: 3840
12:56:45 - DEBUG   - normcap.screengrab.utils:28 - Resize ratio: 1.0
12:56:45 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1684580205.4574864_raw_screen0.png
12:56:45 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1684580205.5454798_raw_screen1.png
12:56:45 - DEBUG   - normcap.gui.window:131 - Create window for screen 0
12:56:45 - DEBUG   - normcap.gui.window:193 - Set window of screen 0 to fullscreen
12:56:45 - DEBUG   - normcap:211 - [QT] qtwarningmsg - qsystemtrayicon::setvisible: no icon set
12:56:45 - DEBUG   - normcap.gui.window:184 - Move window 0 to (left=1920, top=0, right=3840, bottom=1080)
12:56:45 - WARNING - normcap.gui.window:108 - Invalid dbus interface on KDE
12:56:45 - DEBUG   - normcap.gui.window:131 - Create window for screen 1
12:56:45 - DEBUG   - normcap.gui.window:193 - Set window of screen 1 to fullscreen
12:56:46 - DEBUG   - normcap.gui.window:184 - Move window 1 to (left=0, top=0, right=1920, bottom=1080)
12:56:46 - WARNING - normcap.gui.window:108 - Invalid dbus interface on KDE
12:56:46 - DEBUG   - normcap.ocr.tesseract:23 - Tesseract command output:
List of available languages in "/home/boss/.var/app/com.github.dynobo.normcap/config/normcap/tessdata/" (6):
ara
chi_sim
deu
eng
rus
spa
12:56:53 - DEBUG   - normcap.gui.tray:289 - Hide 2 windows
12:56:53 - INFO    - normcap.gui.tray:192 - Crop image to region (562, 370, 1247, 469)
12:56:53 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1684580213.1804812_cropped.png
12:56:53 - DEBUG   - normcap.gui.tray:217 - Start OCR
12:56:53 - DEBUG   - normcap.ocr.enhance:76 - Scale image x3.2
12:56:53 - DEBUG   - normcap.ocr.enhance:54 - Pad image by 80px
12:56:53 - DEBUG   - normcap.ocr.recognize:35 - Run Tesseract on image of size (2346, 476) with args:
TessArgs(tessdata_path=PosixPath('/home/boss/.var/app/com.github.dynobo.normcap/config/normcap/tessdata'), lang='ara+eng', oem=<OEM.DEFAULT: 3>, psm=<PSM.AUTO_OSD: 1>)
12:56:53 - DEBUG   - normcap.ocr.tesseract:23 - Tesseract command output:

12:56:53 - DEBUG   - normcap.ocr.recognize:44 - OCR result:
OcrResult(tess_args=TessArgs(tessdata_path=PosixPath('/home/boss/.var/app/com.github.dynobo.normcap/config/normcap/tessdata'), lang='ara+eng', oem=<OEM.DEFAULT: 3>, psm=<PSM.AUTO_OSD: 1>), words=[{'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 1, 'left': 139, 'top': 93, 'width': 277, 'height': 100, 'conf': 95.711533, 'text': 'After'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 2, 'left': 453, 'top': 124, 'width': 53, 'height': 68, 'conf': 95.219055, 'text': 'a'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 3, 'left': 546, 'top': 99, 'width': 218, 'height': 93, 'conf': 95.789696, 'text': 'visit'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 4, 'left': 802, 'top': 107, 'width': 109, 'height': 85, 'conf': 96.472221, 'text': 'to'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 5, 'left': 955, 'top': 96, 'width': 345, 'height': 97, 'conf': 96.493546, 'text': 'ruined'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 6, 'left': 1350, 'top': 94, 'width': 340, 'height': 115, 'conf': 96.121735, 'text': 'Berlin,'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 7, 'left': 1737, 'top': 95, 'width': 127, 'height': 97, 'conf': 96.665634, 'text': 'he'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 1, 'word_num': 8, 'left': 1904, 'top': 107, 'width': 319, 'height': 85, 'conf': 96.553108, 'text': 'wrote'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 2, 'word_num': 1, 'left': 150, 'top': 261, 'width': 144, 'height': 99, 'conf': 96.598412, 'text': 'his'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 2, 'word_num': 2, 'left': 333, 'top': 262, 'width': 226, 'height': 98, 'conf': 96.26091, 'text': 'wife'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 2, 'word_num': 3, 'left': 600, 'top': 289, 'width': 133, 'height': 71, 'conf': 96.51947, 'text': 'on'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 2, 'word_num': 4, 'left': 770, 'top': 261, 'width': 207, 'height': 120, 'conf': 96.355522, 'text': 'July'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 2, 'word_num': 5, 'left': 1017, 'top': 267, 'width': 158, 'height': 108, 'conf': 96.249504, 'text': '21,'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 2, 'word_num': 6, 'left': 1224, 'top': 267, 'width': 295, 'height': 93, 'conf': 95.926888, 'text': '1945:'}, {'level': 5, 'page_num': 1, 'block_num': 1, 'par_num': 1, 'line_num': 2, 'word_num': 7, 'left': 1566, 'top': 261, 'width': 365, 'height': 98, 'conf': 95.926888, 'text': 'Berlin\n5\t1\t1\t1\t2\t8\t1971\t289\t246\t92\t96.436195\tgave\n'}], image=<PySide6.QtGui.QImage(QSize(2346, 476),format=QImage::Format_RGB32,depth=32,devicePixelRatio=1,bytesPerLine=9384,sizeInBytes=4466784) at 0x7f819cc41780>, magic_scores={}, parsed='')
12:56:53 - INFO    - normcap.ocr.magics.email_magic:33 - 0 emails found 
12:56:53 - DEBUG   - normcap.ocr.magics.email_magic:41 - 0/104 (0.0) chars in emails
12:56:53 - INFO    - normcap.ocr.magics.url_magic:55 - 0 URLs found 
12:56:53 - DEBUG   - normcap.ocr.magics.url_magic:63 - 0/117 (0.0) chars in urls
12:56:53 - DEBUG   - normcap.ocr.magics.magic:70 - Magic scores:
{'SingleLineMagic': 0, 'MultiLineMagic': 50.0, 'ParagraphMagic': 0.0, 'EmailMagic': 0.0, 'UrlMagic': 0.0}
12:56:53 - DEBUG   - normcap.ocr.recognize:48 - Parsed text:
After a visit to ruined Berlin, he wrote
his wife on July 21, 1945: Berlin
5       1       1       1       2       8       1971    289     246     92      96.436195       gave

12:56:53 - DEBUG   - normcap.gui.utils:22 - Save debug image as /tmp/normcap/1684580213.755709_enhanced.png
12:56:53 - INFO    - normcap.gui.tray:235 - Text from OCR:
After a visit to ruined Berlin, he wrote
his wife on July 21, 1945: Berlin
5       1       1       1       2       8       1971    289     246     92      96.436195       gave

12:56:53 - DEBUG   - normcap.clipboard.linux:33 - Select clipboard method wl-copy
12:56:53 - DEBUG   - normcap.gui.tray:265 - Copy text to clipboard
12:56:53 - DEBUG   - normcap.gui.notifier:111 - Send notification via QT
12:56:59 - INFO    - normcap.gui.tray:497 - Exit normcap (notification sent delaying exit)
12:56:59 - DEBUG   - normcap.gui.tray:498 - Debug images saved in /tmp/normcap
@Victor239 Victor239 added bug Something isn't working triage Needs confirmation and priotization labels May 20, 2023
@dynobo
Copy link
Owner

dynobo commented May 21, 2023

@Victor239 , I can reproduce the issue, thanks a lot for submitting this example! I'm pretty sure it has to do with the doublequote ". I'll bet this doesn't get escaped correctly which leads to word-metadata (position, accuracy. etc) leaking into the output. I'm on it!

@dynobo dynobo removed the triage Needs confirmation and priotization label May 21, 2023
@dynobo dynobo self-assigned this May 21, 2023
@dynobo
Copy link
Owner

dynobo commented May 21, 2023

Should be fixed in the latest v0.4.2 (It might take a while to appear on flathub, though).

Thanks again, and please feel free to re-open, if issue persists :-)

@dynobo dynobo closed this as completed May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants