-
Notifications
You must be signed in to change notification settings - Fork 353
Delay consumption of new files #46
Comments
If you scan the file somewhere else and move it into the consumption folder afterwards, does that work? That would mean the consumer is trying to read the file while its still being written, resulting in the error about no pages in the PDF. |
It's not consuming any PDF anymore (not even new ones) now. I also restarted the containers just to be sure. The following exception occurs: `tuple index out of range : Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): |
This is actually a pretty common issue when using watchdog as it does not (and will not) expose close_write events (gorakhargosh/watchdog#184). Especially with larger files (in my case from a Brother document scanner) one will run into this with almost every document.
That's probably unrelated, I just created #47 for that. |
@jayme-github, how did you solve this? I am running a while loop, checking if file_size still increases. It is still prone to errors as sometimes (large documents or network issues) my scanner apparently caches more, most of the time less. I guess there are smarter ways? :-) |
I switched to inotify ;-) |
Hmmm and then just use close_write? Damn, I should use that instead of create/move, too |
Update setup.rst
I might be a little too nervous but I'm just too excited. I just tested direct scanning into the consumption directory, which leads to the following warning in the log:
`11/25/20, 10:02 PM WARNING Thumbnail generation with ImageMagick failed, falling back to ghostscript. Check your /etc/ImageMagick-x/policy.xml!
11/25/20, 10:02 PM INFO Consuming doc20201125222904.pdf`
The stdout log of the webserver container shows the following:
`22:02:09 [Q] INFO Enqueued 1
22:02:09 [Q] INFO Process-1:1 processing [doc20201125222904.pdf]
Consuming doc20201125222904.pdf
Parser: RasterisedDocumentParser based on mime type application/pdf
Generating thumbnail for doc20201125222904.pdf...
Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -trim /usr/src/paperless/src/../consume/doc20201125222904.pdf[0] /tmp/paperless/paperless-3vtrk65p/convert.png
**** Error: Cannot find a 'startxref' anywhere in the file.
Output may be incorrect.
**** Error: An error occurred while reading an XREF table.
**** The file has been damaged. This may have been caused
**** by a problem while converting or transfering the file.
**** Ghostscript will attempt to recover the data.
**** However, the output may be incorrect.
**** Error: Trailer dictionary not found.
Output may be incorrect.
Requested FirstPage is greater than the number of pages in the file: 0
No pages will be processed (FirstPage > LastPage).
convert-im6.q16: no images defined
/tmp/paperless/paperless-3vtrk65p/convert.png' @ error/convert.c/ConvertImageCommand/3258. Thumbnail generation with ImageMagick failed, falling back to ghostscript. Check your /etc/ImageMagick-x/policy.xml! **** Error: Cannot find a 'startxref' anywhere in the file. Output may be incorrect. **** Error: An error occurred while reading an XREF table. **** The file has been damaged. This may have been caused **** by a problem while converting or transfering the file. **** Ghostscript will attempt to recover the data. **** However, the output may be incorrect. **** Error: Trailer dictionary not found. Output may be incorrect. No pages will be processed (FirstPage > LastPage). Execute: convert -density 300 -scale 500x5000> -alpha remove -strip -trim /tmp/paperless/paperless-3vtrk65p/gs_out.png /tmp/paperless/paperless-3vtrk65p/convert.png convert-im6.q16: unable to open image
/tmp/paperless/paperless-3vtrk65p/gs_out.png': No such file or directory @ error/blob.c/OpenBlob/2874.convert-im6.q16: no images defined `/tmp/paperless/paperless-3vtrk65p/convert.png' @ error/convert.c/ConvertImageCommand/3258.
Deleting directory /tmp/paperless/paperless-3vtrk65p
22:02:09 [Q] ERROR Failed [doc20201125222904.pdf] - Convert failed at ['convert', '-density', '300', '-scale', '500x5000>', '-alpha', 'remove', '-strip', '-trim', '/tmp/paperless/paperless-3vtrk65p/gs_out.png', '/tmp/paperless/paperless-3vtrk65p/convert.png'] : Traceback (most recent call last):
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 49, in get_thumbnail
logging_group=self.logging_group)
File "/usr/src/paperless/src/documents/parsers.py", line 107, in run_convert
raise ParseError("Convert failed at {}".format(args))
documents.parsers.ParseError: Convert failed at ['convert', '-density', '300', '-scale', '500x5000>', '-alpha', 'remove', '-strip', '-trim', '/usr/src/paperless/src/../consume/doc20201125222904.pdf[0]', '/tmp/paperless/paperless-3vtrk65p/convert.png']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/paperless/src/documents/consumer.py", line 132, in try_consume_file
thumbnail = document_parser.get_optimised_thumbnail()
File "/usr/src/paperless/src/documents/parsers.py", line 168, in get_optimised_thumbnail
return self.optimise_thumbnail(self.get_thumbnail())
File "/usr/src/paperless/src/paperless_tesseract/parsers.py", line 73, in get_thumbnail
logging_group=self.logging_group)
File "/usr/src/paperless/src/documents/parsers.py", line 107, in run_convert
raise ParseError("Convert failed at {}".format(args))
documents.parsers.ParseError: Convert failed at ['convert', '-density', '300', '-scale', '500x5000>', '-alpha', 'remove', '-strip', '-trim', '/tmp/paperless/paperless-3vtrk65p/gs_out.png', '/tmp/paperless/paperless-3vtrk65p/convert.png']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/django_q/cluster.py", line 436, in worker
res = f(*task["args"], **task["kwargs"])
File "/usr/src/paperless/src/documents/tasks.py", line 68, in consume_file
override_tag_ids=override_tag_ids)
File "/usr/src/paperless/src/documents/consumer.py", line 138, in try_consume_file
raise ConsumerError(e)
documents.consumer.ConsumerError: Convert failed at ['convert', '-density', '300', '-scale', '500x5000>', '-alpha', 'remove', '-strip', '-trim', '/tmp/paperless/paperless-3vtrk65p/gs_out.png', '/tmp/paperless/paperless-3vtrk65p/convert.png']
`
Is it possible to delay the consumption of new files while they're being written to avoid this?
The text was updated successfully, but these errors were encountered: