Replies: 2 comments 1 reply
-
Hello @mophilly! Sorry for the delay, was a contractor for this and got too busy. Gonna take care of all the rest!
Hmm. Should not happen and should go transparent if the content is not too complex.\
Also, use Docling if possible, with vision, everything will be added into the response and the results will be pretty much ok. If stratregies dont work, like paginate, its odd, means the model is too complex. Don't worry, no money needed :) |
Beta Was this translation helpful? Give feedback.
-
I will try Docling. The model is rather complex. I used classes with sub-classes to follow the structure of the source documents. That may need to be refactored.
So the pydantic model follows that using references.
And so on. The transaction detail table is a sparse matrix, creating a spreadsheet-like presentation of sale amounts, deductions and adjustments. |
Beta Was this translation helpful? Give feedback.
-
I have several sample PDF documents that a dense financial statements. Each is over 60 pages long. Each page has a header that repeats document id's and column headings.
I have a script that can process correctly smaller documents with the same layout and two or three pages. Well, still wrangling with retrieving some values in correct position.
For the longer document the LLM hit limits. So I tried completion strategy, paginate and concatenate. For these the pydantic validation raises a great many errors and aborts the process.
I am looking into increasing limits at the ai companies, to see if that helps.
For now I am stuck. Any advice?
P.S. I am happy to buy a coffee or some $$ for one-on-one help.
Beta Was this translation helpful? Give feedback.
All reactions