You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great if when instantiating the PDFLoader we could pass in a LinearizationConfig object so that we could add things like ignoring footers, headers, and any other of the many available configurations for the linearizer.
Motivation
The motivation behind this is because once the document is loaded I only get the strings for each page as output so I lose the option to customize anything on my end based on the textract output itself cause that is used and is not stored anywhere for reference.
Proposal (If applicable)
Add a new parameter to AmazonTextractPDFParser called linearization_config that is not required but if passed it will be used instead of the one that is instantiated inside of it.
Add the same parameter to AmazonTextractPDFLoader and pass it down to the AmazonTextractPDFParser when instantiating it.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Checked
Feature request
Currently the AmazonTextractPDFLoader uses a default set of configurations for how to linearize the output from textract.
It instantiates the TextLinearizationConfig internally without the ability to receive parameters from the caller:
It would be great if when instantiating the PDFLoader we could pass in a LinearizationConfig object so that we could add things like ignoring footers, headers, and any other of the many available configurations for the linearizer.
Motivation
The motivation behind this is because once the document is loaded I only get the strings for each page as output so I lose the option to customize anything on my end based on the textract output itself cause that is used and is not stored anywhere for reference.
Proposal (If applicable)
Add a new parameter to
AmazonTextractPDFParser
calledlinearization_config
that is not required but if passed it will be used instead of the one that is instantiated inside of it.Add the same parameter to
AmazonTextractPDFLoader
and pass it down to the AmazonTextractPDFParser when instantiating it.Beta Was this translation helpful? Give feedback.
All reactions