0.16.21
Enhancements
-
Use password to load PDF with all modes
-
use vectorized logic to merge inferred and extracted layouts. Using the new
LayoutElements
data structure and numpy library to refactor the layout merging logic to improve compute performance as well as making logic more clear -
Add PDF Miner configuration Now PDF Miner can be configured via
pdfminer_line_overlap
,pdfminer_word_margin
,pdfminer_line_margin
andpdfminer_char_margin
parameters added to partition method.
Features
Fixes
- Fix file type detection for NDJSON files NDJSON files were being detected as JSON due to having the same mime-type.