Skip to content

0.16.21

Compare
Choose a tag to compare
@plutasnyy plutasnyy released this 17 Feb 16:01
· 3 commits to main since this release
3403db1

Enhancements

  • Use password to load PDF with all modes

  • use vectorized logic to merge inferred and extracted layouts. Using the new LayoutElements data structure and numpy library to refactor the layout merging logic to improve compute performance as well as making logic more clear

  • Add PDF Miner configuration Now PDF Miner can be configured via pdfminer_line_overlap, pdfminer_word_margin, pdfminer_line_margin and pdfminer_char_margin parameters added to partition method.

Features

Fixes

  • Fix file type detection for NDJSON files NDJSON files were being detected as JSON due to having the same mime-type.