Releases: dhdaines/playa
Releases · dhdaines/playa
PLAYA-PDF 0.2.9: Final (really) 0.2 release
What's Changed
- fix: Support the all-important empty name object
- feat!: Break the CLI again (ZeroVer YOLO) to better support page ranges
- feat: Support some limited and lossy text extraction in the CLI
- feat: Add necessary
.doc
property to page list - fix: Correct type annotations for page list
Full Changelog: v0.2.8...v0.2.9
PLAYA-PDF 0.2.8: Much Improved Parallelism (M. I. P.)
PLAYA-PDF 0.2.7: Definitive 0.2.x release
What's Changed
- Remove most uses of
Typing.cast
by @dhdaines in #37 - Optimize text placement (some dare call it "rendering") by @dhdaines in #38
- Fix font size and rotated/skewed bounding boxes by @dhdaines in #39
- fix: deprecate layout in CLI right away and do other useful stuff by @dhdaines in #40
- Correctly implement ToUnicode according to the PDF standard and not that bogus technical note (that the PDF standard refers to...) by @dhdaines in #41
- feat: support slices and tuples in page list by @dhdaines in #42
- Optimize text extraction a bit more by @dhdaines in #43
- Make text less Lazy 😥 by @dhdaines in #47
- Treat marked content sections (more) correctly
- fix: recognize junk before header and compensate (fixes: #46) by @dhdaines in #48
Full Changelog: v0.2.6...v0.2.7
PLAYA-PDF 0.2.6: New year, new acronym
What's Changed
- ci: test on windows and mac by @dhdaines in #33
- Support parallel operations over pages by @dhdaines in #36
- Partially correct the handling of some types of CMaps (not fully correct though)
Full Changelog: v0.2.5...v0.2.6
PLAYA-PDF 0.2.5: Bug fixes and improvements
What's Changed
- Fix various bugs in the lazy API
- Add specialized
__len__
methods toContentObject
classes - Clarify iteration over
ContentObject
- Add specialized
- Fix installation of playa-pdf[crypto]
- Fix attribute classes in structure tree elements
- Deprecate "user" device space to avoid confusion with user space
- Parse embedded CMaps (mostly)
- Update
pdfplumber
support - Add parser for object streams and iterator over all indirect objects
in a document
Full Changelog: v0.2.4...v0.2.5
v0.2.4
PLAYA-PDF 0.2.3: Release early and often (before vacation)
What's Changed
- Require a newline before EI to fix various inline images by @dhdaines in #25
- Refactoring the CMap parser missed a very important corner case (which somehow mypy did not flag?)
structtree
property did not actually exist onDocument
andPage
(oops!)
Full Changelog: v0.2.2...v0.2.3
PLAYA-PDF 0.2.2: Make it go fast again
PLAYA-PDF 0.2.1: Fix some bugs
What's Changed
- Fix the RLE implementation by @dhdaines in #19 (originally pdfminer/pdfminer.six#1055 by @helpmefindaname)
- Report the actual device space bounding box for rotated text by @dhdaines in #20
- Prevent endless looping on bogus stream length and other EOFs by @dhdaines in #21
Full Changelog: v0.2...v0.2.1
PLAYA-PDF 0.2: Break all the APIs
What's Changed
- Support TIFF predictor on image streams by @dhdaines in #18 (originally from pdfminer/pdfminer.six#1058 by @helpmefindaname)
- Support different "device spaces" (screen, page, and default user space)
- expose form XObjects on Page to allow getting only their contents
- expose form XObject IDs in LayoutDict
- make TextState conform to PDF spec (leading and line matrix) and document it
- expose more of TextState in LayoutDict (render mode in particular)
- do not try to map characters with no ToUnicode and no Encoding
- properly support Pattern color space (uncolored tiling patterns) the
way pdfplumber expects it to work - support marked content points as ContentObjects
- document ContentObjects
- make a proper schema for LayoutDict, document it, and communicate it to Polars
- separate color values and patterns in LayoutDict
Full Changelog: v0.1.2...v0.2