From e5fab217befef1e12e8dcbb8ed6714a1bd94758e Mon Sep 17 00:00:00 2001 From: Ronny H <138828701+ron-unstructured@users.noreply.github.com> Date: Fri, 8 Mar 2024 09:54:13 -0800 Subject: [PATCH] Unstructured v0.12.6 release (#2626) ## 0.12.6 ### Enhancements * **Improve ability to capture embedded links in `partition_pdf()` for `fast` strategy** Previously, a threshold value that affects the capture of embedded links was set to a fixed value by default. This allows users to specify the threshold value for better capturing. * **Refactor `add_chunking_strategy` decorator to dispatch by name.** Add `chunk()` function to be used by the `add_chunking_strategy` decorator to dispatch chunking call based on a chunking-strategy name (that can be dynamic at runtime). This decouples chunking dispatch from only those chunkers known at "compile" time and enables runtime registration of custom chunkers. ### Features * **Added Unstructured Platform Documentation** The Unstructured Platform is currently in beta. The documentation provides how-to guides for setting up workflow automation, job scheduling, and configuring source and destination connectors. ### Fixes * **Partitioning raises on file-like object with `.name` not a local file path.** When partitioning a file using the `file=` argument, and `file` is a file-like object (e.g. io.BytesIO) having a `.name` attribute, and the value of `file.name` is not a valid path to a file present on the local filesystem, `FileNotFoundError` is raised. This prevents use of the `file.name` attribute for downstream purposes to, for example, describe the source of a document retrieved from a network location via HTTP. * **Fix SharePoint dates with inconsistent formatting** Adds logic to conditionally support dates returned by office365 that may vary in date formatting or may be a datetime rather than a string. * **Include warnings** about the potential risk of installing a version of `pandoc` which does not support RTF files + instructions that will help resolve that issue. * **Incorporate the `install-pandoc` Makefile recipe** into relevant stages of CI workflow, ensuring it is a version that supports RTF input files. * **Fix Google Drive source key** Allow passing string for source connector key. * **Fix table structure evaluations calculations** Replaced special value `-1.0` with `np.nan` and corrected rows filtering of files metrics basing on that. * **Fix Sharepoint-with-permissions test** Ignore permissions metadata, update test. * **Fix table structure evaluations for edge case** Fixes the issue when the prediction does not contain any table - no longer errors in such case. --- CHANGELOG.md | 2 +- unstructured/__version__.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9dac5a6702..51c184ff3e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,4 @@ -## 0.12.6-dev9 +## 0.12.6 ### Enhancements diff --git a/unstructured/__version__.py b/unstructured/__version__.py index f072482913..9e269170ee 100644 --- a/unstructured/__version__.py +++ b/unstructured/__version__.py @@ -1 +1 @@ -__version__ = "0.12.6-dev9" # pragma: no cover +__version__ = "0.12.6" # pragma: no cover