You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
0.12.4
Enhancements
Apply New Version of black formatting The black library recently introduced a new major version that introduces new formatting conventions. This change brings code in the unstructured repo into compliance with the new conventions.
Move ingest imports to local scopes Moved ingest dependencies into local scopes to be able to import ingest connector classes without the need of installing imported external dependencies. This allows lightweight use of the classes (not the instances. to use the instances as intended you'll still need the dependencies).
Add support for .p7s filespartition_email can now process .p7s files. The signature for the signed message is extracted and added to metadata.
Fallback to valid content types for emails If the user selected content type does not exist on the email message, partition_email now falls back to anoter valid content type if it's available.
Features
Add .heic file partitioning .heic image files were previously unsupported and are now supported though partition_image()
Add the ability to specify an alternate OCR implementation by implementing an OCRAgent interface and specify it using OCR_AGENT environment variable.
Add Vectara destination connector Adds support for writing partitioned documents into a Vectara index.
Fixes
Fix partition_pdf() not working when using chipper model with file
Handle common incorrect arguments for languages and ocr_languages Users are regularly receiving errors on the API because they are defining ocr_languages or languages with additional quotationmarks, brackets, and similar mistakes. This update handles common incorrect arguments and raises an appropriate warning.
Default hi_res_model_name now relies on unstructured-inference When no explicit hi_res_model_name is passed into partition or partition_pdf_or_image the default model is picked by unstructured-inference's settings or os env variable UNSTRUCTURED_HI_RES_MODEL_NAME; it now returns the same model name regardless of infer_table_structure's value; this function will be deprecated in the future and the default model name will simply rely on unstructured-inference and will not consider os env in a future release.
Fix remove Vectara requirements from setup.py - there are no dependencies
Add missing dependency files to package manifest. Updates the file path for the ingest
dependencies and adds missing extra dependencies.
Fix remove Vectara requirements from setup.py - there are no dependencies
Add title to Vectara upload - was not separated out from initial connector
Fix change OpenSearch port to fix potential conflict with Elasticsearch in ingest test