Skip to content

0.12.3

Compare
Choose a tag to compare
@ahmetmeleq ahmetmeleq released this 29 Jan 14:41
· 492 commits to main since this release
4e226d8

Enhancements

  • Driver for MongoDB connector. Adds a driver with unstructured version information to the
    MongoDB connector.

Features

  • Add Databricks Volumes destination connector Databricks Volumes connector added to ingest CLI. Users may now use unstructured-ingest to write partitioned data to a Databricks Volumes storage service.

Fixes

  • Fix support for different Chipper versions and prevent running PDFMiner with Chipper
  • Treat YAML files as text. Adds YAML MIME types to the file detection code and treats those
    files as text.
  • Fix FSSpec destination connectors check_connection. FSSpec destination connectors did not use check_connection. There was an error when trying to ls destination directory - it may not exist at the moment of connector creation. Now check_connection calls ls on bucket root and this method is called on initialize of destination connector.
  • Fix databricks-volumes extra location. setup.py is currently pointing to the wrong location for the databricks-volumes extra requirements. This results in errors when trying to build the wheel for unstructured. This change updates to point to the correct path.
  • Fix uploading None values to Chroma and Pinecone. Removes keys with None values with Pinecone and Chroma destinations. Pins Pinecone dependency
  • Update documentation. (i) best practice for table extration by using 'skip_infer_table_types' param, instead of 'pdf_infer_table_structure', and (ii) fixed CSS, RST issues and typo in the documentation.
  • Fix postgres storage of link_texts. Formatting of link_texts was breaking metadata storage.