Skip to content

Releases: Unstructured-IO/unstructured

0.4.16

28 Feb 04:50
5eaf449
Compare
Choose a tag to compare

0.4.16

Enhancements

  • Fallback to using file extensions for filetype detection if libmagic is not present

Features

  • Added setup script for Ubuntu
  • Added GitHub connector for ingest cli.
  • Added partition_md partitioner.
  • Added Reddit connector for ingest cli.

Fixes

  • Initializes connector properly in ingest.main::MainProcess
  • Restricts version of unstructured-inference to avoid multithreading issue

0.4.15

23 Feb 21:59
0d229f0
Compare
Choose a tag to compare

0.4.15

Enhancements

  • Added elements_to_json and elements_from_json for easier serialization/deserialization
  • convert_to_dict, dict_to_elements and convert_to_csv are now aliases for functions
    that use the ISD terminology.

Fixes

  • Update to ensure all elements are preserved during serialization/deserialization

0.4.14

23 Feb 17:25
354eff1
Compare
Choose a tag to compare

0.4.14

  • Automatically install nltk models in the tokenize module.

0.4.13

23 Feb 05:33
83f0454
Compare
Choose a tag to compare

0.4.13

  • Fixes unstructured-ingest cli.

0.4.12

23 Feb 03:54
80c0fab
Compare
Choose a tag to compare

0.4.12

  • Adds console_entrypoint for unstructured-ingest, other structure/doc updates related to ingest.
  • Add parser parameter to partition_html.

0.4.11

17 Feb 17:12
601f250
Compare
Choose a tag to compare

0.4.11

  • Adds partition_doc for partitioning Word documents in .doc format. Requires libreoffice.
  • Adds partition_ppt for partitioning PowerPoint documents in .ppt format. Requires libreoffice.

0.4.10

16 Feb 17:26
f5ff140
Compare
Choose a tag to compare

0.4.10

  • Fixes ElementMetadata so that it's JSON serializable when the filename is a Path object.

0.4.9

15 Feb 18:27
74e6b84
Compare
Choose a tag to compare

0.4.9

  • Added ingest modules and s3 connector
  • Default to url=None for partition_pdf and partition_image
  • Add ability to skip English specific check by setting the UNSTRUCTURED_LANGUAGE env var to "".
  • Document Element objects now track metadata

0.4.8

13 Feb 19:32
a920e55
Compare
Choose a tag to compare

0.4.8

  • Modified XML and HTML parsers not to load comments.

0.4.7

10 Feb 16:40
962de78
Compare
Choose a tag to compare
  • Added the ability to pull an HTML document from a url in partition_html.
  • Added the the ability to get file summary info from lists of filenames and lists
    of file contents.
  • Added optional page break to partition for .pptx, .pdf, images, and .html files.
  • Added to_dict method to document elements.
  • Include more unicode quotes in replace_unicode_quotes.