Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Unstructured-IO / unstructured Public

Notifications You must be signed in to change notification settings
Fork 820
Star 9.8k

Code
Issues 133
Pull requests 52
Discussions
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: Unstructured-IO/unstructured

Releases Tags

Releases · Unstructured-IO/unstructured

0.4.16

28 Feb 04:50

cragwolfe

0.4.16

5eaf449

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.16

Enhancements

Fallback to using file extensions for filetype detection if libmagic is not present

Features

Added setup script for Ubuntu
Added GitHub connector for ingest cli.
Added partition_md partitioner.
Added Reddit connector for ingest cli.

Fixes

Initializes connector properly in ingest.main::MainProcess
Restricts version of unstructured-inference to avoid multithreading issue

Assets 2

All reactions

0.4.15

23 Feb 21:59

MthwRobinson

0.4.15

0d229f0

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.15

Enhancements

Added elements_to_json and elements_from_json for easier serialization/deserialization
convert_to_dict, dict_to_elements and convert_to_csv are now aliases for functions
that use the ISD terminology.

Fixes

Update to ensure all elements are preserved during serialization/deserialization

Assets 2

All reactions

0.4.14

23 Feb 17:25

MthwRobinson

0.4.14

354eff1

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.14

Automatically install nltk models in the tokenize module.

Assets 2

All reactions

0.4.13

23 Feb 05:33

cragwolfe

0.4.13

83f0454

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.13

Fixes unstructured-ingest cli.

Assets 2

All reactions

0.4.12

23 Feb 03:54

cragwolfe

0.4.12

80c0fab

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.12

Adds console_entrypoint for unstructured-ingest, other structure/doc updates related to ingest.
Add parser parameter to partition_html.

Assets 2

All reactions

0.4.11

17 Feb 17:12

MthwRobinson

0.4.11

601f250

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.11

Adds partition_doc for partitioning Word documents in .doc format. Requires libreoffice.
Adds partition_ppt for partitioning PowerPoint documents in .ppt format. Requires libreoffice.

Assets 2

All reactions

0.4.10

16 Feb 17:26

MthwRobinson

0.4.10

f5ff140

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.10

Fixes ElementMetadata so that it's JSON serializable when the filename is a Path object.

Assets 2

All reactions

0.4.9

15 Feb 18:27

MthwRobinson

0.4.9

74e6b84

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.9

Added ingest modules and s3 connector
Default to url=None for partition_pdf and partition_image
Add ability to skip English specific check by setting the UNSTRUCTURED_LANGUAGE env var to "".
Document Element objects now track metadata

Assets 2

All reactions

0.4.8

13 Feb 19:32

MthwRobinson

0.4.8

a920e55

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.8

Modified XML and HTML parsers not to load comments.

Assets 2

All reactions

0.4.7

10 Feb 16:40

MthwRobinson

0.4.7

962de78

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

View all tags

0.4.7

Added the ability to pull an HTML document from a url in partition_html.
Added the the ability to get file summary info from lists of filenames and lists
of file contents.
Added optional page break to partition for .pptx, .pdf, images, and .html files.
Added to_dict method to document elements.
Include more unicode quotes in replace_unicode_quotes.

Assets 2

All reactions

Previous 1 2 … 13 14 15 16 17 Next

Previous Next

Footer

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.