-
Notifications
You must be signed in to change notification settings - Fork 822
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: default to None for the languages metadata field (#1743)
### Summary Closes #1714 Changes the default value for `languages` to `None` for elements that don't have text or the language can't be detected. ### Testing ``` from unstructured.partition.auto import partition filename = "example-docs/handbook-1p.docx" elements = partition(filename=filename, detect_language_per_element=True) # PageBreak elements don't have text and will be collected here none_langs = [element for element in elements if element.metadata.languages is None] none_langs[0].text ``` --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: Coniferish <Coniferish@users.noreply.github.com> Co-authored-by: cragwolfe <crag@unstructured.io>
- Loading branch information
1 parent
d0c84d6
commit 6d7fe3a
Showing
8 changed files
with
63 additions
and
121 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
138 changes: 28 additions & 110 deletions
138
...tured_ingest/expected-structured-output/local-single-file/UDHR_first_article_all.txt.json
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters