Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve false-positive Title elements on Chinese text #3836

Merged
merged 6 commits into from
Dec 18, 2024

Conversation

scanny
Copy link
Collaborator

@scanny scanny commented Dec 17, 2024

Summary
Improve element-type mapping for Chinese text. Fixes bug where Chinese text would produce large numbers of false-positive Title elements.

Fixes #3084

@scanny scanny changed the title chore: bump CHANGELOG + __version__ fix: improve false-positive Title elements on Chinese text Dec 17, 2024
@scanny scanny force-pushed the scanny/fix-3084-chinese-titles branch from d2b0d80 to e5a3459 Compare December 17, 2024 18:40
@scanny scanny force-pushed the scanny/fix-3084-chinese-titles branch from 3f5ab19 to 6119550 Compare December 17, 2024 19:52
scanny and others added 3 commits December 17, 2024 13:57
This pull request includes updated ingest test fixtures.
Please review and merge if appropriate.

Co-authored-by: scanny <scanny@users.noreply.github.com>
@scanny scanny force-pushed the scanny/fix-3084-chinese-titles branch from d170e2f to 915bceb Compare December 17, 2024 21:58
This pull request includes updated ingest test fixtures.
Please review and merge if appropriate.

Co-authored-by: scanny <scanny@users.noreply.github.com>
@scanny scanny force-pushed the scanny/fix-3084-chinese-titles branch from 4082123 to df4ff7a Compare December 17, 2024 23:33
@scanny scanny added this pull request to the merge queue Dec 18, 2024
Merged via the queue into main with commit 9ece0b5 Dec 18, 2024
41 checks passed
@scanny scanny deleted the scanny/fix-3084-chinese-titles branch December 18, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug/<Compatibility Issue with Chinese Text in Document Parsing>
3 participants