Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#5265 - Extract paragraph structure from PDF files #5266

Merged
merged 3 commits into from
Jan 28, 2025

Conversation

reckart
Copy link
Member

@reckart reckart commented Jan 27, 2025

What's in the PR

  • Remove some unused legacy classes
  • Set up a basic HTML structure in the CASes extracted from PDF files based on the paragraph detection from pdfbox

How to test manually

  • Import PDF
  • Switch to Apache Annotator in the annotation editor

Automatic testing

  • PR includes unit tests

Documentation

  • PR updates documentation

@reckart reckart added this to the 36.0 milestone Jan 27, 2025
@reckart reckart self-assigned this Jan 27, 2025
- Remove some unused legacy classes
- Set up a basic HTML structure in the CASes extracted from PDF files based on the paragraph detection from pdfbox
@reckart reckart force-pushed the feature/5265-Extract-paragraph-structure-from-PDF-files branch from 7a3ad3e to 483c81f Compare January 27, 2025 21:33
@reckart reckart merged commit 3d779a5 into main Jan 28, 2025
3 checks passed
@reckart reckart deleted the feature/5265-Extract-paragraph-structure-from-PDF-files branch January 28, 2025 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🍹 Done
Development

Successfully merging this pull request may close these issues.

1 participant