-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pdfbox2.0 #146
Pdfbox2.0 #146
Conversation
org.apache.pdfbox.examples.util.RemoveAllText
- Temporally set height
… string Add a test writer two tables for CSV output
Add more csv tests
# Conflicts: # src/test/resources/technology/tabula/json/schools.json # src/test/resources/technology/tabula/json/spanning_cells.json # src/test/resources/technology/tabula/json/spanning_cells_basic.json # src/test/resources/technology/tabula/json/twotables.json
add more tests
Yayyy!
Jeremy B. Merrill
Sent from my mobile device
…On Mar 8, 2017 12:37 PM, "Melisa Bok" ***@***.***> wrote:
------------------------------
You can view, comment on, or merge this pull request online at:
#146
Commit Summary
- Fix TextElement creation
- fix tabs
- Use the code from LegacyPDFStreamEngine to create the TextElements
- Fix removeText function using the example:
- close the document
- close removed text document
- fix array serialization
- add spanning cells test with CSV format
- - Remove capheight calculation
- Test writer two tables checking the json result object instead of
the string
- Fix pageTransform when there is a rotation
- fix path iterator
- update json tests
- Merge branch 'master' into pdfbox2.0
- update json outputs
- upgrade pdfbox version
- back to the old implementation and catch the
IndexOutOfBoundsException
- Remove hardcoded code
- Remove more hardcoded code
- test all the elements of the detected table
- Change the expected table top value
- Increase the threshold factor to support a greater headings
- Fix rectangle comparator.
- fix wrong expected column size, 5 instead of 6.
- update expected table, more spaces are expected to respect the
alingment.
- when the text value has length > 1, clean the spaces.
- clean code
- remove stackstrace
- Merge remote-tracking branch 'upstream/pdfbox2.0' into pdfbox2.0
- add log error
File Changes
- *M* pom.xml
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-0> (4)
- *M* src/main/java/technology/tabula/ObjectExtractorStreamEngine.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-1> (541)
- *M* src/main/java/technology/tabula/TextChunk.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-2> (9)
- *M* src/main/java/technology/tabula/Utils.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-3> (2)
- *M* src/main/java/technology/tabula/detectors/
NurminenDetectionAlgorithm.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-4> (72)
- *M* src/main/java/technology/tabula/writers/JSONWriter.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-5> (8)
- *M* src/test/java/technology/tabula/TestBasicExtractor.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-6> (48)
- *M* src/test/java/technology/tabula/TestCommandLineApp.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-7> (13)
- *M* src/test/java/technology/tabula/TestSpreadsheetExtractor.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-8> (62)
- *M* src/test/java/technology/tabula/TestWriters.java
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-9> (46)
- *A* src/test/resources/technology/tabula/csv/frx_2012_disclosure.csv
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-10> (90)
- *M* src/test/resources/technology/tabula/csv/indictb1h_14.csv
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-11> (72)
- *A* src/test/resources/technology/tabula/csv/schools.csv
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-12> (45)
- *A* src/test/resources/technology/tabula/csv/spanning_cells.csv
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-13> (25)
- *A* src/test/resources/technology/tabula/csv/twotables.csv
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-14> (32)
- *A* src/test/resources/technology/tabula/csv/us-020.csv
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-15> (50)
- *M* src/test/resources/technology/tabula/icdar2013-dataset/
competition-dataset-eu/eu-027-reg.xml
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-16> (2)
- *M* src/test/resources/technology/tabula/json/schools.json
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-17> (3)
- *M* src/test/resources/technology/tabula/json/spanning_cells.json
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-18> (2)
- *M* src/test/resources/technology/tabula/json/spanning_cells_
basic.json
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-19> (2)
- *M* src/test/resources/technology/tabula/json/twotables.json
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-20> (2)
- *A* src/test/resources/technology/tabula/us-020.pdf
<https://github.com/tabulapdf/tabula-java/pull/146/files#diff-21> (0)
Patch Links:
- https://github.com/tabulapdf/tabula-java/pull/146.patch
- https://github.com/tabulapdf/tabula-java/pull/146.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#146>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAhdmq4HoHp5IMhvXAAItiNBb0RPlEKGks5rjudxgaJpZM4MXFWS>
.
|
Yayy indeed! I've been looking and playing with @melisabok's branch as she was working on it, and everything looks great. I'll review this PR in the coming days, merge to |
Hey @melisabok, I was wondering if you could take a quick look at this branch. From a clean clone, I'm getting a bunch of failed tests. Not sure what is going on. As far as I remember, the tests were passing. Thanks! |
@melisabok, additional datapoint: all test pass in your |
@jazzido I did a clean clone of the branch: https://github.com/tabulapdf/tabula-java/tree/pdfbox2.0 And I'm getting this result:
that was my start point of my branch. |
Apologies, @melisabok. This was a total brain fart on my part. I did not merge your PR into our Sorry! |
No description provided.