Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tabula-java allow specifying multiple areas #213

Conversation

asheeshrana
Copy link

@asheeshrana asheeshrana commented Mar 12, 2018

1942-update-tabula-java-for-muti-column-pdf (PolicyReporter/requests/#1942)

  • Allow multiple occurrences of -a parameter
  • Allow -a parameter to accept % values as well as absolute values
  • Add few test cases
  • Add test file

Misc Notes for reviewer:

  • Added a new class Pair. Created only a very basic implementation. Could be replaced in future by apache-commons-lang
  • Areas are stored in list of "Pairs" so that tables are extracted in the same order as specified in the commandline
  • Haven't put any limits on values for % option as specifying length/width which is bigger than the pdf page dimensions is allowed even now with absolute co-ordinates.

This change is Reviewable

Asheesh Rana added 2 commits March 12, 2018 12:54
…#1942)

- Allow multiple occurrences of -a parameter
- Allow -a parameter to accept % values as well as absolute values
- Add test cases
- Add test files
return getArea(area, mode);
}

public Page getArea(Rectangle area, int mode) {
Copy link
Contributor

@jazzido jazzido Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather leave this method as it was, and make the relative→absolute conversion in CommandLineApp.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The orginal method is there as is... I didn't modify it. I simply added another method.
Additionally based on the comment at the top of the file...

// TODO: this class should probably be called "PageArea" or something like that

it seemed to me that Page.java is the place where all area calculations should be housed and hence I decided the add that method.

I can anyways move the calculations out of Page.java... but just want a quick confirmation from you again if possible please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I misread and thought you modified the original method. My bad.

In any case, I still feel that the conversion belongs in CommandLineApp.

if (pageArea != null) {
page = page.getArea(pageArea);
if (pageAreas != null) {
for (Pair<Integer, Rectangle> areaPair : pageAreas) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean here. You have access to the Page objects, so you can do the conversion at this point.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree... tried to explain the thinking behind why I put the calculations the Page.java as opposed to CommandLineApp.java file... Please do confirm once more before I send the updated code.

PS: I apologize for asking to reconfirm

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation is under the other review comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No apologies needed!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes done and committed.

@jazzido
Copy link
Contributor

jazzido commented Mar 12, 2018

This is awesome, @asheeshrana — Thanks a lot! The tests are very much appreciated :)

I've requested a couple minor changes, let me know what you think.

- Moved the constants out of page class
@jazzido jazzido merged commit 460ae29 into tabulapdf:master Mar 15, 2018
@jazzido
Copy link
Contributor

jazzido commented Mar 15, 2018

Merged! Thanks a lot, @asheeshrana !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants