Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inclusions and exclusions #30

Open
Ebsy opened this issue Jul 5, 2018 · 8 comments
Open

inclusions and exclusions #30

Ebsy opened this issue Jul 5, 2018 · 8 comments

Comments

@Ebsy
Copy link

Ebsy commented Jul 5, 2018

Hi,
It goes without saying that the exclusions array in the HOCON file is incredibly useful.
Would it also be feasible to specify content areas that should be tested as well as ones that shouldn't?

An array of pages that should be tested would also be great. Possibly also a random selection of pages.

Don't get me wrong, I will also look into the source and see if these features are something I could contribute to the project I just thought I'd ask to see if any work has started on them.

@finsterwalder
Copy link
Collaborator

Do I understand you correctly:
You want to specify inclusions regions for one page or all pages.
When you do so, only those areas inside the inclusion regions are checked for differences.
Exclusions inside the inclusion regions are still obeyed.
Never thought about this...
I think about whether to put it in a different file or the same file...
I will get back to you...

@Ebsy
Copy link
Author

Ebsy commented Jul 5, 2018

Yes exactly.

For example, I often need to compare multi-page PDFs but checking page 1, 3 and 5 is enough (out of 100 pages)

config.conf
inclusions: [ { page: [1, 3, 5] // or 'rand' would be great. } ]

As for the areas, many of the pdfs have variable data surrounding the pages (e.g. barcodes, service lines etc.) these aren't necessary to compare just the content in the 'centre' so specifying one content box would be easier. Right now (since yesterday ;)) I simply add exclusion boxes for these variable elements.

@finsterwalder
Copy link
Collaborator

What do you mean with "rand"?
The current workaround is to create enough exclusions, of course.
But I can see how inclusions could make it easier for those situations.

@Ebsy
Copy link
Author

Ebsy commented Jul 5, 2018

rand being a random page. or 'rand(6)' to pick 6 random pages to compare.

@finsterwalder
Copy link
Collaborator

What sense does it make to compare random pages?

@Ebsy
Copy link
Author

Ebsy commented Jul 5, 2018

well, wouldn't it be quicker to just compare a subset of pages rather than the entire document? At least in theory?

@finsterwalder
Copy link
Collaborator

Quicker, yes. But you are only comparing a subset, so you are loosing confidence.
When comparing only random pages, you also loose reproducability. Your test fail randomly as well.
A very bad trade-off when you ask me...

@Ebsy
Copy link
Author

Ebsy commented Jul 6, 2018

When one is dealing with hundreds of thousands of pages spread over hundreds of documents it's impracticable from a time/resources point of view to compare each and every one (outside of a dev/test environment) if a small subset is compared (and the specific pages reported in the output) then the test is, of course, reproducible and 'spot checks' could be inserted into to the production workflow without delaying the process too much.

At the end of the day, it's just a feature idea and not a deal-breaker!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants