inclusions and exclusions #30

Ebsy · 2018-07-05T07:43:43Z

Hi,
It goes without saying that the exclusions array in the HOCON file is incredibly useful.
Would it also be feasible to specify content areas that should be tested as well as ones that shouldn't?

An array of pages that should be tested would also be great. Possibly also a random selection of pages.

Don't get me wrong, I will also look into the source and see if these features are something I could contribute to the project I just thought I'd ask to see if any work has started on them.

finsterwalder · 2018-07-05T10:23:58Z

Do I understand you correctly:
You want to specify inclusions regions for one page or all pages.
When you do so, only those areas inside the inclusion regions are checked for differences.
Exclusions inside the inclusion regions are still obeyed.
Never thought about this...
I think about whether to put it in a different file or the same file...
I will get back to you...

Ebsy · 2018-07-05T10:51:23Z

Yes exactly.

For example, I often need to compare multi-page PDFs but checking page 1, 3 and 5 is enough (out of 100 pages)

config.conf
inclusions: [ { page: [1, 3, 5] // or 'rand' would be great. } ]

As for the areas, many of the pdfs have variable data surrounding the pages (e.g. barcodes, service lines etc.) these aren't necessary to compare just the content in the 'centre' so specifying one content box would be easier. Right now (since yesterday ;)) I simply add exclusion boxes for these variable elements.

finsterwalder · 2018-07-05T14:37:16Z

What do you mean with "rand"?
The current workaround is to create enough exclusions, of course.
But I can see how inclusions could make it easier for those situations.

Ebsy · 2018-07-05T15:06:02Z

rand being a random page. or 'rand(6)' to pick 6 random pages to compare.

finsterwalder · 2018-07-05T15:17:41Z

What sense does it make to compare random pages?

Ebsy · 2018-07-05T21:06:45Z

well, wouldn't it be quicker to just compare a subset of pages rather than the entire document? At least in theory?

finsterwalder · 2018-07-05T21:26:30Z

Quicker, yes. But you are only comparing a subset, so you are loosing confidence.
When comparing only random pages, you also loose reproducability. Your test fail randomly as well.
A very bad trade-off when you ask me...

Ebsy · 2018-07-06T07:27:33Z

When one is dealing with hundreds of thousands of pages spread over hundreds of documents it's impracticable from a time/resources point of view to compare each and every one (outside of a dev/test environment) if a small subset is compared (and the specific pages reported in the output) then the test is, of course, reproducible and 'spot checks' could be inserted into to the production workflow without delaying the process too much.

At the end of the day, it's just a feature idea and not a deal-breaker!

finsterwalder referenced this issue Jul 17, 2018

Allow to add Exclusions via API

f55b2c8

finsterwalder added the enhancement label Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inclusions and exclusions #30

inclusions and exclusions #30

Ebsy commented Jul 5, 2018

finsterwalder commented Jul 5, 2018

Ebsy commented Jul 5, 2018

finsterwalder commented Jul 5, 2018

Ebsy commented Jul 5, 2018

finsterwalder commented Jul 5, 2018

Ebsy commented Jul 5, 2018

finsterwalder commented Jul 5, 2018

Ebsy commented Jul 6, 2018

inclusions and exclusions #30

inclusions and exclusions #30

Comments

Ebsy commented Jul 5, 2018

finsterwalder commented Jul 5, 2018

Ebsy commented Jul 5, 2018

finsterwalder commented Jul 5, 2018

Ebsy commented Jul 5, 2018

finsterwalder commented Jul 5, 2018

Ebsy commented Jul 5, 2018

finsterwalder commented Jul 5, 2018

Ebsy commented Jul 6, 2018