Fixed a bug causing duplicate object IDs #788

willswope · 2018-02-01T23:18:30Z

The ID values for new "Selection" objects were being generated by the line "String.fromCharCode(65 + Math.floor(Math.random() * 26)) + Date.now()", which led to the Auto-detection feature frequently skipping pages in large documents. The unique IDs are now simply generated by converting Math.random() to a string.

jeremybmerrill · 2018-02-01T23:35:48Z

Thank you for your contribution! I'm excited that this may fix the problem you describe. This mechanism for the problem is very interesting... Do you know why the previous version was creating collision for the IDs?

willswope · 2018-02-01T23:59:37Z

Hi Jeremy! The previous code was generating IDs by concatenating a single random alphabetical character together with the system time in milliseconds, so when multiple pages were being processed in less than a millisecond, there was a probability that two pages would have identical IDs.

The bug is easy to replicate; load up a sufficiently large document (in the several hundred pages range), use the "Repeat this selection" or Autodetect feature, and when you scroll through the document a small number of pages will be missing selections. I was processing 300-page long medicare fee schedule PDFs when I first encountered the issue, but any long document does the trick (https://www.novitas-solutions.com/webcenter/portal/MedicareJH/FeeLookup if you want some easy test material).

jeremybmerrill · 2018-02-02T14:57:23Z

Ah, that makes sense. Thank you again for digging into this and submitting the pull request! This may fix #780.

jeremybmerrill · 2018-02-02T14:59:21Z

@jazzido do you remember if there's any reason we chose to do a single random letter plus a timestamp for these IDs? Or more precisely, is there a reason to add any of that back in addition to the larger random number IDs suggested here?

jazzido · 2018-02-02T15:13:49Z

No idea. In any case, Math.random() should be safer than our previous solution.

jeremybmerrill · 2018-02-02T16:30:19Z

Haha, okay cool. Thanks again @willswope!

willswope · 2018-02-02T20:20:52Z

@jeremybmerrill I'm glad I was able to help! Thank you for working on such a great project; it's saved me many hours of work.

Fixed a bug causing duplicate object IDs

jeremybmerrill merged commit 5cdcfa7 into tabulapdf:master Feb 2, 2018

jeremybmerrill added a commit that referenced this pull request Jun 26, 2018

Merge pull request #788 from willswope/master

3ca4abe

Fixed a bug causing duplicate object IDs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed a bug causing duplicate object IDs #788

Fixed a bug causing duplicate object IDs #788

willswope commented Feb 1, 2018

jeremybmerrill commented Feb 1, 2018 •

edited

Loading

willswope commented Feb 1, 2018

jeremybmerrill commented Feb 2, 2018

jeremybmerrill commented Feb 2, 2018

jazzido commented Feb 2, 2018

jeremybmerrill commented Feb 2, 2018

willswope commented Feb 2, 2018

Fixed a bug causing duplicate object IDs #788

Fixed a bug causing duplicate object IDs #788

Conversation

willswope commented Feb 1, 2018

jeremybmerrill commented Feb 1, 2018 • edited Loading

willswope commented Feb 1, 2018

jeremybmerrill commented Feb 2, 2018

jeremybmerrill commented Feb 2, 2018

jazzido commented Feb 2, 2018

jeremybmerrill commented Feb 2, 2018

willswope commented Feb 2, 2018

jeremybmerrill commented Feb 1, 2018 •

edited

Loading