Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PI / summary merge can break down #11

Closed
jdpye opened this issue Jun 5, 2024 · 3 comments
Closed

PI / summary merge can break down #11

jdpye opened this issue Jun 5, 2024 · 3 comments

Comments

@jdpye
Copy link

jdpye commented Jun 5, 2024

Got a fun error when processing all the OTN NSBS data (open, attached)

> otndo::make_tag_push_summary('nsbs_matched_detections_all.csv')
ℹ Asking OTN GeoServer for project information...Writing report...


processing file: make_tag_push_summary.qmd
  |.....................                     |  49% [station-summary-table]    
Quitting from lines  at lines 309-356 [station-summary-table] (make_tag_push_summary.qmd)
Error in `vecseq()`:
! Join results in 1729 rows; more than 1464 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
Backtrace:
 1. base::merge(station_summary, pis[, .(detectedby, PI)], by = "detectedby")
 2. data.table::merge.data.table(...)
 4. data.table::`[.data.table`(...)
 5. data.table:::vecseq(...)
                                                                                                                      
Execution halted

Looks like there are more results in the PI/station_summary aggregation than there are in the data file. Could this be a problem with what our assumptions about uniqueness are for station names within a project, or is there another issue for us to chase down? Willing to bet it's a source data foible.

Source file was too big to attach, i'll ship it to you via Slack.

@mhpob mhpob closed this as completed in 4f645a0 Jun 5, 2024
@mhpob
Copy link
Owner

mhpob commented Jun 5, 2024

HFX project had two instances of PI metadata: one before and one after addition of a PI. This caused the one-to-many match error.

                                                                                                           contact_pi detectedby
                                                                                                               <char>     <char>
1:                                       Dave Hebert (david.hebert@dfo-mpo.gc.ca), Fred Whoriskey (fwhoriskey@dal.ca)        HFX
2: Dave Hebert (david.hebert@dfo-mpo.gc.ca), Robert Lennox (robert.lennox@dal.ca), Fred Whoriskey (fwhoriskey@dal.ca)        HFX

The fix checks if multiple instances of PI information exist per project and, if so, combines those instances, extracts unique character elements, and replaces the PI/POC/PI_emails/POC_emails with the combined instance.

@jdpye
Copy link
Author

jdpye commented Jun 6, 2024

That did the trick. Appreciated, Mike!

Changing PIs with time is a tricky proposition!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants