Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INNER or LEFT joins for component parent material and geomorph data #31

Closed
dylanbeaudette opened this issue Sep 25, 2017 · 2 comments · Fixed by #289
Closed

INNER or LEFT joins for component parent material and geomorph data #31

dylanbeaudette opened this issue Sep 25, 2017 · 2 comments · Fixed by #289
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions

Comments

@dylanbeaudette
Copy link
Member

dylanbeaudette commented Sep 25, 2017

Functions affected:

  • get_component_cogeomorph_data_from_NASIS_db() -- commit 8cb1201
  • get_component_copm_data_from_NASIS_db() -- commit 94aa357

INNER JOINS implicitly filter NULL records, LEFT JOINS retain them.

For example, before is based on the LEFT JOIN, after is based on the INNER JOIN:

image

@brownag brownag added the NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions label Jan 16, 2021
@brownag
Copy link
Member

brownag commented Jul 19, 2021

INNER JOIN is appropriate here if we do not want to return NULL values for components that lack data populated for child geomorph/pm tables.

LEFT JOIN will give a full list of component IDs and identifies the absence of the child tables in the result by filling in with empty rows (contain only coiid and up).

The question here is:
Should get_...*() methods return results that contain "complete" sets of parent table record IDs? Or should they only return records for IDs that have data? I generally find it easier to think about results that are 1:1 with one of the canonical parent tables e.g. component.

@dylanbeaudette
Copy link
Member Author

TL/DR: I'm not as concerned about this as much as I might have been in 2017.

Historically (2012--2015) Jay and I designed most of the fetchNASIS / fetchPedonPC stuff to use LEFT JOINS for 1:1 relationships with something close to the "parent" table. At the time we were doing a lot more of the work in the database, with only a couple of follow-up plyr::join statements in R. A lot has changed since then, and we are doing all kinds of splicing-in of data outside of the DB. Conceptually, both approaches are appealing: leaner results with the INNER JOIN (safely merged in R anyway) and the simplicity of a 1:1 relationship.

Flip a coin?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants