Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrades to profileApply #113

Merged
merged 4 commits into from
Jan 20, 2020
Merged

upgrades to profileApply #113

merged 4 commits into from
Jan 20, 2020

Conversation

brownag
Copy link
Member

@brownag brownag commented Jan 19, 2020

scaling up profileApply (#112)

profileApply operation now scales in execution time linearly rather than ~quadratically with increasing number of pedons in the collection over a range of typical to slightly larger than typical SoilProfileCollection (tens to hundreds of thousands)

Still need to consider optimization on chunk.size -- ~sqrt(length(spcobj)) -- sensitivity analysis suggests current default of 100 is a good all around value for typical use.

Before upgrade:

> devtools::load_all()
Loading aqp
This is aqp 1.18.4
> foo <- do.call('rbind', lapply(as.list(1:2000), random_profile))
> depths(foo) <- id ~ top + bottom
> system.time(profileApply(foo, simpleFunction))
user  system elapsed 
18.893   0.160  19.967 

After "chunkApply" upgrade, 3x faster processing of 2000 pedons.

> devtools::load_all()
Loading aqp
This is aqp 1.18.5
> foo <- do.call('rbind', lapply(as.list(1:2000), random_profile))
> depths(foo) <- id ~ top + bottom
> system.time(profileApply(foo, simpleFunction))
user  system elapsed 
5.528   0.057   5.837 

With frameify = TRUE:

> system.time(foo <- profileApply(foo, simpleFunction, frameify = TRUE))
user  system elapsed 
5.690   0.088   6.073 

Here is definition of above used "simple" function that calculates horizon thickness and proportional thickness and returns an ID'd data.frame.

simpleFunction <- function(p) {
  hz <- horizons(p)
  res <- data.frame(profile_id(p), hzID(p), thk=(hz$bottom - hz$top))
  res$hz_prop <- res$thk / sum(res$thk)
  # optional: return just second and third horizon to test merge result
  #res <- res[2:3,]
  colnames(res) <- c(idname(p), hzidname(p), 'hz_thickness', 'hz_prop')
  return(res)
}

frameify argument (#111)

In the simplest sense, frameify combines list elements from the typical profileApply result into one large table. This presumes that the results from successive runs of FUN return a data.frame with identical columns (or that can otherwise be rbind-ed)

However, SoilProfileCollections contain several pieces of information that are relevant to data management using the site and horizons as abstract representations of soil profile data. Under conditions where the result has a particular unambiguous format, the results can be enhanced to make them easier to use by providing a full set of IDs from the parent SPC. This behavior of frameify is only invoked when result of FUN is data.frame and idname and/or hzidname are present in the data.frame. This also has the constraint that result ID values are at least a subset of IDs from parent SPC. Internal management of idname and hzidname by frameify facilitates taking stock of potentially incomplete results. Merging back to the ID structure fills any gaps in the result with NA, for unambiguous and complete ID sets that are safe to merge back into the parent object (e.g. using site<- or horizons<-) or use in other downstream analysis.

@brownag brownag self-assigned this Jan 19, 2020
@dylanbeaudette dylanbeaudette merged commit e50b8c3 into master Jan 20, 2020
@dylanbeaudette
Copy link
Member

Nice work, this solves a long-standing bottleneck with all functions that rely on profileApply. Thanks.

@brownag brownag deleted the iterators branch February 9, 2020 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants