Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add future bioclim from CHELSA #24

Closed
wants to merge 19 commits into from
Closed

Add future bioclim from CHELSA #24

wants to merge 19 commits into from

Conversation

tpoisot
Copy link
Contributor

@tpoisot tpoisot commented Mar 3, 2021

See #18 - I am not super set on the interface, but I added a number of types for models and RCPs. This is mostly because I expect the models and RCP to pop up at different places, so we can use them to e.g. build URLs.

@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 4, 2021

As an FYI (maybe for @rafaqz to comment on), an alternative way to specify these models would be Future{BioClim, RCP26, CCSM4, 2050} or something like this. The issue is getting the info out of the type parameters...

@rafaqz
Copy link
Member

rafaqz commented Mar 4, 2021

That sounds good to me, although dates are an argument for other sources. You can add a helper method to get the type params, like

_model(::Type{<:Future{M,X}}) where M = M

The other option is we switch to objects, which just adds () brackets to the syntax everywhere, but also lets us use constructors and fields.

@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 4, 2021

I see your point about date - they have a slightly different meaning in climate scenarios, but we can keep them as an argument.

I don't think we need to add fields and constructors and all of that to the package - types work for what we do, and if we parameterize something as Future{BioClim, RCP26, CCSM4}, it's fine (for me). I'll edit my PR.

@rafaqz
Copy link
Member

rafaqz commented Mar 4, 2021

Sorry should it maybe be CHELSA{Future{RCP26,CSM24}} ??

Or are the multiple kinds of Future data? (I havent really read it)

@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 4, 2021

CHELSA has future raw data as well, like worldclim. An alternative would be CHELSA{FutureBioclim{RCP26, CCSM4}}} - but that's a little more unwieldy?

@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 4, 2021

Your point made me think about the fact that we can have future data from multiple things (e.g. landcover, species richness, etc). What I've done is separate the FutureClimate from the CHELSA{BioClim} part, so now the call is something like:

getraster(CHELSA{BioClim}, FutureClimate{CCSM4,RCP26}, 5)

and the date is an optional argument on top - I think it makes more sense as it keeps the same idea of sources within datasets, and the "future" aspect is just a complication on top.

@tpoisot tpoisot requested a review from rafaqz March 4, 2021 15:50
@tpoisot tpoisot marked this pull request as ready for review March 4, 2021 15:50
@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 4, 2021

I'm concurrently workin on PoisotLab/SimpleSDMLayers.jl#86 - this makes the API super easy to write. I am now fully ❤️ about the central raster location even for small files.

@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 5, 2021

So WorldClim has finally released the future BioClim data, and the code is ~ ready, but their zip files seem to be corrupted? I can see the contents, but can't extract them. Anyways as soon as I figure it out, we'll also get the scenarios for tmin, tmax, and prec, as they all follow the same naming convention.

Copy link
Member

@rafaqz rafaqz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is great.

src/types/futures.jl Outdated Show resolved Hide resolved
struct RCP85 <: RCP end

"""
Abstract type for Shared Socio-economic Pathways (SSPs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indented type as first line of docstring

src/types/futures.jl Show resolved Hide resolved
struct SSP585 <: SSP end

"""
A ClimateScenario can be a RCP or a SSP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a comment?

src/types/futures.jl Outdated Show resolved Hide resolved
export BCCCSM2MR, CNRMCM61, CNRMESM21, CanESM5, GFDLESM4, IPSLCM6ALR, MIROCES2L, MIROC6, MRIESM2

# Future datasets
export FutureClimate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just Future ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have FutureLandscape or FutureBiodiversity at some point. I'm trying to future-proof the package

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you can specify Landscape as the internal type, like Future{Landscape}, as with Future{BioClim}.

export SSP126, SSP245, SSP370, SSP585

# CC models from CMIP5 (used in CHELSA)
export ACCESS1, BNUESM, CCSM4, CESM1BGC, CESM1CAM5, CMCCCMS, CMCCCM, CNRMCM5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about these being symbols instead? to reduce the exports? I was thinking of switching ALWB to symbols too - but it does require users typing one more confusing character (:), and a big elseif block... not sure anyway, just flagging this because I've been thinking about which is best to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dispatching on values is awkward - and it also requires the big elseif (like I have to do for years), so it's overall less maintainable. I think the exports are not too bad, since the names are really quite specific and it's unlikely that users would have variables like that.

The alternative is... a sub-module named FutureClimate and we call FutureClimate.CanEMS5? It seems a bit more verbose than we need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, there are for and against for both.

return raster_path
end

rasterpath(T::Type{<:WorldClim{BioClim}}, ::Type{F}) where {F <: FutureClimate} = joinpath(rasterpath(T), "Future", _format(WorldClim, _scenario(F)), _format(WorldClim, _model(F)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need to use where, although we should recombine these types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a suggestion about how to recombine them?

Copy link
Member

@rafaqz rafaqz Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about WorldClim{Future{BioClim}} ? reads like a modifier to BioClim, which it kind of is, and lets you have WorldClim{Future{Climate}} if they have the monthly climate data as well.

@@ -6,6 +6,7 @@ include("worldclim-weather.jl")
include("earthenv-landcover.jl")
include("earthenv-heterogeneity.jl")
include("chelsa-bioclim.jl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests for WorldClim?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because the zip files don't seem to contain any data... this is weird and I need to dig into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok no worries, that sounds annoying. What kind of zip is it?

You could ould also move WorldClim Future to another PR if you want to get CHELSA merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might do that. Let me think about the types a bit more, and I'll get back with some infos tomorrow.

Without a layer argument, all layers will be downloaded, and a tuple of paths is returned.
If the data is already downloaded the path will be returned.
"""
function getraster(T::Type{WorldClim{BioClim}}, ::Type{F}, layer::Integer, date=Year(2050); res::String=defres(T)) where {F <: FutureClimate}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again probably better if this is a single type argument to keep things standardised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be a big type though - do you have an idea of what the type would look like? I ended up picking this syntax because it makes sense to me from a semantic point of view (we want the data, and then we want them in the future).

Copy link
Member

@rafaqz rafaqz Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Its not the best, but I think method signature consistency is overall more important.

So:

getraster(T::Type{WorldClim{Future{BioClim}}}, layer::Integer, date=Year(2050); res::String=defres(T))

And this is actually less characters...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also makes writing the GeoData.jl wrapper easier, it's one function for future bioclim and bioclim, instead of 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have a few counter-arguments for that.

  1. I think it makes sense to separate two distinct elements, i.e. the type of data we want (bioclim from CHELSA) to the specific (no argument: extant data, argument: future data).
  2. I also think it makes sense because we can define the "future" part independently from the data source - multiple data sources will have future scenarios, and we want to be able to specify them correctly
  3. the type signature you give as an example isn't actually shorter in practice because there is not a valid default model for the future climate, so the user still needs to specify all of it when getting the data
  4. I get that this makes the GeoData wrapper simpler to write, but it's not going to be a majority of datasets at the moment
  5. I think having the data source contains sometimes a dataset and sometimes a future data specification is actually less consistent for the user - I agree that the date should be kept outside of it, tho

If you think of it in terms of path, as well, I think it would make sense that CHELSA{BioClim}, Future{X,Y} goes into CHELSA/BioClim/Future/X_Y... - there is an implicit semantics to the types that should also be present in the folder hierarchy.

Almost all functions of this type in Julia accept multiple inputs, I think that as long as it is properly documented, it would make things less confusing for users. An example is if you want to change from extant to future data: with a second type argument, you simply need to add it to the function call. With nested types, you need to go into the type and edit it. I'd rather pay the cost of writing more methods for the datasets who require it than have an akward typing, especially since we have multiple dispatch for this reason.

Copy link
Member

@rafaqz rafaqz Mar 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear where you are coming from here, so maybe I'll explain myself a little more. A lot of the work I did to this package was to introduce strict stardardiations. Partly for users, but mostly for devs and for interop and reducing workload downstream. Not just in GeoData.jl but every package and script that builds on this in future. This is the only way I can maintain 20 or so julia packages - simplicity and strict standardisations. It also helps everyone else building on this in future.

This PR means now instead of one consistent method signature, this package has 2. I'm really strongly against that happening for only aesthetic reasons. There is no technical reason 3 args is better than 2 here.

  1. I think it makes sense to separate two distinct elements, i.e. the type of data we want (bioclim from CHELSA) to the specific (no argument: extant data, argument: future data).

This could be argued either way.

  1. I also think it makes sense because we can define the "future" part independently from the data source - multiple data sources will have future scenarios, and we want to be able to specify them correctly

This is possible with Future how I defined it? That generality was the point of removing the ...Climate part... You can add as many params for it as you want, and use however many you need to in each case? The trailing params are just ignored.

  1. the type signature you give as an example isn't actually shorter in practice because there is not a valid default model for the future climate, so the user still needs to specify all of it when getting the data

Its still shorter, because future is just Future, not FutureClimate, and you have an extra space and comma. So 2 chars at minimum ;)

  1. I get that this makes the GeoData wrapper simpler to write, but it's not going to be a majority of datasets at the moment

Not just geodata, ever wrapper anyone wants to write. Now there are 2 methods instead of 1. Downstream packages and scripts shouldn't know or care if its bioclim or future bioclim being passed through. We need to consider ecosystem consequences.

  1. I think having the data source contains sometimes a dataset and sometimes a future data specification is actually less consistent for the user - I agree that the date should be kept outside of it, tho

I dont understand this point... but again its an aesthetic choice not a technical one. The user consistency I mean is argument number and placement, which this complicates.

How I organised this package was that besides source and layer all other args should be exposed as kw arguments, so as not to require more methods everywhere downstream. It can still be improved, but I think its fairly simple currently. But I think we should push to standardise it even more.

I can edit this into the PR, im just mostly here from my phone and busy currently.

@codecov-io
Copy link

Codecov Report

Merging #24 (5b00747) into master (5ba48c2) will decrease coverage by 18.30%.
The diff coverage is 25.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #24       +/-   ##
===========================================
- Coverage   77.51%   59.20%   -18.31%     
===========================================
  Files          13       16        +3     
  Lines         249      326       +77     
===========================================
  Hits          193      193               
- Misses         56      133       +77     
Impacted Files Coverage Δ
src/RasterDataSources.jl 100.00% <ø> (ø)
src/chelsa/bioclim.jl 100.00% <ø> (+8.33%) ⬆️
src/worldclim/shared.jl 17.24% <0.00%> (-82.76%) ⬇️
src/chelsa/shared.jl 10.00% <10.00%> (ø)
src/chelsa/futures.jl 100.00% <100.00%> (ø)
src/types/futures.jl 100.00% <100.00%> (ø)
src/awap/awap.jl 46.15% <0.00%> (-42.31%) ⬇️
src/geodata.jl 50.98% <0.00%> (-13.73%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ba48c2...5b00747. Read the comment docs.

@tpoisot tpoisot requested a review from rafaqz March 6, 2021 23:52
@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 6, 2021

@rafaqz I've made some changes - I disagree with your rationale and design decision, but respect your lead on the project. I will let you edit the PR as you see fit, as you seem to have a very set idea of how to express this.

Feel free to merge when you have setup the types in a way that matches your vision of how it should be done, my work on this branch is done at this point. When this is merged, I'll move on to #25 .

@rafaqz
Copy link
Member

rafaqz commented Mar 7, 2021

Great. I was thinking even if you really wanted 3 args I would still define a 2 arg version as well, instead of propagating that complexity into the ecosystem.

I honestly dont mind the syntax either way. But if they are not all the same we objectively create downstream work. This decision determines how many methods and tests will be needed in packages and scripts that use getraster.

That concern its central to all of my code and allways my first concern, hopefully you will come to see the sense in it ;)

@rafaqz
Copy link
Member

rafaqz commented Mar 7, 2021

Also I'll try to lay out my reasoning about package design more clearly in the docs.

@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 8, 2021

I think I'll keep the two args version downstream (i.e. in SimpleSDMLayers), it should be an easy "translation". If the assumption is that this package is going to be less user-facing as it's going to be used in other things, then a single arg is probably OK.

@rafaqz
Copy link
Member

rafaqz commented Mar 13, 2021

I will let you edit this PR as you see fit

I missed this. It probably means this PR wont be merged for quite a while, and is kind of unusual.

@tpoisot
Copy link
Contributor Author

tpoisot commented Mar 24, 2021

I will let you edit this PR as you see fit

I missed this. It probably means this PR wont be merged for quite a while, and is kind of unusual.

Just getting back to old issues - what I meant is, I'd rather let you do this one so I have no doubt about how you want to organize things. I've been suggesting a way you don't like, and I am OK with that. I'm used to finishing PRs from people when I'm really specific about how something should be done.

This is kinda blocking development on SimpleSDMLayers, but this isn't urgent. Worse case scenario I will add a temporary way to get the bioclim future data there while you decide how to implement this one.

@rafaqz
Copy link
Member

rafaqz commented Mar 24, 2021

Ok no problem. I usually finish PRs incorporating requested changes, especially if I need them for something.

Also I dont have to decide how to implement it, its straightforward and clear - use two arguments like the other sources.

@rafaqz
Copy link
Member

rafaqz commented Mar 24, 2021

How I would do it is something like this, pretty much as expressed earlier:

struct Future{A,B,C} end

function getraster(T::Type{Chelsa{Future{BioClim,B,C}}}, layer; kw...) where {B,C}
        path = rasterpath(T, layer)
        url = rasterurl(T, layer)
        # etc
end

So that:

source = CHELSA{Future{BioCLim,RCP26,CSM24}}
layer = 7
path = getraster(source, layer)

I'm on the road travelling and also trying to finish a methods paper and all the packages that it relies on. I won't need this functionality for a long time, so I wouldn't be able to prioritise getting this finished and tested to the point of merging.

rafaqz added a commit that referenced this pull request Sep 10, 2021
Refactor CHELSA future from #24, add climate and CMIP6
@rafaqz rafaqz closed this Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants