-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(0.5.0) Metadata
for JRA55
#286
base: main
Are you sure you want to change the base?
Conversation
sounds good |
Gotcha! I tried converting and then that changing the default was the way to go. I’m actually bit confused regarding when to change the default. But I’ll drop it from the example and infer the conversion from the exchanger grid! |
This comes from
|
…ean.jl into ss/metadata-for-everything
right, the backend has no effect on the data in the timeseries, but it indicates how this is organized in memory. In main, a way to limit the time series is to pass time_indices = 1:10
JRA55PrescribedAtmosphere(time_indices; kw...) for 10 elements, while in this PR we switch to start_date = DateTime(1990, 1, 1)
end_date = DateTime(1990, 2, 1)
dates = range(start_date, end_date, step = Hour(3))
JRA55PrescribedAtmosphere(; dates, kw...) The source of truth for the dates associated with a particular dataset all_dates(version, name) for example: julia> all_dates(JRA55RepeatYear(), :temperature)
Dates.DateTime("1990-01-01T00:00:00"):Dates.Hour(3):Dates.DateTime("1990-12-31T21:00:00")
julia> all_dates(JRA55RepeatYear(), :river_freshwater_flux)
Dates.DateTime("1990-01-01T00:00:00"):Dates.Day(1):Dates.DateTime("1990-12-31T00:00:00")
julia> all_dates(JRA55MultipleYears(), :pressure)
Dates.DateTime("1958-01-01T00:00:00"):Dates.Hour(3):Dates.DateTime("2021-01-01T00:00:00") This applies also to julia> all_dates(ECCO4Monthly(), :salinity)
Dates.DateTime("1992-01-01T00:00:00"):Dates.Month(1):Dates.DateTime("2023-12-01T00:00:00") that are also constructed with a julia> ECCOFieldTimeSeries(:temperature; dates = DateTime(1993, 1, 1):Month(1):DateTime(1993, 2, 1))
[ Info: Note: ECCO temperature data is in /Users/simonesilvestri/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/ECCO.
[ Info: Note: ECCO temperature data is in /Users/simonesilvestri/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/ECCO.
[ Info: Note: ECCO temperature data is in /Users/simonesilvestri/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/ECCO.
[ Info: Note: ECCO temperature data is in /Users/simonesilvestri/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/ECCO.
[ Info: Note: ECCO temperature data is in /Users/simonesilvestri/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/ECCO.
[ Info: Note: ECCO temperature data is in /Users/simonesilvestri/.julia/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/ECCO.
720×360×50×2 FieldTimeSeries{ClimaOcean.DataWrangling.ECCO.ECCONetCDFBackend} located at (Center, Center, Center) on Oceananigans.Architectures.CPU
├── grid: 720×360×50 LatitudeLongitudeGrid{Float32, Oceananigans.Grids.Periodic, Oceananigans.Grids.Bounded, Oceananigans.Grids.Bounded} on Oceananigans.Architectures.CPU with 7×7×3 halo and with precomputed metrics
├── indices: (:, :, :)
├── time_indexing: Cyclical(period=5.3568e6)
├── backend: ECCONetCDFBackend(1, 2)
└── data: 734×374×56×2 OffsetArray(::Array{Float32, 4}, -6:727, -6:367, -2:53, 1:2) with eltype Float32 with indices -6:727×-6:367×-2:53×1:2
└── max=31.2508, min=-1.98588, mean=3.33469 Before merging I can write more details in the description of the PR and I am for changes if suggestes |
Gotcha. So at the papa example, I did this:
What would be a better way to do it in
? |
Ah, I didn't realize this was looping over the while timeseries. I guess we could do something like start_date = DateTime(1990, 1, 1)
end_date = DateTime(1990, 1, 31)
dates = range(start_date, end_date, step = Hour(3)) # 3 hours is the frequency of JRA55 data
atmosphere = JRA55PrescribedAtmosphere(longitude = λ★,
latitude = φ★,
dates = dates) another option would be version = JRA55RepeatYear()
native_dates = all_dates(version)
simulation_days = 31
snapshots_per_day = Hour(24) / native_dates.step # corresponding to JRA55's 3-hour frequency
time_indices = 1 : simulation_days * snapshots_per_day
dates = native_dates[time_indices]
atmosphere = JRA55PrescribedAtmosphere(longitude = λ★,
latitude = φ★,
version = version,
dates = dates) or maybe this is a bit better version = JRA55RepeatYear()
native_dates = all_dates(version)
end_date_index = findfirst(x -> x == DateTime(1990, 1, 31), native_dates) # We end after 31 days
atmosphere = JRA55PrescribedAtmosphere(longitude = λ★,
latitude = φ★,
version = version,
dates = dates[1:end_date_index]) I am also open to changing the name of the function |
We could also think about extending the interface to pass |
Why aren't we using @navidcy the size of the data in memory is displayed at the top of the show, eg julia> atmosphere = JRA55PrescribedAtmosphere(longitude = λ★,
latitude = φ★,
backend = JRA55NetCDFBackend(2))
2×2×1×2920 PrescribedAtmosphere{Float32} on LatitudeLongitudeGrid:
├── times: 2920-element StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}
├── surface_layer_height: 10.0
└── boundary_layer_height: 600.0 Note, the time-dimension of the in-memory data (given by the 4th element in the size) can differ from the length of |
@simone-silvestri I think that using By the way, I am still worried that the design of |
ok, I ll put here the Then when we merge it I can open a new PR that changes |
I see how this is a consequence. But what specifically is the downside? The crux of figuring out the best way to develop this abstraction is to understand this specific trade-off, so we have to articulate the pros and cons clearly. |
I think the difference is whether we want In the first case, it is nice to be able to represent a dataset composed of a version, a name, and a set of dates in a type (here we can probably go the In the latter case, there is no problem with the user interface, |
Ok, please clarify what you see as the trade-offs for user interface. I think you are assuming a design, but it is not being explicitly state. I can't respond or judge what user interface you are referring to, unless you state it explicitly. Part of the problem is the proposal to define someting like struct Metadatum # represents a single file
# properties
end
const Metadata = Vector{Metadatum} has no specific implications for how The difference is mainly that we could define a new constructor for a single So in summary i don't see what changes would be required of the user interface. The difference is that we can expand the user interface to make more sense while leaving existing components unchanged. |
Here's a simple example, just one of many possibilities # source
Metadata(name; version, dates) = [Metadatum(name; version, date for date in dates] datum = Metadatum(name; version, date)
data = Metadata(name; version, dates) the idea of a "version" for general Metadata is weird to me by the way. What concept are we expressing here. |
I was referring to achieving something similar avoiding a Metadatum
name
version
date
dir
end
@propagate_inbounds Base.getindex(m::Metadata, i::Int) = Metadatum(m.name, m.dates[i], m.version, m.dir)
@propagate_inbounds Base.first(m::Metadata) = Metadatum(m.name, m.dates[1], m.version, m.dir)
@propagate_inbounds Base.last(m::Metadata) = Metadatum(m.name, m.dates[end], m.version, m.dir)
@inline function Base.iterate(m::Metadata, i=1)
if (i % UInt) - 1 < length(m)
return Metadatum(m.name, m.dates[i], m.version, m.dir), i + 1
else
return nothing
end
end also in this way it would be possible to do datum = Metadatum(name; version, date)
data = Metadata(name; version, dates) |
I guess I was seeing the ability to mix "versions" as a perk. For example JRA55 was originally generated up to 2018, but there are updates which continue the dataset past that. You may not be able to form a single consistent dataset (eg a single "version") that encompasses all dates, but it still could be valid to write something like up_to_2018 = Metadata(name; dataset=original_dataset, dates=dates_till_2018)
past_2018 = Metadata(name; dataset=continuation, dates=dates_past_2018)
data = vcat(up_to_2018, past_2018) by the way what does "version" mean in the context of a general |
I guess version is more suited in main where Metadata is You are right that that field name has to change in this PR, I like |
eg struct ECCODataset
version :: Symbol
end am I right that the path to a particular file is determined by a combination of |
yep, in general the full file path is determined by the whole |
Ruminating on "dataset" --- I think it could be a good term, because it expresses the concept of a "category of data". Which is what we mean here, a single Also just to offer an alternative --- rather than metadatum / metadata, we could have struct Metadata
name
dataset
end
const MetadataSeries = Vector{Metadata} semantically, MetadataSeries is easier to distinguish from |
This PR is an initial proposal to generalize
ECCOMetadata
toMetadata
and rework the JRA55 module to useMetadata
. In this way, we can have different JRA55 versions (repeat year and multiple year) and we can define adownload_dataset
function to download the dataset independently of using JRA55 as we can do for ECCOThis PR also removes the ability to generate a
JRA55FieldTimeSeries
directly interpolated on the ocean grid, since we need to interpolate anyways when we compute fluxescloses #182