Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved xarray compatibility on function input and output #1353

Closed
wants to merge 14 commits into from
Closed

Improved xarray compatibility on function input and output #1353

wants to merge 14 commits into from

Conversation

jthielen
Copy link
Collaborator

@jthielen jthielen commented Apr 16, 2020

Description Of Changes

The big "xarray compatibility" PR for v1.0. This will consist of the following features:

  • Ability to automatically broadcast input xarray arguments
  • Wrapping function output as a DataArray with the coordinates of one of the inputs
  • Nearly all functions in library returning DataArrays when provided DataArrays
  • Automatic inclusion of grid parameters for kinematics calculations (e.g., dx, dy, f, latitude)
  • Dimension order independence when using DataArrays
  • Stretch Goal: Vectorization as many functions as possible when having DataArrays (hopefully only excluding those where 1D integration is required like parcel calculations using moist_lapse)

Items yet to be completed

  • Choose wrap_like argument for non-kinematics functions
  • Choose broadcast arguments for all functions where useful
  • Verify order of steps in main wrapper preprocess_and_wrap
  • Identify all functions where preprocess_and_wrap doesn't work (e.g., advection) and add custom xarray compatibility
    • If enough exceptions share common behavior (perhaps parcel calculations), can write alternative wrapper
    • After going through, it looks like advection is the only true special case that is worth handling now with custom xarray compatiblity (others seem better deferred)
  • Identify all currently-non-vectorized functions that should be able to be vectorized when leveraging xarray (e.g., get_layer and dependent functions), and write new/alternative implementations to do so? (Update: most deferred, but some easy implementations were made)
  • Add more tests of preprocess_and_wrap with different combinations of parameters
  • Add some form of systematic testing using xarray (perhaps using a simplified dataset based on the Sanders Analytic Model?)
  • New: add tests for new/significantly modified functions (*_as_dataset, all BV functions)
  • Update all docstrings to include xarray.DataArray and not just pint.Quantity?
  • Update the xarray with MetPy tutorial
  • Update any other tutorials where DataArrays are unnecessarily downcast to Quantities

Supersedes #1304.

Checklist

@jthielen jthielen added Type: Enhancement Enhancement to existing functionality Area: Calc Pertains to calculations Type: API Change Changes to how existing functionality works Area: Units Pertains to unit information Area: Xarray Pertains to xarray integration labels Apr 16, 2020
@jthielen jthielen added this to the 1.0 milestone Apr 16, 2020
dopplershift and others added 8 commits May 13, 2020 19:07
CartoPy 0.18 started validating scales for NaturalEarthFeature, which
breaks with the scales for our own map features. There's really no
reason to inherit from NaturalEarthFeature, so just make this inherit
Feature and add a few missing things.
Matplotlib (as of 3.2.2) does not properly clip text to paths. For now,
add a clipbox for the Axes bounding box, which will supplement the set
clip path.
0.18 started autoscaling map features and updated some gridline
handling, causing changes to our test images.
@jthielen jthielen requested a review from kgoebber May 26, 2020 15:11
@jthielen
Copy link
Collaborator Author

jthielen commented May 26, 2020

I've finally gotten back around to this PR! It has been brought up to date with recently implemented (and pending) changes. It is dependent on the unmerged #1378, which in turn dependent on #1372 and #1373. A preliminary review would be great at this point, if possible, but I would also understand if we need to get the dependent PRs and CI failures cleaned up first.

Here's the updated status:

Discussion Needed On

What do we do about:

  • lifted_index? (API doesn't make as much sense in context of xarray, but likely still useful)
  • 1d and grid interpolation functions? (API doesn't make as much sense in context of xarray, but definitely still useful functionality)
  • angle_to_direction? (the one function where output in DataArray may be desired, but can't use Quantities as intermediary because the output is strings, rather than numeric)

Can I defer remaining documentation updates to a follow-up PR so that this can be merged for 1.0.0rc2, and then doc updates added for 1.0.0 final? #1385 will be a big part of this.

Do the two added functions (parcel_profile_with_lcl_as_dataset and isentropic_interpolation_as_dataset) seem reasonable? Those were the ones where I would foresee wanting xarray output, but the existing API of the original functions seemed closely tied to lists of Quantities as output, and I think it would be awfully confusing to sometimes get list of quantities, sometimes Dataset depending on input types.

PR Tasks Remaining

  • Implement custom xarray handling for advection
  • Add tests for new/significantly modified functions
    • parcel_profile_with_lcl_as_dataset
    • isentropic_interpolation_as_dataset
    • all brunt vaisala
  • Add some "representative" tests of xarray wrapped functions (perhaps an example from each calc submodule)

Other Notes

There is a fair collection of functions that, at least as of right now, did not make sense to return xarray DataArrays even when provided them:

  • bunkers_storm_motion
  • bulk_shear
  • critical_angle
  • storm_relative_helicity
  • moist_lapse
  • lcl
  • lfc
  • el
  • parcel_profile_with_lcl
  • cape_cin
  • most_unstable_parcel
  • isentropic_interpolation
  • surface_based_cape_cin
  • most_unstable_cape_cin
  • mixed_layer_cape_cin
  • mixed_parcel
  • mixed_layer
  • thickness_hydrostatic
  • thickness_hydrostatic_from_relative_humidity
  • find_intersections
  • get_layer_heights
  • get_layer
  • lat_lon_grid_deltas
  • grid_deltas_from_dataarray

While I've added notes to the docstrings of all of these (where not already clear), they would probably be worth listing in the xarray & MetPy tutorial (in place of the currently supported list). Is it worth somehow documenting the rationale as to why each doesn't have xarray output? Some (like most parcel calculations) are because they take in 1D profiles and output scalars (suggesting that xarray output may be useful once gridded support is available), others (like thickness_hydrostatic) are because of the non-xarray-friendly nature of get_layer or other non-xarray-friendly APIs (suggesting xarray output may naturally arise once we have an xarray alternative to get_layer and related functions), and still others (lat_lon_grid_deltas and grid_deltas_from_dataarray) are specifically intended to always give Quantity output (and thus will never give xarray output).

Also, there are some functions that could see a improved API with xarray, but I'm not sure if its worth the effort at this point to implement over just have basic compatibility for now (e.g., still having to supply vertical coordinates, axis numbers, etc.):

  • static_stability
  • gradient_richardson_number
  • find_bounding_indicies
  • get_perturbation
  • tke
  • kinematic_flux
  • friction_velocity

@kgoebber
Copy link
Collaborator

Okay, high level take-aways at this point. I've run through all of the GEMPAK comparison tests (so scalar gridded output) and in general we are kosher and being able to only have to put in the main parameters is very nice!

I think your listing above of what to include versus not include is reasonable. In all of the tests that I have done to date with this PR, we've hit the >95% use threshold for gridded calculations giving back an Xarray DataArray. I have a few comments on a couple of the functions, the first being the most important, the rest not solely related to this particular PR.

  • Absolute Vorticity - need to broadcast the latitude for computation of Coriolis parameter. If using GFS output, latitude is only one dimension and will fail on the calculation. This also impacts the potential_vorticity_baroclinic calculation too.

  • Potential Vorticity - The broadcasting works well here with all of the different variables, except for the latitude due to the absolute vorticity. In testing, I overrode the latitude with a keyword argument and then it works as expected.

  • Frontogenesis - correction of deformation and divergence terms appear to be happening OR GEMPAK isn't using the spherical version of those terms for that calculation...more investigation needed.

  • Spherical Terms - Vorticity, Absolute Vorticity, three Deformation equations, and others still need the correction terms included.

  • Laplacian - Were not getting the same answer as GEMPAK (this was true before changes); need further investigation

  • Density - Were close to GEMPAK, but the difference is greater than it should be; need to look into the differing assumptions in those calculations to determine source of error.

@tjwixtrom
Copy link
Contributor

I will add that the implementation of isentropic_interpolation will likely require a re-write of interpolate_1d as this is written in much the same way. Also, the thickness calculations and layer calculations may be good to return as DataArrays to preserve horizontal coordinates. Finally, #1384 has some comments on leveraging Xarray's resampling for TKE and similar functions.

@dopplershift
Copy link
Member

Hey @tjwixtrom ! We've actually been having a lot of discussion around some of these issues and have come to a lot of those same conclusions. We've been doing some of this discussion on bi-weekly calls that I have done a really poor job of advertising more widely. So if you'd like to join in and discuss, our next one will be 1:30 MDT on 11 June on Google Meet.

Happy to have more input because I really want to make sure we get the interfaces for these right--they're important.

@tjwixtrom
Copy link
Contributor

@dopplershift I will try and be there!

@kgoebber
Copy link
Collaborator

kgoebber commented Jun 1, 2020

Okay, update on Vector Calculations...

Main thing is the need of broadcasting the latitudes and/or Coriolis while using DataArrays.

Question: Should we normalize all of the functions (that require Coriolis) to take just latitude (and not have some take latitudes while others take Coriolis parameter) as a way of standardizing? For example, geostrophic_wind and ageostrophic_wind both take f as an input, but inertial_advective_wind takes latitude. Or change all of the latitudes to f would be another option. Either way, I think it might be helpful to set those all to a similar thing if in the end they are all calculating the Coriolis parameter.

The following vector calculations have been tested: geostrophic_wind, 'ageostrophic_wind, q_vector, inertial_advective_wind, gradient, and converting units. All are good, except the inertial_advective_wind` we produce wildly different answers from GEMPAK. I am somewhat convinced that GEMPAK is not doing the cross product - will be digging more into this one. Other than that, use of DataArrays worked well for tested functions, outside of the Coriolis piece mentioned above.

@kgoebber
Copy link
Collaborator

kgoebber commented Jun 8, 2020

Found while cleaning up testing against GEMPAK calculations...

When testing the Laplacian, if I feed it a DataArray, I just get back a numpy array. If I feed it a unit array, then I get unit arrays out.

@jthielen
Copy link
Collaborator Author

jthielen commented Jun 8, 2020

Found while cleaning up testing against GEMPAK calculations...

When testing the Laplacian, if I feed it a DataArray, I just get back a numpy array. If I feed it a unit array, then I get unit arrays out.

Good catch, I suspect something got messed up when I changed how the optional arguments are handled. Does the same issue arise with gradient or not?

@kgoebber
Copy link
Collaborator

kgoebber commented Jun 9, 2020

Yes, gradient currently has the same behavior as Laplacian, where if given a DataArray gives back just a numpy array.

@jthielen
Copy link
Collaborator Author

Closing since I had to re-create my MetPy fork, which means this PR no longer has a valid reference. Will put up a new PR once the new version of #1378 is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Calc Pertains to calculations Area: Units Pertains to unit information Area: Xarray Pertains to xarray integration Type: API Change Changes to how existing functionality works Type: Enhancement Enhancement to existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add standard data object
4 participants