Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize parameter name for table data inputs #1479

Closed
weiji14 opened this issue Sep 1, 2021 · 16 comments · Fixed by #1565
Closed

Standardize parameter name for table data inputs #1479

weiji14 opened this issue Sep 1, 2021 · 16 comments · Fixed by #1565
Labels
enhancement Improving an existing feature
Milestone

Comments

@weiji14
Copy link
Member

weiji14 commented Sep 1, 2021

Description of the desired feature

Originally posted by @meghanrjones in #1441 (comment)

This is somewhat related - I am not sure if it's worth a deprecation but I think we should decide on whether the parameter for tabular input is data or table for all modules moving forward. Currently, this is the distribution:

data parameter: surface, wiggle, velo, rose, plot3d, plot, contour
table parameter: blockmedian, blockmean, histogram
other: grdtrack (points), text (textfiles)

It would also be worthwhile to set a consistent order for the positional arguments x, y, [z], data, [outfile]

Are you willing to help implement and maintain this feature? Discuss first

@weiji14 weiji14 added question Further information is requested deprecation Deprecating a feature labels Sep 1, 2021
@weiji14
Copy link
Member Author

weiji14 commented Sep 1, 2021

One option to reduce the amount of deprecation notices is to keep two parameters, but with this rule:

  • data - use for the fig.* plotting functions (e.g. fig.plot, fig.contour, etc)
  • table - use for the data processing functions (e.g. blockmean, histogram, etc)

This way, we'll only need to do the renaming for surface ('data' -> 'table'), grdtrack ('points' -> 'table') and text ('textfiles' -> 'data'). I have some reservations with renaming text's 'textfiles' to 'data' since it's a bit non-standard (and need to wait for #1121), but otherwise, what do people think?

@maxrjones
Copy link
Member

One option to reduce the amount of deprecation notices is to keep two parameters, but with this rule:

* `data` - use for the `fig.*` plotting functions (e.g. `fig.plot`, `fig.contour`, etc)

* `table` - use for the data processing functions (e.g. `blockmean`, `histogram`, etc)

This way, we'll only need to do the renaming for surface ('data' -> 'table'), grdtrack ('points' -> 'table') and text ('textfiles' -> 'data'). I have some reservations with renaming text's 'textfiles' to 'data' since it's a bit non-standard (and need to wait for #1121), but otherwise, what do people think?

One complication is that many modules can be plotting or processing (e.g., histogram is mostly plotting but can return statistics with -I). My preference would be to use data moving forward and, if needed, deprecate table to data in blockmean, blockmedian, and histogram. grdtrack and text are non-standard enough for different names in my opinion.

@weiji14 weiji14 mentioned this issue Sep 3, 2021
10 tasks
@weiji14
Copy link
Member Author

weiji14 commented Sep 3, 2021

One complication is that many modules can be plotting or processing (e.g., histogram is mostly plotting but can return statistics with -I).

Those types of dual-purpose functions (e.g. histogram, rose, solar) might end up having two APIs, depending on how we decide at #896. E.g. a fig.histogram for plotting and pygmt.histogram for extracting data.

My preference would be to use data moving forward and, if needed, deprecate table to data in blockmean, blockmedian, and histogram. grdtrack and text are non-standard enough for different names in my opinion.

Ok, sounds like we need a vote. So the options are:

Place your vote on this comment. Only one vote per person 😛

@weiji14 weiji14 pinned this issue Sep 10, 2021
This was referenced Sep 10, 2021
@maxrjones
Copy link
Member

Can we also set the conventional order for the positional arguments as data/table, x, y, [z] or x, y, [z], data/table? My opinion is that this isn't worth deprecating any current syntax, but that #1434, #1429, #1418, #1379, #731 and #1491 should have the same order (if both data/table and x/y[/z] will be supported).

Here's the current count:

@weiji14
Copy link
Member Author

weiji14 commented Sep 15, 2021

Can we also set the conventional order for the positional arguments as data/table, x, y, [z] or x, y, [z], data/table?

Yes, good point. I would say just go with data, x, y, [z], so people can do pygmt.somefunction(data) if data is a pandas.DataFrame/geopandas.GeoDataFrame or any table-like input, similar to other Python-like plotting/processing code like plt.imshow. In this case, we would need to modify:

  • x, y, [z], data/table - (5 current functions) contour, plot3d, wiggle, surface, plot (Used in PRs Wrap nearneighbor #1379

Debating on whether we should go through a deprecation cycle (2 releases) for the x/y/z/data -> data/x/y/z parameter ordering change. Personally I try to always do parameter=arg in my code, but many people just pass in the arguments directly. So maybe best to raise a warning to give people time to switch over 🙃

@weiji14 weiji14 mentioned this issue Sep 15, 2021
5 tasks
@maxrjones
Copy link
Member

Can we also set the conventional order for the positional arguments as data/table, x, y, [z] or x, y, [z], data/table?

Yes, good point. I would say just go with data, x, y, [z], so people can do pygmt.somefunction(data) if data is a pandas.DataFrame/geopandas.GeoDataFrame or any table-like input, similar to other Python-like plotting/processing code like plt.imshow. In this case, we would need to modify:

  • x, y, [z], data/table - (5 current functions) contour, plot3d, wiggle, surface, plot (Used in PRs Wrap nearneighbor #1379

Debating on whether we should go through a deprecation cycle (2 releases) for the x/y/z/data -> data/x/y/z parameter ordering change. Personally I try to always do parameter=arg in my code, but many people just pass in the arguments directly. So maybe best to raise a warning to give people time to switch over 🙃

Sounds good. If we will make the change for the functions that have x/y/z/data, I think we should raise a warning.

@weiji14 weiji14 mentioned this issue Sep 20, 2021
5 tasks
@seisman
Copy link
Member

seisman commented Sep 22, 2021

Just want to confirm that we have reached a consensus that we will use data and the order data, x, y, z for table inputs. If yes, then we can start to update these functions.

If we will make the change for the functions that have x/y/z/data, I think we should raise a warning.

It's unclear to me how to raise a warning for this case. For users who do x=x, y=y, z=z, the parameter order doesn't matter, thus the warning is unnecessary and may be confusing. For users who do x, y, z, how can know they pass arguments directly without giving parameters?

@maxrjones
Copy link
Member

Just want to confirm that we have reached a consensus that we will use data and the order data, x, y, z for table inputs. If yes, then we can start to update these functions.

Yes, I think so.

If we will make the change for the functions that have x/y/z/data, I think we should raise a warning.

It's unclear to me how to raise a warning for this case. For users who do x=x, y=y, z=z, the parameter order doesn't matter, thus the warning is unnecessary and may be confusing. For users who do x, y, z, how can know they pass arguments directly without giving parameters?

If the user passes in x, y, z, then len(args) will be at least 3. If the user passes in x=x, y=y, z=y, then len(args) will be 0, using a decorator similar to deprecate_parameter.

@seisman
Copy link
Member

seisman commented Sep 23, 2021

Here are a list of modules that may need to rename table to data or reorder parameters like data, x, y, z:

@seisman
Copy link
Member

seisman commented Sep 24, 2021

If the user passes in x, y, z, then len(args) will be at least 3. If the user passes in x=x, y=y, z=y, then len(args) will be 0, using a decorator similar to deprecate_parameter.

It looks like a good solution. I'll work on it.

@seisman
Copy link
Member

seisman commented Sep 27, 2021

If the user passes in x, y, z, then len(args) will be at least 3. If the user passes in x=x, y=y, z=y, then len(args) will be 0, using a decorator similar to deprecate_parameter.

I'm rethinking the decorator implemented in #1541.

We're changing the parameter order from x, y, [z], data to data, x, y, [z]. Before making the change, users can pass positional parameters x, y, **kwargs or x, y, z, **kwargs. After making the change, users still can pass positional parameters, but they can only use it in this way data, **kwargs.

Thus, actually, we should raise the warning only if len(args) > 1 (i.e., x, y or x, y, z), for non-plotting functions.

For plotting functions, there is always the self parameter, thus, we should check if len(args) > 2 instead.

@maxrjones
Copy link
Member

If the user passes in x, y, z, then len(args) will be at least 3. If the user passes in x=x, y=y, z=y, then len(args) will be 0, using a decorator similar to deprecate_parameter.

I'm rethinking the decorator implemented in #1541.

We're changing the parameter order from x, y, [z], data to data, x, y, [z]. Before making the change, users can pass positional parameters x, y, **kwargs or x, y, z, **kwargs. After making the change, users still can pass positional parameters, but they can only use it in this way data, **kwargs.

Thus, actually, we should raise the warning only if len(args) > 1 (i.e., x, y or x, y, z), for non-plotting functions.

For plotting functions, there is always the self parameter, thus, we should check if len(args) > 2 instead.

In addition to len(args), we might actually want to check the type or shape of args[0] (for processing) or args[1] (for plotting). For example, in #1546 someone could pass in data, length, azimuth as positional arguments and should not get a warning even though len(args>2).

@seisman
Copy link
Member

seisman commented Sep 27, 2021

For example, in #1546 someone could pass in data, length, azimuth as positional arguments and should not get a warning even though len(args>2).

Users can pass data, data=data, or length=length, azimuth=azimuth, but they can't do data, length, azimuth because we have the data_kind function to make sure that users cannot pass both data and x,y.

@maxrjones
Copy link
Member

For example, in #1546 someone could pass in data, length, azimuth as positional arguments and should not get a warning even though len(args>2).

Users can pass data, data=data, or length=length, azimuth=azimuth, but they can't do data, length, azimuth because we have the data_kind function to make sure that users cannot pass both data and x,y.

OK, that was a bad example. An alternate example for plot: data, None, None, size. I don't know why anyone would want to do this, but they could.

@seisman
Copy link
Member

seisman commented Sep 28, 2021

We're changing the parameter order from x, y, [z], data to data, x, y, [z]. Before making the change, users can pass positional parameters x, y, **kwargs or x, y, z, **kwargs. After making the change, users still can pass positional parameters, but they can only use it in this way data, **kwargs.

Thus, actually, we should raise the warning only if len(args) > 1 (i.e., x, y or x, y, z), for non-plotting functions.

For plotting functions, there is always the self parameter, thus, we should check if len(args) > 2 instead.

I've implemented it in 95fa173 as part of PR #1548.

For example, in #1546 someone could pass in data, length, azimuth as positional arguments and should not get a warning even though len(args>2).

Users can pass data, data=data, or length=length, azimuth=azimuth, but they can't do data, length, azimuth because we have the data_kind function to make sure that users cannot pass both data and x,y.

OK, that was a bad example. An alternate example for plot: data, None, None, size. I don't know why anyone would want to do this, but they could.

I prefer NOT to check these complicated cases:

  1. Although users can pass data, None, None, size, I believe it's very rare. The current decorator only raise a warning, so users script won't stop working.
  2. The decorator is temporary, and likely will be removed after v0.7.0. It's six months if we strictly following the 3-month release cycle.

@seisman
Copy link
Member

seisman commented Oct 3, 2021

After #1565 is merged, all functions will use consistent parameter name data, and consistent parameter order data, x, y, [z]. There are few exceptions (see https://www.pygmt.org/dev/api/index.html):

  • image(self, imagefile, ...)
  • legend(self, spec=None, ...)
  • meca(self, spec, scale, ...)
  • rose(data=None, length=None, azimuth=None, ...)
  • text(textfiles=None, x=None, y=None, ...)
  • grdtrack(points, grid, ...)
  • x2sys_cross(tracks=None, outfile=None, ...)

I think these exceptions are reasonable and don't need more changes. Thus, we can close the issue after #1565 is merged. Agree?

@seisman seisman unpinned this issue Oct 4, 2021
@seisman seisman mentioned this issue Jan 3, 2022
5 tasks
weiji14 added a commit that referenced this issue Mar 12, 2022
weiji14 added a commit that referenced this issue Mar 14, 2022
Wrapping the triangulate function which does
"Delaunay triangulation or Voronoi partitioning
and gridding of Cartesian data".
Original GMT documentation can be found at
https://docs.generic-mapping-tools.org/6.3/triangulate.html.

Aliased outgrid (G), spacing (I), projection (J),
region (R), verbose (V), registration (r).

* Refactor triangulate to use virtualfile_from_data
* Refactor triangulate implementation to use pygmt.io.load_dataarray
* Alias binary(b), nodata(d), find(e), coltypes(f), header(h), incols(i)

* Rename the parameter 'table' to 'data'

As per #1479.

* Refactor test_triangulate to use Table_5_11_mean.xyz instead of tut_ship
* Refactor test_triangulate_with_outgrid to use xr.testing.assert_allclose
* Refactor test_triangulate_input_xyz to use pd.testing.assert_frame_equal

* Implement regular_grid and delaunay_triples staticmethod for triangulate
* Let list inputs to spacing (I) and incols (i) work

Use  I="sequence" and i="sequence_comma".

* Ensure triangulate.delaunay_triples output_type is valid

Must be either one of numpy, pandas or file

* Autocorrect output_type to 'file' if outfile parameter is set
* Allow only str or None inputs to outgrid parameter

Xref #1807

* Use gmt get GMT_TRIANGULATE to check whether Watson or Shewchuk is used
* State that Shewchuk is the default triangulation algorithm

As per GenericMappingTools/gmt#6438

* Actually document the output_type parameter for delaunay_triples

Co-authored-by: Will Schlitzer <schlitzer90@gmail.com>
Co-authored-by: Meghan Jones <meghanj@alum.mit.edu>
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
sixy6e pushed a commit to sixy6e/pygmt that referenced this issue Dec 21, 2022
Wrapping the triangulate function which does
"Delaunay triangulation or Voronoi partitioning
and gridding of Cartesian data".
Original GMT documentation can be found at
https://docs.generic-mapping-tools.org/6.3/triangulate.html.

Aliased outgrid (G), spacing (I), projection (J),
region (R), verbose (V), registration (r).

* Refactor triangulate to use virtualfile_from_data
* Refactor triangulate implementation to use pygmt.io.load_dataarray
* Alias binary(b), nodata(d), find(e), coltypes(f), header(h), incols(i)

* Rename the parameter 'table' to 'data'

As per GenericMappingTools#1479.

* Refactor test_triangulate to use Table_5_11_mean.xyz instead of tut_ship
* Refactor test_triangulate_with_outgrid to use xr.testing.assert_allclose
* Refactor test_triangulate_input_xyz to use pd.testing.assert_frame_equal

* Implement regular_grid and delaunay_triples staticmethod for triangulate
* Let list inputs to spacing (I) and incols (i) work

Use  I="sequence" and i="sequence_comma".

* Ensure triangulate.delaunay_triples output_type is valid

Must be either one of numpy, pandas or file

* Autocorrect output_type to 'file' if outfile parameter is set
* Allow only str or None inputs to outgrid parameter

Xref GenericMappingTools#1807

* Use gmt get GMT_TRIANGULATE to check whether Watson or Shewchuk is used
* State that Shewchuk is the default triangulation algorithm

As per GenericMappingTools/gmt#6438

* Actually document the output_type parameter for delaunay_triples

Co-authored-by: Will Schlitzer <schlitzer90@gmail.com>
Co-authored-by: Meghan Jones <meghanj@alum.mit.edu>
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants