-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul mechanism for passing data by duplication or reference #3532
Conversation
Not a bug but a "feature": subregion was not implemented for this import yet. So passing a grid as a matrix is the most common way for pyGMT to feed grids to gmt modules? If so we should make sure this works. |
Yes, PyGMT has two ways to pass a grid to GMT: (1) a file name, e.g., |
Thanks, I will have a look at the api_import_grid and api_export_grid functions for GMT_VIA_MATRIX. |
Wrap the `grdcut` function. Ideally it should take a grid file or a `xarray.DataArray` as input, and store output as a grid file or return a `xarray.DataArray`. Supported features: - [X] input: grid file; output: grid file - [X] input: grid file; output: `xarray.DataArray` - [ ] input: `xarray.DataArray`; output: grid file - [ ] input: `xarray.DataArray`; output: `xarray.DataArray` Passing `xarray.DataArray` to `grdcut` is not implenmented in the GMT 6.0.0. See GenericMappingTools/gmt#3532 for a feature request to the upstream GMT.
* Enable matrix subset requests See #3535 for context. This PR allows us to get a subset grid from an input matrix. * Update gmt_api.c * Update api.rst
OK, I think the first part of #3532 (GMT_IS_DUPLICATE|GMT_VIA_MATRIX) works. Passes your test. I noticed that test img/imgmap.sh fails with some tiny changes to the top image. In that case img2grd passes a grid to grdsample but it surely is not a matrix. Another thing did change though and that is when the user does pass a region (like your initial case), the code that did this only did this:
but now it also sets the region flag so it can actually be used to do subsets:
So I think the failure is harmless and just reflects the fact that this variable is true and it takes a slightly different path through the code related to BCs. Thus this works for you via pyGMT then I think this is solved. |
Discovered this comment. I was wondering why your GMT_IS_REFERENCE ends up being GMT_IS_DUPLICATE:
So I take that to mean the downstream was not implemented yet so we switch. I will loosen this and see what happens and if I can fix it. |
Very close to finalizing the reading via GMT_IS_REFERENCE. I already had tests to check that your matrix is in the same row-format as GMT grids, and that the storage unit is float. Letting that pass through and it works. It implies a zero pad size since matrices don't have pads (that was the first problem). The shrinking to a subset and turning the rest to a temporary pad works fine, and the subset is written to file correctly. What was stumping me for so long (the above comment) was undoing the padding before exiting. It forgot to check that it was a real grid and not a matrix. So we got some garbage pointer and crash. Now no longer crash, but apiusergrid.sh and apigrdpix.sh fail, probably because of grdimage plots of grids with no pad. Will trace through. |
The reason the two tests fail is this: Since days immemorial, when gmt writes a grid, the netcdf code that Florian wrote must strip off the pad. So since we "created" a pad to form a subset, when we write it out the grid is first shuffled to remove the pad, then we write the 2x2 subset out. Since the shuffling is done on the original read-only coord array, after grdcut returns the coord array is messed up. We then plot it a second time and it fails. So, what shall we do?
|
For me In MEX and Julia one has to already make a copy because of the transposition issue imposed by the different memory layouts, but that copy is made by GMT itself so it owns the memory and can change it at will. I have not yet explored the passing by matrix and and make use of the grid memory layouts but would like to that one day. So, if the idea is to save memory (easier from Python because it uses the same C memory layout as GMT), I clearly prefer option 2. The padding is beast that we could perhaps kill in more cases. Maybe except grid projections the other operations that require BC could be done without a grid pad. The solutions would require doing the operations that require BC in steps. First step process all the array but the two outer row/columns, then going around the four edges and instead of having the padding zone in memory use mirror of the outer 2 row/cols. Don't know if this would work for project but if we could get rid of the padding several things would simplify. |
Learning more every day about GMT code I wrote. I've been trying to make sure all the testapi stuff passes. Take GMT_Open_VirtualFile. We can pass direction GMT_IN/GMT_OUT and we can add GMT_IS_REFERENCE if we want to. But that is only partly listened to, and from the comments I pasted earlier (from 2018) I bypassed that and changed to GMT_IS_DUPLICATE for the matrix cases. However, DATASET got passed through as GMT_IS_REFERENCE. Now that I am fixing this so that it should always be GMT_IS_DUPLICATE for everything unless you give GMT_IS_REFERENCE (and then it only is allowed if, say, your matrix is float since grid is float), then I got errors because GMT_IS_DUPLICATE for GMT_IS_DATASET had not been implemented. OK, so fixed that and a bunch of new failures stopped. Then I got some when grids were passed around, like in grdfilter when we do highpass and must call grdsample. The Open_VirtualFile call did not pass GMT_IS_REFERENCE, but internally that just got changed to GMT_IS_REFERENCE anyway. While this worked, that is clearly either in response to a bug or bad implementation. I think we agree that if you don't pass GMT_IS_REFERENCE you don't want to be second-guessed inside GMT_Open_VirtualFile. So this case is perfect for actually adding GMT_IS_REFERENCE, since otherwise it fails for GMT_IS_DUPLICATE. I will trace to see why, but after than I will add GMT_IS_REFERENCE to those places that implicitly expected it. It is messy. |
I believe I have finalized this commit. Boy, lots of stuff, where to start:
Please give it a spin, including Python and Julia (I know @joa-quim is busy but I dont think you are affected by any of the above). |
Believe I cannot do #3528 until this is OK'ed. |
Tested this branch with PyGMT. If PyGMT uses If I understand you correctly, using |
Yes, GMT_IN means "we are going to duplicate what you give us and work on that instead". But it would be good to understand under what circumstance something crashes, so if you can trace that down it would be helpful. |
I don't have time to write a C code. Here is the Python code which crashes.
The Python code is equivalent to command line Here is the debug message:
|
Well, the new files will fix this I think. The xarray read results in -179.5/179.5/0.5/89.5 but that should now be -180/180/-90/90 I think. |
It still crashes, no matter the |
OK, if I find some spare time I will write a new testapi_* to read a grid and pass it as reference and see what I can learn. |
Meanwhile, I think this branch is sound and can be merged, no? |
Yes, good to me. Since I opened this PR, you need to approve it. |
It seems this PR breaks PyGMT. Passing an xarray to grdimage gives the following error:
|
OK, what is the command-line equivalent? With our without shading (-I). And this is GMT_IN only? |
PyGMT code:
command line:
|
Yes, so this is a good lesson and plays into @joa-quim's annoyance with the pads. Since you xarray has no pads, and since you pass by REFERENCE, GMT just takes the matrix as the grid. But the algorithm for map projection, resampling etc expects there to be a pad, and since there is not you get that error. Again, solving this is difficult during the read stage since we dont check what module is calling us. If we did, and knew that module will need the pad, we could switch to DUPLICATE and you would get no error. I suspect this is true for grdview, grdimage, grdproject, grdsample, grdtrack. Maybe others. I think rewriting GMT to not have the pads is way too much work (@joa-quim's suggestion) since we would take something that works and break it up. I think programmer time is much more expensive than RAM. So duplicating internally if we have to is clearly the cheapest solution. I will try to make a list of programs where padding is essential. Not all programs care and you would only get that message from those that do. |
Only grdimage calls gmt_grd_project which gave that message. I wonder what happens when you do grdtrack with a line that crosses outside. Does that give any grief? |
Perhaps related to this comment #3532 (comment) |
OK, can recreate this with testapi_matrix_plot by setting Create_Session pad to 0. Otherwise it works. Do you set pad to zero?
This works fine |
Instead of giving the no pad error I now simply duplicate the grid and add a 2-pad and use that grid instead inside gmt_grd_project and that works fine. PR coming up. |
I think PyGMT uses 2 ( |
I thought so too. So unsure why the testapi_matrix_plot with pad = 2 works fine while pad = 0 recreates the error you found. I tried -JW as well, no difference. Anyway, just running tests now with the new scheme. I think this is better: Instead lf throwing error we do what we can do to make it work - no questions asked. |
Wrap the `grdcut` function. Ideally, `grdcut` should take a grid file or a `xarray.DataArray` as input, and store output as a grid file or return a `xarray.DataArray`. Currently supported features: - [X] input: grid file; output: grid file - [X] input: grid file; output: `xarray.DataArray` - [ ] input: `xarray.DataArray`; output: grid file - [ ] input: `xarray.DataArray`; output: `xarray.DataArray` Passing `xarray.DataArray` to `grdcut` is not implemented in GMT 6.0.0, only is only possible for GMT>=6.1.0. See GenericMappingTools/gmt#3532 for the feature request to the upstream GMT. Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
One more issue. It seems some grid header information are lost (e.g., variable names for x, y and z). Information of raw data:
Grid information of command line
Grid Information of the equivalent PyGMT
|
It seems impossible to get the correct variables names, since when passing xarray data to GMT, the variable names are not passed to GMT. |
Yes, and worse, we do not recognized the grid as geographic because no names like longitude. Presumably, the xarray container has the names and the problem is that GMT_MATRIX has no space for them? if so, then we could add some more text variables for this. Then, when we read GRID VIA MATRIX we can copy those into the Grid header and things should work. What you think? Adding variables is backwards compatible (removing not so much...). |
PyGMT can pass
xarray has the names, perhaps PyGMT should convert xarray to GMT_GRID instead? |
That might be an even better solution, if that can be arranged. |
The
Why does grdtrack check if an output file exist? Lines 294 to 300 in ed50fc5
|
GMT_Get_FilePath, for GMT_OUT, only checks that a file name is given (if not, an error) then returns:
But if @ridge.txt cannot be found it will print that same message, but from gmt_remote.. However, unclear how it can print output.txt. Does pygmt.grdtrack translate outfile="output.txt" to the right argument? For command line that would be ->output.txt |
This is the
|
Here is the verbose message. It seems the error message is given after running the
|
How can we debug pyGMT? When debugging GMTMEX we start Matlab, then tell XCode to connect to process Matlab, then run the matlab script and Xcode stops at the point we said. Are you running pyGMT in a Notebook or a terminal running Python? Maybe it is possible to do the same? OK write that above then your latest came in. So seems like it continues and runs successfully? There is only two places where the messate "Cannot find file " is printed:
|
Yes, gmt_download_file_if_not_found could be called for output, so I am adding an if test on direction around it [in gmtapi_next_io_source]. Running tests before committing. Hopefully this makes it go away. |
Did you confirm that this PR did the trick with the example with output.txt? |
Yes. PyGMT now works well with the GMT master branch, using either |
@PaulWessel Here is another potential bug when calling the C API.
The code
src/testapi_matrix_as_grid.c
pass a 3x3 matrix/grid to GMT. I want to call grdcut to extract a subregion (-R0/1/0/1
, i.e., a 2x2 grid) from it. However, the output grid is still 3x3.