Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul mechanism for passing data by duplication or reference #3532

Merged
merged 8 commits into from
Jun 27, 2020

Conversation

seisman
Copy link
Member

@seisman seisman commented Jun 24, 2020

@PaulWessel Here is another potential bug when calling the C API.

The code src/testapi_matrix_as_grid.c pass a 3x3 matrix/grid to GMT. I want to call grdcut to extract a subregion (-R0/1/0/1, i.e., a 2x2 grid) from it. However, the output grid is still 3x3.

@PaulWessel
Copy link
Member

Not a bug but a "feature": subregion was not implemented for this import yet. So passing a grid as a matrix is the most common way for pyGMT to feed grids to gmt modules? If so we should make sure this works.

@seisman
Copy link
Member Author

seisman commented Jun 24, 2020

passing a grid as a matrix is the most common way for pyGMT to feed grids to gmt modules?

Yes, PyGMT has two ways to pass a grid to GMT: (1) a file name, e.g., @earth_relief_01m.grd; (2) an xarray.DataArray, i.e., a matrix with some attributes. Passing an xarray.DataArray matrix is the common way for PyGMT, which works well for grdimage, grdview, and some other implemented modules in PyGMT, but not for grdcut.

@PaulWessel
Copy link
Member

Thanks, I will have a look at the api_import_grid and api_export_grid functions for GMT_VIA_MATRIX.

seisman added a commit to GenericMappingTools/pygmt that referenced this pull request Jun 24, 2020
Wrap the `grdcut` function.

Ideally it should take a grid file or a `xarray.DataArray` as input, and
store output as a grid file or return a `xarray.DataArray`.

Supported features:

- [X] input: grid file; output: grid file
- [X] input: grid file; output: `xarray.DataArray`
- [ ] input: `xarray.DataArray`; output: grid file
- [ ] input: `xarray.DataArray`; output: `xarray.DataArray`

Passing `xarray.DataArray` to `grdcut` is not implenmented in the
GMT 6.0.0. See GenericMappingTools/gmt#3532 for
a feature request to the upstream GMT.
@seisman seisman mentioned this pull request Jun 24, 2020
9 tasks
* Enable matrix subset requests

See #3535 for context.  This PR allows us to get a subset grid from an input matrix.

* Update gmt_api.c

* Update api.rst
@PaulWessel
Copy link
Member

OK, I think the first part of #3532 (GMT_IS_DUPLICATE|GMT_VIA_MATRIX) works. Passes your test. I noticed that test img/imgmap.sh fails with some tiny changes to the top image. In that case img2grd passes a grid to grdsample but it surely is not a matrix. Another thing did change though and that is when the user does pass a region (like your initial case), the code that did this only did this:

		if (!full_region (wesn)) {	/* Must update subset selection */
			int item;
			if ((item = gmtlib_validate_id (API, family, in_ID, GMT_IN, GMT_NOTSET)) == GMT_NOTSET) {
				if (input) gmt_M_str_free (input);
				return_null (API, API->error);
			}
			gmt_M_memcpy (API->object[item]->wesn, wesn, 4, double);
		}

but now it also sets the region flag so it can actually be used to do subsets:

		if (!full_region (wesn)) {	/* Must update subset selection */
			int item;
			if ((item = gmtlib_validate_id (API, family, in_ID, GMT_IN, GMT_NOTSET)) == GMT_NOTSET) {
				if (input) gmt_M_str_free (input);
				return_null (API, API->error);
			}
			gmt_M_memcpy (API->object[item]->wesn, wesn, 4, double);
			API->object[item]->region = true;
		}

So I think the failure is harmless and just reflects the fact that this variable is true and it takes a slightly different path through the code related to BCs. Thus this works for you via pyGMT then I think this is solved.

@PaulWessel
Copy link
Member

Discovered this comment. I was wondering why your GMT_IS_REFERENCE ends up being GMT_IS_DUPLICATE:

/* We want to pass a matrix from the outside as a grid.  If it is a float matrix then should be possible to pass
 * as is but the test below changes REF to DUPLICATE.  Without this we crash.  Need to either explain why it cannot
 * work that way or make changes.  As it stands, it seems all REF methods are converted to DUPLICATE. PW 6/6/2018 */

So I take that to mean the downstream was not implemented yet so we switch. I will loosen this and see what happens and if I can fix it.

@PaulWessel
Copy link
Member

Very close to finalizing the reading via GMT_IS_REFERENCE. I already had tests to check that your matrix is in the same row-format as GMT grids, and that the storage unit is float. Letting that pass through and it works. It implies a zero pad size since matrices don't have pads (that was the first problem). The shrinking to a subset and turning the rest to a temporary pad works fine, and the subset is written to file correctly. What was stumping me for so long (the above comment) was undoing the padding before exiting. It forgot to check that it was a real grid and not a matrix. So we got some garbage pointer and crash. Now no longer crash, but apiusergrid.sh and apigrdpix.sh fail, probably because of grdimage plots of grids with no pad. Will trace through.

@PaulWessel
Copy link
Member

The reason the two tests fail is this: Since days immemorial, when gmt writes a grid, the netcdf code that Florian wrote must strip off the pad. So since we "created" a pad to form a subset, when we write it out the grid is first shuffled to remove the pad, then we write the 2x2 subset out. Since the shuffling is done on the original read-only coord array, after grdcut returns the coord array is messed up. We then plot it a second time and it fails.
There is more: There may be a scale/offset applied to fit into a certain format, there is likely a flip up/down call as well, and all of that is done on the array passed to GMT_Write_Data. When the nc file is written we are too deep in the stack to know that this array should be read only. However, it might be possible to do something about this higher in GMT_Write_Data, but that would mean duplication and then the point of passing by reference goes away.
So either we are OK with some operations changing the values of the array and we pass ref to save memory, or we insist the array should not be changed in any way and then we are back to requiring a copy. Finally, the shuffling/squeezing is only applied in the netcdf format. If I add =bf to the filename it writes a native binary (old-style) GMT grid and the plot passes. However, I am not sure if there are not other places in GMT where a module may change the values in the input grid. I know in a few places (grdgradient comes to mind) we do check and if read-only we duplicate a grid. So maybe this problem is restricted to the netcdf packing.

So, what shall we do?

  1. Add checks in modules that may change the data array so that a read-only array is never changed.
  2. Document in the API that passing a pointer to memory outside GMT may lead to changes in those arrays as the price to pay for reusing memory. If keeping things unchanged the user may as well not add GMT_IS_REFERENCE.
  3. Just add a check in GMT_Write_Data so if we are writing a netcdf file we insist on copying the array first.

What do you think, @seisman and @joa-quim ?

@joa-quim
Copy link
Member

For me In MEX and Julia one has to already make a copy because of the transposition issue imposed by the different memory layouts, but that copy is made by GMT itself so it owns the memory and can change it at will. I have not yet explored the passing by matrix and and make use of the grid memory layouts but would like to that one day. So, if the idea is to save memory (easier from Python because it uses the same C memory layout as GMT), I clearly prefer option 2.

The padding is beast that we could perhaps kill in more cases. Maybe except grid projections the other operations that require BC could be done without a grid pad. The solutions would require doing the operations that require BC in steps. First step process all the array but the two outer row/columns, then going around the four edges and instead of having the padding zone in memory use mirror of the outer 2 row/cols. Don't know if this would work for project but if we could get rid of the padding several things would simplify.

@PaulWessel
Copy link
Member

PaulWessel commented Jun 26, 2020

Learning more every day about GMT code I wrote. I've been trying to make sure all the testapi stuff passes. Take GMT_Open_VirtualFile. We can pass direction GMT_IN/GMT_OUT and we can add GMT_IS_REFERENCE if we want to. But that is only partly listened to, and from the comments I pasted earlier (from 2018) I bypassed that and changed to GMT_IS_DUPLICATE for the matrix cases. However, DATASET got passed through as GMT_IS_REFERENCE. Now that I am fixing this so that it should always be GMT_IS_DUPLICATE for everything unless you give GMT_IS_REFERENCE (and then it only is allowed if, say, your matrix is float since grid is float), then I got errors because GMT_IS_DUPLICATE for GMT_IS_DATASET had not been implemented. OK, so fixed that and a bunch of new failures stopped. Then I got some when grids were passed around, like in grdfilter when we do highpass and must call grdsample. The Open_VirtualFile call did not pass GMT_IS_REFERENCE, but internally that just got changed to GMT_IS_REFERENCE anyway. While this worked, that is clearly either in response to a bug or bad implementation. I think we agree that if you don't pass GMT_IS_REFERENCE you don't want to be second-guessed inside GMT_Open_VirtualFile. So this case is perfect for actually adding GMT_IS_REFERENCE, since otherwise it fails for GMT_IS_DUPLICATE. I will trace to see why, but after than I will add GMT_IS_REFERENCE to those places that implicitly expected it. It is messy.

@PaulWessel
Copy link
Member

I believe I have finalized this commit. Boy, lots of stuff, where to start:

  1. Now, when you open a virtual files on an object, it will consistently be duplicated internally, unless you add GMT_IS_REFERENCE to GMT_IN. I have updated the API documentation on that. This means that most places inside GMT modules where we need to pass something we create in one module into another, we do so by REFERENCE since we created it in the first place. External users (pyGMT, Julia, etc) who use GMT_Open_VirtualFile can safely call it without GMT_IS_REFERENCE and there is no corruption of their original data. If they know that passing their data into gmt is safe (i.e., it does not matter if values change), they can pass GMT_IS_REFERENCE|GMT_IN. If it cannot be done (e.g., passing a double matrix as a grid), we detect and revert to duplicate anyway. Finally, when you register a grid from memory as a new virtual file, we must reset any trace of the original being a subset; now done.
  2. Most of the files changes simply have a GMT_IN|GMT_IS_REFERENCE instead of just GMT_IN since we know what we are doing inside the modules.
  3. The biggest changes are in gmt_api.c (in addition to GMT_Open_VirtualFIle as described). The detour into GMT_IS_DUPLICATE pointed out a bug in GMT_Read_Data for grid via duplicate as we duplicated the full array when only the header was requested (this lead to ugly DEBUG memory leak messages). I also found that we tried to pass back the pointer to a memory grid without checking that GMT_IS_REFERENCE is in effect (if GMT_IS_DUPLICATE we must duplicate...) Then it is the original problem that started this PR: Reading subset of a matrix passed as a grid. This is tested and works for me in the testapi*.c programs, but need @seisman to confirm from the Python side. Oh, and duplicating from a dataset in rec-by-rec reading failed since there was no check for GMT_IS_DUPLICATE in that situation.
  4. Finally, at the end of the day I had to update 3 PostScript files: ex34.ps, highpass.ps, and imgmap.ps. I have vague memories of going back and forth with these in the past. The differences seem tiny and I spent a good deal of time trying to figure out why they are differnt around the padding. Looks all fine to me and all tests pass (well, the ones who do in master).

Please give it a spin, including Python and Julia (I know @joa-quim is busy but I dont think you are affected by any of the above).

@PaulWessel
Copy link
Member

Believe I cannot do #3528 until this is OK'ed.

@seisman
Copy link
Member Author

seisman commented Jun 26, 2020

Tested this branch with PyGMT. If PyGMT uses GMT_IN|GMT_IS_REFERENCE, some tests crashes. If PyGMT uses GMT_IN, all tests look good.

If I understand you correctly, using GMT_IN is the simplest and safest way for PyGMT, right?

@PaulWessel
Copy link
Member

Yes, GMT_IN means "we are going to duplicate what you give us and work on that instead". But it would be good to understand under what circumstance something crashes, so if you can trace that down it would be helpful.

@PaulWessel PaulWessel changed the title WIP: Call grdcut in external programs Overhaul mechanism for passing data by duplication or reference Jun 26, 2020
@seisman
Copy link
Member Author

seisman commented Jun 26, 2020

I don't have time to write a C code. Here is the Python code which crashes.

import pygmt
import xarray as xr

grid = xr.open_dataarray("output.nc")
pygmt.grdtrack("@ridge.txt", grid, outfile="output.txt", V='d')

The Python code is equivalent to command line gmt grdtrack @ridge.txt -Goutput.nc > output.txt, except that PyGMT first read the grid file into memory, then pass it to GMT. output.nc is a subset of the 01d earth relief data, generated by gmt grdcut @earth_relief_01d -R-180/180/0/90 -Goutput.nc.

Here is the debug message:

grdtrack [DEBUG]: Remote file @ridge.txt exists locally as /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Replace file @ridge.txt with /Users/seisman/.gmt/cache/ridge.txt
grdtrack [INFORMATION]: Processing input grid(s)
grdtrack [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
grdtrack [DEBUG]: gmtapi_import_grid: Passed ID = 0 and mode = 1
grdtrack [DEBUG]: Grid/Image dimensions imply w/e/s/n = -179.5/179.5/0.5/89.5, inc = 1/1, gridline registration, n_layers = 1
grdtrack [INFORMATION]: Cartesian input grid
grdtrack [DEBUG]: Chosen boundary condition for all edges: natural
grdtrack [INFORMATION]: Cartesian input grid, yet x cells span exactly 360 and -90 <= y <= 90.
grdtrack [INFORMATION]:      To make sure the grid is recognized as geographical and global, use the -fg option
grdtrack [DEBUG]: Object ID 1 : Registered Grid Memory Reference 7fc1f9ac3ed0 as an Input resource with geometry Surface [n_objects = 2]
grdtrack [DEBUG]: Successfully created a new Grid container
grdtrack [DEBUG]: Set_Object for family: 1
==> 2 API Objects at end of GMT_Create_Data
--------------------------------------------------------
K.. ID RESOURCE.... FAMILY.... ACTUAL.... DIR... S O M L
--------------------------------------------------------
* 0  0 7fc1f9164af0 Grid       Matrix     Input  0 Y N 0
* 1  1 7fc1f9ac3ed0 Grid       Grid       Input  0 Y N 1
--------------------------------------------------------
grdtrack [DEBUG]: GMT_End_IO: Input resource access is now disabled
grdtrack [DEBUG]: Set_Object for family: 1
==> 2 API Objects at end of GMT_Read_Data
--------------------------------------------------------
K.. ID RESOURCE.... FAMILY.... ACTUAL.... DIR... S O M L
--------------------------------------------------------
* 0  0 7fc1f9164af0 Grid       Matrix     Input  0 Y N 0
* 1  1 7fc1f9ac3ed0 Grid       Grid       Input  0 Y N 1
--------------------------------------------------------
grdtrack [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
grdtrack [DEBUG]: gmtapi_import_grid: Passed ID = 0 and mode = 2
grdtrack [INFORMATION]: Referencing grid data from user memory location
grdtrack [DEBUG]: GMT_End_IO: Input resource access is now disabled
grdtrack [DEBUG]: Set_Object for family: 1
==> 2 API Objects at end of GMT_Read_Data
--------------------------------------------------------
K.. ID RESOURCE.... FAMILY.... ACTUAL.... DIR... S O M L
--------------------------------------------------------
* 0  0 7fc1f9164af0 Grid       Matrix     Input  2 Y N 0
* 1  1 7fc1f9ac3ed0 Grid       Grid       Input  2 Y N 1
--------------------------------------------------------
grdtrack [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Point
grdtrack [DEBUG]: gmt_get_filename: In: /Users/seisman/.gmt/cache/ridge.txt Out: /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Given full path to file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: gmt_get_filename: In: /Users/seisman/.gmt/cache/ridge.txt Out: /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: gmt_get_filename: In: /Users/seisman/.gmt/cache/ridge.txt Out: /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Found readable file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Object ID 2 : Registered Data Table File /Users/seisman/.gmt/cache/ridge.txt as an Input resource with geometry Point [n_objects = 3]
grdtrack [DEBUG]: gmtapi_init_import: Added 1 new sources
grdtrack [DEBUG]: GMT_Init_IO: Returned first Input object ID = 2
grdtrack [DEBUG]: GMT_Begin_IO: Mode value 1 not considered (ignored)
grdtrack [DEBUG]: GMT_Begin_IO: Initialize record-by-record access for Input
grdtrack [DEBUG]: gmtapi_next_io_source: Selected object 2
grdtrack [DEBUG]: gmt_get_filename: In: /Users/seisman/.gmt/cache/ridge.txt Out: /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Given full path to file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Found readable file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Found readable file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [INFORMATION]: Reading Data Table from file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: GMT_Begin_IO: Input resource access is now enabled [record-by-record]
grdtrack [DEBUG]: gmtapi_init_export: Passed family = Data Table and geometry = Point
grdtrack [DEBUG]: Object ID 3 : Registered Data Table File output.txt as an Output resource with geometry Point [n_objects = 4]
grdtrack [DEBUG]: gmtapi_init_export: Added 1 new destination
grdtrack [DEBUG]: GMT_Init_IO: Returned first Output object ID = 3
grdtrack [DEBUG]: GMT_Begin_IO: Mode value 1 not considered (ignored)
grdtrack [DEBUG]: GMT_Begin_IO: Initialize record-by-record access for Output
grdtrack [DEBUG]: gmtapi_next_io_source: Selected object 3
grdtrack [DEBUG]: gmt_get_filename: In: output.txt Out: output.txt
grdtrack [DEBUG]: Found readable file output.txt
grdtrack [DEBUG]: Replace file output.txt with path output.txt
grdtrack [INFORMATION]: Writing Data Table to file output.txt
grdtrack [DEBUG]: GMT_Begin_IO: Output resource access is now enabled [record-by-record]
grdtrack [DEBUG]: Source col types: (Number,Number)
grdtrack [DEBUG]: ASCII source scanned: Numerical columns: 2, Trailing text: N, Record type: Numerical only
Assertion failed: (node < G->header->size), function gmt_bcr_get_z_fast, file ../../src/gmt_bcr.c, line 256.
[1]    6679 abort      python test.py

@PaulWessel
Copy link
Member

Well, the new files will fix this I think. The xarray read results in -179.5/179.5/0.5/89.5 but that should now be -180/180/-90/90 I think.

@seisman
Copy link
Member Author

seisman commented Jun 27, 2020

It still crashes, no matter the output.nc is generated from a gridline or pixel-registered grid. Not a big issue. Passing GMT_IN and everything works.

@PaulWessel
Copy link
Member

OK, if I find some spare time I will write a new testapi_* to read a grid and pass it as reference and see what I can learn.

@PaulWessel
Copy link
Member

Meanwhile, I think this branch is sound and can be merged, no?

@seisman
Copy link
Member Author

seisman commented Jun 27, 2020

Yes, good to me. Since I opened this PR, you need to approve it.

@PaulWessel PaulWessel merged commit d7d1701 into master Jun 27, 2020
@PaulWessel PaulWessel deleted the grdcut-api branch June 27, 2020 07:24
@seisman
Copy link
Member Author

seisman commented Jun 27, 2020

It seems this PR breaks PyGMT. Passing an xarray to grdimage gives the following error:

grdimage [ERROR]: gmt_grd_project: Input grid does not have sufficient (2) padding

@PaulWessel
Copy link
Member

OK, what is the command-line equivalent? With our without shading (-I). And this is GMT_IN only?

@seisman
Copy link
Member Author

seisman commented Jun 27, 2020

PyGMT code:

import pygmt
from pygmt.datasets import load_earth_relief

grid = load_earth_relief('30m')
fig = pygmt.Figure()
fig.grdimage(grid, cmap='geo', projection='W0/6i')
fig.savefig("grdimage.png")

command line: gmt grdimage @earth_relief_30m -Cgeo -JW0/6i -png map

GMT_IN works, but GMT_IN|GMT_IS_REFERENCE gives the error.

@PaulWessel
Copy link
Member

Yes, so this is a good lesson and plays into @joa-quim's annoyance with the pads. Since you xarray has no pads, and since you pass by REFERENCE, GMT just takes the matrix as the grid. But the algorithm for map projection, resampling etc expects there to be a pad, and since there is not you get that error. Again, solving this is difficult during the read stage since we dont check what module is calling us. If we did, and knew that module will need the pad, we could switch to DUPLICATE and you would get no error. I suspect this is true for grdview, grdimage, grdproject, grdsample, grdtrack. Maybe others. I think rewriting GMT to not have the pads is way too much work (@joa-quim's suggestion) since we would take something that works and break it up. I think programmer time is much more expensive than RAM. So duplicating internally if we have to is clearly the cheapest solution. I will try to make a list of programs where padding is essential. Not all programs care and you would only get that message from those that do.

@PaulWessel
Copy link
Member

Only grdimage calls gmt_grd_project which gave that message. I wonder what happens when you do grdtrack with a line that crosses outside. Does that give any grief?

@seisman
Copy link
Member Author

seisman commented Jun 27, 2020

I wonder what happens when you do grdtrack with a line that crosses outside. Does that give any grief?

Perhaps related to this comment #3532 (comment)

@PaulWessel
Copy link
Member

OK, can recreate this with testapi_matrix_plot by setting Create_Session pad to 0. Otherwise it works. Do you set pad to zero?

	API = GMT_Create_Session ("test", 2U, GMT_SESSION_EXTERNAL, NULL);
	M = GMT_Read_Data (API, GMT_IS_MATRIX, GMT_IS_FILE, GMT_IS_SURFACE, GMT_READ_NORMAL, NULL, "@matrix_grid.txt", NULL);
	GMT_Open_VirtualFile (API, GMT_IS_GRID|GMT_VIA_MATRIX, GMT_IS_SURFACE, GMT_IN|GMT_IS_REFERENCE, M, input);
	sprintf (args, "%s -JM6i -P -Baf -Ei", input);
	GMT_Call_Module (API, "grdimage", GMT_MODULE_CMD, args);

This works fine

@PaulWessel
Copy link
Member

Instead of giving the no pad error I now simply duplicate the grid and add a 2-pad and use that grid instead inside gmt_grd_project and that works fine. PR coming up.

@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

can recreate this with testapi_matrix_plot by setting Create_Session pad to 0. Otherwise it works. Do you set pad to zero?

I think PyGMT uses 2 (GMT_PAD_DEFAULT).

@PaulWessel
Copy link
Member

I thought so too. So unsure why the testapi_matrix_plot with pad = 2 works fine while pad = 0 recreates the error you found. I tried -JW as well, no difference. Anyway, just running tests now with the new scheme. I think this is better: Instead lf throwing error we do what we can do to make it work - no questions asked.

seisman added a commit to GenericMappingTools/pygmt that referenced this pull request Jun 28, 2020
Wrap the `grdcut` function.

Ideally, `grdcut` should take a grid file or a `xarray.DataArray` as input, and
store output as a grid file or return a `xarray.DataArray`.

Currently supported features:

- [X] input: grid file; output: grid file
- [X] input: grid file; output: `xarray.DataArray`
- [ ] input: `xarray.DataArray`; output: grid file
- [ ] input: `xarray.DataArray`; output: `xarray.DataArray`

Passing `xarray.DataArray` to `grdcut` is not implemented in GMT 6.0.0, 
only is only possible for GMT>=6.1.0.
See GenericMappingTools/gmt#3532 for
the feature request to the upstream GMT.

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

One more issue. It seems some grid header information are lost (e.g., variable names for x, y and z).

Information of raw data:

earth_relief_01d_g.grd: Title: Earth Relief at 01 arc degree
earth_relief_01d_g.grd: Command: grdfilter SRTM15+V2.1.nc -Fg111.2 -D1 -I01d -rg -Gearth/earth_relief/earth_relief_01d_g.grd=ns+s0.5+o0 --IO_NC4_DEFLATION_LEVEL=9 --IO_NC4_CHUNK_SIZE=4096 --PROJ_ELLIPSOID=Sphere
earth_relief_01d_g.grd: Remark: Obtained by Gaussian Cartesian filtering (111.2 km fullwidth) from SRTM15+V2.1.nc [Tozer et al., 2019; http://dx.doi.org/10.1029/2019EA000658]
earth_relief_01d_g.grd: Gridline node registration used [Geographic grid]
earth_relief_01d_g.grd: Grid file format: ns = GMT netCDF format (16-bit integer), CF-1.7
earth_relief_01d_g.grd: x_min: -180 x_max: 180 x_inc: 1 name: longitude n_columns: 361
earth_relief_01d_g.grd: y_min: -90 y_max: 90 y_inc: 1 name: latitude n_rows: 181
earth_relief_01d_g.grd: z_min: -8592.5 z_max: 5559 name: elevation (m)
earth_relief_01d_g.grd: scale_factor: 0.5 add_offset: 0 packed z-range: [-17185,11118]
earth_relief_01d_g.grd: format: netCDF-4 chunk_size: 181,181 shuffle: on deflation_level: 9

Grid information of command line gmt grdcut @earth_relief_01d -R0/30/0/50 -Goutput.nc:

output.nc: Title: Produced by grdcut
output.nc: Command: grdcut @earth_relief_01d_p -R0/30/0/50 -Goutput.nc
output.nc: Remark: Obtained by Gaussian Cartesian filtering (111.2 km fullwidth) from SRTM15+V2.1.nc [Tozer et al., 2019; http://dx.doi.org/10.1029/2019EA000658]
output.nc: Pixel node registration used [Geographic grid]
output.nc: Grid file format: nf = GMT netCDF format (32-bit float), CF-1.7
output.nc: x_min: 0 x_max: 30 x_inc: 1 name: longitude n_columns: 30
output.nc: y_min: 0 y_max: 50 y_inc: 1 name: latitude n_rows: 50
output.nc: z_min: -4852 z_max: 2228.5 name: elevation (m)
output.nc: scale_factor: 1 add_offset: 0
output.nc: format: classic

Grid Information of the equivalent PyGMT grdcut() output:

output.nc: Title: Produced by grdcut
output.nc: Command: grdcut @GMTAPI@-S-I-G-M-G-N-000000 -Goutput.nc -R0/180/0/90
output.nc: Remark:
output.nc: Gridline node registration used [Cartesian grid]
output.nc: Grid file format: nf = GMT netCDF format (32-bit float), CF-1.7
output.nc: x_min: -0.5 x_max: 179.5 x_inc: 1 name: x n_columns: 181
output.nc: y_min: -0.5 y_max: 89.5 y_inc: 1 name: y n_rows: 91
output.nc: z_min: -8182 z_max: 5651.5 name: z
output.nc: scale_factor: 1 add_offset: 0
output.nc: format: netCDF-4 chunk_size: 181,91 shuffle: on deflation_level: 3

@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

It seems impossible to get the correct variables names, since when passing xarray data to GMT, the variable names are not passed to GMT.

@PaulWessel
Copy link
Member

Yes, and worse, we do not recognized the grid as geographic because no names like longitude. Presumably, the xarray container has the names and the problem is that GMT_MATRIX has no space for them? if so, then we could add some more text variables for this. Then, when we read GRID VIA MATRIX we can copy those into the Grid header and things should work. What you think? Adding variables is backwards compatible (removing not so much...).

@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

we do not recognized the grid as geographic because no names like longitude.

PyGMT can pass GMT_GRID_IS_GEO to GMT_Create_Data, and it seems work, right?

Presumably, the xarray container has the names and the problem is that GMT_MATRIX has no space for them?

xarray has the names, perhaps PyGMT should convert xarray to GMT_GRID instead?

@PaulWessel
Copy link
Member

That might be an even better solution, if that can be arranged.

@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

The grdtrack example above seems working now, possibly fixed by one of your recent changes. No crashes anymore, but it gives the following error:

grdtrack [ERROR]: Cannot find file output.txt

Why does grdtrack check if an output file exist?

gmt/src/grdtrack.c

Lines 294 to 300 in ed50fc5

case '>': /* Specified output file */
if (n_files++ > 0) {n_errors++; continue; }
Ctrl->Out.active = true;
if (opt->arg[0]) Ctrl->Out.file = strdup (opt->arg);
if (GMT_Get_FilePath (GMT->parent, GMT_IS_DATASET, GMT_OUT, GMT_FILE_LOCAL, &(Ctrl->Out.file))) n_errors++;
break;

@PaulWessel
Copy link
Member

PaulWessel commented Jun 28, 2020

GMT_Get_FilePath, for GMT_OUT, only checks that a file name is given (if not, an error) then returns:

if (direction == GMT_OUT) return GMT_NOERROR;

But if @ridge.txt cannot be found it will print that same message, but from gmt_remote.. However, unclear how it can print output.txt. Does pygmt.grdtrack translate outfile="output.txt" to the right argument? For command line that would be ->output.txt

@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

This is the args passed to GMT_Call_Module:

@ridge.txt -G@GMTAPI@-S-I-G-M-G-N-000000 ->output.txt

@seisman
Copy link
Member Author

seisman commented Jun 28, 2020

Here is the verbose message. It seems the error message is given after running the parse() function:

grdtrack [INFORMATION]: Processing input grid(s)
grdtrack [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
grdtrack [DEBUG]: gmtapi_import_grid: Passed ID = 0 and mode = 1
grdtrack [DEBUG]: Grid/Image dimensions imply w/e/s/n = -179.5/179.5/0.5/89.5, inc = 1/1, gridline registration, n_layers = 1
grdtrack [INFORMATION]: Cartesian input grid
grdtrack [DEBUG]: Chosen boundary condition for all edges: natural
grdtrack [INFORMATION]: Cartesian input grid, yet x cells span exactly 360 and -90 <= y <= 90.
grdtrack [INFORMATION]:      To make sure the grid is recognized as geographical and global, use the -fg option
grdtrack [DEBUG]: Object ID 1 : Registered Grid Memory Reference 7fdc1a2f2e90 as an Input resource with geometry Surface [n_objects = 2]
grdtrack [DEBUG]: Successfully created a new Grid container
grdtrack [DEBUG]: GMT_End_IO: Input resource access is now disabled
grdtrack [DEBUG]: gmtapi_begin_io: Input resource access is now enabled [container]
grdtrack [DEBUG]: gmtapi_import_grid: Passed ID = 0 and mode = 2
grdtrack [INFORMATION]: Referencing grid data from user memory location
grdtrack [DEBUG]: GMT_End_IO: Input resource access is now disabled
grdtrack [DEBUG]: gmtapi_init_import: Passed family = Data Table and geometry = Point
grdtrack [DEBUG]: gmt_get_filename: In: /Users/seisman/.gmt/cache/ridge.txt Out: /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Given full path to file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: gmt_get_filename: In: /Users/seisman/.gmt/cache/ridge.txt Out: /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: gmt_get_filename: In: /Users/seisman/.gmt/cache/ridge.txt Out: /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Found readable file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Object ID 2 : Registered Data Table File /Users/seisman/.gmt/cache/ridge.txt as an Input resource with geometry Point [n_objects = 3]
grdtrack (gmtapi_init_import): tried to free unallocated memory
grdtrack [DEBUG]: gmtapi_init_import: Added 1 new sources
grdtrack [DEBUG]: GMT_Init_IO: Returned first Input object ID = 2
grdtrack [DEBUG]: GMT_Begin_IO: Mode value 1 not considered (ignored)
grdtrack [DEBUG]: GMT_Begin_IO: Initialize record-by-record access for Input
grdtrack [DEBUG]: gmtapi_next_io_source: Selected object 2
grdtrack [DEBUG]: gmt_get_filename: In: /Users/seisman/.gmt/cache/ridge.txt Out: /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Given full path to file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Found readable file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: Found readable file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [INFORMATION]: Reading Data Table from file /Users/seisman/.gmt/cache/ridge.txt
grdtrack [DEBUG]: GMT_Begin_IO: Input resource access is now enabled [record-by-record]
grdtrack [DEBUG]: gmtapi_init_export: Passed family = Data Table and geometry = Point
grdtrack [DEBUG]: Object ID 3 : Registered Data Table File output.txt as an Output resource with geometry Point [n_objects = 4]
grdtrack [DEBUG]: gmtapi_init_export: Added 1 new destination
grdtrack [DEBUG]: GMT_Init_IO: Returned first Output object ID = 3
grdtrack [DEBUG]: GMT_Begin_IO: Mode value 1 not considered (ignored)
grdtrack [DEBUG]: GMT_Begin_IO: Initialize record-by-record access for Output
grdtrack [DEBUG]: gmtapi_next_io_source: Selected object 3
grdtrack [DEBUG]: gmt_get_filename: In: output.txt Out: output.txt
grdtrack [DEBUG]: Look for file output.txt in /Users/seisman/.gmt
grdtrack [DEBUG]: Look for file output.txt in /Users/seisman/.gmt/cache
grdtrack [DEBUG]: Look for file output.txt in /Users/seisman/.gmt/server
grdtrack [ERROR]: Cannot find file output.txt
grdtrack [INFORMATION]: Writing Data Table to file output.txt
grdtrack [DEBUG]: GMT_Begin_IO: Output resource access is now enabled [record-by-record]
grdtrack [DEBUG]: Source col types: (Number,Number)
grdtrack [DEBUG]: ASCII source scanned: Numerical columns: 2, Trailing text: N, Record type: Numerical only
grdtrack [DEBUG]: GMT_End_IO: Input resource access is now disabled
grdtrack [DEBUG]: GMT_End_IO: Output resource access is now disabled
grdtrack [WARNING]: Some input points were outside the grid domain(s).
grdtrack [INFORMATION]: Sampled 1537 points from grid @GMTAPI@-S-I-G-M-G-N-000000 (360 x 90)
grdtrack [DEBUG]: GMT_Destroy_Data: freed memory for a Grid for object 1
grdtrack [DEBUG]: gmtlib_unregister_io: Unregistering object no 1 [n_objects = 3]
grdtrack [DEBUG]: gmtlib_unregister_io: Object no 1 has non-NULL resource pointer
grdtrack [DEBUG]: gmtlib_unregister_io: Unregistering object no 2 [n_objects = 2]
grdtrack [DEBUG]: gmtlib_unregister_io: Unregistering object no 3 [n_objects = 1]
grdtrack (gmtlib_free_tmp_arrays): tried to free unallocated memory

@PaulWessel
Copy link
Member

How can we debug pyGMT? When debugging GMTMEX we start Matlab, then tell XCode to connect to process Matlab, then run the matlab script and Xcode stops at the point we said. Are you running pyGMT in a Notebook or a terminal running Python? Maybe it is possible to do the same?

OK write that above then your latest came in. So seems like it continues and runs successfully? There is only two places where the messate "Cannot find file " is printed:

  1. gmt_api.c: GMT_Get_FilePath. However, that happens after the check if (direction == GMT_OUT) return GMT_NOERROR; so seems impossible.
  2. gmt_remote.c: gmt_download_file_if_not_found. Looks like it could hppeng in gmtapi_next_io_source. Let me check.

@PaulWessel
Copy link
Member

Yes, gmt_download_file_if_not_found could be called for output, so I am adding an if test on direction around it [in gmtapi_next_io_source]. Running tests before committing. Hopefully this makes it go away.

@PaulWessel
Copy link
Member

Did you confirm that this PR did the trick with the example with output.txt?

@seisman
Copy link
Member Author

seisman commented Jun 29, 2020

Yes. PyGMT now works well with the GMT master branch, using either GMT_IN or GMT_IN|GMT_IS_REFERENCE. Currently, PyGMT uses GMT_IN|GMT_IS_REFERENCE, but I believe we will change it to GMT_IN to avoid possible crashes in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants