Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/vsisubfile reads GRIB2 files inconsistently #10214

Closed
hrodmn opened this issue Jun 14, 2024 · 5 comments · Fixed by #10215
Closed

/vsisubfile reads GRIB2 files inconsistently #10214

hrodmn opened this issue Jun 14, 2024 · 5 comments · Fixed by #10215
Assignees

Comments

@hrodmn
Copy link

hrodmn commented Jun 14, 2024

What is the bug?

I have been working with some GRIB2 files recently and discovered that I could use /vsisubfile to read individual bands using the published byte ranges for each band. This is super useful because these files have hundreds of bands. It works great except for the first band. When I try to read the first band, GDAL interprets it as a multi-band dataset instead of a single-band dataset like all of the other bands.

Steps to reproduce the issue

You can run gdalinfo on an entire GRIB2 file like this:

gdalinfo /vsicurl/https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf02.grib2

This properly describes all of the bands!

You can run gdalinfo on a slice in the middle of the file like this:

gdalinfo /vsisubfile/345203_103748,/vsicurl/https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf02.grib2

When you run gdalinfo on the first slice (start byte = 0), GDAL reads it as a dataset with the full number of bands but only loads data for the first (correct) band:

gdalinfo /vsisubfile/0_345203,/vsicurl/https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf02.grib2

The result can be loaded and used, but the structure is not what you expect from the operation.

I don't know a lot about the internal structure of the GRIB files but I assume that this is happening because the full dataset metadata is getting loaded somewhere around byte 0.

Versions and provenance

Linux Ubuntu 24.04
GDAL 3.8.4, released 2024/02/08

Additional context

No response

@dbaston
Copy link
Member

dbaston commented Jun 14, 2024

FWIW, these files have a .idx sidecar, so GDAL should be able to read a single band directly without the need for /vsisubfile, e.g.

gdal_translate "/vsicurl/https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf02.grib2" -b 50 /tmp/out.tif

@rouault
Copy link
Member

rouault commented Jun 14, 2024

@dbaston 's advice is a good one. Reading of a specific band when the .idx is available should be efficient and not involve reading other parts of the file
If using /vsisubfile/, you can also use the USE_IDX=NO open option to avoid trying to read the .idx file with current GDAL versions, as they will actually try to apply /vsisubfile/ to the .idx file itself with the same range of the .grb2 file, which explains the weird behavior you get. For non-zero offset, GDAL tries to read the .idx file at an offset beyond its end, so it is ignored, and you get the expected result. For the zero offset, the .idx file is read in its entirety, and GDAL tries to instanciate all messages referenced into it as bands
In #10215, I've implemented consistent reading of the .idx file if using /vsisubfile/

@hrodmn
Copy link
Author

hrodmn commented Jun 14, 2024

@rouault thank you for the quick fix! For my use case, I am hoping to encode everything I need to read the right subset directly in the dsn. /vsisubfile lets me do that really easily so this change should take care of it.

@dbaston
Copy link
Member

dbaston commented Jun 14, 2024

I am hoping to encode everything I need to read the right subset directly in the dsn

The vrt:// syntax can also help with this,

gdalinfo "vrt:///vsicurl/https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf02.grib2?bands=50"

@hrodmn
Copy link
Author

hrodmn commented Jun 17, 2024

The vrt:// syntax can also help with this,

gdalinfo "vrt:///vsicurl/https://noaahrrr.blob.core.windows.net/hrrr/hrrr.20240510/conus/hrrr.t12z.wrfsfcf02.grib2?bands=50"

Thanks @dbaston, I did not know about the vrt:// syntax. That is really useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants