Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance gdal capabilities for complex datasets #142

Closed

Conversation

jmichel-otb
Copy link

This pull request creates a new metadata domain, called DERIVED_SUBDATASETS, for all datasets supported by gdal.

This new metadata domain reports derived subdatasets that actually are derived vrt raster band using some of the pixel functions from Valentino. Those functions where integrated in another pull request, that has been merged in this one (see #141)

Exposed derived datasets are described in a single array, which makes it rather easy to add new derived datasets in the future.

Here is an example on Sentinel1 dataset :

$ gdalinfo -nogcp -mdd DERIVED_SUBDATASETS s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
Driver: GTiff/GeoTIFF
Files: s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
Size is 17663, 31106
Coordinate System is `'
Metadata:
AREA_OR_POINT=Area
TIFFTAG_DATETIME=2015:06:20 01:52:30
TIFFTAG_IMAGEDESCRIPTION=Sentinel-1A SM SLC L1
TIFFTAG_SOFTWARE=Sentinel-1 IPF 002.45
Metadata (DERIVED_SUBDATASETS):
DERIVED_SUBDATASET_0_NAME=DERIVED_SUBDATASET:AMPLITUDE:s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_0_DESC=Amplitude of input bands from s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_1_NAME=DERIVED_SUBDATASET:PHASE:s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_1_DESC=Phase of input bands from s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_2_NAME=DERIVED_SUBDATASET:REAL:s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_2_DESC=Real part of input bands from s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_3_NAME=DERIVED_SUBDATASET:IMAG:s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_3_DESC=Imaginary part of input bands from s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_4_NAME=DERIVED_SUBDATASET:CONJ:s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_4_DESC=Conjugate of input bands from s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_5_NAME=DERIVED_SUBDATASET:INTENSITY:s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_5_DESC=Intensity (squared amplitude) of input bands from s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_6_NAME=DERIVED_SUBDATASET:LOGAMPLITUDE:s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
DERIVED_SUBDATASET_6_DESC=log10 of amplitude of input bands from s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
Image Structure Metadata:
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 0.0, 0.0)
Lower Left ( 0.0,31106.0)
Upper Right (17663.0, 0.0)
Lower Right (17663.0,31106.0)
Center ( 8831.5,15553.0)
Band 1 Block=17663x1 Type=CInt16, ColorInterp=Gray

And :

$ gdalinfo -nogcp DERIVED_SUBDATASET:AMPLITUDE:s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
Driver: COMPLEXDERIVED/Complex derived bands
Files: s1a-s6-slc-vv-20150619t195043-20150619t195101-006447-00887d-001.tiff
Size is 17663, 31106
Coordinate System is `'
Origin = (0.000000000000000,0.000000000000000)
Pixel Size = (1.000000000000000,1.000000000000000)
Metadata:
AREA_OR_POINT=Area
TIFFTAG_DATETIME=2015:06:20 01:52:30
TIFFTAG_IMAGEDESCRIPTION=Sentinel-1A SM SLC L1
TIFFTAG_SOFTWARE=Sentinel-1 IPF 002.45
Corner Coordinates:
Upper Left ( 0.0000000, 0.0000000)
Lower Left ( 0.000, 31106.000)
Upper Right ( 17663.000, 0.000)
Lower Right ( 17663.000, 31106.000)
Center ( 8831.500, 15553.000)
Band 1 Block=128x128 Type=Float64, ColorInterp=Undefined

Things to be discussed :

  • Proper use of gdal coding style / naming conventions
  • Tests are available in autotest, are they sufficient ?
  • Since derived datatets are available for all datasets with at least one raster band, they are not specifically made for complex pixel type. Hence, we might want to change the driver name.
  • Some pixel function only make sense when there is at least one complex pixel type band. Shall we filter them ? They will work with real pixel type, but will be pretty much useless (like conj or imag).
  • Driver documentation html page is still to be written

avalentino and others added 30 commits April 13, 2014 11:47
@jmichel-otb
Copy link
Author

Many tests are failing because of the new reported metadata domain list. Typical case is checking against a hard-coded metadata domain list, or metadata domain list size (good example in autotest/jp2openjpeg.py). I was thinking maybe we could add an environment variable to explicitly disable the report of derived subdatatsets, and set it in tests ?

currentDomainList = CSLAddString(currentDomainList,"DERIVED_SUBDATASETS");

// Ensure that we do not duplicate DERIVED domain
if(CSLFindString(currentDomainList,"DERIVED_SUBDATASETS")==-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which situation would DERIVED_SUBDATASETs already be in oMDMD.GetDomainList() ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some of the tests from jp2openjpeg.py (the ones with vsimem). It seems that the dataset is a duplicate from another dataset (and that the whole metadata domain list is already ported during duplication).

@rouault
Copy link
Member

rouault commented Jul 8, 2016

Many tests are failing because of the new reported metadata domain list. Typical case is checking against a hard-coded metadata domain list, or metadata domain list size (good example in autotest/jp2openjpeg.py). I was thinking maybe we could add an environment variable to explicitly disable the report of derived subdatatsets, and set it in tests ?

Hum, I'm not a big fan of this solution. Looking at https://s3.amazonaws.com/archive.travis-ci.org/jobs/143320336/log.txt, there are only 6 tests failing, so updating them shouldn't be that a big deal. And interestingly some failures are not expected. For example in ogr_wfs.py and ogr_gpkg.py, it's a pure OGR dataset, so no raster at all. In that case the new metadata domain shouldn't be reported at all. A if GetRasterCount() != 0 check should likely be added.

@jmichel-otb
Copy link
Author

jmichel-otb commented Jul 8, 2016

Only 3 tests failed :
------------ Failures ------------
Script: gdrivers/jp2openjpeg.py
TEST: jp2openjpeg_37 ... fail
line 1796: fail
TEST: jp2openjpeg_42 ... fail
line 2098: fail
Script: gdrivers/gpkg.py
TEST: gpkg_21 ... fail
line 2073: fail


@jmichel-otb
Copy link
Author

Remaining failing tests :

------------ Failures ------------
Script: gcore/pixfun.py
TEST: pixfun_log10_c ... fail
TEST: pixfun_dB_r ... fail (blowup)
TEST: pixfun_dB_c ... fail (blowup)
Script: gdrivers/pixfun.py
TEST: pixfun_log10_r ... fail
line 760: Unable to open "data/pixfun_log10_r.vrt" dataset.
TEST: pixfun_log10_c ... fail
line 788: Unable to open "data/pixfun_log10_c.vrt" dataset.
TEST: pixfun_dB_r ... fail
line 815: Unable to open "data/pixfun_dB_r.vrt" dataset.
TEST: pixfun_dB_c ... fail
line 843: Unable to open "data/pixfun_dB_c.vrt" dataset.
Script: gdrivers/jp2openjpeg.py
TEST: jp2openjpeg_37 ... fail
line 1796: fail


Regarding pixel functions tests, I do not get what is wrong : they pass on my local configuration, and they also pass in trunk (note that I merged trunk into the branch of the pull request).

Regarding jp2openjpeg_37, this relates to the following warning :
TEST: jp2openjpeg_37 ... WARNING[INSPIRE_TG]: "xml " box not at expected index
The validate() function in jp2opnjpeg will fail if there is a single warning in gdal/swig/python/samples/validate_jp2.py validate() function, but looking at the code I do not see how this pull request can modify the order of jp2 boxes that are parsed from XML ...

@rouault
Copy link
Member

rouault commented Jul 12, 2016

Regarding jp2openjpeg_37 failure, it might be a bug in validate_jp2, but this is a sign that something has changed in the generated JP2 file. I suspect GDALJP2Metadata::CreateGDALMultiDomainMetadataXML() in gcore/gdaljp2metadata.cpp should be modified to exclude the DERIVED_SUBDATASET domain (there's already a list of skipped metadata domain).

Which makes me thing that we could have other side effects : could you check the content of a VRT produced by "gdal_translate some.tif out.vrt -of VRT" ? I don't think we would want the DERIVED_SUBDATASET domain to be serialized. Similarly if you do "gdal_translate some.tif out.png -of PNG", does the new domain appears in the out.png.aux.xml ?

Regarding pixfun failures, I suspect an issue with the merge or perhaps not using the latest trunk ?

@jmichel-otb
Copy link
Author

gdal_translate some.tif out.vrt -of VRT : no trace of the DERIVED mdd in output VRT.
gdal_translate some.tif out.png -of PNG : no trace of the DERIVED mdd in out.png.aux.xml

Will check the GDALJP2Metadata::CreateGDALMultiDomainMetadataXML() function. I am pretty sure that I am synced with trunk, but let see the outcome of the current build.

@jmichel-otb
Copy link
Author

------------ Failures ------------
Script: gcore/pixfun.py
TEST: pixfun_log10_c ... fail
TEST: pixfun_dB_r ... fail (blowup)
TEST: pixfun_dB_c ... fail (blowup)
Script: gdrivers/pixfun.py
TEST: pixfun_log10_r ... fail
line 760: Unable to open "data/pixfun_log10_r.vrt" dataset.
TEST: pixfun_log10_c ... fail
line 788: Unable to open "data/pixfun_log10_c.vrt" dataset.
TEST: pixfun_dB_r ... fail
line 815: Unable to open "data/pixfun_dB_r.vrt" dataset.
TEST: pixfun_dB_c ... fail
line 843: Unable to open "data/pixfun_dB_c.vrt" dataset.


Almost there ...

@jmichel-otb
Copy link
Author

Beware of git merge when files have been moved ...

@rouault
Copy link
Member

rouault commented Jul 12, 2016

So it seems we're almost done. Just doc missing ? And afterwards I'd advise doing a "git rebase -i trunk" and squash all commits into a single one to ease porting that back into SVN.

@jmichel-otb
Copy link
Author

Documentation pushed.

@@ -1139,6 +1139,14 @@
</td><td> Yes
</td></tr>

<tr><td> <a href="frmt_derived.html">Derived</a>
</td><td> DERIVED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To move towards the top: the list is sorted by driver short names

@jmichel-otb
Copy link
Author

Rebase & squash has been quite painful, had to make a lot of merge decisions regarding files unrelated to the PR ... Shall I push it ? Is it reversible ?

@rouault
Copy link
Member

rouault commented Jul 13, 2016

Rebase & squash has been quite painful, had to make a lot of merge decisions regarding files unrelated to the PR ... Shall I push it ? Is it reversible ?

Hum that might be due to having merge trunk back to your branch. If you are unsure about the result, create another branch after the merge with git checkout -b another_branch and push that one.

@rouault
Copy link
Member

rouault commented Jul 13, 2016

Was merged per #144. Closing

@rouault rouault closed this Jul 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants