Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNIP 86 - metadata parsing and storing #7263

Closed
3 of 5 tasks
etj opened this issue Apr 8, 2021 · 2 comments
Closed
3 of 5 tasks

GNIP 86 - metadata parsing and storing #7263

etj opened this issue Apr 8, 2021 · 2 comments
Assignees
Labels
gnip A GeoNodeImprovementProcess Issue
Milestone

Comments

@etj
Copy link
Contributor

etj commented Apr 8, 2021

GNIP 86 - metadata parsing and storing

Overview

Metadata parsing is not easily customizable.

The proposed changes allows to

  • define one or more custom parsers, in order to be able to handle metadata documents following specific profiles not handled by owslib
  • add hook to handle custom DB object on layer creation.

Other tasks included:

  • fix keyword parsing

Proposed By

@etj (Emanuele Tajariol)

Assigned to Release

This proposal is for GeoNode .

State

  • Under Discussion
  • In Progress
  • Completed
  • Rejected
  • Deferred

Motivation

I want to be able to provide one or more metadata parsers, in order to be able to override a mapping implemented by the default parser.
Examples:

  • extend owslib XML mapping: there are ISO19115/19139 profiles that store thesaurus keywords not in gco:CharacterString, but into gmx:Anchor, and the current owslib parser is not able to parse them. We could of course fix owslib, but it would be a much longer process and it's not said that a particular profile is of general interest, so we need to fine tune the parsing on geonode itself.
  • fix XML mapping: ISO19115 constraints should be grouped at gmd:MD_LegalConstraints level, while owslib loops over gmd:MD_RestrictionCodes and gmd:otherConstraints separately.
  • extend GeoNode model: if in a GeoNode customization I need to store some more metadata info, I'll need a custom parser.

In this latter case (model extension), we may want to save the extra data parsed. That's why we also need a way to call some logic that knows how to deal with the custom parsed data.
In case the customization has extended the base model, the parser will put the added values into the val dict and the default logic will save the new fields along with the "official" ones.
In case the customization added a 1:1 relationship to another table, we'll provide all of the parsed values to the custom logic.
We need to explicitly call the custom logic to deal with other db objects, and can not rely on signals because in this latter case there is no way we can provide the parsed data.

Proposal

Metadata parsing

In settings, a new variable METADATA_PARSERS will be added.
It's a dict, having as keys MD_Metadata, metadata, Record -- these values are taken from metadata.py, related to the root element names of the related metadata; Anyway, since this proposed implementation makes it dynamic, you will be able to define a parser for a brand new metadata format only by implementing the function and declaring it in this setting.
The value related to a key in the dict will be a list containing:

  • "__DEFAULT__" a fixed string, to indicate the default parser (the existing one, if already defined for that type of metadata), only as optional first element
  • references to parser functions.

Parser functions must be implemented so that they will return a 5 element tuple:

  • uuid, as it is now
  • vals, as it is now, a dict holding ResourceBase fields
  • regions, as it is now
  • keywords, as it is now
  • custom, a dict that contains
    • key: an id related to the parser itself
    • val: a dict of parsed values

The parser function should take as params:

  • exml object (the input document)
  • vals, the vals produced in the previous step
  • regions, the regions produced in the previous step
  • keywords, the keywords produced in the previous step
  • custom, the custom produced in the previous step

The parser function can alter (refine / improve) the content of each one of the params, and then return them back.

In meta-code the defined functions should be called like that (excluding error checking and default assignments):

parsers = config['METADATA_PARSERS'][root_el]
uuid=none
vals={}
regions=[]
keywords=[]
custom={}
for f in parsers:
   uuid, vals, regions, keywords, custom = f(exml, uuid, vals, regions, keywords, custom)

Current parsing is called for instance in https://github.com/GeoNode/geonode/blob/3.1/geonode/upload/upload.py#L845

layer_uuid, vals, regions, keywords = set_metadata( ...xml file ...)

Storing

At the end of final_step(), we'll be providing the layer and all the parsed info to any function defined in settings.

storer_list = config['METADATA_STORERS']
for s in storer_list:
   s(layer, uuid, vals, regions, keywords, custom)

As an example, we may have a parser which extracts al "process steps" from the metadata, and store them into custom['processes'].
A storer function will then use the Layer info and the custom['processes'] info to create new DB records referencing to layer with a foreign key and other text fields holding the process steps details.

As an alternative to have so many params in the Layer storer functions, they may require only layer and custom parameters, since all the other ones have already been stored in the Layer instance.

Sub-issues

When considering this GNIP, other topics were involved, which have been moved as standalone issues:

Backwards Compatibility

The logic will not be implemented for the synch uploader (i.e.

UPLOADER = {
    'BACKEND': os.getenv('DEFAULT_BACKEND_UPLOADER', 'geonode.rest'),

since it's going to be deprecated.

Future evolution

Explain which could be future evolutions.

Feedback

Update this section with relevant feedbacks, if any.

Voting

Project Steering Committee:

  • Alessio Fabiani:
  • Francesco Bartoli:
  • Giovanni Allegri:
  • Simone Dalmasso:
  • Toni Schoenbuchner:
  • Florian Hoedt:

Links

Remove unused links below.

mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 9, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 9, 2021
@gannebamm
Copy link
Contributor

@stefmec Please take a look at this proposal and check if it would solve some of our current issues with metadata ingestion.

@afabiani
Copy link
Member

afabiani commented Apr 9, 2021

+1

mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 9, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 9, 2021
… of set_metadata to accept 5 output value instead of 5
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 9, 2021
…m dict to list, replace all occurences of set_metadata with parse_metadata
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 9, 2021
…rename underscores with custom, first scheleton of keyword handler
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 9, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 9, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 12, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 12, 2021
@t-book t-book added the gnip A GeoNodeImprovementProcess Issue label Apr 13, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 13, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 13, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 13, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 13, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 14, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 15, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 15, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 15, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 15, 2021
@afabiani afabiani added this to the 3.2 milestone Apr 15, 2021
mattiagiupponi added a commit to mattiagiupponi/geonode that referenced this issue Apr 15, 2021
afabiani pushed a commit that referenced this issue Apr 16, 2021
* test

* rollback

* [Fixes #7263] Add test for current set_metadata

* [Fixes #7263] Add test for current set_metadata

* [Fixes #7263] Add dummy xml for tests

* [Fixes #7263] Add multiple metadata_parser, fix all currencies of set_metadata to accept 5 output value instead of 5

* [Fixes #7263] Add smoke test description, METADATA_PARSERS from dict to list, replace all occurences of set_metadata with parse_metadata

* [Fixes #7263] Add thesaurus to test xml, fix flake8 issue and rename underscores with custom, first scheleton of keyword handler

* [Fixes #7263] Keyword handler moted to object

* [Fixes #7263] KeywordHandler assign keyword to object

* [Fixes #7279] Keyword Handler for metadata upload, remove keyword from geoserver step and add tests

* [Fixes #7279] Add test for set_metadata_function cd /opt/geonode ; /usr/bin/env /home/mattia/.virtualenvs/local-geonode/bin/python /home/mattia/.vscode-server/extensions/ms-python.python-2021.3.680753044/pythonFiles/lib/python/debugpy/launcher 41039 -- /opt/geonode/manage.py test geonode.layers.tests.TestSetMetadata -v 2 --settings=geonode.local_settings --keepdb

* [Fixes #7279] Test coverage for set_metadata with ISO xml dummy

* [Fixes #7279] Add keyword handler in upload

* [Fixes #7279] Flake8 indentation

* [Fixes #7279] Fix KeywordHandler descripiton

* [Fixes #7279] Add tests for convert_keyword

* [Fixes #7263]cleanup wrong code

* Merge with ISSUE_7279

* [Fixes #7288] Layer information handling moved to final step instead of geoserver_finalize_upload

* [Fixes #7288] rollback metadata file

* [Fixes #7288] flake8 indentation

* [Fixes #7263] Add custom field in parsers

* [Fixes #7263] Add metadata_storer feature

* [Fixes #7263] Fix flake8 indentation

* [Fixes #7263] Rename storer

* [Fixes #7288] Remove xml reading

* [Fixes #7288] Fix flake8 warnings

* [Fixes #7263] Fix parser evaluator

* - Double check "gs_resource" is not None

* - Avoid unneeded attribute value assignments

* - Updaye GeoServer resource anyaway

* - Add more checks on Keywors "dict" and upload error messages

* Use KeywordHandler

* [Fixes #7263] Removed duplicated def

* [Fixes #7263] Fix flake8 and tests alignement

* [Fixes #7263] Fix flake8 and tests alignement

* [Fixes #7263] Fix flake8 and tests alignement

* [Fixes #7263] Fix set_metadata return variable number

* - Metadata file exception error handling

Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>
(cherry picked from commit fc9d81a)
afabiani pushed a commit that referenced this issue Apr 16, 2021
* test

* rollback

* [Fixes #7263] Add test for current set_metadata

* [Fixes #7263] Add test for current set_metadata

* [Fixes #7263] Add dummy xml for tests

* [Fixes #7263] Add multiple metadata_parser, fix all currencies of set_metadata to accept 5 output value instead of 5

* [Fixes #7263] Add smoke test description, METADATA_PARSERS from dict to list, replace all occurences of set_metadata with parse_metadata

* [Fixes #7263] Add thesaurus to test xml, fix flake8 issue and rename underscores with custom, first scheleton of keyword handler

* [Fixes #7263] Keyword handler moted to object

* [Fixes #7263] KeywordHandler assign keyword to object

* [Fixes #7279] Keyword Handler for metadata upload, remove keyword from geoserver step and add tests

* [Fixes #7279] Add test for set_metadata_function cd /opt/geonode ; /usr/bin/env /home/mattia/.virtualenvs/local-geonode/bin/python /home/mattia/.vscode-server/extensions/ms-python.python-2021.3.680753044/pythonFiles/lib/python/debugpy/launcher 41039 -- /opt/geonode/manage.py test geonode.layers.tests.TestSetMetadata -v 2 --settings=geonode.local_settings --keepdb

* [Fixes #7279] Test coverage for set_metadata with ISO xml dummy

* [Fixes #7279] Add keyword handler in upload

* [Fixes #7279] Flake8 indentation

* [Fixes #7279] Fix KeywordHandler descripiton

* [Fixes #7279] Add tests for convert_keyword

* [Fixes #7263]cleanup wrong code

* Merge with ISSUE_7279

* [Fixes #7288] Layer information handling moved to final step instead of geoserver_finalize_upload

* [Fixes #7288] rollback metadata file

* [Fixes #7288] flake8 indentation

* [Fixes #7263] Add custom field in parsers

* [Fixes #7263] Add metadata_storer feature

* [Fixes #7263] Fix flake8 indentation

* [Fixes #7263] Rename storer

* [Fixes #7288] Remove xml reading

* [Fixes #7288] Fix flake8 warnings

* [Fixes #7263] Fix parser evaluator

* - Double check "gs_resource" is not None

* - Avoid unneeded attribute value assignments

* - Updaye GeoServer resource anyaway

* - Add more checks on Keywors "dict" and upload error messages

* Use KeywordHandler

* [Fixes #7263] Removed duplicated def

* [Fixes #7263] Fix flake8 and tests alignement

* [Fixes #7263] Fix flake8 and tests alignement

* [Fixes #7263] Fix flake8 and tests alignement

* [Fixes #7263] Fix set_metadata return variable number

* - Metadata file exception error handling

Co-authored-by: Alessio Fabiani <alessio.fabiani@geo-solutions.it>
Co-authored-by: afabiani <alessio.fabiani@gmail.com>
(cherry picked from commit fc9d81a)

Co-authored-by: mattiagiupponi <51856725+mattiagiupponi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gnip A GeoNodeImprovementProcess Issue
Projects
None yet
Development

No branches or pull requests

5 participants