Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining predecessor node of a process #107

Closed
bgoesswe opened this issue Jan 13, 2020 · 6 comments
Closed

Defining predecessor node of a process #107

bgoesswe opened this issue Jan 13, 2020 · 6 comments

Comments

@bgoesswe
Copy link
Member

This came up when I tried to make the examples work again for GEE:
Currently, when a process is added to the Imagecollection/datacube it automatically sets the last node as input for the new process, which is kind of intuitive. The problem is that there are process graphs that can not be created by this:
e.g. The NDVI example process graph from the GEE web editor:

  1. load_collection
  2. filter red band using input node load_collection
  3. filter nir band using input node load_collection
  4. apply NDVI process using the node of filter_red and filter_nir

I know that in this particular example there is the Imagecollection.band method, but there might be use cases where there is no predefined function to solve that. (and with this band method I could not make it run on GEE)

Just an idea:

datacube = con.imagecollection("COPERNICUS/S2")
datacube = datacube.filter_bbox(...)
datacube_filter = datacube.filter_daterange(...)

datacube_red = datacube_filter.filter_bands("B4")
datacube_nir = datacube_filter.filter_bands("B8A")
datacube_ndvi = datacube_filter.normalized_difference(datacube_red, datacube_nir)

The last call merges the process graphs of datacube_red and datacube_nir so that there are no duplicate nodes in the process graph of datacube_ndvi. I am also not sure how to make that in a transparent way for the users, so this is open for discussion.

@bgoesswe bgoesswe changed the title Defining Predecessor node of a process Defining predecessor node of a process Jan 13, 2020
@soxofaan
Copy link
Member

I'm not sure I understand the issue here, because what you describe is already the approach in the python client.

when a process is added to the Imagecollection/datacube it automatically sets the last node as input for the new process

If you use the methods on ImageCollectionClient (e.g. filter_bbox, band, ndvi, ...), you create new python objects each time (process graphs might be partially shared, but that is an implementation detail) and each ImageCollectionClient object has an individual property .node_id pointing to the "last" node of the graph. However you still can "access" the nodes of previously created objects.

s2 = con.load_collection("S2")
a = s2.filter_bbox(...)
b = a.filter_temporal(...)
c =  b.apply_kernel(...)

at this point you still can build different process "paths" from the a and b objects, it's not that c is the only thing that can be built upon.

@bgoesswe
Copy link
Member Author

So yes this is possible, but the issue is that it is not possible to use one node as the input of two nodes in the same process graph, so e.g.:

s2 = con.load_collection("S2")
a = s2.filter_band(...)
b = s2.filter_band(...)
c =  s2.normalized_difference(a, b)

Should result in a process graph similiar like:

{ "load_collection1": {}, 
  "filter_bands1": {... "from_node": "load_collection1" },
  "filter_bands2": {... "from_node": "load_collection1" },
  "normalized_difference1": {... "band1": { "from_node": "filter_bands1"}, "band2": { "from_node": "filter_bands2"}}
   ...
}

At the moment the EVI in the phenology example is created with a new process graph and a reducer (using the band() method), which at the moment fails on GEE.

Complete NDVI example for GEE, which can not be reconstructed with the Python client atm:

{
  "load_collection_IQCXY5622P": {
    "process_id": "load_collection",
    "arguments": {
      "id": "COPERNICUS/S2",
      "spatial_extent": {
        "west": 11.2792,
        "south": 46.4643,
        "east": 11.4072,
        "north": 46.5182
      },
      "temporal_extent": [
        "2018-06-04",
        "2018-06-23"
      ],
      "bands": [
        "B4",
        "B8"
      ]
    }
  },
  "filter_bands_TMGHB5086J": {
    "process_id": "filter_bands",
    "arguments": {
      "data": {
        "from_node": "load_collection_IQCXY5622P"
      },
      "bands": [
        "B4"
      ]
    }
  },
  "normalized_difference_SGZTT5944D": {
    "process_id": "normalized_difference",
    "arguments": {
      "band1": {
        "from_node": "filter_bands_SNUEL8205U"
      },
      "band2": {
        "from_node": "filter_bands_TMGHB5086J"
      }
    }
  },
  "reduce_VVIRY8830F": {
    "process_id": "reduce",
    "arguments": {
      "data": {
        "from_node": "normalized_difference_SGZTT5944D"
      },
      "reducer": {
        "callback": {
          "max_JPPTE5228K": {
            "process_id": "max",
            "arguments": {
              "data": {
                "from_argument": "data"
              }
            },
            "result": true
          }
        }
      },
      "dimension": "temporal"
    }
  },
  "apply_DUEGA7716B": {
    "process_id": "apply",
    "arguments": {
      "data": {
        "from_node": "reduce_VVIRY8830F"
      },
      "process": {
        "callback": {
          "linear_scale_range_NROJC6099D": {
            "process_id": "linear_scale_range",
            "arguments": {
              "x": {
                "from_argument": "x"
              },
              "inputMin": -1,
              "inputMax": 1,
              "outputMin": 0,
              "outputMax": 255
            },
            "result": true
          }
        }
      }
    }
  },
  "save_result_ELJKH7975C": {
    "process_id": "save_result",
    "arguments": {
      "data": {
        "from_node": "apply_DUEGA7716B"
      },
      "format": "png"
    },
    "result": true
  },
  "filter_bands_SNUEL8205U": {
    "process_id": "filter_bands",
    "arguments": {
      "data": {
        "from_node": "load_collection_IQCXY5622P"
      },
      "bands": [
        "B8"
      ]
    }
  }
}

@soxofaan
Copy link
Member

ok, I understand better now.
made a notebook gist showing current behavior of client: https://gist.github.com/soxofaan/91e31e72a4f1446fbf952e8617158867
the load_collection node is indeed needlessly duplicated when combining multiple bands

@soxofaan
Copy link
Member

This related somewhat to #102 , which is also result of the current way the process graph is built and managed in the client. @jdries and I already discussed better approaches and this merging issue should certainly be part of the exercise

@soxofaan
Copy link
Member

closing this ticket (for now)
main ticket to address the underlying issue: #117

soxofaan added a commit to soxofaan/openeo-python-client that referenced this issue Mar 13, 2020
soxofaan added a commit to soxofaan/openeo-python-client that referenced this issue Mar 16, 2020
…Open-EO#117 Open-EO#125)

- break down `accept` in ProcessGraphVisitor in smaller part to allow
  finegrained overloading
- avoid in-place editing in GraphFlattener
- fix issue Open-EO#107: unnecessary duplicated nodes
- fix handling of aggregate_spatial and apply along the way
soxofaan added a commit to soxofaan/openeo-python-client that referenced this issue Mar 17, 2020
…Open-EO#117 Open-EO#125)

- break down `accept` in ProcessGraphVisitor in smaller part to allow
  finegrained overloading
- avoid in-place editing in GraphFlattener
- fix issue Open-EO#107: unnecessary duplicated nodes
- fix handling of aggregate_spatial and apply along the way
@soxofaan
Copy link
Member

#126 was merged, issue is fixed in the new 1.0 API style DataCube

e.g. see unit test

def test_merge_issue107(con100):
"""https://github.com/Open-EO/openeo-python-client/issues/107"""
s2 = con100.load_collection("S2")
a = s2.filter_bands(['B02'])
b = s2.filter_bands(['B04'])
c = a.merge(b)
flat = c.graph
# There should be only one `load_collection` node (but two `filter_band` ones)
processes = sorted(n["process_id"] for n in flat.values())
assert processes == ["filter_bands", "filter_bands", "load_collection", "merge_cubes"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants