Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect URL endpoints on birds #14

Open
tlvu opened this issue Jan 29, 2020 · 8 comments
Open

Incorrect URL endpoints on birds #14

tlvu opened this issue Jan 29, 2020 · 8 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@tlvu
Copy link
Collaborator

tlvu commented Jan 29, 2020

Migrated from old PAVICS https://github.com/Ouranosinc/PAVICS/issues/98, lots of discussions, too much to summarize. Just action items.

Most (if not all?) GetCapabilities indicate URL endpoints that are invalid on any bird-server (pluvier, colibri, etc.) for multiple bird-services (flyingpigeon, catalog, etc). Seems like the twitcher-url wasn't properly setup when they where deployed.

Ex:

GET
https://pluvier.crim.ca:/twitcher/ows/proxy/flyingpigeon/wps?service=WPS&version=1.0.0&request=GetCapabilities

returns:

[..­.]
<ows:HTTP>
     <ows:Get xlink:href="https://pluvier.crim.ca/ows/proxy/flyingpigeon"/>
     <ows:Post xlink:href="https://pluvier.crim.ca/ows/proxy/flyingpigeon"/>
</ows:HTTP>                               
[..­.]

but the real URL is: (see twitcher)

https://pluvier.crim.ca/twitcher/ows/proxy/flyingpigeon

Action item:

Yes. MagpieAdapter would need to add purl here:
https://github.com/Ouranosinc/Magpie/blob/master/magpie/adapter/magpieservice.py#L76-L78
Then function replace_caps_url in Twitcher should handle the rest further down the processing chain.

@fmigneault fmigneault added the bug Something isn't working label Jan 22, 2021
@fmigneault fmigneault added the good first issue Good for newcomers label Feb 8, 2023
@tlvu
Copy link
Collaborator Author

tlvu commented Sep 20, 2023

https://pywps.readthedocs.io/en/latest/configuration.html

We can try

[server]
url = the URL of the WPS service endpoint

@tlvu
Copy link
Collaborator Author

tlvu commented Sep 20, 2023

@fmigneault Oh you already started working on this in commit fc00111

Will you eventually PR it?

Something still elude me is how does the WPS knows about /ows/proxy/ part but only missing /twitcher in front of /twitcher/ows/proxy/?

@fmigneault
Copy link
Collaborator

@tlvu
I was stuck with the older birds that were using buildout, but since they shouldn't be used anymore, I think that branch could be used after an update of paths for deprecated flyingpigeon.

@fmigneault
Copy link
Collaborator

Something still elude me is how does the WPS knows about /ows/proxy/ part but only missing /twitcher in front of /twitcher/ows/proxy/?

My guess is that it takes the incoming request URL.
After resolution of /twitcher toward the Twitcher service, it itself uses path /ows/proxy/....
The proxied request probably only sees the remaining part 🤷‍♂️

@fmigneault
Copy link
Collaborator

@tlvu

If found that the following patches "work" where the url = ... is applied.

diff --git a/birdhouse/config/raven/docker-compose-extra.yml b/birdhouse/config/raven/docker-compose-extra.yml
index 920f1b6d..0e35da07 100644
--- a/birdhouse/config/raven/docker-compose-extra.yml
+++ b/birdhouse/config/raven/docker-compose-extra.yml
@@ -15,7 +15,10 @@ services:
       PYWPS_CFG: /wps.cfg
       GEO_URL: "${RAVEN_GEO_URL}"
     volumes:
+      # override all paths such that whichever one the code resolves leads to our config!
       - ./config/raven/wps.cfg:/wps.cfg
+      - ./config/raven/wps.cfg:/opt/wps/.custom.wps
+      - ./config/raven/wps.cfg:/opt/wps/raven/default.cfg
       - /tmp
     restart: always
     logging: *default-logging
diff --git a/birdhouse/config/raven/wps.cfg.template b/birdhouse/config/raven/wps.cfg.template
index 3ee21076..9249fd7f 100644
--- a/birdhouse/config/raven/wps.cfg.template
+++ b/birdhouse/config/raven/wps.cfg.template
@@ -1,4 +1,5 @@
 [server]
+url = https://${PAVICS_FQDN_PUBLIC}${TWITCHER_PROTECTED_PATH}/raven
 outputurl = https://${PAVICS_FQDN_PUBLIC}/wpsoutputs/raven
 outputpath = /data/wpsoutputs/raven
 
@@ -13,3 +14,27 @@ level = INFO
 database=postgresql://${POSTGRES_PAVICS_USERNAME}:${POSTGRES_PAVICS_PASSWORD}@postgres/raven
 
 ${EXTRA_PYWPS_CONFIG}
+
+[metadata:main]
+identification_title = Raven
+identification_abstract = Raven offers processes related to hydrological model building, and in particular, the handling of geospatial data and processing.
+identification_keywords = PyWPS, WPS, OGC, processing, birdhouse, raven, demo
+identification_keywords_type = theme
+# identification_fees =
+identification_accessconstraints = https://github.com/bird-house/raven/blob/master/LICENSE.txt
+provider_name = Ouranos
+provider_url = https://github.com/Ouranosic/raven
+contact_name = Ouranos
+contact_position = developer
+# contact_address = NONE
+# contact_city =
+# contact_stateorprovince =
+# contact_postalcode =
+# contact_country =
+# contact_phone =
+# contact_fax =
+# contact_email =
+contact_url = https://github.com/Ouranosic/raven/issues
+# contact_hours =
+contact_instructions = Submit an new issue
+contact_role = originator

However, this part of the code
https://github.com/Ouranosinc/raven/blob/main/raven/cli.py#L80-L94
and
https://github.com/Ouranosinc/raven/blob/main/raven/cli.py#L34-L47
basically enforce the PyWPS [server].url value to be the same as the "local" werkzeug server running the PyWPS app, which fails if it is anything else than the default http://localhost:9099/wps due to how the Docker is configured.

I have tried overriding the https://github.com/Ouranosinc/raven/blob/main/raven/cli.py and https://github.com/Ouranosinc/raven/blob/main/raven/templates/pywps.cfg files to enforce some value combinations, but even then, somewhere in the startup procedure, the werkzeug app process and the pywps server end up using the same URL.

There would be a need to have 2 separate options, but I'm not sure how to set them up properly among all the config shuffling going on in there (and there is a lot with all the .cfg overrides/dynamic rewrites ! 🤯) Probably @Zeitsperre could have a better chance looking at it.

I know however it should be possible because Weaver only loads PyWPS configs here:
https://github.com/crim-ca/weaver/blob/master/weaver/wps/utils.py#L397
(important part about [server].url here: https://github.com/crim-ca/weaver/blob/master/weaver/wps/utils.py#L501)
which are kept separate from the WSGI/Werkzeug app configurations used to start the PyWPS via:
https://github.com/crim-ca/weaver/blob/master/weaver/wps/app.py#L13-L20
while the app runs with:
pserve (gunicorn) bind 0.0.0.0:4001
https://github.com/crim-ca/weaver/blob/master/config/weaver.ini.example#L192-L196

The result is correctly obtained here:
https://hirondelle.crim.ca/weaver/wps?service=WPS&request=GetCapabilities (requires admin)

[...]
<ows:Operation name="GetCapabilities">
  <ows:DCP>
    <ows:HTTP>
      <ows:Get xlink:href="https://hirondelle.crim.ca/weaver/wps" />
    </ows:HTTP>
  </ows:DCP>
</ows:Operation>
[...]

@Zeitsperre
Copy link
Contributor

@fmigneault

I can't say for sure I know where all the server URL overrides occur. For Raven/RavenPy in particular, I know that there are a handful of hard-coded addresses to our GeoServer instance within the code itself (not good practice; I was learning/I'm sorry). There are probably some instances within the Makefile and within the notebooks themselves that can be blamed too.

What I can do is try to convert some of those configurations so that they can be set via environment variables. There's no gurantee that this would fix it but at least it would remove a few potential sources of the problem ? Let me give it a look.

@tlvu
Copy link
Collaborator Author

tlvu commented Sep 21, 2023

https://github.com/Ouranosinc/raven/blob/ecdee95efdaf5a5cc87bff9178c0ad6a36a0ee18/raven/cli.py#L34-L36

def get_host():
    url = configuration.get_config_value("server", "url")
    url = url or "http://localhost:9099/wps"

If config [server.url] is set it should take precedence

https://github.com/Ouranosinc/raven/blob/ecdee95efdaf5a5cc87bff9178c0ad6a36a0ee18/raven/cli.py#L76C1-L81C34

def _run(application, bind_host=None, daemon=False):
    from werkzeug.serving import run_simple

    # call this *after* app is initialized ... needs pywps config.
    host, port = get_host()
    bind_host = bind_host or host

bind_host is initially None so technically the host from get_host() which comes from [server.url] should take precedence.

So maybe the problem is somewhere else?

Does the same work-around also do not work for Finch? If Finch works and not Raven them something is fishy with Raven. If both Finch and Raven do not work, then maybe we are looking at the wrong piece code?

@fmigneault
Copy link
Collaborator

fmigneault commented Sep 21, 2023

I don't think it's a matter of precedence, but that it should be distinct values.
The Werkzeug app should run "localhost" with some bind host/port, while the server.url should be just a value used for output location and responses.
I've tried overriding manually both these locations with hardcoded values, but still was unable to have both working simultaneously.

(note: bind_host is passed via hardcoded command line when inpecting the Docker).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants