Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request that OS-Climate ODH host a datasette instance #418

Closed
MichaelClifford opened this issue Oct 11, 2021 · 12 comments
Closed

Request that OS-Climate ODH host a datasette instance #418

MichaelClifford opened this issue Oct 11, 2021 · 12 comments

Comments

@MichaelClifford
Copy link
Member

Please see os-climate/os_c_data_commons#79 for full details.

In short. We would like to host an instance of Datasette on the os-climate cluster. What would be the best approach to make this happen?

@MichaelTiemannOSC
Copy link

MichaelTiemannOSC commented Oct 11, 2021

On this page of the Datasette documentation is given for creating a docker container: https://docs.datasette.io/en/stable/installation.html#installation

I'm sure that if there were a good OpenShift recipe, @simonw could add that to the documentation.

Note that Datasette's concept is to be a per-database HTTPS server, meaning that if I want to browse a number of databases, I want to open a number of Datasette instances.

@4n4nd
Copy link

4n4nd commented Oct 11, 2021

cc @erikerlandson @HumairAK

@MichaelTiemannOSC
Copy link

MichaelTiemannOSC commented Oct 11, 2021

Actually, datasette is installed via pip install. And it's not thrilled about trying to connect jupyter-server 1.6.4 with anyio 3.3.3. There's a bit of a dance I can do manually to downgrade enough things to be compatible. But where should I actually file this as an issue?

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-server 1.6.4 requires anyio<3,>=2.0.2, but you have anyio 3.3.3 which is incompatible.

And what are the chances that this is fixed but that the dependencies are not yet: jupyter-server/jupyter_server#490

@MichaelTiemannOSC
Copy link

I attempted to install a docker image:

https://hub.docker.com/r/datasetteproject/datasette/

It seemed to go well, but the image resulted in a non-responsive application:

https://console-openshift-console.apps.odh-cl1.apps.os-climate.org/k8s/ns/wri-pp/services/datasette/

@erikerlandson
Copy link

erikerlandson commented Oct 12, 2021

https://console-openshift-console.apps.odh-cl1.apps.os-climate.org/k8s/ns/wri-pp/pods/wri-pp-demo-git-54794f4cc4-s6wq2/logs

@MichaelTiemannOSC what was your oc command (and/or yaml)? Did you specify the command to run, either via something like oc run ... --command .... or in a yaml file?

Also, what details do we know about this image https://hub.docker.com/r/datasetteproject/datasette/ ? Is it designed to run using an anonymous random UID?

@MichaelTiemannOSC
Copy link

MichaelTiemannOSC commented Oct 12, 2021

I'm lacking key skills in the container area. I see this in the log file:

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)

However, I don't know how to upload a database file to the openshift environment and I don't know why the route I get (which is just an http: not https: route) doesn't reach the application and bring up a webserver that says hello. It's all well and good that datasette is running on localhost, but the URL I'm trying to reach is http://datasette-wri-pp.apps.odh-cl1.apps.os-climate.org/ and that doesn't have a very 8001-looking port.

@MichaelTiemannOSC
Copy link

MichaelTiemannOSC commented Oct 12, 2021

Here's what I think I need to finish this task:

  • A docker file for datasette (needs to be in GitHub so it can be checked out of GitHub) (does this need to explicitly EXPOSE 8001 or is this handled in routing info?). The docker file does the pull of datasetteproject/datasette
  • A YAML file that has the datasette command [ 'serve', '/mnt/corp data.db', '--cors', '--setting base_url /corp-datasette/', '-p 8001 -h 0.0.0.0', '--setting sql_time_limit_ms 500', '--setting default_facet_size 60', '--setting facet_time_limit_ms 500', '--metadata /mnt/corp-metadata.json' ] (or similar). I cannot add command parameters after creating a pod.
  • A volume loaded with corp data.db
  • Mounting the volume to the pod's /mnt directory (which is referenced in the command parameter)

Am I missing anything?

@HumairAK
Copy link
Member

Okay not too familiar with datasette, but as I understand it, we just need to openshift-ify this image, meaning just create a deployment/route/etc.

Can someone expand on the *.db file vol, is this something that will need to be updated frequently? does datasette provide a ui mechanism to update it? Also can I get access to the one to mount starting out?

@MichaelTiemannOSC
Copy link

The corp data.db file is in this directory: https://github.com/os-climate/corp_data_browser-ingestion-pipeline/tree/main/data/raw

@MichaelTiemannOSC
Copy link

MichaelTiemannOSC commented Oct 12, 2021

In a perfect world the corp data.db file will live in a place where it is as easy for the ingest notebook to overwrite it as it is for the datasette instance to read it. Some kind of stand-alone storage space that can be attached to both environments for their respective purposes. datasette is a read-only SQL server that formats its outputs in browser-friendly ways.

In this particular case we also want to drag along corp-metadata.json and pint-definitions.txt to that middle-ground storage location, with notebooks able to update and datasette able to read.

@MichaelTiemannOSC
Copy link

Well, I did everything I thought I needed to do and I still don't have a working datasette. My URL is here:

http://datasette-datasette-osc.apps.odh-cl1.apps.os-climate.org/

I have a PVC which has the appropriate files sitting in /mnt using a PVC. I think I am really close, but not quote there.

@MichaelTiemannOSC
Copy link

There were two final impediments to completing the task:

  1. Despite the fact that openshift offered to provide a publicly exposed port to the application, the container remained unexposed until I manually edited the container's YAML file.
  2. I needed to put this into my Docker file on GitHub:

ENTRYPOINT /usr/local/bin/datasette serve "/mnt/corp data.db" --cors --setting base_url /corp-datasette/ -p 8001 -h 0.0.0.0 --setting sql_time_limit_ms 500 --setting default_facet_size 60 --setting facet_time_limit_ms 500 --metadata /mnt/corp-metadata.json

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants