Skip to content
This repository has been archived by the owner on May 27, 2022. It is now read-only.

Initial implementation design #1

Open
jonathansick opened this issue Mar 15, 2017 · 4 comments
Open

Initial implementation design #1

jonathansick opened this issue Mar 15, 2017 · 4 comments

Comments

@jonathansick
Copy link
Member

jonathansick commented Mar 15, 2017

Ticket: DM-9818

Background

The purpose of DocHub is make LSST's information artifacts, which are currently spread across many platforms, available and searchable from a single website. I did some research on DocHub in https://sqr-013.lsst.io last November, and that technote will provide useful background on what DocHub will (hopefully) become. But what you'll be building here is an initial prototype for DocHub. Rather than a sophisticated API+React app with JSON-LD metadata modeling, what we're looking for here is:

  • A static website published with LSST the Docs to the www product so that its URL will be www.lsst.io (we can alias lsst.io to www.lsst.io too).
  • There's no need for persistence yet in building the initial static site; all data can be obtained during build time from the keeper.lsst.codes API and from metadata.yaml files in the GitHub repositories of projects.
  • The ltd-dasher project is similar to what you'll build here (Jinja2 templates, with data populated from APIs), except that there's no need to make dochub-prototype a server application (at least at this stage). LTD Keeper doesn't need to trigger a DocHub rebuild everytime a new LSST the Docs build is pushed. I think that hourly builds will be sufficient. The reason I'm cautious about making this a server app is because the build will take a significant amount of time, so any client would timeout unless we build a background task queue. But if we design the entire thing to run as an asynchronous job that can be triggered by a cron or launched as a Kubernetes Job, then we get that task queue feature for free.

Python package

I think the core implementation can just be a standard Python package dochubproto (it can even be deployed to PyPI). Inside the package will be a templates directory with the Jinja2 templates and Python modules that handle website rendering (getting data from APIs and actually rendering the templates).

There can be a dochub-render.py executable for triggering a render.

Like ltd-dasher, you can use ltd-conveyor to upload the built HTML/CSS/whatever to LSST the Docs with all the appropriate caching headers.

Dockerizing

If you want, you can Dockerize and deploy dochub-prototype with Kubernetes. I was thinking of doing this as a Kubernetes Job resource. Once CronJob is available we can switch to that. The nice thing about this is then we could build a lightweight api.lsst.codes microservice that triggers a DocHub rebuild by just deploying the DocHub manifest. Again, this help prevent us from building our own task queue with celery.

If you can set up a Jenkins job or Travis Job to run this every hour that's great. But I think we can still close the epic without nailing the operational infrastructure 100%.

The index.html information content and API sources

The MVP for the sqre-s17-doceng epic is to list all technotes on www.lsst.io. We could also list LDMs and user guides (pipelines.lsst.io, firefly.lsst.io, developer.lsst.io, ltd-keeper.lsst.io, ltd-mason.lsst.io, ltd-conveyor.lsst.io) but I think that shipping just a list of DMTN, SQRs, and SMTNs would be sufficient and also useful.

Without getting into front-end design, you can treat the DMTN, SQR and SMTN sections (either all on the homepage, or as separate HTML pages) as ul lists of technote template partials.

The template partials should provide the following information for each technote:

  • Title (without the handle) (either from keeper.lsst.codes or metadata.yaml)
  • The document handle (either from keeper.lsst.codes or metadata.yaml)
  • The URL (from keeper.lsst.codes)
  • The GitHub repo URL (from keeper.lsst.codes)
  • Link to the edition dashboard (compute as https://product.lsst.io/v). For bonus points, use the GitHub API to state whether there are open PRs.
  • Date last updated (from keeper.lsst.codes)
  • The author list (from metadata.yaml)
  • The description (from metadata.yaml, if available)

Getting data from keeper.lsst.codes is straightforward as you know. You can use the GitHub API to obtain the metadata.yaml file from technote repositories.

One trick is that not all technotes are on LSST the Docs. Some of the originals are on Read the Docs, but still have metadata.yaml files. You can either work around that, or (probably better) just list technotes in LSST the Docs and I'll actually get around to porting the old technotes over.

@athornton
Copy link
Member

Looks straightforward enough. I will dive into implementing it tomorrow.

@athornton
Copy link
Member

Date last updated is not available from the main document fetch at keeper. So I am omitting it for now, because as it stands we only need the get-all-the-products GET and then two GETs per product: one to keeper and one to raw.githubusercontent.com (thus we avoid the overhead of authenticating to GH or using their API at all, presuming the repos are public).

@jonathansick
Copy link
Member Author

@athornton You can use the keeper.lsst.codes API to find the main Edition for a product, and then use the date_rebuilt field of the edition. https://ltd-keeper.lsst.io/editions.html#get--editions-(int-id)

But if you want to avoid adding two more GETs per product at this point that's fine.

@jonathansick
Copy link
Member Author

GraphQL will make this so much easier!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants