Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving metadata handling #1669

Closed
erquhart opened this issue Aug 24, 2018 · 5 comments · Fixed by #3292 or #3316
Closed

Improving metadata handling #1669

erquhart opened this issue Aug 24, 2018 · 5 comments · Fixed by #3292 or #3316

Comments

@erquhart
Copy link
Contributor

erquhart commented Aug 24, 2018

Below is a short proposal on how to make our metadata handling less error prone by using it less and making it more obvious. The purpose of this proposal is to get feedback from the community.

Overview

Netlify CMS has a few kinds of metadata, and they're all called "metadata", unfortunately. This issue deals with the highest level of metadata, which is used to provide state for the editorial workflow and is only supported by the GitHub backend (currently).

This metadata is kept in a _netlify_cms prefixed orphan ref and consists of a separate json file for each editorial workflow item.

How it works

A current metadata entry will look something like this (prettified):

{
  "type": "PR",
  "pr": {
    "number": 1514,
    "head": "7189948a4a21811bd6aea42f356084e0b9245760"
  },
  "user": "Phil Hawksworth",
  "status": "draft",
  "branch": "cms/netlify-cms-2-0-launches-with-bitbucket-support-and-a-new-monorepo-architecture",
  "collection": "blog",
  "title": "Netlify CMS 2.0 launches with BitBucket support and a new monorepo architecture",
  "description": "Announcing the release of Netlify CMS v2.0, with new BitBucket support and an improved project architecture designed to ease contribution and the extension of features.",
  "objects": {
    "entry": {
      "path": "website/site/content/blog/netlify-cms-2-0-launches-with-bitbucket-support-and-a-new-monorepo-architecture.md",
      "sha": "a16615141244f35717f8c97c163b1d5dfc0d8241"
    },
    "files": []
  },
  "timeStamp": "2018-07-25T17:59:26.494Z"
}

Netlify CMS knows which editorial workflow entries exist by checking for branches prefixed with cms/ and then checking for corresponding metadata files that look like the one above. Most of the metadata above is just copied for GitHub's API response for the pull request's data.

What needs fixin'

The problem with this approach is that the metadata files must be kept in sync with the pull requests they represent. We've seen this break in two ways:

  1. Someone manually edits a pull request
  2. A bug is introduced in metadata handling

Manual pull request edits should not cause any issues at all, and bugs in metadata handling shouldn't be able to cause problems that a subsequent fix can't recover from.

Proposal

It'd be cool if we could:

  1. track critical info (like sha, branch name, etc) in only one place
  2. use inferred metadata when possible
  3. use explicit metadata that even makes sense apart from Netlify CMS when inferred isn't an option
  4. limit hidden metadata that's hyper-specific to Netlify CMS to non-critical data that can be automatically regenerated

Also keep in mind that, while this kind of metadata currently only serves the editorial workflow, it could serves lots of purposes in the future.

Inferred metadata

Most of our current metadata can be inferred direct from a given PR. Inferring metadata for a workflow entry in GitHub, for example, can look like:

  1. Get all open pull requests with branches starting with cms/
  2. Filter out any with a base branch other than the CMS configured branch

That's it! The pull request data gives us what we need to infer the collection and title, and there's no metadata to keep synced. The only thing we can't track this way is workflow status.

Explicit metadata

Arbitrary data such as editorial workflow entry status can be handled by an explicit metadata concept we can call "annotations". For now I'd expect this annotations concept to only apply to unpublished entries. In GitHub, they would ideally be expressed as pull request labels. By default we'd use something like netlifycms/draft, netlifycms/review, etc. A user could manually add and remove these labels from GitHub without consequence, and the CMS can automatically fix any idiosyncrasies it finds (like having two conflicting status labels). These labels could also be customized via config.

The risk of someone changing a label should not be viewed as a risk at all - it's a feature. If this kind of ephemeral metadata is lost, the damage is limited as all unpublished entries are still in place, and only need their statuses updated. It's also not expected that annotations would be easily lost in the first place.

Hidden metadata

Hidden metadata shouldn't be necessary yet, but it would apply for performance operations like creating and caching thumbnails, because Netlify CMS can recreate and push up new thumbnails if they ever disappear. That kind of metadata doesn't need to be explicit or visible because it isn't critical.

@erquhart
Copy link
Contributor Author

cc/ @Benaiah @tech4him1 @talves

@timaschew
Copy link

That makes lot of sense. I mean not creating commits to update meta data.
Labels seems to be a good place.

In case you need to store more complex data, you could use the description of the pull/merge request.
I've used this field to store some data for release notes, I've wrapped them into a front matter, this works for GitLab, but for GiHub front matter in the pull request doesn't work, but using code syntax would also work.

@Benaiah
Copy link
Contributor

Benaiah commented May 22, 2019

To @timaschew's point, there is a technique you can use to append data to a PR description which is visible only when editing the description. In fact, we're already using it in our templates - HTML comments. A tagged JSON object in an HTML comment in the (already auto-generated) PR description would seem to be a good fit for storing metadata. It's visible to the users when necessary, but hidden when irrelevant. The amount of metadata we'd need to store would be reduced if it's stored directly on the PR, since a lot of the existing metadata only serves to help identify which PR corresponds to each unpublished entry, or is an outright duplicate of information already provided by requesting the PR (e.g. usernames when using the github backend). There would be no irrecoverable errors, since state would be explicitly editable (and each version of the description is already stored by GitHub). This is in stark contrast to the current implementation - finding the orphan ref in the GitHub UI is a chore even when you know it's possible, and recovering to the right state is not straightforward.

We may have to introduce more HTTP requests to check cached things like the title and description if we want to minimize the amount of data we store in the PR description, but we will also be able to eliminate many requests for metadata since the metadata will already be in the PR description.

Additionally, for PRs from forks, descriptions (as well as the contents of branches in PRs with "allow edits by maintainers" checked) have the unique property of being editable by both the user who created them and the maintainers of the repo. This is not true of labels, which are only editable by users who have write access to the repo.

The only case this approach doesn't address is unpublished changes without a corresponding PR, which don't exist currently and will only exist in the fork workflow (drafts in the fork workflow don't create PRs right away). I think this case can be handled either with an approach similar to our current metadata (but much simplified) or we can simply infer or request all the required metadata for that edge case.

@heyakyra
Copy link

heyakyra commented Jul 2, 2019

Eager to see this pave the way for #568

@stale
Copy link

stale bot commented Oct 29, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment