Improving metadata handling #1669

erquhart · 2018-08-24T21:32:12Z

Below is a short proposal on how to make our metadata handling less error prone by using it less and making it more obvious. The purpose of this proposal is to get feedback from the community.

Overview

Netlify CMS has a few kinds of metadata, and they're all called "metadata", unfortunately. This issue deals with the highest level of metadata, which is used to provide state for the editorial workflow and is only supported by the GitHub backend (currently).

This metadata is kept in a _netlify_cms prefixed orphan ref and consists of a separate json file for each editorial workflow item.

How it works

A current metadata entry will look something like this (prettified):

{
  "type": "PR",
  "pr": {
    "number": 1514,
    "head": "7189948a4a21811bd6aea42f356084e0b9245760"
  },
  "user": "Phil Hawksworth",
  "status": "draft",
  "branch": "cms/netlify-cms-2-0-launches-with-bitbucket-support-and-a-new-monorepo-architecture",
  "collection": "blog",
  "title": "Netlify CMS 2.0 launches with BitBucket support and a new monorepo architecture",
  "description": "Announcing the release of Netlify CMS v2.0, with new BitBucket support and an improved project architecture designed to ease contribution and the extension of features.",
  "objects": {
    "entry": {
      "path": "website/site/content/blog/netlify-cms-2-0-launches-with-bitbucket-support-and-a-new-monorepo-architecture.md",
      "sha": "a16615141244f35717f8c97c163b1d5dfc0d8241"
    },
    "files": []
  },
  "timeStamp": "2018-07-25T17:59:26.494Z"
}

Netlify CMS knows which editorial workflow entries exist by checking for branches prefixed with cms/ and then checking for corresponding metadata files that look like the one above. Most of the metadata above is just copied for GitHub's API response for the pull request's data.

What needs fixin'

The problem with this approach is that the metadata files must be kept in sync with the pull requests they represent. We've seen this break in two ways:

Someone manually edits a pull request
A bug is introduced in metadata handling

Manual pull request edits should not cause any issues at all, and bugs in metadata handling shouldn't be able to cause problems that a subsequent fix can't recover from.

Proposal

It'd be cool if we could:

track critical info (like sha, branch name, etc) in only one place
use inferred metadata when possible
use explicit metadata that even makes sense apart from Netlify CMS when inferred isn't an option
limit hidden metadata that's hyper-specific to Netlify CMS to non-critical data that can be automatically regenerated

Also keep in mind that, while this kind of metadata currently only serves the editorial workflow, it could serves lots of purposes in the future.

Inferred metadata

Most of our current metadata can be inferred direct from a given PR. Inferring metadata for a workflow entry in GitHub, for example, can look like:

Get all open pull requests with branches starting with cms/
Filter out any with a base branch other than the CMS configured branch

That's it! The pull request data gives us what we need to infer the collection and title, and there's no metadata to keep synced. The only thing we can't track this way is workflow status.

Explicit metadata

Arbitrary data such as editorial workflow entry status can be handled by an explicit metadata concept we can call "annotations". For now I'd expect this annotations concept to only apply to unpublished entries. In GitHub, they would ideally be expressed as pull request labels. By default we'd use something like netlifycms/draft, netlifycms/review, etc. A user could manually add and remove these labels from GitHub without consequence, and the CMS can automatically fix any idiosyncrasies it finds (like having two conflicting status labels). These labels could also be customized via config.

The risk of someone changing a label should not be viewed as a risk at all - it's a feature. If this kind of ephemeral metadata is lost, the damage is limited as all unpublished entries are still in place, and only need their statuses updated. It's also not expected that annotations would be easily lost in the first place.

Hidden metadata

Hidden metadata shouldn't be necessary yet, but it would apply for performance operations like creating and caching thumbnails, because Netlify CMS can recreate and push up new thumbnails if they ever disappear. That kind of metadata doesn't need to be explicit or visible because it isn't critical.

The text was updated successfully, but these errors were encountered:

erquhart · 2018-08-24T21:37:31Z

cc/ @Benaiah @tech4him1 @talves

timaschew · 2019-03-11T21:36:25Z

That makes lot of sense. I mean not creating commits to update meta data.
Labels seems to be a good place.

In case you need to store more complex data, you could use the description of the pull/merge request.
I've used this field to store some data for release notes, I've wrapped them into a front matter, this works for GitLab, but for GiHub front matter in the pull request doesn't work, but using code syntax would also work.

Benaiah · 2019-05-22T21:34:33Z

To @timaschew's point, there is a technique you can use to append data to a PR description which is visible only when editing the description. In fact, we're already using it in our templates - HTML comments. A tagged JSON object in an HTML comment in the (already auto-generated) PR description would seem to be a good fit for storing metadata. It's visible to the users when necessary, but hidden when irrelevant. The amount of metadata we'd need to store would be reduced if it's stored directly on the PR, since a lot of the existing metadata only serves to help identify which PR corresponds to each unpublished entry, or is an outright duplicate of information already provided by requesting the PR (e.g. usernames when using the github backend). There would be no irrecoverable errors, since state would be explicitly editable (and each version of the description is already stored by GitHub). This is in stark contrast to the current implementation - finding the orphan ref in the GitHub UI is a chore even when you know it's possible, and recovering to the right state is not straightforward.

We may have to introduce more HTTP requests to check cached things like the title and description if we want to minimize the amount of data we store in the PR description, but we will also be able to eliminate many requests for metadata since the metadata will already be in the PR description.

Additionally, for PRs from forks, descriptions (as well as the contents of branches in PRs with "allow edits by maintainers" checked) have the unique property of being editable by both the user who created them and the maintainers of the repo. This is not true of labels, which are only editable by users who have write access to the repo.

The only case this approach doesn't address is unpublished changes without a corresponding PR, which don't exist currently and will only exist in the fork workflow (drafts in the fork workflow don't create PRs right away). I think this case can be handled either with an approach similar to our current metadata (but much simplified) or we can simply infer or request all the required metadata for that edge case.

heyakyra · 2019-07-02T03:59:47Z

Eager to see this pave the way for #568

stale · 2019-10-29T07:46:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

erquhart mentioned this issue Dec 4, 2018

Pull request creation and link in editor UI #1929

Open

erquhart mentioned this issue Dec 17, 2018

feat(backend-github): workflow improve metadata handling #1961

Closed

erquhart mentioned this issue Mar 11, 2019

Editorial Workflow support for GitLab #1817

Closed

erquhart mentioned this issue Jun 3, 2019

Editorial workflow breaks if a branch named "cms" exists #2331

Closed

stale bot added the wontfix label Oct 29, 2019

erezrokah added status: stale and removed wontfix labels Oct 29, 2019

erquhart added kind: discussion pinned and removed status: stale labels Nov 8, 2019

erezrokah mentioned this issue Nov 14, 2019

fix(backend-github): prepend collection name #2878

Merged

This was referenced Dec 18, 2019

Manage any PR with editable content through editorial workflow #2977

Open

Widget validation: unique constraint #1069

Open

erezrokah mentioned this issue Dec 23, 2019

Cannot save changes (API_ERROR: Reference update failed - "UNPUBLISHED_ENTRY_PERSIST_FAILURE") #2544

Closed

erezrokah mentioned this issue Jan 12, 2020

Feat: editorial workflow bitbucket gitlab #3014

Merged

6 tasks

erezrokah self-assigned this Feb 4, 2020

This was referenced Feb 20, 2020

Feat: Align GitHub metadata handling with other backends #3292

Merged

feat(github): allow issue/pr labels api netlify/git-gateway#51

Merged

erquhart closed this as completed in #3292 Feb 22, 2020

erezrokah reopened this Feb 24, 2020

erezrokah mentioned this issue Feb 24, 2020

Feat: Align GitHub metadata handling with other backends #3316

Merged

erquhart closed this as completed in #3316 Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving metadata handling #1669

Improving metadata handling #1669

erquhart commented Aug 24, 2018 •

edited

Loading

erquhart commented Aug 24, 2018

timaschew commented Mar 11, 2019

Benaiah commented May 22, 2019 •

edited

Loading

heyakyra commented Jul 2, 2019

stale bot commented Oct 29, 2019

Improving metadata handling #1669

Improving metadata handling #1669

Comments

erquhart commented Aug 24, 2018 • edited Loading

Overview

How it works

What needs fixin'

Proposal

Inferred metadata

Explicit metadata

Hidden metadata

erquhart commented Aug 24, 2018

timaschew commented Mar 11, 2019

Benaiah commented May 22, 2019 • edited Loading

heyakyra commented Jul 2, 2019

stale bot commented Oct 29, 2019

erquhart commented Aug 24, 2018 •

edited

Loading

Benaiah commented May 22, 2019 •

edited

Loading