Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to document versioning #527

Closed
eduardoboucas opened this issue Jan 8, 2019 · 1 comment
Closed

Improvements to document versioning #527

eduardoboucas opened this issue Jan 8, 2019 · 1 comment

Comments

@eduardoboucas
Copy link
Contributor

eduardoboucas commented Jan 8, 2019

Overview

Currently, API implements document versioning by creating a new copy of a document every time it is updated, storing it – in its entirety – in a history collection. This is quite resource intensive. Furthermore, the method for retrieving previous versions of a document is somewhat limited, as it involves querying a document with includeHistory=true and getting all the previous documents attached to the document in a _history array, leading to a massive response body as the history collection grows.

This is a proposal for storing versions more efficiently, as diffs. Also, it introduces a more convenient method for retrieving previous versions of a document, allowing users to specify which version exactly they want to rollback to, as opposed to getting the full history in bulk.

Storing diffs

When a collection has history enabled (i.e. settings.enableHistory: true), an auxiliary collection is created, with a name defaulting to {collection-name}Versions (e.g. articlesVersions).

Whenever a document is updated, we use JSON Operational Transformations to compute a diff between the before and after states of the document. That diff is stored in the versions collection, alongside a reference to the main document, a timestamp and an optional description message that annotates the reason for the update, similar to a Git commit message.

{
  "_id" : "5c334a9f139c7e48eb44a9bd",
  "_document" : "5c334a60139c7e48eb44a9bb",
  "_createdAt" : 1546865311629,
  "_diff" : [ { "p" : [ "surname" ], "oi" : "Lambie" } ],
  "_description" : "add surname"
}

The main collection always stores the fully-constructed latest version of documents, whilst the versions collection stores a series of diffs that, when applied to the latest state of the document, allow a previous state to be reconstructed. We always work backwards from the current state, meaning that there's no additional overhead required when performing normal queries to documents.

Listing versions

A new /versions endpoint is available, allowing users to get a list of all the previous versions available for a given document.

GET https://api.somedomain.tech/vjoin/testdb/users/5c334a60139c7e48eb44a9bb/versions

{
    "results": [
        {
            "_id": "5c334a68139c7e48eb44a9bc",
            "_createdAt": 1546865256487,
            "_description": "update first name"
        },
        {
            "_id": "5c334a9f139c7e48eb44a9bd",
            "_createdAt": 1546865311629,
            "_description": "add surname"
        }
    ]
}

The _id property uniquely identifies each version, akin to how a SHA identifies a Git commit.

Rolling back to a previous version

To retrieve a specific version of a document, an ID must be added to a normal query as a version URL parameter.

GET https://api.somedomain.tech/vjoin/testdb/users/5c334a60139c7e48eb44a9bb?version=5c334a68139c7e48eb44a9bc

{
    "results": [
        {
            "_apiVersion": "vjoin",
            "_createdAt": 1546865248196,
            "_createdBy": "testClient",
            "_id": "5c334a60139c7e48eb44a9bb",
            "_lastModifiedAt": 1546865311615,
            "_lastModifiedBy": "testClient",
            "name": "first name"
        }
    ],
    "metadata": {
        "limit": 40,
        "page": 1,
        "fields": {},
        "sort": {
            "name": 1
        },
        "offset": 0,
        "totalCount": 1,
        "totalPages": 1,
        "version": "5c334a68139c7e48eb44a9bc"
    }
}

When such a request is made, API is pulling all the versions between the current state and the version with the given ID, applying the corresponding diffs sequentially until the resulting object reflects the state of the document at the time of the given version.

Performing updates

The two existing methods for updating a document remain the same: PUT /collection/document-ID with the update in the body, or PUT /collection with a query and update properties in the body. We're now extending the latter to allow users to specify a message that describes the update, which will get stored alongside the diff in the versions collection.

PUT https://api.somedomain.tech/vjoin/testdb/users

{
  "query": {
    "_id": "5c334a60139c7e48eb44a9bb"
  },
  "update": {
    "name": "Superman",
    "surname": null
  },
  "description": "Update name and remove surname"
}

This could be leveraged by a consumer application, like Publish, to provide a more meaningful history of a document.

@eduardoboucas
Copy link
Contributor Author

Closed via #532.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant