Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for getting diffs #60

Merged
merged 5 commits into from
Jun 1, 2017
Merged

Add API for getting diffs #60

merged 5 commits into from
Jun 1, 2017

Conversation

Mr0grog
Copy link
Member

@Mr0grog Mr0grog commented May 31, 2017

Fixes #8.

This is currently implemented on top of @b5’s simple diff service: https://github.com/edgi-govdata-archiving/go-calc-diff
…but should theoretically work with any HTTP differ that accepts requests with two query arguments:

  • a: A URL to the content of the starting version
  • b: A URL to the content of the ending version

e.g: http://diff-it.com?a=http%3A%2F%2Fexample.com%2Fv1&b=http%3A%2F%2Fexample.com%2Fv2

Diffs are retrieved by requesting the URL:

/api/v0/pages/{page_id}/changes/{version_a_id}..{version_b_id}/diff/{diff_type}?{query_params}

(Meant to mirror the annotations API)

query_params are simply passed along to the underlying diff service. For example, go-calc-diff can take a format argument that determines how the response is formatted (JSON, HTML document, HTML snippet):

/api/v0/pages/xxx/changes/yyy..zzz/diff/source?format=json

That returns a response like:

{
  "data": {
    "page_id": "3f28a7c3-65f2-481e-9722-efccc1f38462",
    "from_version_id": "51896700-93d7-42df-8a3c-aa0282e1c8e8",
    "to_version_id": "16d82acd-01ef-4e39-a5d2-ab11daa6540e",
    "diff_service": "DIFF SERVICE NAME",
    "diff_service_version": "?",
    "content": "RESPONSE FROM DIFF SERVICE HERE"
  }
}

If the response was JSON data, content will actually an object. Otherwise, it will be a string.

We only support one diff_type right now (source), but the system is designed to be expandable. Under the hood, there is a registry that maps diff type names to actual diff service implementations. Implementations only need to respond to one method:

def diff (change_model, options = {})
  # return a string or hash that will be used in the `content` field
end

For now, we just have one SimpleDiff class, which is instantiated with a URL. It can be used to interact with any diff service that matches the same protocol as go-calc-diff.

@Mr0grog
Copy link
Member Author

Mr0grog commented May 31, 2017

Hoping to get this worked out before Thursday so Global Sprinters have a working example of diffing.

@Mr0grog
Copy link
Member Author

Mr0grog commented May 31, 2017

Side note: this includes a wonky patch for an encoding issue in HTTParty (jnunemaker/httparty#542), which was also causing import issues (if raw version content had to be parsed, which is not the case for most imports).

require_dependency 'differ/simple_diff'

if ENV['DIFFER_SOURCE']
Differ.register(:source, Differ::SimpleDiff.new(ENV['DIFFER_SOURCE']))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be too clever to allow arbitrary many differs with arbitrary names to be configured through ENV by picking up all vars with names that match DIFFER_*? Or will new differs likely need something more complex that SimpleDiff? Offhand I'm not sure what they would need so the approach I'm floating seems doable -- but maybe less than clean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow arbitrary many differs with arbitrary names to be configured through ENV by picking up all vars with names that match DIFFER_*?

Ha! I thought about the same idea and then said “nah, let’s not get ahead of ourselves.” I think I would at least change the var to DIFFER_SIMPLE_* if we did that.

I can imagine some differs being more complex (or maybe SimpleDiff will just grow)—we can‘t meaningfully diff PDFs with this, for example, but we really should eventually have that capability. There’s a lot of PDF content that is “tracked” but has no real tooling to support analyzing it. Versionista is also hitting some audio files and potentially things like CSVs. I haven't really done a full audit of content types. By nature, I’m sure PF is grabbing who-knows-what variety of content types.

Copy link
Contributor

@danielballan danielballan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one stray single comment. I read this through twice and the tests look good. I agree with the general API. Let's merge and iterate.

Similar route to annotations, since both are about a change between two versions:

  GET /api/v0/page/{page id}/changes/{from id}..{to id}/diff/{diff type}
There is a registry of diffing services that can be set with:

    Differ.register(type, service)

And retrieved with:

    Differ.for_type(type) # returns nil for unknown type
    Differ.for_type!(type) # raises for unknown type

A diffing service need only respond to the `diff(change_model, options_hash)` method.

Right now there is only one diffing service: SimpleDiff. It's configurable with a URL and simply requests the URL with the query args:
- `a`: the uri for the content of the start version
- `b`: the uri for the content of the end version
It turns out HTTParty has a pretty bad bug with response encodings: jnunemaker/httparty#542
This is hopefully just a temporary patch.
Make Rubocop happy. This syntax is only meaningfully different when modules are used with classes, which we aren't doing. For our use, they're equivalent.
@Mr0grog
Copy link
Member Author

Mr0grog commented Jun 1, 2017

Merging so we’ve got something for Global Sprint. Once we’ve got more diff services, we can come back around and consider iterating through DIFFER_SIMPLE_* to auto-create differ instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants