Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP APIs that accept binary data as query string args #7958

Open
achingbrain opened this issue Mar 2, 2021 · 10 comments
Open

HTTP APIs that accept binary data as query string args #7958

achingbrain opened this issue Mar 2, 2021 · 10 comments
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/ready Ready to be worked topic/rpc-api Issues related to Kubo RPC API at /api/v0

Comments

@achingbrain
Copy link
Member

#7939 exposed two issues with the HTTP API. A solution for one was proposed, the other was discussed at triage and I wanted to capture the scope of the work here.

Some of our API endpoints accept query string arguments that contain strings which will be interpreted as binary data. This causes problems for browsers and other JS clients in that the built in URL encoding functions accept strings and do UTF8 multi-byte code point conversions for byte values above 127/0x7F, so 0x80 becomes %EF%BF%BD instead of %80.

Instead we should allow sending multibase encoded strings for these arguments removing any ambiguity over their contents. The proposal was to add an additional flag that indicates to the API that the received string should be interpreted as a multibase encoded string and not a byte array.

I've done a quick pass over the HTTP API docs and these endpoints currently interpret an argument as a byte array, the good news is there aren't many:

/api/v0/dht/findpeer - arg needs encoding
/api/v0/dht/findprovs - arg needs encoding
/api/v0/dht/provide - arg needs encoding
/api/v0/dht/put - arg needs encoding
/api/v0/dht/query - arg needs encoding
/api/v0/pubsub/pub - second arg needs encoding (also accepts payload in message body?)

These endpoints accept arguments that by convention would be B58 encoded multibase strings containing CIDs, but it is not explicit in the docs so could use a note saying the value 'should be an IPFS path or multibase encoded string containing a CID' or similar depending on context.

/api/v0/filestore/ls - arg may need encoding note?
/api/v0/filestore/verify - arg may need encoding note?
/api/v0/id - arg may need encoding note?
/api/v0/name/resolve - arg may need encoding note?
/api/v0/object/diff - arg may need encoding note?
/api/v0/object/patch/add-link - arg may need encoding note?
/api/v0/object/patch/append-data - arg may need encoding note?
/api/v0/object/patch/rm-link - arg may need encoding note?
/api/v0/object/patch/set-data - arg may need encoding note?
/api/v0/ping - arg may need encoding note?
/api/v0/stats/bw - peer may need encoding note?
/api/v0/swarm/connect - arg may need encoding note?
/api/v0/swarm/disconnect - arg may need encoding note?

There may be other affected endpoints.

@achingbrain achingbrain added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Mar 2, 2021
@BigLep
Copy link
Contributor

BigLep commented Mar 29, 2021

@Stebalien : to link to the relevant project proposals that would fix the right way.

@lidel lidel added the topic/rpc-api Issues related to Kubo RPC API at /api/v0 label Mar 29, 2021
@Stebalien Stebalien added need/analysis Needs further analysis before proceeding kind/enhancement A net-new feature or improvement to an existing feature and removed need/triage Needs initial labeling and prioritization kind/bug A bug in existing code (including security flaws) labels Apr 13, 2021
@Stebalien
Copy link
Member

The tricky part is mixing multibase-encoded arguments along with non-multibase-encoded arguments.

A possible solution is to reserve something like @... to mean "the following is multibase encoded". That would be a breaking change, but might be less terrible than some of the other options.

@aschmahmann
Copy link
Contributor

but might be less terrible than some of the other options.

There are a bunch of options available here. One of these perhaps more terrible options, that's similar to the proposal above but trades off verbosity for transparency is having binary options get two flags --field-name and one of {--field-name-binary, --field-name-multibase, or --field-name-uses-multibase}. We'd need to do something similar (although perhaps not exactly the same) for arguments too.

@mikeal
Copy link

mikeal commented Apr 13, 2021

Whatever we go with, it would be good to put “multibase” in the name of the option/feature. Implying multibase in options marked “binary” is going to be more confusing for users, if we say “multibase” they’ll know what to Google to figure out what that means.

@lidel
Copy link
Member

lidel commented Apr 13, 2021

The tricky part is mixing multibase-encoded arguments along with non-multibase-encoded arguments.

I don't think we should allow mixing. When it comes to /api/v0/pubsub/pub both parameters should be in multibase because people use blobs in topic names as well (pretty sure I've seen it in the wild a while ago).

In the past we solved a variant of this problem in /api/v0/object/get by adding support for &data-encoding=base64, but it was scoped to a specific field in request body (object/get.js).

food for thought

To remove surface for errors like this from the entire HTTP API we need to add support for multibase envelope to arg params AND body in both request AND response. Not sure if we can do less, and avoid the problem coming back in the future.

proposal: single flag

I suggest a single flag &multibase-foobar=base64url (multibase-rpc? multibase-io? multibase-envelope?)
that informs the /api/v0/ endpoint that:

  • all arg parameters and body present in request need to be unwrapped using multibase
  • response body should also be wrapped in multibase envelope (using encoding from the flag)

Flag would be off by default (to maintain backward-compatibility), but js-ipfs-http-client should use this flag by default on /api/v0/pubsub/pub and similar endponts, just like it already does in object/get.js, quietly solving the problem for everyone who uses the lib. For everyone else experiencing issues when pushing binary instead of text, we would add a section at the top of https://docs.ipfs.io/reference/http/api/ (by updating ipfs/http-api-docs).

ps. Does not need to be base64url, but it is space-efficient and the safest default, so we should use it in examples and docs.
There is a potential footgun with base64 when used in URL (and we would use it in arg). – we should detect it and return an error suggesting base64url instead.

@achingbrain
Copy link
Member Author

How about:

proposal: use versioning

Make it /api/v1/pubsub/pub, /api/v1/dht/put etc and demand multibase for all binary sent on the query string or as fields in JSON instead of adding additional arguments which add complexity and the potential for mis-use.

No need to rev the version number for all endpoints at once, just a few at a time as and when they need upgrading.

@lidel
Copy link
Member

lidel commented Apr 14, 2021

Versioning via /api/v1 would also work, but introduction of a new API endpoint root not only complicates our code, but also complicates implementation of http client libraries in other languages, and may even require people to update their reverse proxy configs (if they only exposed v0).

At the end of the day we want to provide an easy fix for people, and a new query param feels like less work for the end user of the API: one can add it to the map passed to the API call, in some languages without even modifying the underlying library.

Not feeling strongly about this, but I feel we should pick the least expensive option for the existing ecosystem.

@achingbrain
Copy link
Member Author

achingbrain commented Apr 14, 2021

introduction of a new API endpoint root not only complicates our code

Kind of the opposite - I think it simplifies our code as with fewer options you have fewer branches to test. The more options you allow, the more execution paths there are in the code, and the more tests you need due to the combinatorial explosion.

We also don't have to keep on top of making options uniform across different APIs - see --datafieldenc, --data-encoding, --inputenc, --encoding etc for examples of this which makes our lives easier and our APIs more pleasant.

but also complicates implementation of http client libraries in other languages, and may even require people to update their reverse proxy configs (if they only exposed v0).

This doesn't seem like a deal breaker to me, just switch to use the new endpoint and encode your input args, no need for the user of the http client to change anything. You might have to add an extra line or a wildcard to your nginx config in the second case.

People using the API without a client will need to change how they use the API anyway, so we can take the opportunity to reduce our maintenance overhead by having fewer code paths to maintain and test. v0 of a given endpoint could even be extracted into a module that's rarely if ever updated.

a new query param feels like less work for the end user of the API

It usually does, and it's a quick hack for the implementer of the API as well, but over time adding options results in increased maintenance overhead for the implementer due to the resulting spaghetti code and conceptual overhead for the user as they now have to understand more things about the API in order to use it.

@BigLep BigLep added the status/blocked Unable to be worked further until needs are met label May 5, 2021
@BigLep
Copy link
Contributor

BigLep commented May 5, 2021

This is blocked on side conversations that are being had to determine the way forward. See https://hackmd.io/cwWkQyXrRo-JZGUyAqXFvw

@lidel lidel added status/ready Ready to be worked and removed need/analysis Needs further analysis before proceeding status/blocked Unable to be worked further until needs are met labels May 6, 2021
@lidel
Copy link
Member

lidel commented May 6, 2021

Decision summary after today's call:

Potential low priority tasks here:

  • Remaining arg issue with dht/* commands is a lower priority, because it does not seem to impact real world use.
    If there is a bug filled for this, we will add &arg-multibase=true flag to enable passing of binary keys as multibase string, but for now leave things as-is.
  • All the other commands mentioned in original issue could use --help text clarification, but also a low priority (PR welcome).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/ready Ready to be worked topic/rpc-api Issues related to Kubo RPC API at /api/v0
Projects
None yet
Development

No branches or pull requests

6 participants