Skip to content
This repository has been archived by the owner on Oct 5, 2023. It is now read-only.

fix: make Block().* return correct ABI based ipld.ErrNotFound errors #156

Merged
merged 9 commits into from
Apr 5, 2022

Conversation

Jorropo
Copy link
Contributor

@Jorropo Jorropo commented Mar 27, 2022

This fixes the tests for ipfs/kubo#8815

There are probably other APIs that needs the same treatment however we do not have failing tests for them yet.

Blocking PR and replace rules to be removed:

@Jorropo Jorropo force-pushed the fix/ipld-ErrNotFound branch 3 times, most recently from e5b3ccf to 507bb8c Compare March 27, 2022 12:24
@Jorropo Jorropo force-pushed the fix/ipld-ErrNotFound branch from 507bb8c to c15609a Compare March 27, 2022 12:51
@BigLep BigLep requested review from aschmahmann and guseggert March 28, 2022 04:23
@BigLep BigLep added this to the go-ipfs 0.13 milestone Mar 28, 2022
abyfy_errors.go Outdated

// This file handle parsing and returning the correct ABI based errors from error messages
//lint:ignore ST1008 this function is not using the error as a mean to return failure but it massages it to return the correct type
func abyfyIpldNotFound(msg string) (error, bool) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty complicated and brittle. Does every impl of this HTTP client have to do this? Should we consider changing the API to return structured errors, or do this server side, so clients don't have to resort to these contortions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some summarizing from the issue as discussed in https://discord.com/channels/806902334369824788/956547968385822730/957037148853395496.

IIUC from the interface tests previously we did have some weak contract on the CoreAPI that functions like Block().Get() and Block().Rm() also needed to return things like blockstore.ErrNotFound (or really that the error message would be something like blockstore: block not found ipfs/interface-go-ipfs-core@01ee941. Now we've broken the weak contract by changing the text and are figuring out the best way to restore it.

Does every impl of this HTTP client have to do this?

Not necessarily, it depends what they're trying to do. In this case to for an HTTP API to be compatible with interface-go-ipfs-core's CoreAPI interfaces it needs to do some sort of error checking to figure out what type of error it received. Maybe we could drop that requirement, but idk that we have to.

Should we consider changing the API to return structured errors

You're right that it might be worth changing the HTTP API to have some contract on the types of errors it returns. I think for me the question boils down to the value tradeoffs here, i.e. the number of server and client implementations of this API that need to care about errors vs how much effort it is to return structured errors. Given the low number of servers and the low number of clients that care strongly about the error types it seems like we can probably not do this right now.

or do this server side, so clients don't have to resort to these contortions?

We can make some of this easier server side, although currently there are two implementers of this API (go-ipfs and js-ipfs) with most other IPFS implementations choosing to not implement this HTTP API at all. Some contortions will still be necessary because go-ipfs and js-ipfs don't have the same error messages at the moment (especially since we just changed them in go-ipfs with resolving ipfs/kubo#7074).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with @aschmahmann

Should we consider changing the API to return structured errors, or do this server side, so clients don't have to resort to these contortions?

I belive what surfaced from our discussion (see adin's discord link) is that we can do that later.

Right now it's fine because there is only one error type that needs that treatment. But I think that if we spread that pattern in the future I think we will use a list of error IDs (or automatically generated ones).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how these are used and whether they care about this or not.

My thought is that grep.app would've shown more code caring about the previous error text if code built around those clients cared. Since there doesn't seem to be much there it appears to be safe to change the error text and that others aren't relying on knowing the error types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here are some active community client impls

I don't think it matters to them, as this is required if the http clients want to be compatible with the CoreAPI objects (extended weak contract, if you only use errors to show your users error messages you don't care about that for example).

Object which doesn't exists in other languages and even if it does (maybe JS ?) it doesn't know about ipld.ErrNotFound.

abyfy_errors.go Outdated Show resolved Hide resolved
abyfy_errors_test.go Outdated Show resolved Hide resolved
abyfy_errors.go Outdated

// This file handle parsing and returning the correct ABI based errors from error messages
//lint:ignore ST1008 this function is not using the error as a mean to return failure but it massages it to return the correct type
func abyfyIpldNotFound(msg string) (error, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some summarizing from the issue as discussed in https://discord.com/channels/806902334369824788/956547968385822730/957037148853395496.

IIUC from the interface tests previously we did have some weak contract on the CoreAPI that functions like Block().Get() and Block().Rm() also needed to return things like blockstore.ErrNotFound (or really that the error message would be something like blockstore: block not found ipfs/interface-go-ipfs-core@01ee941. Now we've broken the weak contract by changing the text and are figuring out the best way to restore it.

Does every impl of this HTTP client have to do this?

Not necessarily, it depends what they're trying to do. In this case to for an HTTP API to be compatible with interface-go-ipfs-core's CoreAPI interfaces it needs to do some sort of error checking to figure out what type of error it received. Maybe we could drop that requirement, but idk that we have to.

Should we consider changing the API to return structured errors

You're right that it might be worth changing the HTTP API to have some contract on the types of errors it returns. I think for me the question boils down to the value tradeoffs here, i.e. the number of server and client implementations of this API that need to care about errors vs how much effort it is to return structured errors. Given the low number of servers and the low number of clients that care strongly about the error types it seems like we can probably not do this right now.

or do this server side, so clients don't have to resort to these contortions?

We can make some of this easier server side, although currently there are two implementers of this API (go-ipfs and js-ipfs) with most other IPFS implementations choosing to not implement this HTTP API at all. Some contortions will still be necessary because go-ipfs and js-ipfs don't have the same error messages at the moment (especially since we just changed them in go-ipfs with resolving ipfs/kubo#7074).

abyfy_errors.go Outdated Show resolved Hide resolved
@Jorropo Jorropo force-pushed the fix/ipld-ErrNotFound branch 3 times, most recently from adc28ba to f3052c4 Compare March 31, 2022 22:24
@Jorropo Jorropo requested a review from guseggert March 31, 2022 22:24
@Jorropo Jorropo force-pushed the fix/ipld-ErrNotFound branch from f3052c4 to 48b2867 Compare March 31, 2022 22:34
errors.go Outdated
Comment on lines 102 to 108
// Assume that CIDs only contain a-zA-Z0-9 characters.
// This is true because go-ipld-format use go-cid#Cid.String which use base{3{2,6},58}.
postIndex = strings.IndexFunc(msgPostKey, notAsciiLetterOrDigits)
if postIndex < 0 {
postIndex = len(msgPostKey)
}

var err error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the complexity brought by this? Can't we just assume the CID is separated by a space, instead? (if we have to assume anything at all)

Copy link
Contributor Author

@Jorropo Jorropo Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a space seems incomplete, I would like to include " too.

And at this point instead of searching a list of things I want to break on, I thought it was simpler to do a list of things I do not want to break on.

That is very static code that should never need an update (unless this parsing code is removed) and has a ~100% test coverage. So it's not really adding maintainment costs.

Edit: this code could be made simpler by removing the LUT, however not breaking on a-zA-Z0-9 seems usefull to me.

Copy link

@schomatis schomatis Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code could be made simpler by removing the LUT

Yes, that.

however not breaking on a-zA-Z0-9 seems usefull to me.

I'm not sure how this keeps something from not breaking, will need to take a closer look. In general hard-coding a basic alphanumeric detector along with its tests sounds like unneeded complexity added, if not for maintaining costs (though more code seems to always imply that) at least for my selfish reviewing costs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this keeps something from not breaking

I don't meant breaking code, I meant breaking as control flow, breaking out of the IndexFunc loop.

Since you are two with gus raising the same point #156 (comment) I've replaced it by breaking on " \t\n\r\v\f;\"" (whitespaces or semicolon (produced by multierr when concatenating) or " for %q and %#v).

go.mod Outdated
Comment on lines 24 to 27

replace github.com/ipfs/go-ipld-format => github.com/Jorropo/go-ipld-format v0.3.2-0.20220330014726-942265d1aca7

replace github.com/ipfs/interface-go-ipfs-core => github.com/Jorropo/interface-go-ipfs-core v0.6.2-0.20220331215619-b98f8571cf6b

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

@Jorropo Jorropo Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW from https://discord.com/channels/806902334369824788/956547968385822730/959649999988338762

The go-ipfs and go-ipfs-http-cient PRs are circularly dependent, how you want to handle that ?
The tests works on my machine when I use both branches together (unhelpfull as it is).

Do I make temporary replaces and custom checkout that I'll remove before merging so you can see a green CI light ?
An other option is that since go-ipfs's dependency is only in a testing workflow, we can merge go-ipfs with a custom checkout there, that unblocks go-ipfs-http-client merge it and then do a new PR that just cleanup go-ipfs's checkout.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay that makes sense, thanks

Copy link

@guseggert guseggert Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect CI runs to fully pass at this point, or is the current test failure expected?

Copy link
Contributor Author

@Jorropo Jorropo Apr 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect CI runs to fully pass at this point, or is the current test failure expected?

It is the one expected. And it works on my machine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay SGTM, the go-ipfs PR looks like it's ready to go right? So can we go ahead and remove the replace directives then so we can merge this?

errors_test.go Outdated Show resolved Hide resolved
errors.go Outdated
Comment on lines 102 to 103
// Assume that CIDs only contain a-zA-Z0-9 characters.
// This is true because go-ipld-format use go-cid#Cid.String which use base{3{2,6},58}.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a brittle assumption to make, as there already exist multibase encodings (and additional proposals) outside of this range. Would prefer not to make assumptions about the CID encoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced by breaking on " \t\n\r\v\f;\"" (whitespaces or semicolon (produced by multierr when concatenating) or " for %q and %#v). Hopefully no CID we will see will include thoses.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see so this is actually just a bug on the server side, because there doesn't seem to be a way to parse all valid CIDs from these error strings, am I understanding that correctly?

We should add some documentation about the fact that this only supports a subset of valid CIDs, and also open an issue about this bug.

Anyway, this seems like the least-awful approach. Thanks for taking the time to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see so this is actually just a bug on the server side, because there doesn't seem to be a way to parse all valid CIDs from these error strings, am I understanding that correctly?

Currently AFAIK, no multibase that we support use thoses codepoints so it's valid. If you use a CID that use thoses codepoints the client wouldn't know how to parse it anyway (without bumping it's go-multibase dep).

The "proper" fix, is encoding ipld.ErrNotFound.Cid in a separate whatever field (LEB, json field, ...) so we can unambiguously know where they terminate and fetch them correctly, but that fixing an issue that doesn't exists yet since no CID we know how to parse use that (I would like to see a test failing before fixing it).

Copy link

@guseggert guseggert Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra chars the encodings have outside 0-9A-Za-z:

  • base32pad: =
  • base32padupper: =
  • base32hexpadupper: =
  • base64pad: =, +, /
    • ex f00003E00003F00 => MAAA+AAA/AA==
  • base64urlpad: =, -, _
    • ex f00003E00003F00 => UAAA-AAA_AA==
  • base64: +, /
    • ex f00003E00003F00 => mAAA+AAA/AA
  • base64url: -, _
    • ex f00003E00003F00 => uAAA-AAA_AA

If a new encoding is added, ideally it would just work (would be the case if the cid were properly delimited rather than guessing about delimiters, so we could just blindly pass it to go-multibase). Second-best would be that an existing test would fail when we add support for the new encoding (doesn't seem to be the case here?). Least ideal is that this parsing code gets confused and parses the CID incorrectly (how it's currently implemented). I'm not sure what the implications are of that...annoying for users at the least, potentially insecure?

Copy link
Contributor Author

@Jorropo Jorropo Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra chars the encodings have outside 0-9A-Za-z

I meant " \t\n\r\v\f;\"".

parsing code gets confused and parses the CID incorrectly

It parse incorrectly see that the CID is invalid (either header is truncated or multihash length doesn't match) and giveup and use a text error instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially insecure

@guseggert

I'll change it to return a blockstoreNotFoundMatchingIPLDErrNotFound instead of a text error when it gives up (the difference is that ipld.IsNotFound(blockstoreNotFoundMatchingIPLDErrNotFound{}) return true). However the CID will still not be recoverable (so if someone expects to reads the CID out of that they can't).

But no code currently do that (because in the past only text errors was there) and I don't plan anyone to write such code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we decoding CIDs in the first place? Is that part of the spirit of what we're testing here? It seems what we want to identify is the type of error returned (and sadly we need to do that by string manipulation), but we already know what CID we're requesting (it's our test) so we can anticipate the the CID string and just match against it.

@Jorropo Jorropo force-pushed the fix/ipld-ErrNotFound branch from 48b2867 to 5635257 Compare April 2, 2022 01:26
Copy link

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy this no longer uses the alphanumeric detection scheme.

@Jorropo Jorropo force-pushed the fix/ipld-ErrNotFound branch from 5635257 to d7208ce Compare April 5, 2022 18:18
@Jorropo Jorropo merged commit fdbee7c into master Apr 5, 2022
@Jorropo Jorropo deleted the fix/ipld-ErrNotFound branch April 5, 2022 18:19
@github-actions
Copy link

github-actions bot commented Apr 5, 2022

Suggested version: v0.3.0
Comparing to: v0.2.0 (diff)

Changes in go.mod file(s):

diff --git a/go.mod b/go.mod
index 3ef46c5..ce3ed95 100644
--- a/go.mod
+++ b/go.mod
@@ -5,11 +5,11 @@ require (
 	github.com/ipfs/go-cid v0.0.7
 	github.com/ipfs/go-ipfs-cmds v0.6.0
 	github.com/ipfs/go-ipfs-files v0.0.8
-	github.com/ipfs/go-ipld-format v0.2.0
-	github.com/ipfs/go-merkledag v0.4.0
+	github.com/ipfs/go-ipld-format v0.4.0
+	github.com/ipfs/go-merkledag v0.6.0
 	github.com/ipfs/go-path v0.1.1
 	github.com/ipfs/go-unixfs v0.2.5
-	github.com/ipfs/interface-go-ipfs-core v0.5.2
+	github.com/ipfs/interface-go-ipfs-core v0.6.2
 	github.com/ipfs/iptb v1.4.0
 	github.com/ipfs/iptb-plugins v0.3.0
 	github.com/libp2p/go-libp2p-core v0.8.6

gorelease says:

# summary
Suggested version: v0.3.0

gocompat says:

(empty)

Copy link

@schomatis schomatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34cc489 looks good.

Jorropo added a commit to ipfs/kubo that referenced this pull request May 30, 2023
…rNotFound

fix: make Block().* return correct ABI based ipld.ErrNotFound errors

This commit was moved from ipfs/go-ipfs-http-client@fdbee7c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants