Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving clarity and error handling around absent values #360

Open
warpfork opened this issue Feb 14, 2022 · 8 comments
Open

Improving clarity and error handling around absent values #360

warpfork opened this issue Feb 14, 2022 · 8 comments
Assignees
Labels
P2 Medium: Good to have, but can wait until someone steps up

Comments

@warpfork
Copy link
Collaborator

warpfork commented Feb 14, 2022

As of v0.14ish (and considerably before), the exact contract that the datamodel.Node interface has for values that are not found in a map (or are out of bounds index in a list) is not super well documented, nor is a defacto convention well adhered-to in implementations. This needs to be rectified. The current situation makes writing correct code for handling data generically very difficult, if absence of values needs to be handled distinctly from other potential error paths.

(Why is this an issue now, and not earlier? It's always been an issue -- but sometimes it's avoidable. The typed Node implementations have generally had clearer behavior for this. In many cases, a data transformation that's not conditional on the existing data hits no speedbumps here. But in completely generic data transforms -- such as a Patch implementation, which I'm working on now, it's becoming quite keening indeed.)

At the same time: I'm finding that I experience 'nil' considerably more often than I'm happy with while using this library. One of the biggest sources of this is when nil is used where an Absent value might also be semantically reasonable.

To address both of these problems, here is what I think we should do:

  • Node implementations should return Absent for the value when doing lookups in a map or list, if that value is indeed absent. (No longer should this ever use 'nil'.)
  • Nodes should not return datamodel.ErrNotExists anymore. Returning Absent as the value and nil as an error should be the new style. (Trying to handle value absence as an error path has turned out to be considerably painful, and often produces confusing branching.)
  • ErrNotFound: this might still exist, but it's should only used by APIs that are doing more than one stride at a time (e.g. traversal stuff).
  • While we're at it: In typed nodes, structs also should return Absent for both a field which is defined but was absent, and also for a field which is not defined. (Previously, structs returned somewhat stronger error for a field that is not defined: schema.ErrNoSuchField. But I've come to think this is probably on net unhelpful; the general learning from working with our type system over time seems be uncovering that the less the type lens makes things diverge from the basic data model, the better.)

This is likely to be a breaking change in a few cases. Worse, it's the kind that the compiler won't detect. Nonetheless, I think pulling the bandaid on this one will not become easier over time, so it might as well be done soon or now.

Previous review: #74 (no final resolution reached, but covers the same problem).
Possibly also relevant (though tangentally): #191

We might consider how to migrate this smoothly. For example, perhaps there could be an intermediate time where we return both Absent and ErrNotExists. However, it's not immediately clear that this would actually result in greater smoothness of transition. AMEND: Let's not atttempt this particular "migration" strategy. It's sufficiently un-idiomatic golang that it seems like a very poor idea, even temporarily.

@rvagg
Copy link
Member

rvagg commented Feb 14, 2022

I experience 'nil' considerably more often than I'm happy with

Is this just a smell thing? Or are there practical reasons why this is a problem?

In typed nodes, structs also should return Absent for both a field which is defined but was absent, and also for a field which is not defined

This seems like the most controversial part of this to me; it seems like you're proposing a loss of information for typed nodes. There'll be no easy way to distinguish between "this type has no such field" and "this field's value is currently absent", will there? I assume this is to make it so typed nodes are more easily interchangeable with basic nodes and don't have these weird behaviours. But is this a case of removing some of the utility of typed nodes in the process of making them generic?

@mvdan
Copy link
Contributor

mvdan commented Feb 16, 2022

I basically arrived at the same thoughts that Rod brought up. I also have a worry that the transition will be painful, but given that ipld-prime is a v0, I imagine we can get away with it.

As for "not found" returning the current nil, ErrNotExists versus a future Absent, nil: I have a slight preference for the current form, because it's more difficult to misuse. For instance:

// compiler error! err declared and not used
node, err := parent.LookupByString("foo")
use(node)

Or:

// suspicious; explicitly ignored error
node, _ := parent.LookupByString("foo")
use(node)

Users today will instead write something like:

node, err := parent.LookupByString("foo")
if err != nil {
    handle(err) // handle any error, but you can also handle ErrNotExists separately here
}
use(node)

However, this will not be enough with the new Absent, nil form, and none of the static analysis or syntax cues will help you. You'll instead want something like:

node, err := parent.LookupByString("foo")
if err != nil {
    handle(err) // handle other errors
}
if node.IsAbsent() {
    // handle the absent case
}
use(node)

@mvdan
Copy link
Contributor

mvdan commented Feb 16, 2022

I should note that returning a nil error for "not found" can be very reasonable if such a case is entirely normal, and using the result normally won't cause problems. For instance, if the return parameters are (*SomeStruct, error), and the methods on *SomeStruct don't panic when called on nil. That's not our case here, because we return an interface, which will panic when methods are called on its nil value.

@warpfork
Copy link
Collaborator Author

I think the main distinction is, while with absent values, one may write:

node, err := parent.LookupByString("foo")
if err != nil {
    handle(err) // handle other errors
}
if node.IsAbsent() {
    // handle the absent case
}
use(node)

... one may also simply not have that IsAbsent branch sometimes: it's a fairly normal value and you can often pass it on.

Whereas currently, one generally must write:

node, err := parent.LookupByString("foo")
if err != nil {
    handle(err) // handle other errors
}
if node == nil {
    // handle the absent case
}
use(node)

... because passing forward a nil Node generally doesn't produce sensible results, and is very prone to producing nil panics (which are then in turn a PITA to debug because they're very uninformative).

@rvagg
Copy link
Member

rvagg commented Feb 28, 2022

Returning a valid value and an err response doesn't seem very idiomatic Go to me, is this something that's regularly done? Are there common APIs that employ this?

@warpfork
Copy link
Collaborator Author

I agree, returning both a valid value and an error response is not very idiomatic. I'll strike that suggestion from the original post.

@warpfork
Copy link
Collaborator Author

This seems like the most controversial part of this to me; it seems like you're proposing a loss of information for typed nodes. [...] I assume this is to make it so typed nodes are more easily interchangeable with basic nodes and don't have these weird behaviours. But is this a case of removing some of the utility of typed nodes in the process of making them generic?

Yes and yes.

Agree that this is controversial. I'm not entirely convinced of it myself. But the reasoning is as you say: I'd like the schema system to be even less special than it is. (It's already minimally special! These bits about absent fields are one of the remaining noticeable specialnesses.)

We could crowbar this choice out of this issue (and probably should). (It should probable affect the decision about how "length" works on structs with optional fields too, and that'd be kind of a bigger change.)

There'll be no easy way to distinguish between "this type has no such field" and "this field's value is currently absent", will there?

Right. Not through the data model universal interface.

One would need to shift gears into some kind of introspective mode that's outside the data model.

That doesn't currently exist in a very standardized way. (It's present in the golang code! But if we wanted to make a declarative/message-passing serial API about it, and put serial fixtures about its behavior in the specs/meta repo, we'd have to invent a lot of things that aren't invented yet.) But we could make that. And it might be useful and reasonable to do so.

Meanwhile: I think it's rather questionable whether being able to make that distinction through the data model view is actually useful in practice. I thought it would be! But I don't know if I can think of any supporting evidence or major use of it, now that we've had that ability for a while. (More data welcome.)

@rvagg
Copy link
Member

rvagg commented Mar 1, 2022

I think it's rather questionable whether being able to make that distinction through the data model view is actually useful in practice. I thought it would be! But I don't know if I can think of any supporting evidence or major use of it, now that we've had that ability for a while.

True .. I can't think of a good case or even a good hypothetical .. maybe with time.

@rvagg rvagg added the P2 Medium: Good to have, but can wait until someone steps up label May 17, 2022
@rvagg rvagg self-assigned this May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium: Good to have, but can wait until someone steps up
Projects
Status: 🗄️ Backlog
Development

No branches or pull requests

3 participants