Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Principles: Error handling #84

Closed
wants to merge 13 commits into from
238 changes: 238 additions & 0 deletions docs/project/principles/error_handling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
# Principles: Error handling

<!--
Part of the Carbon Language, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

<!-- toc -->

- [Principles](#principles)
- [Programming errors are not recoverable](#programming-errors-are-not-recoverable)
- [Examples](#examples)
- [Memory exhaustion is not a recoverable error](#memory-exhaustion-is-not-a-recoverable-error)
- [Examples](#examples-1)
- [Caveats](#caveats)
- [Recoverable errors are explicit in function declarations](#recoverable-errors-are-explicit-in-function-declarations)
- [Recoverable errors are explicit at the callsite](#recoverable-errors-are-explicit-at-the-callsite)
- [Error propagation must be straightforward](#error-propagation-must-be-straightforward)
- [No universal error categories](#no-universal-error-categories)
- [Other resources](#other-resources)

<!-- tocstop -->

## Principles

### Programming errors are not recoverable

The Carbon language and standard library will not use recoverable
error-reporting mechanisms to report programming errors. Furthermore, Carbon's
design will not prioritize use cases involving recovery from programming errors.
Comment on lines +30 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error-reporting mechanisms to report programming errors. Furthermore, Carbon's
design will not prioritize use cases involving recovery from programming errors.
error-reporting mechanisms to report programming errors, i.e. errors caused by
incorrect user code. Furthermore, Carbon's design will not prioritize use cases
involving recovery from programming errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I had originally, but I changed it after @jonmeow pointed out it violated our style guide: https://developers.google.com/style/abbreviations#dont-use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reminder, you can trivially replace "i.e." with the literal meaning of "that is".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which I'm emphasizing here because Matt and Dmitri are suggesting a change in wording. Not simply the addition of latin.


Recovering from an error generally consists of discarding or reverting any state
that might be invalidated by the original cause of the error, and then
transferring control to a point that doesn't depend on the discarded state. For
example, a function that reads data from a file and validates a checksum might
avoid modifying any nonlocal state until validation is successful, and return
early if validation fails. This recovery strategy relies on the fact that the
likely causes of the failure are known and bounded (probably a malformed input
file or an I/O error), which allows us to put a bound on the state that might
have been invalidated.

A _programming error_ is an error caused by incorrect user code, such as failing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest deleting this whole paragraph, and probably also the preceding paragraph. My rationale:
(1) Everyone already has an informal understanding of what a programming error is. I don't think anything in this proposal depends on making that understanding more precise, and I also don't think it's possible to be precise.
(2) These two paragraphs both depend on the distinction between cases where it is and where it isn't practical to know what the original cause of an error is. I agree that that distinction makes sense, but I don't think it lines up at all cleanly with things that are and aren't programming errors. Consider "file not found" versus "square root of a negative number": I don't think there's any significant difference between the two in how easy it is to find the original cause.
(3) The point about dereferencing a dangling pointer is well taken, but it's better put below as one of the example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably you'd also recommend deleting "Thus, we expect that supporting recovery from programming errors would provide little or no benefit" from the following paragraph? That would leave this principle without any discussion of the purported benefit of recovering from user error. I think that would be a serious omission: at least for me, the fact that I expect that benefit to be small is a key part of the rationale for this principle. I would be much more reluctant to adopt it if I thought that recovery from programming errors was a generally viable software engineering practice.

(1) Everyone already has an informal understanding of what a programming error is. I don't think anything in this proposal depends on making that understanding more precise, and I also don't think it's possible to be precise.

The first sentence of this paragraph is this document's only attempt to define "programming error". I don't intend it to make "programming error" precise, but only to make sure the reader and I are on the same page regarding the intuitive meaning of the term. I gather you agree, since you've suggested adding a similar definition on lines 30-31. If you're suggesting I define the term there instead of here, that's fine with me, assuming the style issues can be worked out.

These two paragraphs are primarily concerned not with defining "programming errors", but with explaining why recovering from those errors is unlikely to be practical.

(2) These two paragraphs both depend on the distinction between cases where it is and where it isn't practical to know what the original cause of an error is. I agree that that distinction makes sense, but I don't think it lines up at all cleanly with things that are and aren't programming errors. Consider "file not found" versus "square root of a negative number": I don't think there's any significant difference between the two in how easy it is to find the original cause.

The issue isn't "how easy it is to find the original cause", it's how feasible it is to anticipate the original cause when writing the code that will eventually handle that error. And in that respect, I think "file not found" is very different from "square root of a negative number": I find it very hard to imagine situations where the programmer can correctly anticipate that a "square root of negative number" error may occur, and correctly understand the cause of that error, but can't more easily just intervene to prevent that error from occurring in the first place.

I've revised to try to make that clearer; does that help?

to satisfy the preconditions of an operation. When such an error is detected,
it's not possible for the program to know, or even plausibly guess, what the
original cause is. For example, dereferencing a dangling pointer is
unambiguously a programming error, but it can have many possible causes. The
author of the code might have forgotten to check some condition before
dereferencing, or the caller might have passed a dangling pointer into the
function, or some other code might have released the memory too early, or any
number of other possibilities. Without more information, it's impossible to
know, so the only way to somewhat reliably recover from a programming error is
to discard the entire address space and terminate the program.

Thus, we expect that supporting recovery from programming errors would provide
little or no benefit. Furthermore, it would be harmful to several of Carbon's
primary goals:

- [Performance-critical software](/docs/project/goals.md#performance-critical-software):
It would impose a pervasive performance overhead, because recoverable error
handling is never free, and a programming error can occur anywhere.
- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write):
Because potential programming errors are pervasive, they would have to
propagate invisibly, which makes code harder to understand (see
geoffromer marked this conversation as resolved.
Show resolved Hide resolved
[below](#recoverable-errors-are-explicit-at-the-callsite)).
- [Software and language evolution](/docs/project/goals.md#both-software-and-language-evolution):
It would inhibit evolution of Carbon libraries, and the Carbon language, by
preventing them from changing how they respond to incorrect code.
- [Practical safety guarantees and testing mechanisms](/docs/project/goals.md#practical-safety-guarantees-and-testing-mechanisms):
Similarly, it would prevent Carbon users from choosing different
performance/safety tradeoffs for handling programming errors: if an
out-of-bounds array access is required to throw an exception, users can't
disable bounds checks, regardless of their risk tolerance, because code might
rely on those exceptions being thrown.

#### Examples

If Carbon supports contract checking or other forms of assertions, it will not
permit callers to detect and handle assertion failures, even as an optional
build mode. Assertion failures will only be presented in ways that don't alter
the program state, such as logging, terminating the program, or trapping into a
debugger.

Comment on lines +91 to +92
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
debugger.
debugger.
Dereferencing a dangling or null pointer will not be reported as a
recoverable error. Doing so would impose significant performance
overhead. It also wouldn't be useful; the original bug that resulted
in a bad pointer could have been anywhere, so the only reliable way
to recover from this situation is to discard the entire address space
and terminate the program.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes along with my suggestion of deleting the paragraph saying that the only reliable way to recover from programmer error is to terminate the whole program. I'm not convinced that's true in general, but I do think it's useful to have dereferencing a bad pointer as an explicit example.

### Memory exhaustion is not a recoverable error

The Carbon standard library's common-case APIs will not go out of their way to
support treating memory exhaustion as a recoverable error.

Memory exhaustion is not a programming error, and it is feasible to write code
that can successfully recover from it. However, the available evidence indicates
that very little C++ code actually does so correctly (for example, see section
4.3 of
geoffromer marked this conversation as resolved.
Show resolved Hide resolved
[this paper](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0709r4.pdf)),
which suggests that very little C++ code actually needs to do so, and we see no
reason to expect Carbon's users to differ in this respect.

Supporting recovery from memory exhaustion would impose many of the same harms
as supporting recovery from programming errors, and for the same basic reason:
memory allocation is pervasive, and so a mechanism for recovering from it would
have to be similarly pervasive. Furthermore, experience with C++ has shown that
attempting to support memory exhaustion can seriously deform the design of an
API.

#### Examples

The `pop` operation on a Carbon queue will return the value removed from the
queue. This is in contrast to C++'s `std::queue::pop()`, which does not return
the value popped from the queue, because
geoffromer marked this conversation as resolved.
Show resolved Hide resolved
[that would not be exception-safe](https://isocpp.org/blog/2016/06/quick-q-why-doesnt-stdqueuepop-return-value)
due to the possibility of an out-of-memory error while copying that value.
Instead, the user must first examine the front of the queue, and then pop it as
a separate operation. Not only is this awkward for users, it means that
concurrent queues cannot match the API of non-concurrent queues, because
separate `front()` and `pop()` calls would create a race condition.

#### Caveats

Carbon will probably provide a low-level way to allocate heap memory that makes
geoffromer marked this conversation as resolved.
Show resolved Hide resolved
allocation failure recoverable, because doing so appears to have few drawbacks.
However, users may need to build their own libraries on top of it, rather that
relying on the Carbon standard library, if they want to take advantage of it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like both of these statements are pushing a bit far into details and specifics that haven't materialized yet. I think they're more intended to be examples, but as written feel a bit sweeping in scope.

For example, I think we might work to enable parts of the standrad library to take advantage of different allocation strategies like this if we can find a clean way to incorporate it into the design. But it is a big "if", and I'm totally down with not overpromising. I just don't want to discourage too sharply either or preclude still open design exploration.

As I mentioned above, maybe we can replace specific caveats with a more general statement around working to explore and find ways of addressing the fundamental requirements of constrained systems programming which don't have as dramatic of an effect on the overall language and API design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Working to explore" those use cases is pretty different from having them be an explicit goal (which you seem to be suggesting above), so I'm not sure what you're looking for here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "may need to build their own libraries on top of it" covers this adequately: it does leave open the possibility of a standard library that includes recovery from memory allocation failure.

I do want to avoid over-promising in the other case too, though: saying we may provide heap allocation that allows recovery from allocation failure rather than saying we definitely will.

There probably will not be a way to recover from _stack_ exhaustion, because
there is no known way of doing that without major drawbacks, and users who can't
tolerate crashing due to stack overflow can normally prevent it using static
analysis.
josh11b marked this conversation as resolved.
Show resolved Hide resolved

### Recoverable errors are explicit in function declarations

Carbon functions that can emit recoverable errors will always be explicitly
marked in all function declarations, either as part of the return type or as a
separate property of the function.

The possibility of emitting recoverable errors is nearly as fundamental to a
function's API as its return type, and so Carbon APIs will be substantially
clearer to read, and safer to use, if we require consistent, compiler-checked
documentation of that property. Furthermore, as noted above, the mechanisms for
emitting a recoverable error always impose some performance overhead, so the
compiler must be able to distinguish the functions that need that overhead from
the ones that do not.

The default should be that functions do not emit errors, because that's the
simpler and more efficient behavior, and we also expect it to be the common
case.

### Recoverable errors are explicit at the callsite

Operations that can emit recoverable errors will always be explicitly marked at
the point of use.

If errors can propagate silently, as with exceptions in most languages,
functions that they propagate through will have control flow paths that are not
visible to the reader. It is extremely difficult to reason about procedural code
when you aren't aware of all control flow paths, so this approach makes code
harder to understand, maintain, and debug, especially in large cases where
readers may not be familiar with the code above and below them in the call
stack.

Conversely, if errors can be silently ignored, as with error return codes in
many languages, it creates a major risk of accidentally resuming normal
execution without actually recovering from the error (that is, without
discarding invalidated state). This, too, would make it extremely difficult to
reason correctly about Carbon code.

Either possibility would also allow code to evolve in unsafe ways. Changing a
function to allow it to emit errors is semantically a breaking change: client
code must now contend with a previously-impossible failure case. Requiring
geoffromer marked this conversation as resolved.
Show resolved Hide resolved
errors to be marked at the callsite ensures that this breakage manifests at
build time.

josh11b marked this conversation as resolved.
Show resolved Hide resolved
### Error propagation must be straightforward

Carbon will provide a means to propagate recoverable errors from any function
call to the caller of the enclosing function, with minimal textual overhead.

In our experience, it is very common for C++ code to propagate errors across
multiple layers of the call stack. C++ exceptions support this natively, and
programmers in environments without exceptions usually develop a lightweight way
to propagate errors explicitly, typically by using a macro containing a
conditional `return`. In some cases they even resort to using nonstandard
language extensions in order to be able to use this operation within
expressions, rather than only at the statement level.

Given the ubiquity of this use case, Carbon must provide support for it that can
be used with minimal changes the structure of the code, and without making the
non-error-case logic less clear.

### No universal error categories

Carbon will not establish an error hierarchy or other reusable error vocabulary,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this conflates two questions.
(1) Does Carbon itself, either in the core language or in the standard library, establish an error hierarchy?
(2) Does Carbon allow/encourage/require users to define their own hierarchy?

The text itself mainly answers question 1, but the argument about brittle code also applies to question 2. I believe it's important to address both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this conflates those questions, but I think that conflation is correct: the two questions are aspects of one underlying question, namely whether classifying propagated errors is a programming practice that Carbon will encourage. I've tweaked part of the next paragraph to be less specific to (1); are there other places that you think put too much emphasis on (1), or not enough emphasis on (2)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"or other reusable error vocabulary" seems a bit over-broad to me. Go's error interface seems pretty harmless to me (in particular it doesn't require so many type shenanigans as Rust's Error trait), and your arguments about the downside of hierarchy and classification don't seem to apply to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I phrased this poorly; I didn't intend to exclude things like that. Better?

and will not prioritize use cases that involve classifying and reacting to the
properties of a propagated error.

Some languages attempt to impose a hierarchy or some other global classification
scheme for errors, in order to allow code to respond differently to different
kinds of errors, even after the errors have propagated some distance from the
function that originally raised them. However, this practice tends to be quite
brittle, because it almost inevitably requires relying on implementation
details: if a function's contract gives different meanings to different errors
it emits, it generally can't satisfy that contract by blindly propagating errors
from the functions it calls. Conversely, if it doesn't have such a contract, its
callers normally can't differentiate among the errors it emits without depending
on its implementation details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an argument against including an easy to use construct to propagate errors, rather than an argument against universal error classification APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's an argument against having both convenient error propagation and universal error classification. But as argued above, I think we need convenient error propagation, so classification has to be what we drop.


It may make sense to distinguish certain categories of errors, if any layer of
the stack can in principle respond to those errors, and the appropriate response
requires only local knowledge. For example, any layer of the stack can respond
to an out-of-memory error by releasing any unused caches. Similarly, any layer
of the stack can respond to thread cancellation by ceasing any new computational
work and propagating the signal _even if_ it could otherwise continue despite a
failure at that point.

However, such cases are caught between the horns of a dilemma: any error that's
universal enough to be meaningful across arbitrary levels of the call stack is
likely to be too pervasive for explicitly-marked propagation to be tolerable.
Both of the above examples have that problem; we've already ruled out
propagating out-of-memory errors because of their pervasiveness, and
cancellation is likely to pose similar challenges, although cancellation can be
ignored, which may simplify the problem somewhat.

It is certainly possible to structure a codebase so that you can reliably
propagate errors across multiple layers of the stack so long as you control
those layers, and Carbon will support those use cases. However, it will do so as
a byproduct of general-purpose programming facilities such as pattern matching;
Carbon will not provide a separate sugar syntax for pattern-matching error
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this could use a justification

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific? This is supposed to be a corollary of the general principle, which the previous three paragraphs are supposed to provide justification for.

metadata, especially if that syntax can encompass multiple potentially-failing
operations. For example, if Carbon supports `try`/`catch` statements, they will
always have a single `catch` block, which will be invoked for any error that
escapes the `try` block.
Comment on lines +236 to +238
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is becoming too specific for a principles doc. Allowing only a single catch block and asking users to use a match statement within it to distinguish errors vs. allowing multiple catch blocks and making try-catch-catch-catch resemble match-case-case-case sounds like a purely syntactic choice to me that should be discussed in the actual error handing proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just supposed to be an example application of the principle, and examples are supposed to be specific. And I don't think it's purely syntactic: providing syntactic sugar for a particular pattern is a way of encouraging that pattern, and the point of this principle is we don't want to encourage that pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "will always" here is too absolute; it sounds like approving this proposal would put this specific hard constraint on future designs, whereas I think your intention is instead that this should be used as guidance only.

Maybe softening this a little would help:

Suggested change
operations. For example, if Carbon supports `try`/`catch` statements, they will
always have a single `catch` block, which will be invoked for any error that
escapes the `try` block.
operations. For example, if Carbon supports `try`/`catch` statements, the
`catch` statements should not invent a new mechanism for dispatching on the
kind of the exception.


## Other resources

Several other groups of language designers have arrived at similar principles.
For example, see Swift's
[error handling rationale](https://github.com/apple/swift/blob/master/docs/ErrorHandlingRationale.rst),
[Joe Duffy's account](http://joeduffyblog.com/2016/02/07/the-error-model) of
Midori's error model, and Herb Sutter's
[pending proposal](http://wg21.link/P0709) for a new approach to exceptions in
C++.
29 changes: 29 additions & 0 deletions proposals/p0084.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Principles: Error handling

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/84)

## Table of contents

<!-- toc -->

- [Problem](#problem)
- [Proposal](#proposal)

<!-- tocstop -->

## Problem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be useful to have kept the "background" section here and collect all of the links about error handling that you and others have been surveying and referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd sort of rather keep that in the main principles doc (see the "Other resources" section), because I expect that to be what most people read.


Error-handling is a pervasive aspect of language and library design, and Carbon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we had a central design for how error handling should work in Carbon? Would there still be a need for a separate principle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. For example, the first principle affects the design of every language feature that can be used incorrectly (hence the example involving pointer dereferencing), and the second affects the design of quite a lot of the standard library. Some of the other principles have narrower applicability, but it's unclear exactly which language features they will apply to.

will need a consistent approach to it.

## Proposal

Introduce a set of
[principles for error handling](docs/project/principles/error_handling.md). See
that document for details.