-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: conventions for placement of unsafe APIs #240
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
- Start Date: (fill me in with today's date, 2014-09-15) | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
|
||
This is a *conventions RFC* for settling the location of `unsafe` APIs relative | ||
to the types they work with, as well as the use of `raw` submodules. | ||
|
||
The brief summary is: | ||
|
||
* Unsafe APIs should be made into methods or static functions in the same cases | ||
that safe APIs would be. | ||
|
||
* `raw` submodules should be used only to *define* explicit low-level | ||
representations. | ||
|
||
# Motivation | ||
|
||
Many data structures provide unsafe APIs either for avoiding checks or working | ||
directly with their (otherwise private) representation. For example, `string` | ||
provides: | ||
|
||
* An `as_mut_vec` method on `String` that provides a `Vec<u8>` view of the | ||
string. This method makes it easy to work with the byte-based representation | ||
of the string, but thereby also allows violation of the utf8 guarantee. | ||
|
||
* A `raw` submodule with a number of free functions, like `from_parts`, that | ||
constructs a `String` instances from a raw-pointer-based representation, a | ||
`from_utf8` variant that does not actually check for utf8 validity, and so | ||
on. The unifying theme is that all of these functions avoid checking some key | ||
invariant. | ||
|
||
The problem is that currently, there is no clear/consistent guideline about | ||
which of these APIs should live as methods/static functions associated with a | ||
type, and which should live in a `raw` submodule. Both forms appear throughout | ||
the standard library. | ||
|
||
# Detailed design | ||
|
||
The proposed convention is: | ||
|
||
* When an unsafe function/method is clearly "about" a certain type (as a way of | ||
constructing, destructuring, or modifying values of that type), it should be a | ||
method or static function on that type. This is the same as the convention for | ||
placement of safe functions/methods. So functions like | ||
`string::raw::from_parts` would become static functions on `String`. | ||
|
||
* `raw` submodules should only be used to *define* low-level | ||
types/representations (and methods/functions on them). Methods for converting | ||
to/from such low-level types should be available directly on the high-level | ||
types. Examples: `core::raw`, `sync::raw`. | ||
|
||
The benefits are: | ||
|
||
* *Ergonomics*. You can gain easy access to unsafe APIs merely by having a value | ||
of the type (or, for static functions, importing the type). | ||
|
||
* *Consistency and simplicity*. The rules for placement of unsafe APIs are the | ||
same as those for safe APIs. | ||
|
||
The perspective here is that marking APIs `unsafe` is enough to deter their use | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @thestinger Perhaps the RFC text is unclear, but this is not what I'm proposing: this RFC proposes that |
||
in ordinary situations; they don't need to be further distinguished by placement | ||
into a separate module. | ||
|
||
There are also some naming conventions to go along with unsafe static functions | ||
and methods: | ||
|
||
* When an unsafe function/method is an unchecked variant of an otherwise safe | ||
API, it should be marked using an `_unchecked` suffix. | ||
|
||
For example, the `String` module should provide both `from_utf8` and | ||
`from_utf8_unchecked` constructors, where the latter does not actually check | ||
the utf8 encoding. The `string::raw::slice_bytes` and | ||
`string::raw::slice_unchecked` functions should be merged into a single | ||
`slice_unchecked` method on strings that checks neither bounds nor utf8 | ||
boundaries. | ||
|
||
* When an unsafe function/method produces or consumes a low-level representation | ||
of a data structure, the API should use `raw` in its name. Specifically, | ||
`from_raw_parts` is the typical name used for constructing a value from e.g. a | ||
pointer-based representation. | ||
|
||
* Otherwise, *consider* using a name that suggests *why* the API is unsafe. In | ||
some cases, like `String::as_mut_vec`, other stronger conventions apply, and the | ||
`unsafe` qualifier on the signature (together with API documentation) is | ||
enough. | ||
|
||
The unsafe methods and static functions for a given type should be placed in | ||
their own `impl` block, at the end of the module defining the type; this will | ||
ensure that they are grouped together in rustdoc. (Thanks @kballard for the | ||
suggestion.) | ||
|
||
# Drawbacks | ||
|
||
One potential drawback of these conventions is that the documentation for a | ||
module will be cluttered with rarely-used `unsafe` APIs, whereas the `raw` | ||
submodule approach neatly groups these APIs. But rustdoc could easily be | ||
changed to either hide or separate out `unsafe` APIs by default, and in the | ||
meantime the `impl` block grouping should help. | ||
|
||
More specifically, the convention of placing unsafe constructors in `raw` makes | ||
them very easy to find. But the usual `from_` convention, together with the | ||
naming conventions suggested above, should make it fairly easy to discover such | ||
constructors even when they're supplied directly as static functions. | ||
|
||
More generally, these conventions give `unsafe` APIs more equal status with safe | ||
APIs. Whether this is a *drawback* depends on your philosophy about the status | ||
of unsafe programming. But on a technical level, the key point is that the APIs | ||
are marked `unsafe`, so users still have to opt-in to using them. *Ed note: from | ||
my perspective, low-level/unsafe programming is important to support, and there | ||
is no reason to penalize its ergonomics given that it's opt-in anyway.* | ||
|
||
# Alternatives | ||
|
||
There are a few alternatives: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One unlisted alternative is the design I gave in another issue (I forget where) that uses associated items (RFC PR #195) to provide e.g. impl String {
/// Raw operations
static raw: StringRaw = StringRaw;
}
struct StringRaw;
impl StringRaw {
unsafe fn from_parts(buf: *mut u8, length: uint, capacity: uint) -> String;
// ...
} Personally, I think the ergonomics of this are nice. It's basically providing a tiny module structure for the static methods of a type, similar to how we use a module structure in general for types/functions. The biggest downside from a usability perspective is that it's not immediately obvious in rustdoc how this is supposed to be used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kballard I'm sorry I left this out -- I'll add a reference to it from the RFC text. |
||
|
||
* Rather than providing unsafe APIs directly as methods/static functions, they | ||
could be grouped into a single extension trait. For example, the `String` type | ||
could be accompanied by a `StringRaw` extension trait providing APIs for | ||
working with raw string representations. This would allow a clear grouping of | ||
unsafe APIs, while still providing them as methods/static functions and | ||
allowing them to easily be imported with e.g. `use std::string::StringRaw`. | ||
On the other hand, it still further penalizes the raw APIs (beyond marking | ||
them `unsafe`), and given that rustdoc could easily provide API grouping, it's | ||
unclear exactly what the benefit is. | ||
|
||
* ([Suggested by @kballard](https://github.com/rust-lang/rfcs/pull/240#issuecomment-55635468)): | ||
|
||
> Use `raw` for functions that construct a value of the type without checking | ||
> for one or more invariants. | ||
|
||
The advantage is that it's easy to find such invariant-ignoring functions. The | ||
disadvantage is that their ergonomics is worsened, since they much be | ||
separately imported or referenced through a lengthy path: | ||
|
||
```rust | ||
// Compare the ergonomics: | ||
string::raw::slice_unchecked(some_string, start, end) | ||
some_string.slice_unchecked(start, end) | ||
``` | ||
|
||
* Use `raw` submodules to group together *all* manipulation of low-level | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You keep stating that, but where are you getting that from? As far as I'm aware there was no consensus at all, because there had been no real discussion about this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was plenty of real discussion about methods. You would have to rewrite history to ignore the work myself, @cmr, @huonw and @Kimundi spent implementing the consensus from the mailing list. There were also many issues about this on the bug tracker like rust-lang/rust#6045 and rust-lang/rust#2868 but most of the discussion occurred on rust-dev. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @thestinger Why are you being so antagonistic? I'm not trying to "rewrite history". I've asked repeatedly for you to provide some source for your claim that there has been general consensus, and this is the first time you've actually provided any source whatsoever, and it doesn't even include the alleged rust-dev conversation. rust-lang/rust#6045 is a very general "don't use functions where methods are appropriate" issue. It's completely non-controversial and doesn't say anything in particular about the As for the rust-dev conversation, you still haven't actually linked it. If you want to use a rust-dev conversation to support your claim, you need to actually link it so we can read it. Also, FWIW, 4 people does not a "general consensus" make. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You only have to look in the git history to see that we migrated away from If you don't believe what I'm saying, then I have no interest in continuing to talk. I'm not going to waste time digging up all of the mailing list threads about functions vs. methods and UFCS to satisfy you so you can move on to nitpicking or misrepresenting something else I said. You can confirm it yourself with a search engine. |
||
representations. No module in `std` currently does this; existing modules | ||
provide some free functions in `raw`, and some unsafe methods, without a clear | ||
driving principle. The ergonomics of moving *everything* into free functions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ergonomics of moving anything into free functions is poor. Downgrading more methods is worse, but doing it to a few is still more verbose and harder to navigate - especially due to re-exports. It also means introducing type prefixes in modules exposing more than one type. |
||
in a `raw` submodule are quite poor. | ||
|
||
# Unresolved questions | ||
|
||
The `core::raw` module provides structs with public representations equivalent | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
to several built-in and library types (boxes, closures, slices, etc.). It's not | ||
clear whether the name of this module, or the location of its contents, should | ||
change as a result of this RFC. The module is a special case, because not all of | ||
the types it deals with even have corresponding modules/type declarations -- so | ||
it probably suffices to leave decisions about it to the API stabilization | ||
process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are problems with using a
raw
module that are not addressed. There are modules arebtree
treemap
,trie
exposing more than one type and araw
module would fail to distinguish between these. Araw
module also interacts very poorly with re-exports and adds verbosity to usage of the API.The motivation for making
unsafe
code uglier in these cases is not laid out. The point ofunsafe
is to draw a boundary between safe andunsafe
code without the need for conventions to distinguish between these. The rule about construction from raw pointer representation is very obscure, and doesn't change the fact that it makes the API less consistent and more painful to use without gaining anything in return.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thestinger
I'm a little confused by this comment. The RFC recommends making unsafe functions into methods or static functions in the same cases you'd do so for safe APIs. It cites as an explicit benefit that importing/using these APIs becomes easier by doing so:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You keep asserting that this makes the API harder to use, but that's extremely subjective. I've always found the
raw
modules instr
/slice
/string
/vec
make the API much easier to use, because any time I need to construct a value from raw pointers, I know precisely where to look.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aturon: Sorry, I misinterpreted the proposed guidelines. The reason I didn't find the details about how to handle those edge cases and a detailed rationale is because it's not what you're proposing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think much of the "know where to look" problem would be fixed by segmenting the things in the source, and also allowing rustdoc to divide functions/methods into user-defined subsections. This would mean it is clear both in the .rs file and in the docs.