Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: conventions for placement of unsafe APIs #240

Merged
merged 4 commits into from
Oct 7, 2014
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions active/0000-unsafe-api-location.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
- Start Date: (fill me in with today's date, 2014-09-15)
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary

This is a *conventions RFC* for settling the location of `unsafe` APIs relative
to the types they work with, as well as the use of `raw` submodules.

The brief summary is:

* Unsafe APIs should be made into methods or static functions in the same cases
that safe APIs would be.

* `raw` submodules should be used only to *define* explicit low-level
representations.

# Motivation

Many data structures provide unsafe APIs either for avoiding checks or working
directly with their (otherwise private) representation. For example, `string`
provides:

* An `as_mut_vec` method on `String` that provides a `Vec<u8>` view of the
string. This method makes it easy to work with the byte-based representation
of the string, but thereby also allows violation of the utf8 guarantee.

* A `raw` submodule with a number of free functions, like `from_parts`, that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are problems with using a raw module that are not addressed. There are modules are btree treemap, trie exposing more than one type and a raw module would fail to distinguish between these. A raw module also interacts very poorly with re-exports and adds verbosity to usage of the API.

The motivation for making unsafe code uglier in these cases is not laid out. The point of unsafe is to draw a boundary between safe and unsafe code without the need for conventions to distinguish between these. The rule about construction from raw pointer representation is very obscure, and doesn't change the fact that it makes the API less consistent and more painful to use without gaining anything in return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestinger

There are problems with using a raw module that are not addressed. There are modules are btree treemap, trie exposing more than one type and a raw module would fail to distinguish between these. A raw module also interacts very poorly with re-exports and adds verbosity to usage of the API. The motivation for making unsafe code uglier in these cases is not laid out.

I'm a little confused by this comment. The RFC recommends making unsafe functions into methods or static functions in the same cases you'd do so for safe APIs. It cites as an explicit benefit that importing/using these APIs becomes easier by doing so:

The benefit to moving unsafe APIs into methods (resp. static functions) is the usual one: you can gain easy access to these APIs merely by having a value of the type (resp. importing the type).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You keep asserting that this makes the API harder to use, but that's extremely subjective. I've always found the raw modules in str/slice/string/vec make the API much easier to use, because any time I need to construct a value from raw pointers, I know precisely where to look.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aturon: Sorry, I misinterpreted the proposed guidelines. The reason I didn't find the details about how to handle those edge cases and a detailed rationale is because it's not what you're proposing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think much of the "know where to look" problem would be fixed by segmenting the things in the source, and also allowing rustdoc to divide functions/methods into user-defined subsections. This would mean it is clear both in the .rs file and in the docs.

constructs a `String` instances from a raw-pointer-based representation, a
`from_utf8` variant that does not actually check for utf8 validity, and so
on. The unifying theme is that all of these functions avoid checking some key
invariant.

The problem is that currently, there is no clear/consistent guideline about
which of these APIs should live as methods/static functions associated with a
type, and which should live in a `raw` submodule. Both forms appear throughout
the standard library.

# Detailed design

The proposed convention is:

* When an unsafe function/method is clearly "about" a certain type (as a way of
constructing, destructuring, or modifying values of that type), it should be a
method or static function on that type. This is the same as the convention for
placement of safe functions/methods. So functions like
`string::raw::from_parts` would become static functions on `String`.

* `raw` submodules should only be used to *define* low-level
types/representations (and methods/functions on them). Methods for converting
to/from such low-level types should be available directly on the high-level
types. Examples: `core::raw`, `sync::raw`.

The benefits are:

* *Ergonomics*. You can gain easy access to unsafe APIs merely by having a value
of the type (or, for static functions, importing the type).

* *Consistency and simplicity*. The rules for placement of unsafe APIs are the
same as those for safe APIs.

The perspective here is that marking APIs `unsafe` is enough to deter their use
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestinger Perhaps the RFC text is unclear, but this is not what I'm proposing: this RFC proposes that from_raw_parts should be a static function on e.g. String.

in ordinary situations; they don't need to be further distinguished by placement
into a separate module.

There are also some naming conventions to go along with unsafe static functions
and methods:

* When an unsafe function/method is an unchecked variant of an otherwise safe
API, it should be marked using an `_unchecked` suffix.

For example, the `String` module should provide both `from_utf8` and
`from_utf8_unchecked` constructors, where the latter does not actually check
the utf8 encoding. The `string::raw::slice_bytes` and
`string::raw::slice_unchecked` functions should be merged into a single
`slice_unchecked` method on strings that checks neither bounds nor utf8
boundaries.

* When an unsafe function/method produces or consumes a low-level representation
of a data structure, the API should use `raw` in its name. Specifically,
`from_raw_parts` is the typical name used for constructing a value from e.g. a
pointer-based representation.

* Otherwise, *consider* using a name that suggests *why* the API is unsafe. In
some cases, like `String::as_mut_vec`, other stronger conventions apply, and the
`unsafe` qualifier on the signature (together with API documentation) is
enough.

The unsafe methods and static functions for a given type should be placed in
their own `impl` block, at the end of the module defining the type; this will
ensure that they are grouped together in rustdoc. (Thanks @kballard for the
suggestion.)

# Drawbacks

One potential drawback of these conventions is that the documentation for a
module will be cluttered with rarely-used `unsafe` APIs, whereas the `raw`
submodule approach neatly groups these APIs. But rustdoc could easily be
changed to either hide or separate out `unsafe` APIs by default, and in the
meantime the `impl` block grouping should help.

More specifically, the convention of placing unsafe constructors in `raw` makes
them very easy to find. But the usual `from_` convention, together with the
naming conventions suggested above, should make it fairly easy to discover such
constructors even when they're supplied directly as static functions.

More generally, these conventions give `unsafe` APIs more equal status with safe
APIs. Whether this is a *drawback* depends on your philosophy about the status
of unsafe programming. But on a technical level, the key point is that the APIs
are marked `unsafe`, so users still have to opt-in to using them. *Ed note: from
my perspective, low-level/unsafe programming is important to support, and there
is no reason to penalize its ergonomics given that it's opt-in anyway.*

# Alternatives

There are a few alternatives:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One unlisted alternative is the design I gave in another issue (I forget where) that uses associated items (RFC PR #195) to provide e.g. String::raw::from_parts(). This solves the ergonomics of needing to import something to get access to the values. This would look like

impl String {
    /// Raw operations
    static raw: StringRaw = StringRaw;
}

struct StringRaw;

impl StringRaw {
    unsafe fn from_parts(buf: *mut u8, length: uint, capacity: uint) -> String;
    // ...
}

Personally, I think the ergonomics of this are nice. It's basically providing a tiny module structure for the static methods of a type, similar to how we use a module structure in general for types/functions. The biggest downside from a usability perspective is that it's not immediately obvious in rustdoc how this is supposed to be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kballard I'm sorry I left this out -- I'll add a reference to it from the RFC text.


* Rather than providing unsafe APIs directly as methods/static functions, they
could be grouped into a single extension trait. For example, the `String` type
could be accompanied by a `StringRaw` extension trait providing APIs for
working with raw string representations. This would allow a clear grouping of
unsafe APIs, while still providing them as methods/static functions and
allowing them to easily be imported with e.g. `use std::string::StringRaw`.
On the other hand, it still further penalizes the raw APIs (beyond marking
them `unsafe`), and given that rustdoc could easily provide API grouping, it's
unclear exactly what the benefit is.

* ([Suggested by @kballard](https://github.com/rust-lang/rfcs/pull/240#issuecomment-55635468)):

> Use `raw` for functions that construct a value of the type without checking
> for one or more invariants.

The advantage is that it's easy to find such invariant-ignoring functions. The
disadvantage is that their ergonomics is worsened, since they much be
separately imported or referenced through a lengthy path:

```rust
// Compare the ergonomics:
string::raw::slice_unchecked(some_string, start, end)
some_string.slice_unchecked(start, end)
```

* Use `raw` submodules to group together *all* manipulation of low-level

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw modules that exist in the standard library were mostly recent additions without consensus based on the same push as this RFC. The previous consensus was to turn everything into methods, and only a few old raw:: functions remained because no one had taken the time to port away from them yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous consensus was to turn everything into methods

You keep stating that, but where are you getting that from? As far as I'm aware there was no consensus at all, because there had been no real discussion about this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was plenty of real discussion about methods. You would have to rewrite history to ignore the work myself, @cmr, @huonw and @Kimundi spent implementing the consensus from the mailing list. There were also many issues about this on the bug tracker like rust-lang/rust#6045 and rust-lang/rust#2868 but most of the discussion occurred on rust-dev.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thestinger Why are you being so antagonistic? I'm not trying to "rewrite history". I've asked repeatedly for you to provide some source for your claim that there has been general consensus, and this is the first time you've actually provided any source whatsoever, and it doesn't even include the alleged rust-dev conversation.

rust-lang/rust#6045 is a very general "don't use functions where methods are appropriate" issue. It's completely non-controversial and doesn't say anything in particular about the raw case. rust-lang/rust#2868 is even less relevant, that's just "don't have functions that duplicate methods".

As for the rust-dev conversation, you still haven't actually linked it. If you want to use a rust-dev conversation to support your claim, you need to actually link it so we can read it.

Also, FWIW, 4 people does not a "general consensus" make.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only have to look in the git history to see that we migrated away from raw modules as part of replacing free functions with methods. It wasn't treated as a special case because no one thought unsafe was inadequate. I was only the listing some of the people involved in implementing the consensus, not everyone involved in the discussion.

If you don't believe what I'm saying, then I have no interest in continuing to talk. I'm not going to waste time digging up all of the mailing list threads about functions vs. methods and UFCS to satisfy you so you can move on to nitpicking or misrepresenting something else I said. You can confirm it yourself with a search engine.

representations. No module in `std` currently does this; existing modules
provide some free functions in `raw`, and some unsafe methods, without a clear
driving principle. The ergonomics of moving *everything* into free functions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ergonomics of moving anything into free functions is poor. Downgrading more methods is worse, but doing it to a few is still more verbose and harder to navigate - especially due to re-exports. It also means introducing type prefixes in modules exposing more than one type.

in a `raw` submodule are quite poor.

# Unresolved questions

The `core::raw` module provides structs with public representations equivalent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box, Closure and Procedure will end up being removed since they're becoming obsolete. That will only leave behind Slice (could just be in the slice module) and TraitObject so it probably does need to be replaced.

to several built-in and library types (boxes, closures, slices, etc.). It's not
clear whether the name of this module, or the location of its contents, should
change as a result of this RFC. The module is a special case, because not all of
the types it deals with even have corresponding modules/type declarations -- so
it probably suffices to leave decisions about it to the API stabilization
process.