Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: CStr, the dereferenced complement to CString #592

Merged
merged 28 commits into from
Feb 18, 2015
Merged
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4caece6
Added RFC: dereferenced complement to CString
mzabaluev Jan 18, 2015
ce1668e
Added a question regarding `Cow`s
mzabaluev Jan 18, 2015
b04c835
A small proofreading fix
mzabaluev Jan 18, 2015
d8cfda0
Explained the assertion policy on CStr-from-static-data
mzabaluev Jan 19, 2015
e6a9bbd
Put the static data helpers into the CStr impl
mzabaluev Jan 19, 2015
c9cf40c
Added a section detailing API for returning C string references
mzabaluev Jan 19, 2015
7305d7a
Described the DST alternative for `CStr`
mzabaluev Jan 19, 2015
0f26194
Editorial on alternatives
mzabaluev Jan 19, 2015
d64f14f
Updated the definition of c_str!
mzabaluev Jan 19, 2015
234e2ee
Added a paragraph concerning the bogus size to drawbacks
mzabaluev Jan 21, 2015
9565b47
Link to the proposal for truly unsized types.
mzabaluev Jan 22, 2015
34a8dbb
Removed the lifetime anchor on CStr::from_raw_ptr
mzabaluev Jan 30, 2015
a08cc3a
Dropped the paragraph on OwnedCString
mzabaluev Jan 30, 2015
e54f3da
Dropped macro c_str!
mzabaluev Feb 3, 2015
3c904f3
Renamed the new functions
mzabaluev Feb 3, 2015
c317681
Reference to the lifetime RFC
mzabaluev Feb 3, 2015
4e47d19
Added a question about deprecating c_str_to_bytes
mzabaluev Feb 3, 2015
7204d12
Reworded the motivation to better outline the need for a special type
mzabaluev Feb 3, 2015
9cf5ca4
Addressed the drawback of losing the length on deref from CString
mzabaluev Feb 3, 2015
7e5cb6f
Drop CStr::from_static_str
mzabaluev Feb 3, 2015
e65afae
Renamed from_raw to from_ptr
mzabaluev Feb 4, 2015
dbbbb11
Dropped CStr::from_static_bytes
mzabaluev Feb 4, 2015
2bc7582
Promoted deprecation of `c_str_to_bytes` into the proposed changes
mzabaluev Feb 4, 2015
1deb4f8
Updated the links to unsized types RFC
mzabaluev Feb 9, 2015
7e36016
Minor editorial
mzabaluev Feb 9, 2015
96e7abf
RFC 556 is now approved
mzabaluev Feb 9, 2015
2b4d09c
Modified into a two-stage transition plan involving DST
mzabaluev Feb 13, 2015
e4b047c
c_string now has CStr as a pseudo-DST
mzabaluev Feb 14, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions text/0000-c-str-deref.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
- Start Date: 2015-01-17
- RFC PR:
- Rust Issue:

# Summary

Make `CString` dereference to a token type `CStr`, which designates
null-terminated string data.

```rust
// Type-checked to only accept C strings
fn safe_puts(s: &CStr) {
unsafe { libc::puts(s.as_ptr()) };
}

fn main() {
let s = CString::from_slice("A Rust string");
safe_puts(s);
}
```

# Motivation

The type `std::ffi::CString` is used to prepare string data for passing
as null-terminated strings to FFI functions. This type dereferences to a
DST, `[libc::c_char]`. The slice type as it is, however, is a poor choice
for representing borrowed C string data, since:

1. A slice does not express the C string invariant at compile time.
Safe interfaces wrapping FFI functions cannot take slice references as is
without dynamic checks (when null-terminated slices are expected) or
building a temporary `CString` internally (in this case plain Rust slices
must be passed with no interior NULs).
2. An allocated `CString` buffer is not the only desired source for
borrowed C string data. Specifically, it should be possible to interpret
a raw pointer, unsafely and at zero overhead, as a reference to a
null-terminated string, so that the reference can then be used safely.
However, in order to construct a slice (or a dynamically sized newtype
wrapping a slice), its length has to be determined, which is unnecessary
for the consuming FFI function that will only receive a thin pointer.
Another likely data source are string and byte string literals: provided
that a static string is null-terminated, there should be a way to pass it
to FFI functions without an intermediate allocation in `CString`.

As a pattern of owned/borrowed type pairs has been established
thoughout other modules (see e.g.
[path reform](https://github.com/rust-lang/rfcs/pull/474)),
it makes sense that `CString` gets its own borrowed counterpart.

# Detailed design

This proposal introduces `CStr`, a type to designate a null-terminated
string. This type does not implement `Sized`, `Copy`, or `Clone`.
References to `CStr` are only safely obtained by dereferencing `CString`
and a few other helper methods, described below. A `CStr` value should provide
no size information, as there is intent to turn `CStr` into an
[unsized type](https://github.com/rust-lang/rfcs/issues/813),
pending resolution on that proposal.

## Stage 1: CStr, a DST with a weight problem

As current Rust does not have unsized types that are not DSTs, at this stage
`CStr` is defined as a newtype over a character slice:

```rust
#[repr(C)]
pub struct CStr {
chars: [libc::c_char]
}

impl CStr {
pub fn as_ptr(&self) -> *const libc::c_char {
self.chars.as_ptr()
}
}
```

`CString` is changed to dereference to `CStr`:

```rust
impl Deref for CString {
type Target = CStr;
fn deref(&self) -> &CStr { ... }
}
```

In implementation, the `CStr` value needs a length for the internal slice.
This RFC provides no guarantees that the length will be equal to the length
of the string, or be any particular value suitable for safe use.

## Stage 2: unsized CStr

If unsized types are enabled later one way of another, the definition
of `CStr` would change to an unsized type with statically sized contents.
The authors of this RFC believe this would constitute no breakage to code
using `CStr` safely. With a view towards this future change, it's recommended
to avoid any unsafe code depending on the internal representation of `CStr`.

## Returning C strings

In cases when an FFI function returns a pointer to a non-owned C string,
it might be preferable to wrap the returned string safely as a 'thin'
`&CStr` rather than scan it into a slice up front. To facilitate this,
conversion from a raw pointer should be added (with an inferred lifetime
as per [the established convention](https://github.com/rust-lang/rfcs/pull/556)):
```rust
impl CStr {
pub unsafe fn from_ptr<'a>(ptr: *const libc::c_char) -> &'a CStr {
...
}
}
```

For getting a slice out of a `CStr` reference, method `to_bytes` is
provided. The name is preferred over `as_bytes` to reflect the linear cost
of calculating the length.
```rust
impl CStr {
pub fn to_bytes(&self) -> &[u8] { ... }
pub fn to_bytes_with_nul(&self) -> &[u8] { ... }
}
```

An odd consequence is that it is valid, if wasteful, to call `to_bytes` on
a `CString` via auto-dereferencing.

## Remove c_str_to_bytes

The functions `c_str_to_bytes` and `c_str_to_bytes_with_nul`, with their
problematic lifetime semantics, are deprecated and eventually removed
in favor of composition of the functions described above:
`c_str_to_bytes(&ptr)` becomes `CStr::from_ptr(ptr).to_bytes()`.

## Proof of concept

The described interface changes are implemented in crate
[c_string](https://github.com/mzabaluev/rust-c-str).

# Drawbacks

The change of the deref target type is another breaking change to `CString`.
In practice the main purpose of borrowing from `CString` is to obtain a
raw pointer with `.as_ptr()`; for code which only does this and does not
expose the slice in type annotations, parameter signatures and so on,
the change should not be breaking since `CStr` also provides
this method.

Making the deref target unsized throws away the length information
intrinsic to `CString` and makes it less useful as a container for bytes.
This is countered by the fact that there are general purpose byte containers
in the core libraries, whereas `CString` addresses the specific need to
convey string data from Rust to C-style APIs.

# Alternatives

If the proposed enhancements or other equivalent facilities are not adopted,
users of Rust can turn to third-party libraries for better convenience
and safety when working with C strings. This may result in proliferation of
incompatible helper types in public APIs until a dominant de-facto solution
is established.

# Unresolved questions

Need a `Cow`?