Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for CString::{from_ptr, into_ptr} #27769

Closed
alexcrichton opened this issue Aug 12, 2015 · 13 comments · Fixed by #28339
Closed

Tracking issue for CString::{from_ptr, into_ptr} #27769

alexcrichton opened this issue Aug 12, 2015 · 13 comments · Fixed by #28339
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Milestone

Comments

@alexcrichton
Copy link
Member

This is a tracking issue for the unstable cstr_memory feature in the standard library. These methods are likely ready to go as-is modulo naming. Their names should be consistent with Box::{from_raw, into_raw} (currently inconsistent).

cc #27768

@alexcrichton alexcrichton added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Aug 12, 2015
@alexcrichton
Copy link
Member Author

This feature is now entering its final comment period for stabilization.

I will post a PR soon for renaming these function to from_raw and into_raw.

@shepmaster
Copy link
Member

renaming these function to from_raw and into_raw

Awesome! That makes sense to me.

@bluss
Copy link
Member

bluss commented Aug 18, 2015

This rename is important because CStr::from_ptr and CString::from_ptr are so different, but with similar names.

@bluss
Copy link
Member

bluss commented Aug 18, 2015

If you use CString::from_ptr on a non-rust allocated thing by mistake, you end up trying to reallocate string literals and things like that. See #27878.

@bluss
Copy link
Member

bluss commented Aug 18, 2015

We need to learn from these bugs. Premature dropping of CString leading to dangling pointers, and that some raw pointers transfer ownership and others are "borrowed". Can we use the type system better?

@shepmaster
Copy link
Member

@bluss an interesting question. In my mind, once you hit a raw pointer, all bets are off and you have to manage everything yourself. Do you have any thoughts on how to improve that with types?

alexcrichton added a commit to alexcrichton/rust that referenced this issue Aug 18, 2015
This commit renames the `CString::{into_ptr, from_ptr}` methods to `into_raw`
and `from_raw` to mirror the corresponding methods on `Box` and the naming of
"raw" for `from_raw_parts` on slices and vectors.

cc rust-lang#27769
bors added a commit that referenced this issue Aug 18, 2015
This commit renames the `CString::{into_ptr, from_ptr}` methods to `into_raw`
and `from_raw` to mirror the corresponding methods on `Box` and the naming of
"raw" for `from_raw_parts` on slices and vectors.

cc #27769
@geofft
Copy link
Contributor

geofft commented Sep 3, 2015

You can still get some type-safety with raw pointers, though you don't get ownership tracking. I can see a potential API along these lines:

#[repr(i8)] enum RustAllocChar {...}

impl CString {
    fn into_raw(self) -> *const RustAllocChar;
    unsafe fn from_raw(ptr: *const RustAllocChar) -> CString;
}

which lets you write something like

extern {
    fn do_stuff(string: *mut RustAllocChar, free: fn(*mut RustAllocChar));
}

extern fn free(ptr: *mut RustAllocChar) {
    drop(CString::from_raw(ptr));
}

fn main() {
    let s = CString::new("Hello!").unwrap();
    unsafe {do_stuff(s.into_raw(), free)};
}

or for a library

#[no_mangle] // char *mylib_do_stuff(void);
extern fn mylib_do_stuff() -> *mut RustOwnedChar {...}
#[no_mangle] // void mylib_free(char *s);
extern fn mylib_free(s: *mut RustOwnedChar) {...}

So RustAllocChar (or whatever you want to call it -- maybe CChar?) is as valid a binding of C's char as libc::c_char is, but it clearly states that the memory is allocated by Rust: into_raw() no longer returns a type that is indistinguishable from foreign-allocated, otherwise-owned, or static strings. So it's harder to leak it by passing it somewhere where *const c_char was expected, and it's harder to pass the result of .as_ptr() (which is only valid until the CString goes out of scope) where you really wanted .into_ptr(), since they have different types. And it's harder to accidentally free things that shouldn't be freed, because they won't have type *const RustAllocChar.

@geofft
Copy link
Contributor

geofft commented Sep 3, 2015

Also, while I was writing the above, I realized -- shouldn't into_raw return a mutable pointer, not a constant one? It is indeed mutable, in that it's an owned, heap-allocated array of bytes: C code can safely modify it within its length. It is also semantically common for C APIs to use mutable pointers vs. const pointers to indicate ownership: for instance, void free(void *ptr) takes a mutable pointer, so if you only have a constant pointer, you know you can't free it and therefore you don't own it. So if you're "transfer[ring] ownership of the string to a C caller", as our docs say, the C code is probably expecting a mutable pointer regardless of whether it intends to mutate it.

Similarly, from_raw should take a mutable pointer, since C APIs are probably happy to free the string by passing it back as a mutable pointer (since the usual way they free strings is free). This would encourage good API design and also discourage things like from_rawing a string literal, but it's not strictly necessary.

[EDIT: Box::from_raw and into_raw return and take mutable pointers, respectively, so this change would match their API.]

@shepmaster
Copy link
Member

but it clearly states that the memory is allocated by Rust

I disagree. There's no reason to assume that a *const Foo was allocated by Rust or the system allocator or any other allocator.

a type that is indistinguishable from foreign-allocated, otherwise-owned, or static strings

The problem is that we are dealing with C /FFI and these types are indistinguishable.

I think that the additional cognitive overhead of having to remember that a RustAllocChar can be coerced to a *const c_char would also be annoying.

It also sound like you are trying to use these methods inside of Rust code to call FFI code. That's definitely not the intended use case, as you don't want to have a one-way transfer of ownership due to the reallocation requirements.

Box::from_raw and into_raw return and take mutable pointers, respectively

That's a good reason to change them, and it was probably a mistake on my part originally. Honestly, the fluidity of const / mut with raw pointers always makes me treat it as a second-class citizen. ^_^

@geofft
Copy link
Contributor

geofft commented Sep 3, 2015

I disagree. There's no reason to assume that a *const Foo was allocated by Rust or the system allocator or any other allocator. [...] The problem is that we are dealing with C /FFI and these types are indistinguishable.

Maybe I was unclear, but this depends specifically on annotating types with a *const TheseBytesWereAllocatedByRust, not any old *const Foo, and have a guarantee that TheseBytesWereAllocatedByRust is only instantiated by CString (sorta like how we have a guarantee that, e.g., str is UTF-8, despite it having the exact same representation as [u8]). This static guarantee is powered by it being a non-instantiable enum, so only unsafe code can generate or consume pointers to it.

Obviously this requires defining the FFI interfaces to use these types when appropriate, and not use them when inappropriate. I don't think that's too hard (we just need to pick a less stupid name than the ones I've come up with).

To be fair, it is confusing that I'm distinguishing the type of the pointed-to thing, not the type of the pointer, when it is semantically a different sort of (raw) pointer to the same c_char. It would be nice if the FFI bindings could take a RustAllocPtr<c_char>, or perhaps even a CString directly. But I think there's no way to make that FFI-safe since you'd be wrapping the pointer with a struct, and struct {char *} and char * aren't interchangeable (they're passed differently on the x86-32 System V ABI, in particular). If there's a way to newtype a primitive type without wrapping it with a struct, that would be awesome. But short of that, I'm doing a newtype on the target of the pointer, since all data pointers have the same ABI (struct {char} * and char * are interchangeable on all platforms Rust is portable to).

I think that the additional cognitive overhead of having to remember that a RustAllocChar can be coerced to a *const c_char would also be annoying.

I'm not clear on when you'd need to coerce it, in Rust code. The only purpose of a *mut RustAllocChar is to be immediately converted to/from CString. If you need a *const c_char (which represents some sort of borrow), in Rust, go to the CString first and then borrow from it. While transmuting from *mut RustAllocChar to *const c_char is safe to do, it probably represents a memory leak.

It also sound like you are trying to use these methods inside of Rust code to call FFI code. That's definitely not the intended use case, as you don't want to have a one-way transfer of ownership due to the reallocation requirements.

Yeah, drop my first example with the callback because that's more confusing than it's worth. What I have in mind is implementing a C API contract that says something like, "This function returns a pointer. When you are done with it, free it by calling this other function," as you would do if you were implementing a C-ABI plugin spec, or writing a C-ABI library in Rust. Your example from #25777 does this. If I added both proper const usage (in Rust and C) and the newtype I'm suggesting (in Rust), you'd get:

#[no_mangle]
pub extern fn reverse(s: *const libc::c_char) -> *mut RustAllocChar {
    let s = unsafe { CStr::from_ptr(s) };
    let s2 = s.to_str().unwrap();
    let s3: String = s2.chars().rev().collect();
    let s4 = CString::new(s3).unwrap();
    s4.into_ptr()
}

#[no_mangle]
pub extern fn cleanup(s: *mut RustAllocChar) {
    unsafe { CString::from_ptr(s) };
}
#include <stdio.h>

extern char *reverse(const char *);
extern void cleanup(char *);

int main() {
  char *s = reverse("Hello, world!");
  printf("%s\n", s);
  cleanup(s);
}

which, at least on the Rust side, makes it clear what pointers are owned and which are borrowed. On the C side, the API docs or header files would probably have a comment on reverse() saying something like "The caller owns the resulting pointer and must free it by calling cleanup()", which conveys the same thing as the type, except in prose (which is basically how life in C goes).

fluidity of const / mut with raw pointers

Only when you're defining the FFI. Rust requires you to do an unsafe cast to turn const to mut, and C compilers give you a warning if you do it as an implicit cast. (This is what allows C programmers to have the discipline of owned-pointers-are-mutable-pointers + make-every-pointer-const-if-possible, since you can only implicitly cast mutable pointers to void * when calling free.)

Incidentally, the fact that C literals have type char * instead of const char * is a stupid historical artifact dating from C's need to be backwards-compatible with code from literally before const was invented. Good C code should treat literals as const char * as far as possible.

@alexcrichton
Copy link
Member Author

@geofft

I agree with changing {into,from}_raw to taking *mut T instead of a *const T, but I'm less sure about adding a custom type to represent a Rust-owned character. You mention that "only unsafe code can generate or consume pointers to it" but can't you have safe (in terms of Rust) code that does 1234 as *mut RustAllocChar? In that sense these sorts of raw pointers can be manufactured at any time it seems like.

In general I feel the extra cognitive overhead of having an extra type just for this return value may not be worth the little bit of extra safety we get. For example there's still the case where Box::into_raw will return types like *mut i32 which can then be passed down into C (for example), and we probably wouldn't want to wrap that up an an extra wrapper.

@geofft
Copy link
Contributor

geofft commented Sep 3, 2015

You mention that "only unsafe code can generate or consume pointers to it" but can't you have safe (in terms of Rust) code that does 1234 as *mut RustAllocChar? In that sense these sorts of raw pointers can be manufactured at any time it seems like.

... sigh, yes, you're right. OK, I withdraw that.

So if the answer is that there's no practical way to increase the type-safety of these functions, that's fine. And from_raw taking a *mut c_char gets us most of the way there, anyway.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Sep 3, 2015
Conventionally in C `*mut T` is a transfer of ownership where `*const T` is a
loan, so `*mut T` is likely the more appropriate return type for these
functions. Additionally, this more closely mirrors the APIs on `Box` for this
sort of functionality.

cc rust-lang#27769
bors added a commit that referenced this issue Sep 9, 2015
Conventionally in C `*mut T` is a transfer of ownership where `*const T` is a
loan, so `*mut T` is likely the more appropriate return type for these
functions. Additionally, this more closely mirrors the APIs on `Box` for this
sort of functionality.

cc #27769
@aturon
Copy link
Member

aturon commented Sep 9, 2015

@gankro Some of the above discussion makes me think about your plans with Owned/Shared -- did you have a draft for those?

alexcrichton added a commit to alexcrichton/rust that referenced this issue Sep 13, 2015
The FCP is coming to a close and 1.4 is coming out soon, so this brings in the
libs team decision for all library features this cycle.

Stabilized APIs:

* `<Box<str>>::into_string`
* `Arc::downgrade`
* `Arc::get_mut`
* `Arc::make_mut`
* `Arc::try_unwrap`
* `Box::from_raw`
* `Box::into_raw`
* `CStr::to_str`
* `CStr::to_string_lossy`
* `CString::from_raw`
* `CString::into_raw`
* `IntoRawFd::into_raw_fd`
* `IntoRawFd`
* `IntoRawHandle::into_raw_handle`
* `IntoRawHandle`
* `IntoRawSocket::into_raw_socket`
* `IntoRawSocket`
* `Rc::downgrade`
* `Rc::get_mut`
* `Rc::make_mut`
* `Rc::try_unwrap`
* `Result::expect`
* `String::into_boxed_slice`
* `TcpSocket::read_timeout`
* `TcpSocket::set_read_timeout`
* `TcpSocket::set_write_timeout`
* `TcpSocket::write_timeout`
* `UdpSocket::read_timeout`
* `UdpSocket::set_read_timeout`
* `UdpSocket::set_write_timeout`
* `UdpSocket::write_timeout`
* `Vec::append`
* `Vec::split_off`
* `VecDeque::append`
* `VecDeque::retain`
* `VecDeque::split_off`
* `rc::Weak::upgrade`
* `rc::Weak`
* `slice::Iter::as_slice`
* `slice::IterMut::into_slice`
* `str::CharIndices::as_str`
* `str::Chars::as_str`
* `str::split_at_mut`
* `str::split_at`
* `sync::Weak::upgrade`
* `sync::Weak`
* `thread::park_timeout`
* `thread::sleep`

Deprecated APIs

* `BTreeMap::with_b`
* `BTreeSet::with_b`
* `Option::as_mut_slice`
* `Option::as_slice`
* `Result::as_mut_slice`
* `Result::as_slice`
* `f32::from_str_radix`
* `f64::from_str_radix`

Closes rust-lang#27277
Closes rust-lang#27718
Closes rust-lang#27736
Closes rust-lang#27764
Closes rust-lang#27765
Closes rust-lang#27766
Closes rust-lang#27767
Closes rust-lang#27768
Closes rust-lang#27769
Closes rust-lang#27771
Closes rust-lang#27773
Closes rust-lang#27775
Closes rust-lang#27776
Closes rust-lang#27785
Closes rust-lang#27792
Closes rust-lang#27795
Closes rust-lang#27797
bors added a commit that referenced this issue Sep 13, 2015
The FCP is coming to a close and 1.4 is coming out soon, so this brings in the
libs team decision for all library features this cycle.

Stabilized APIs:

* `<Box<str>>::into_string`
* `Arc::downgrade`
* `Arc::get_mut`
* `Arc::make_mut`
* `Arc::try_unwrap`
* `Box::from_raw`
* `Box::into_raw`
* `CStr::to_str`
* `CStr::to_string_lossy`
* `CString::from_raw`
* `CString::into_raw`
* `IntoRawFd::into_raw_fd`
* `IntoRawFd`
* `IntoRawHandle::into_raw_handle`
* `IntoRawHandle`
* `IntoRawSocket::into_raw_socket`
* `IntoRawSocket`
* `Rc::downgrade`
* `Rc::get_mut`
* `Rc::make_mut`
* `Rc::try_unwrap`
* `Result::expect`
* `String::into_boxed_slice`
* `TcpSocket::read_timeout`
* `TcpSocket::set_read_timeout`
* `TcpSocket::set_write_timeout`
* `TcpSocket::write_timeout`
* `UdpSocket::read_timeout`
* `UdpSocket::set_read_timeout`
* `UdpSocket::set_write_timeout`
* `UdpSocket::write_timeout`
* `Vec::append`
* `Vec::split_off`
* `VecDeque::append`
* `VecDeque::retain`
* `VecDeque::split_off`
* `rc::Weak::upgrade`
* `rc::Weak`
* `slice::Iter::as_slice`
* `slice::IterMut::into_slice`
* `str::CharIndices::as_str`
* `str::Chars::as_str`
* `str::split_at_mut`
* `str::split_at`
* `sync::Weak::upgrade`
* `sync::Weak`
* `thread::park_timeout`
* `thread::sleep`

Deprecated APIs

* `BTreeMap::with_b`
* `BTreeSet::with_b`
* `Option::as_mut_slice`
* `Option::as_slice`
* `Result::as_mut_slice`
* `Result::as_slice`
* `f32::from_str_radix`
* `f64::from_str_radix`

Closes #27277
Closes #27718
Closes #27736
Closes #27764
Closes #27765
Closes #27766
Closes #27767
Closes #27768
Closes #27769
Closes #27771
Closes #27773
Closes #27775
Closes #27776
Closes #27785
Closes #27792
Closes #27795
Closes #27797
@steveklabnik steveklabnik added this to the 1.3 milestone Sep 18, 2015
@steveklabnik steveklabnik modified the milestones: 1.4, 1.3 Oct 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants