-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: CStr, the dereferenced complement to CString #592
Conversation
👍 to the general idea: some kind of I think this could be added as a motivating usecase: // C library:
const char* f1(); // return value has static lifetime
void f2(const char*); /// Safe rust bindings:
fn f1() -> &'static CStr { ... }
fn f2(s: &CStr) { ... }
// Caller using only safe code:
f2(f1()) If I also like the 'irrelevantly-sized type': my approach using a DST CStr would involve an unnecessary However, this proposal seems to be missing some details around the type:
|
@dgrunwald, thanks for your comments.
These address a special case: they are non-copying ways to get
In this proposal, I view unsafe fn c_str_from_raw_ptr<'a, T: ?Sized>(ptr: *const libc::c_char,
life_anchor: &'a T)
-> &'a CStr
Yes, and trivially. This, however, is not essential to the RFC and can be added later.
I'm hesitant about these, and I don't quite see the purpose of An alternative is to make |
NB: std::ffi is annoyingly vague, perhaps std::ffi::c_str would not need disambiguation on function names.
How do you prevent instances of CStr being created on the stack without making it unsized? |
@Diggsey, you can't safely move a borrowed non- |
It still seems like there's potential for problems: for example, if DerefMut is ever implemented for CString (not unlikely, as it doesn't enforce UTF8 afaict) then you'll have to special-case "swap" to prevent it being used with CStr. You're allowing the programmer to bypass any "Sized" constraints on type parameters: for example, "sizeof" would give incorrect results when applied to CStr. What's really needed is an unsized type without fat references: maybe there should be an "Unsized" marker class, or a way to negatively implement the Sized trait for a type, which doesn't affect its in-memory representation. |
FWIW
Are there any examples of safe code demonstrating how this could be unintentionally misused? My understanding is that, as long as you can't safely initialize, mutate, copy, or move a value, its size should not be important. I also use this technique when given a public view into an object that has a private part allocated alongside: #[repr(C)]
pub struct Object {
raw: ffi::RawObject, // we know it's not all there is
no_copy: std::marker::NoCopy
}
impl Object {
// an example with a static ref, just for the sake of brevity
pub fn get_static() -> &'static Object {
unsafe {
let ptr = ffi::raw_object_get_static();
std::mem::copy_lifetime("", &*(ptr as *const Object))
}
}
} Being able to opt out of |
Example with safe code: let a: &'a mut CStr = foo.as_c_str();
let b: &'b mut CStr = bar.as_c_str();
mem::swap(a, b); This swaps the actual |
If we don't allow any method to create a mutable #[repr(C)]
pub struct CStringRef<'a> {
raw: core::nonzero::NonZero<*const libc::c_char>,
marker: std::marker::ContravariantLifetime<'a>,
} That is, instead of trying to use an If |
CStr is not the only reason to want unsized types with thin references: the PIMPL idiom in C++ requires a similar ability. You can get away with just not having a dereferenced type (like quantheory described), but it makes for some weird semantics: CStr and CString are both reference types, and yet CString dereferences to a CStr? Either that or you have to duplicate CStr functionality on a CString. |
That's not great for
Aside from the |
I don't think it's that strange, any more than
However, I think that this pretty much kills the idea. I don't know why I didn't think of that problem, since I've had the exact same issue with the |
Some ideation on the forum: http://discuss.rust-lang.org/t/unsized-types-distinct-from-dsts/1385 |
Unsized types definitely the way to go for this. Though I think dynamic sized [lib::char] wrapper enforcing C invariant could be useful too. Your deallocation ideas should be subsumed by an allocator API, whenever we get it. |
RFC PR rust-lang#556 is likely to get discarded.
This would need the allocator API to get resolved, as pointed out by John Ericson: rust-lang#592 (comment)
There is now a tracking issue for the postponed RFC.
I'm late to this party, but this is a great RFC! I love the idea of introducing a slice type for null-terminated data. I think the various motivations are well-covered in the RFC and this thread, and I buy them. This will really tighten up the That said, I agree with the original stance that @mzabaluev took that So that said, I'm interested in some means of exposing this type as unsized from the get-go. (Whereas the main proposal here exposes that |
@aturon, if |
I think only for unsafe code that depended on the exact representation. I will try to get some second opinions on this though, it's an important question.
I don't think so -- we want to stabilize |
I think in general using Rust types in FFI is pretty hazardous and I don't think either way that we should start defining FFI functions like: extern {
fn foo() -> &'static CStr;
} I would be ok changing the runtime representation of |
That's not an issue for me; I don't think any code that would e.g. transmute the reference into a slice could expect any guarantees on backwards compatibility. What's the general stance on binary compatibility in Rust 1.0 timeframe? Does anyone care? |
Completely off the table. ABI stability is going to be a very long road. |
@alexcrichton I don't suggest
|
@mzabaluev would you be ok with the |
@alexcrichton Yes, I'm convinced. I will update the RFC soon. |
I've got another opportunistic change to suggest. Having the conversion functions panic on inner NULs is a little... severe. How about |
@mzabaluev I would be in favor of such a change. We'll probably still panic in common cases when using |
I personally see the panic on inner nul as a contract of the |
@alexcrichton A potential problem is, it is very rare that a real-life string has NUL characters, and at the same time it is perfectly valid in the Rust world and in Unicode text in general. So developers may be blissfully forgetful of the panic until it hits their software in the field. DoS possibilities galore. |
I personally see the fact that it's so rare as a justification for panicking as it's otherwise that much more difficult to deal with for everyone using |
I posted a discuss thread on the topic of panicking here. |
As suggested by Aaron Turon: rust-lang#592 (comment)
I have updated the proposal to @aturon's suggestions, adding a disclaimer that the inner slice is not guaranteed to have any particular size. This way we can still avoid the |
As decided in process of developing the CStr RFC [1], this is the way to go with CStr as an unsized type. The length of the internal slice is bogus; it only provides safe access to the first byte for the implementation of .is_empty(). [1] rust-lang/rfcs#592
@mzabaluev Thanks again for this RFC! After discussion on this thread, on the discuss forum, and amongst the team, it seems that almost everyone is in favor of this change. We have decided to move forward, and try to land these revisions ahead of alpha2 if possible. |
Harmonization with the changes in std::ffi landed as the result of rust-lang/rfcs#592
Change the
Deref
target ofCString
to a token typeCStr
.Add helper functions and macros to produce
CStr
references out of static string data.Rendered