-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to return allocated strings to C and then free them later #25777
Conversation
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
let len_with_nul = len as usize + 1; | ||
let vec = Vec::from_raw_parts(ptr as *mut u8, len_with_nul, len_with_nul); | ||
CString::from_vec_unchecked(vec) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function needsa warning itself that it only handles precisely the pointers returned from into_raw_ptr. Or otherwise explain the preconditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely.
One question is whether this should be included in the higher-level documentation, as an example of how to return allocated strings via FFI. |
Also note I've left all stability attributes off these methods, as I'm not sure what is most appropriate. |
Interesting! I agree with @bluss that this needs some documentation bolstering to outline some of the key aspects of this API:
In terms of naming this may want to consider either Stability-wise these just need explicit At a higher level, I'm curious what you were doing to want to do this kind of operation? I would not expect this to be a common operation at the FFI boundary (due to the need to free the resources at some point). |
I've not done anything specific myself, but I've helped various people want to do something similar. It's a straightforward next step after an FFI example that adds two numbers; one that concats two strings.
Perhaps I'm missing something, but how would you foresee any library that does heavy string processing to be called? One of the concepts that I'm excited most for Rust is writing a core library in Rust and then exposing C bindings. We can then use that library in other languages (Ruby is of particular interest to me). I feel like
|
FWIW, I see |
I suppose it definitely depends on the exact library in play, but I would expect some other owned
Yeah I do agree that this seems like a good tool to have in the toolbox when dealing with FFI, I'm just somewhat hesitant because memory management is often quite difficult when it comes to an FFI boundary and I don't think this will necessarily open up the door to easy management of strings per se. For example in the question you linked to the returned string would need to be deallocated at some point (assuming the incoming string is changed to Not that any of that should stop this at the gates, just something to keep in mind! |
But you'd have the same issue if you return the |
ff86ab6
to
12151bc
Compare
@bluss @alexcrichton I've updated the docs a smidge and added the stability attributes. |
If the caller were to free the strings, wouldn't they want to use |
Generally, no. Many larger C libraries have their own allocators, and you need to use them to prevent passing invalid pointers between allocators. GLib and libxml2 are two examples that come to mind. You can potentially compile these libraries with alternate allocation functions — perhaps you have your own allocator that does extra profiling for example.
I don't know how, as I haven't seen anything parameterized by allocator yet. However, were this to ever exist, it could be neat to use |
@shepmaster, @gkoz yeah I agree that we want to not recommend using I think this all looks good to me, thanks @shepmaster! I realize now though that the sibling function of cc @rust-lang/libs, with the renaming of |
// determinable from the string length, and shrinking to fit | ||
// is the only way to be sure. | ||
let mut vec = self.inner; | ||
vec.shrink_to_fit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that shrink_to_fit
does not guarantee that the capacity will be exactly equal to length. Isn't this important here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation in Vec guarantees that vec.capacity()
== vec.len()
, so the vector has no more information to than ptr + length either at that point. Maybe the docs should update to guarantee that(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, Vec::shrink_to_fit
says:
Shrinks the capacity of the vector as much as possible.
It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a quote from the docs:
It will drop down as close as possible to the length but the allocator may still inform the vector that there is space for a few more elements.
Maybe it should state more clearly what actually happens there or removed altogether if it does not hold anymore. Also vec.capacity()
== vec.len()
does not mean that the actually allocated memory will be equal to vec.len()
, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. The vector needs the value capacity
as a cookie to give to the allocator when freeing the memory. If len == capacity is fine in the vector internally, that's going be a compatible cookie value to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this actually switch to using into_boxed_slice actually
@alexcrichton that was originally my first attempt, but I would have needed to change CString, as you noticed. I can give that refactoring a shot to start with, and then add this commit on top of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we already depend on shrink_to_fit to actually "shrink to fit"
Should the docs for this method be updated? Or maybe some non-doc-comments added that denote that it's being used to truly shrink-to-fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave the shrink_to_fit
docs as-is for now, but layering on a second commit sounds good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note on allocators is, to my knowledge (I'm pretty sure I wrote that doc comment), a reservation for future behaviour that is not implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular it might happen once we get local allocators which naturally might favour a more coarse-grain than full-blown jemalloc.
cc me (i'm wondering about potential overlap with allocator API's that should be coming in the future). |
@alexcrichton I actually deliberately chose to not call it that, because |
r? @alexcrichton (Transferring reviewership, I haven't been keeping track at all...) |
@shepmaster ah yes indeed! I think I would personally err on the side of |
12151bc
to
a425a42
Compare
@alexcrichton OK, updated with the change to |
pub unsafe fn from_ptr(ptr: *const libc::c_char) -> CString { | ||
let len = libc::strlen(ptr); | ||
let len_with_nul = len as usize + 1; | ||
let vec = Vec::from_raw_parts(ptr as *mut u8, len_with_nul, len_with_nul); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of going through Vec
, I think it may be best to go directly to a Box<[u8]>
because it's where the pointer originally came from, for example:
let len = libc:;strlen(ptr) + 1;
let slice = slice::from_raw_parts(ptr, len as usize);
CString { inner: mem::transmute(slice) }
Going through Vec
and then possibly hitting shrink_to_fit
may be adding a bit more indirection than necessary (and may also be somewhat less robust)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah as one final note, could you add to the docs that this method will calculate the length of the string specified by ptr
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton both concerns addressed! Let me know if the added doc line conveys what you intended.
Looks great to me, thanks @shepmaster! |
a425a42
to
e20a6db
Compare
As far as I was able to determine, it's currently *impossible* to allocate a C NUL-terminated string in Rust and then return it to C (transferring ownership), without leaking memory. There is support for passing the string to C (borrowing). To complicate matters, it's not possible for the C code to just call `free` on the allocated string, due to the different allocators in use. `CString` has no way to recreate itself from a pointer. This commit adds one. This is complicated a bit because Rust `Vec`s want the pointer, size, and capacity. To deal with that, another method to shrink and "leak" the `CString` to a `char *` is also provided. We can then use `strlen` to determine the length of the string, which must match the capacity. **TODO** - [x] Improve documentation - [x] Add stability markers - [x] Convert to `Box<[u8]>` ### Example code With this example code: ```rust #![feature(libc)] #![feature(cstr_to_str)] #![feature(c_str_memory)] extern crate libc; use std::ffi::{CStr,CString}; #[no_mangle] pub extern fn reverse(s: *const libc::c_char) -> *const libc::c_char { let s = unsafe { CStr::from_ptr(s) }; let s2 = s.to_str().unwrap(); let s3: String = s2.chars().rev().collect(); let s4 = CString::new(s3).unwrap(); s4.into_ptr() } #[no_mangle] pub extern fn cleanup(s: *const libc::c_char) { unsafe { CString::from_ptr(s) }; } ``` Compiled using `rustc --crate-type dylib str.rs`, I was able to link against it from C (`gcc -L. -l str str.c -o str`): ```c #include <stdio.h> extern char *reverse(char *); extern void cleanup(char *); int main() { char *s = reverse("Hello, world!"); printf("%s\n", s); cleanup(s); } ``` As well as dynamically link via Ruby: ```ruby require 'fiddle' require 'fiddle/import' module LibSum extend Fiddle::Importer dlload './libstr.dylib' extern 'char* reverse(char *)' extern 'void cleanup(char *)' end s = LibSum.reverse("hello, world!") puts s LibSum.cleanup(s) ```
Isn't there a naming inconsistency now between slice and Box that provide "from/into_raw" and CString with "from/into_ptr"? |
Kind of, but there was already one with |
@gkoz not that it can't be changed of course, but at least we thought about it some 😺 |
@shepmaster I'd ask @alexcrichton to take another look at this. The Maybe |
@gkoz thanks for pointing this out, I hadn't considered the |
For what it's worth, I think |
Apparently some APIs do want to When writing a Zabbix loadable module:
Corroborated by https://github.com/zabbix/zabbix/blob/trunk/include/module.h#L109-114 /* NOTE: always allocate new memory for val! DON'T USE STATIC OR STACK MEMORY!!! */
#define SET_STR_RESULT(res, val) \
( \
(res)->type |= AR_STRING, \
(res)->str = (char *)(val) \
) So there might be some demand for a I wonder how the RFC: Allow changing the default allocator is going to impact this. Looks like |
As far as I was able to determine, it's currently impossible to allocate a C NUL-terminated string in Rust and then return it to C (transferring ownership), without leaking memory. There is support for passing the string to C (borrowing).
To complicate matters, it's not possible for the C code to just call
free
on the allocated string, due to the different allocators in use.CString
has no way to recreate itself from a pointer. This commit adds one. This is complicated a bit because RustVec
s want the pointer, size, and capacity.To deal with that, another method to shrink and "leak" the
CString
to achar *
is also provided.We can then use
strlen
to determine the length of the string, which must match the capacity.TODO
Box<[u8]>
Example code
With this example code:
Compiled using
rustc --crate-type dylib str.rs
, I was able to link against it from C (gcc -L. -l str str.c -o str
):As well as dynamically link via Ruby: