-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GoString and CString functions to match C package API #121
Comments
Does
How does a user call free with purego? |
Yes. That's what the C package does. Matching it's behavior would be the least surprising
And freeHandle can be gotten using |
Sure, I'm fine if this is explicitly documented. EDIT: I missed that the comment already said so: "Cstring converts a Go string to a null-terminated C string." Thanks!
So do we dlopen libc? |
I don't think purego should export free nor libc in its API. How someone wants to free it is up to the user |
Doesn't this depend on how |
If |
That is true. I'm not sure what signature to use though so leaving it up to the user seems best. Does it take a *byte since that's what Here is an example of what I was thinking. func getSystemLibrary() string {
switch runtime.GOOS {
case "darwin":
return "/usr/lib/libSystem.B.dylib"
case "linux":
return "libc.so.6"
default:
panic(fmt.Errorf("GOOS=%s is not supported", runtime.GOOS))
}
}
var malloc func(size int) *byte
func init() {
libc, err := purego.Dlopen(getSystemLibrary(), purego.RTLD_LAZY|purego.RTLD_GLOBAL)
if err != nil {
panic(fmt.Errorf("could not open libc: %w", err))
}
purego.RegisterLibFunc(&malloc, libc, "malloc")
}
// CString converts a Go string to a null-terminated C string.
// The C string is allocated in the C heap using malloc.
// It is the caller's responsibility to arrange for it to be
// freed, such as by calling free.
func CString(s string) *byte {
if len(s) == 0 {
return nil
}
size := len(s) +1
c := unsafe.Slice(malloc(size), size)
copy(c, s)
return &c[0]
} |
There is no way for users to know how to free the allocated memory, other than knowing what DLL/so is used, right? I don't think this is a good API... By the way, does CString have to allocate the memory on C heap, not Go heap? If so, I was wondering why. |
Perhaps then returning a func CString(s string) (cStr *byte, free func())
It's the same reason for Cgo. If the C code retains the pointer but the Go code thinks it needs to be garbage collected (since it can't see into the C code) then memory corruption can happen. It's safe to pass a Go string into C if the C code doesn't keep the pointer. |
I think this makes sense, though this would vary from the original Cgo's CString.
As the users have to control freeing the value, would this work for example? func CString(s string) (*byte, func()) {
bs := []byte(s)
if len(bs) == 0 || bs[len(bs)-1] == 0 {
bs = append(bs, 0)
}
return &bs[0], func() { runtime.KeepAlive(bs) }
} |
True. I'm not sure there is a nice way to free while keeping the original API.
I don't see why not since either way it is the responsibility of the caller to ensure the string is freed properly. Although what happens if the free function is ignored ( Perhaps, something like this? func CString(str string) (*byte, func()) {
var b []byte
if len(str) == 0 {
b = []byte{0x00}
} else if hasSuffix(str, "\x00") {
b = []byte(str)
} else {
b = make([]byte, len(str)+1)
copy(b, str)
}
return &b[0], func(){ runtime.KeepAlive(b) }
} Also, do we want the implementation to turn an empty string into a C string that only has null character instead of returning nil? |
Although. What if the C function never releases its access to the string? Shouldn't it than be allocated on the C heap? |
I see. Then what about keeping the string in a global slice or a global map in Go?
My code snippet is just a prototype so if there is a more efficient way, let's adopt this :-)
Not sure. Let's sync with the original CString.
I don't see the point. Freeing is still the user's responsibility, right? |
For example: func CString(str string) (*byte, func()) {
var b []byte
if len(str) == 0 {
b = []byte{0x00}
} else if hasSuffix(str, "\x00") {
b = []byte(str)
} else {
b = make([]byte, len(str)+1)
copy(b, str)
}
key := uintptr(unsafe.Pointer(&b[0]))
// TODO: Need to protect this by a mutex
globalStringMap[key] = b
return &b[0], func(){ delete(globalStringMap, key) }
} |
I just remembered why the string needs to be on the C heap. The reason is because of Go's growing and shrinking stack. If C saves the string and then later Go grows the stack the C string pointer will point to random memory. This is bad. |
Does this problem occur even if a Go string is allocated in a Go heap? |
Hmm maybe it isn't an issue cuz the heap isn't affected by the stack growth. |
My guess is that allocated memory by the original CString might be freed in the C side, though I have never seen such usage. Unfortunately, in our purego, it might be impossible to guarantee that the memory allocator by libc selected by purego matches with the library's allocator. |
Well, Cgo assumes the default implementation that is used by the system libc. So if they assume that can't we? |
Maybe yes? Anyway, I'm not so strongly against using C allocators, but I feel like using Go allocators is easier and more efficient. |
This will be 4 allocations due to the closure, won't it? re: auto-libc/malloc/free, I think if people have strong opinions about how they want their C memory allocated, they are likely to import the library of their choice themselves, I think having a common default malloc/free available in purego would be uncontroversial. Personally, I find that mallocs tend to be more efficient than go allocations and so if I'm working in an environment where I have to manually free memory anyway and it wont' confuse the garbage collector, I'll use C memory. Passing go memory you had to allocate anyway into C instead of C memory is a performance win, but if you're already having to allocate a separate string, and you already have to manually free it, and you're already having to play games with the garbage collector, I'd say just allocate it in C and be done with it. |
I see. Allocating memory in the C world seems better in terms of the amount of memory allocations. I'm fine with it now. |
Another possible suggestion is to use syscall.Mmap, but I don't think this is feasible |
codepackage purego_test
import (
"fmt"
"runtime"
"strings"
"testing"
"unsafe"
"github.com/ebitengine/purego"
)
const str = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce scelerisque fringilla ligula ut finibus. Aenean iaculis convallis erat, ornare tristique turpis. Fusce scelerisque dolor quis faucibus varius. In condimentum risus pharetra augue laoreet convallis. Quisque bibendum consequat ante sit amet molestie. Curabitur vel quam posuere, condimentum turpis sit amet, dapibus massa. In egestas tellus sed ante laoreet tempor.
Fusce ipsum velit, viverra in porttitor in, ultrices quis augue. Nulla urna ante, tincidunt eu ante nec, tincidunt sagittis felis. Curabitur ut imperdiet augue. Etiam nec ligula convallis, convallis leo ut, pulvinar tellus. Proin imperdiet augue ac ipsum scelerisque bibendum. Quisque id egestas turpis. Sed nec elit convallis, lacinia mi a, sagittis purus. Morbi euismod ut nunc at tempus. Cras rhoncus ut nisi at consequat. Suspendisse pellentesque erat arcu, nec volutpat sapien gravida ut.
Vestibulum elementum et nulla vitae malesuada. Suspendisse ut diam mi. Nullam non urna dignissim, cursus ipsum sed, pellentesque enim. Aliquam erat volutpat. Morbi id maximus ligula. Fusce lacinia est et nulla congue euismod. Quisque pulvinar ante magna, sed dictum nibh porta non. Etiam interdum nisl sed feugiat lobortis. Aenean in nunc nunc. Praesent mollis dapibus velit nec porttitor. Ut dictum sem nec condimentum vehicula. Nullam tempor lectus vitae venenatis euismod. Suspendisse potenti. Phasellus consectetur tincidunt nulla id aliquet. Pellentesque sem lorem, commodo at nisi a, iaculis aliquam nisi. In quis lacinia purus, ut rhoncus orci.
Nam varius pellentesque eros ac ullamcorper. Aenean nec nisl ultricies, iaculis sem vel, iaculis ex. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Sed vel hendrerit dolor. Suspendisse blandit venenatis iaculis. Phasellus vel convallis purus. Sed gravida sit amet ipsum id fringilla. In at risus vel lorem faucibus imperdiet. Curabitur id mattis est. Cras cursus egestas turpis, a aliquet dolor semper nec. Praesent dictum orci non diam convallis consectetur. Duis imperdiet, ante non tempus lacinia, nunc nisi feugiat elit, at euismod urna erat a nibh. Vestibulum vehicula vestibulum nulla sed dignissim. Sed quis nisi mauris. Morbi euismod tellus est, eget aliquam justo lacinia sed. Donec id dignissim dolor.
Cras blandit facilisis nulla, ut tincidunt purus vehicula vel. Suspendisse potenti. Maecenas porta vehicula tortor, ac porta eros lobortis vitae. Donec egestas diam velit, nec posuere enim vulputate eu. Nulla vulputate tempus dolor, eu convallis purus pellentesque sit amet. Mauris feugiat quam libero, sed molestie neque maximus nec. Mauris finibus vehicula eros sed euismod.`
var globalStringMap = map[uintptr][]byte{}
//go:noinline
func CStringMap(str string) (*byte, func()) {
var b []byte
if len(str) == 0 {
b = []byte{0x00}
} else if strings.HasSuffix(str, "\x00") {
b = []byte(str)
} else {
b = make([]byte, len(str)+1)
copy(b, str)
}
key := uintptr(unsafe.Pointer(&b[0]))
// TODO: Need to protect this by a mutex
globalStringMap[key] = b
return &b[0], func() { delete(globalStringMap, key) }
}
//go:noinline
func CStringGo(str string) (*byte, func()) {
var b []byte
if len(str) == 0 {
b = []byte{0x00}
} else if strings.HasSuffix(str, "\x00") {
b = []byte(str)
} else {
b = make([]byte, len(str)+1)
copy(b, str)
}
return &b[0], func() { runtime.KeepAlive(b) }
}
func libc() string {
switch runtime.GOOS {
case "darwin":
return "/usr/lib/libSystem.B.dylib"
case "linux":
return "libc.so.6"
default:
panic(fmt.Errorf("GOOS=%s is not supported", runtime.GOOS))
}
}
var (
malloc func(size int) *byte
free func(*byte)
)
func init() {
libc, err := purego.Dlopen(libc(), purego.RTLD_LAZY|purego.RTLD_GLOBAL)
if err != nil {
panic(fmt.Errorf("could not open libc: %w", err))
}
purego.RegisterLibFunc(&malloc, libc, "malloc")
purego.RegisterLibFunc(&free, libc, "free")
}
//go:noinline
func CStringC(s string) (*byte, func()) {
size := len(s) + 1
c := unsafe.Slice(malloc(size), size)
copy(c, s)
return &c[0], func() { free(&c[0]) }
}
func BenchmarkCStringMap(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_, free := CStringMap(str)
free()
}
}
func BenchmarkCStringGo(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_, free := CStringGo(str)
free()
}
}
func BenchmarkCStringC(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_, free := CStringC(str)
free()
}
} Instead of guessing which is the fastest implementation I went ahead and wrote benchmarks for each of the discussed implementations. Here are my results:
Both the map and Go version only have two allocs: one for the slice and the other for the closure. The C version is significantly slower probably due to the overhead of RegisterLibFunc. Perhaps its performance could be improved but I doubt it will get anywhere near the two go versions. Personally, I like the plain Go version because it is easy to understand. The only reason that we were leaning away from it is because if the free function is ignored the GC could remove the pointer at any time. It would only be a problem if one wanted to allocate a C string and then never free it later (for a global). Either way, map is still quite fast and doesn't have the downside of undefined behavior when ignoring the free function. I just thought of another possibility. Say, the C string is allocated in Go but then later some C code will choose to free it. With this mechanism of allocating in the Go heap that won't work. |
This rewrites part of the RegisterFunc to have less code escape to the heap. Running the benchmark inside ebitengine#121 shows a speed increase and alloc decrease. goos: darwin goarch: arm64 pkg: github.com/ebitengine/purego BenchmarkCStringC BenchmarkCStringC-10 2294838 500.3 ns/op 472 B/op 8 allocs/op PASS
This rewrites part of the RegisterFunc to have less code escape to the heap. Running the benchmark inside #121 shows a speed increase and alloc decrease. goos: darwin goarch: arm64 pkg: github.com/ebitengine/purego BenchmarkCStringC BenchmarkCStringC-10 2294838 500.3 ns/op 472 B/op 8 allocs/op PASS
From #161 (comment)
Right, I was misunderstanding. Thanks. So now (arguments) (returns) is my understanding. Is this correct? |
Yes your understanding is correct |
When using the the special C package there are a few functions that convert from Go types to C types and vice versa. When porting code that uses C it's annoying not having an equivalent to these functions. Especially,
GoString
andCString
which convert a C string to a Go string and vice versa. Even ebitengine had to have it's own implementation and writing these over and over is begging for mistakes. Purego already uses it's own internal versions.I purpose adding these two functions to purego with the following signatures.
I believe these are the most used functions so should be added first. If there is strong demand later for the other ones a new issue can be created.
The text was updated successfully, but these errors were encountered: