-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why not an NTuple? #6
Comments
Good question! I didn't explicitly code up the NTuple approach, but did consider it when contemplating various approaches to the representation (ShortStrings.jl, the original idea for this package, just wrapped UInts for the representation). The answer for primitive vs. NTuple is that it's much easier to "work" with primitive types vs. NTuple for the operations I needed. In particular doing the byte-swap operation is just a call to the native LLVM instruction I was also pleasantly surprised to learn that the native LLVM instructions "just work" on any size primitive, even up to the InlineString255 type that is defined like I think if Julia had native support for "updating immutables" (I think Keno started a PR around this idea at one point), then it would probably be easier to use NTuple, but as of the current state of things, I think it ends up being a very simple, clever solution to have these strings types be so "close to the metal". |
One big advantage of an I don't see why you would need byte-swapping at all if you used NTuples — byte-swapping seems like an artifact of trying to represent byte sequences by multibyte |
I started to implement a |
Since there's both Jacob's answer above and now also the NStrings.jl package exploring the Feel free to comment if there's a reason to keep this open. And thanks for the good question! |
Update: NStrings.jl is now https://github.com/mkitti/StaticStrings.jl |
I'm interested to see how the NTuple approach works @mkitti; it'd be great to compare benchmark notes at some point to see InlineString vs. StaticString vs. Base.String. There are additional bit-level optimizations I'd like to do for InlineStrings, but just haven't found the bandwidth. In general, there are some strong wins for InlineString vs. Base.String, but a couple corner cases where it ends up being just on-par or slightly slower. Nothing significantly worse though yet. I've also sketched out an idea where you'd essentially have a string type like: struct HybridString{T <: InlineString}
str::Union{T, Base.String}
end so essentially you have a string that might be inlined, but might also point to a Base.String. The idea here is that in big data workloads, you could work with a single "string" column type, like Anyway, I like seeing more experimentation with alternative string types in Julia, and I think eventually we'll be able to improve Julia's |
It would be good to reconcile the two packages at some point. There is currently some endian weirdness: julia> using InlineStrings, StaticStrings
julia> is = String15("hello")
"hello"
julia> ss = StaticString{16}("hello")
"hello\0\0\0\0\0\0\0\0\0\0\0"
julia> reinterpret(StaticString{16}, [is])
1-element reinterpret(StaticString{16}, ::Vector{String15}):
"\x05\0\0\0\0\0\0\0\0\0\0olleh"
julia> reinterpret(UInt8, [ss])
16-element reinterpret(UInt8, ::Vector{StaticString{16}}):
0x68
0x65
0x6c
0x6c
0x6f
0x0a
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x00
julia> reinterpret(UInt8, [is])
16-element reinterpret(UInt8, ::Vector{String15}):
0x06
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x00
0x0a
0x6f
0x6c
0x6c
0x65
0x68 |
Yes, we store strings in opposite endianess so that sorting an array of inlinestrings can use radixsort and be super fast (we have our own radixsort implementation in InlineStrings). |
At least the julia> using InlineStrings, StaticStrings
julia> is = String15("hello\n")
"hello\n"
julia> css = CStaticString{16}("hello\n")
"hello\n"
julia> CStaticString(is)
"hello\n"
julia> String15(css)
"hello\n" We could likely optimize the interface between the packages with a byte swap. One of my primary motivations for StaticStrings.jl is compatible storage with C's julia> ccall(:printf, Cint, (Ptr{Nothing},), Ref(css))
hello
6
julia> ccall(:printf, Cint, (Ptr{Nothing},), Ref(is))
1 I know that the following works due to julia> ccall(:printf, Cint, (Ptr{Cchar},), is)
hello
6
julia> Base.cconvert(Ptr{Cchar}, is) |> typeof
String
julia> @which Base.cconvert(Ptr{Cchar}, is)
cconvert(::Type{Ptr{Int8}}, s::AbstractString) in Base at pointer.jl:63 This is good though. The packages are optimized for distinct purposes, yet are interoperable. |
I was looking at the source code here, and I was a little confused about why you chose to use a
primitive
type. Wouldn't it be simpler to work with a definition like:similar to StaticArrays.jl?
I'm guessing you already discussed/tried this option and found it problematic for some reason, but I couldn't find any information so I wanted to double-check.
The text was updated successfully, but these errors were encountered: