Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteVec: immediate / remote immutable byte vectors + intrinsics. #8964

Closed
wants to merge 29 commits into from

Conversation

StefanKarpinski
Copy link
Member

I ended up going with @vtjnash's suggestion of representing this as an Int128 since that makes the code generation easier: you can cast the Int128 to various vector types like Uint8 x 8 or Int x 2, which are both handy for writing bytevec_len and bytevec_ref intrinsics.

The code generation is already pretty decent, e.g.:

julia> s = Str("Hello")
"Hello"

julia> @code_native sizeof(s)
    .section    __TEXT,__text,regular,pure_instructions
Filename: bytes.jl
Source line: 64
    push    RBP
    mov RBP, RSP
Source line: 64
    mov RAX, QWORD PTR [RDI + 8]
    vmovq   XMM0, QWORD PTR [RAX + 16]
    vmovq   XMM1, QWORD PTR [RAX + 8]
    vpunpcklqdq XMM0, XMM1, XMM0 ## xmm0 = xmm1[0],xmm0[0]
    vpextrq RAX, XMM0, 1
    neg RAX
    vpextrb ECX, XMM0, 15
    test    CL, CL
    cmovns  RAX, RCX
    pop RBP
    ret

One area to investigate optimization is to try to make sure that the checks involved in loading a single byte can be hoisted out of loops and ideally, we want, instead of a loop with a branch inside of it a branch with a loop inside of each path, but that's a bit of a more tricky optimization than I think we're doing now. But with that code that iterates strings could really fly.

Next steps are to implement bytevec_eq and bytevec_cmp intrinsics. I know these aren't strictly necessary, but I suspect that the built-in versions can be made very efficient.

I ended up going with @vtjnash's suggestion of representing this as
an Int128 since that makes the code generation easier: you can cast
the Int128 to various vector types like Uint8 x 8 or Int x 2, which
are both handy for writing bytevec_len and bytevec_ref intrinsics.

The code generation is already pretty decent, e.g.:

	julia> s = Str("Hello")
	"Hello"

	julia> @code_native sizeof(s)
		.section	__TEXT,__text,regular,pure_instructions
	Filename: bytes.jl
	Source line: 64
		push	RBP
		mov	RBP, RSP
	Source line: 64
		mov	RAX, QWORD PTR [RDI + 8]
		vmovq	XMM0, QWORD PTR [RAX + 16]
		vmovq	XMM1, QWORD PTR [RAX + 8]
		vpunpcklqdq	XMM0, XMM1, XMM0 ## xmm0 = xmm1[0],xmm0[0]
		vpextrq	RAX, XMM0, 1
		neg	RAX
		vpextrb	ECX, XMM0, 15
		test	CL, CL
		cmovns	RAX, RCX
		pop	RBP
		ret

One area to investigate optimization is to try to make sure that the
checks involved in loading a single byte can be hoisted out of loops
*and* ideally, we want, instead of a loop with a branch inside of it
a branch with a loop inside of each path, but that's a bit of a more
tricky optimization than I think we're doing now. But with that code
that iterates strings could really fly.

Next steps are to implement bytevec_eq and bytevec_cmp intrinsics. I
know these aren't strictly necessary, but I suspect that the builtin
versions can be made very efficient.
} here;
struct {
uint8_t *data;
long neglen;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want a signed integer type that is big enough to hold the length of an arbitrary string (up to the factor of 2 lost by the sign bit), shouldn't this be intptr_t or ptrdiff_t? The long type always scares me because I'm never quite sure what it means for an arbitrary compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the rampant assumptions we've made, I suspect that size_t would do but, yes, long may be a bit sketchy.

@StefanKarpinski
Copy link
Member Author

The first issue that was standing in the way of SIMD and vectorization was that the getindex methods could throw exceptions. I've moved the bounds check into the bytevec_ref intrinsic and made it respect the @inbounds, which makes the bounds-check-free code pretty lean, but there's still a branch in there, which seems unavoidable since one of the branches does a memory fetch that's invalid in the other branch. As a result, vectorization does not seem to kick in here. What we need is an LLVM pass that can move the loop inside of each side of the branch and then each loop could separately be vectorized easily.

I noticed that the clever trick that I used to avoid a branch in
the fast path for bytevec_ref when the bytes are immediate was
causing a needless dependency on the index in the branch condition
even in the case with no bounds checking. This could, of course,
prevent certain optimization transforms, so I got rid of it.
@StefanKarpinski
Copy link
Member Author

@JeffBezanson, @ArchRobison, @vtjnash, @Keno, @jakebolewski, do any of you have thoughts on coaxing LLVM into doing such a transformation? This the current code for length(s::Str):

julia> @code_llvm length(s)

define i64 @"julia_length;41343"(%jl_value_t*) {
top:
  %1 = getelementptr inbounds %jl_value_t* %0, i64 1, i32 0, !dbg !1017
  %2 = load %jl_value_t** %1, align 8, !dbg !1017, !tbaa %jtbaa_immut
  %3 = getelementptr %jl_value_t* %2, i64 1, !dbg !1017
  %4 = bitcast %jl_value_t* %3 to i128*, !dbg !1017
  %5 = load i128* %4, align 8, !dbg !1017, !tbaa %jtbaa_immut, !julia_type !1020
  %6 = bitcast i128 %5 to <16 x i8>, !dbg !1017
  %7 = extractelement <16 x i8> %6, i32 15, !dbg !1017
  %8 = bitcast i128 %5 to <2 x i64>, !dbg !1017
  %9 = extractelement <2 x i64> %8, i32 1, !dbg !1017
  %10 = icmp slt i8 %7, 0, !dbg !1017
  %11 = sub i64 0, %9, !dbg !1017
  %12 = zext i8 %7 to i64, !dbg !1017
  %13 = select i1 %10, i64 %11, i64 %12, !dbg !1017
  %14 = icmp slt i64 %13, 1, !dbg !1017
  br i1 %14, label %L3, label %L.preheader, !dbg !1017

L.preheader:                                      ; preds = %top
  %15 = load i128* %4, align 8, !dbg !1017, !tbaa %jtbaa_immut, !julia_type !1020
  %16 = bitcast i128 %15 to <16 x i8>, !dbg !1017
  %17 = extractelement <16 x i8> %16, i32 15, !dbg !1017
  %18 = icmp sgt i8 %17, -1, !dbg !1017
  %19 = bitcast i128 %15 to <2 x i64>, !dbg !1021
  %20 = extractelement <2 x i64> %19, i32 1, !dbg !1021
  %21 = icmp slt i8 %17, 0, !dbg !1021
  %22 = sub i64 0, %20, !dbg !1021
  %23 = zext i8 %17 to i64, !dbg !1021
  %24 = select i1 %21, i64 %22, i64 %23, !dbg !1021
  %25 = extractelement <2 x i64> %19, i32 0, !dbg !1017
  br label %L, !dbg !1017

L:                                                ; preds = %L.preheader, %cont
  %"#s480.0" = phi i64 [ %33, %cont ], [ 1, %L.preheader ]
  %n.0 = phi i64 [ %37, %cont ], [ 0, %L.preheader ]
  %26 = add i64 %"#s480.0", -1, !dbg !1017
  br i1 %18, label %here, label %there, !dbg !1017

here:                                             ; preds = %L
  %27 = trunc i64 %26 to i32, !dbg !1017
  %28 = extractelement <16 x i8> %16, i32 %27, !dbg !1017
  br label %cont, !dbg !1017

there:                                            ; preds = %L
  %29 = add i64 %25, %26, !dbg !1017
  %30 = inttoptr i64 %29 to i8*, !dbg !1017
  %31 = load i8* %30, align 1, !dbg !1017
  br label %cont, !dbg !1017

cont:                                             ; preds = %there, %here
  %32 = phi i8 [ %28, %here ], [ %31, %there ], !dbg !1017, !julia_type !1022
  %33 = add i64 %"#s480.0", 1, !dbg !1017
  %34 = and i8 %32, -64, !dbg !1021, !julia_type !1022
  %35 = icmp ne i8 %34, -128, !dbg !1021
  %36 = zext i1 %35 to i64, !dbg !1021
  %37 = add i64 %36, %n.0, !dbg !1021
  %38 = icmp slt i64 %24, %33, !dbg !1021
  br i1 %38, label %L3, label %L, !dbg !1021

L3:                                               ; preds = %cont, %top
  %n.1 = phi i64 [ 0, %top ], [ %37, %cont ]
  ret i64 %n.1, !dbg !1023
}

The native code mirrors this branch structure but is less readable. This is pretty good, but it seems like it could be really tight if the two loops were separated and each was vectorized.

@ArchRobison
Copy link
Contributor

I think the relevant LLVM pass is Unswitch Loops. It's in Julia's pass list. I don't know why it didn't kick in. We'll need to step through it to figure out why it didn't kick in. It's likely a cost/benefit estimate issue and there's a knob we can play with. Here's the LLVM 3.5.0 source for the knob:

// The specific value of 100 here was chosen based only on intuition and a
// few specific examples.
static cl::opt<unsigned>
Threshold("loop-unswitch-threshold", cl::desc("Max loop size to unswitch"),
          cl::init(100), cl::Hidden);

@StefanKarpinski
Copy link
Member Author

Thanks, @ArchRobison. I will try to figure out how to step through that and see what it's doing. Or maybe just see if bumping up the threshold does it first. It's possible that it doesn't think that this transformation is worthwhile – and it may be right since branch prediction here may be perfect and thus this can execute quite fast.

Value *lo_word = builder.CreateExtractElement(words, ConstantInt::get(T_int32, 0));
Value *addr = builder.CreateAdd(lo_word, i);
Value *ptr = builder.CreateIntToPtr(addr, T_pint8);
Value *there_byte = builder.CreateLoad(ptr, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a tbaa_decorate(tbaa_user, ...) around here, if the load is always from user-modifiable storage. Check hierarchy described in src/codegen.cpp, around comment // type-based alias analysis nodes. for more details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out – this memory should never change and so should be decorated as tbaa_const, I believe, but @JeffBezanson may have something to say about that. One of the major goals of this rewrite is to much more heavily leverage the fact that strings are immutable.

@StefanKarpinski
Copy link
Member Author

@JeffBezanson, do I need to add t_func entries for these intrinsics in inference.jl? I tried it, but it doesn't seem to affect code generation at all (which leads me to wonder why so many intrinsics do have t_func entries).

We should maybe not even force this to go through Int32 since that's
not what the LLVM instructions need anyway. I worry that this may
cause weird extra ops that aren't really necessary.
This took some careful tweaking, but I managed to get this to generate
the same code I was trying to get an intrinsic to produce. This makes
me wonder if I shouldn't try to do the same thing with bytevec_eq and
may some of the other bytevec intrinsics.
It turns out I can actually generate better, slightly faster code
for the length of a ByteVec using bitshifts in Julia.
This one also turns out to be shorter and faster.
This speeds up s[i] significantly, but reveals that endof(s) is a
pretty significant bottleneck as it is.
@StefanKarpinski
Copy link
Member Author

Ok, it seems that if I pepper src/codegen.cpp with FPM->add(createEarlyCSEPass()); in various places, I can get rid of that redundancy and significantly simplify this code:

index 47eeeb5..d93a39c 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -4662,8 +4662,10 @@ static void init_julia_llvm_env(Module *m)
     FPM->add(createLoopRotatePass());           // Rotate loops.
     // LoopRotate strips metadata from terminator, so run LowerSIMD afterwards
     FPM->add(createLowerSimdLoopPass());        // Annotate loop marked with "simdloop" as LLVM parallel loop
+    FPM->add(createEarlyCSEPass()); //// ****
+    FPM->add(createJumpThreadingPass());
     FPM->add(createLICMPass());                 // Hoist loop invariants
-    FPM->add(createLoopUnswitchPass());         // Unswitch loops.
+    FPM->add(createLoopUnswitchPass(500));      // Unswitch loops.
     // Subsequent passes not stripping metadata from terminator
 #ifndef INSTCOMBINE_BUG
     FPM->add(createInstructionCombiningPass());
@@ -4683,6 +4685,7 @@ static void init_julia_llvm_env(Module *m)
 #ifndef INSTCOMBINE_BUG
     FPM->add(createInstructionCombiningPass()); // Clean up after the unroller
 #endif
+    FPM->add(createEarlyCSEPass()); //// ****
     FPM->add(createGVNPass());                  // Remove redundancies
     //FPM->add(createMemCpyOptPass());            // Remove memcpy / form memset
     FPM->add(createSCCPPass());                 // Constant prop with SCCP
@@ -4699,6 +4702,7 @@ static void init_julia_llvm_env(Module *m)

     FPM->add(createAggressiveDCEPass());         // Delete dead instructions
     //FPM->add(createCFGSimplificationPass());     // Merge & remove BBs
+    FPM->add(createEarlyCSEPass()); //// ****

     FPM->doInitialization();
 }

Still haven't coaxed the loop unswitching into happening, but it's a bit closer...

@StefanKarpinski
Copy link
Member Author

So the one that seems to have mattered is this:

diff --git a/src/codegen.cpp b/src/codegen.cpp
index 47eeeb5..0513916 100644
--- a/src/codegen.cpp
@@ -4663,6 +4663,7 @@ static void init_julia_llvm_env(Module *m)
     // LoopRotate strips metadata from terminator, so run LowerSIMD afterwards
     FPM->add(createLowerSimdLoopPass());        // Annotate loop marked with "simdloop" as LLVM parallel loop
     FPM->add(createLICMPass());                 // Hoist loop invariants
+    FPM->add(createEarlyCSEPass());
     FPM->add(createLoopUnswitchPass());         // Unswitch loops.
     // Subsequent passes not stripping metadata from terminator
 #ifndef INSTCOMBINE_BUG

Unfortunately, while the LLVM code for this looks nicer, it is about 2x slower. Sigh.

@StefanKarpinski
Copy link
Member Author

Just kidding, that was a different data set. This change improves the code but has no measurable impact on performance. I'm still hoping that coaxing the loop to unswitch might have a positive impact on performance.

This doesn't cause loop unswitching to kick in but it does produce
cleaner code in some cases. The CSE pass could probably be placed
elsewhere but this spot seems to work well enough.
This isn't really sufficient to handle Latin-1 data smoothly since
most non-ASCII Latin-1 characters are not UTF-8 continuation bytes.
For that, you need to check if the decoded UTF-8 values is valid.
@ArchRobison
Copy link
Contributor

I tried applying my PR #6271, and with -O and LLVM 3.5, it appeared to not generate the duplicate icmp and the code looked a little cleaner. But still not unswitched, and no significant performance impact.

The code difference arises from including BasicAliasAnalysisPass in the passes. I've noticed before that EarlyCSEPass is easily confused by loads unless BasicAliasAnalysisPass is in the pass list. But the intermediate pass dumper kept crashing for the combined PRs, so I can't be sure if that's the case here.

@StefanKarpinski
Copy link
Member Author

Interesting – thanks for checking that out, @ArchRobison. I've decided that for now the best way forward is to to just replace the data::Vector{UInt8} fields of UTF8String and ASCIIString with a ByteVec. That way ASCII decoding will remain as fast as it is now and UTF-8 decoding will only by 20% slower.

I used the sumchars example for simple benchmarking:

	function sumchars{S<:String}(a::Array{S})
	    t = Uint32(0)
	    @inbounds for s in a, c in s
	        t += Uint32(c)
	    end
	    return t
	end

With this change benchmarking with

	median([ @Elapsed sumchars(words) for _ in 1:100 ])

I found the following performance characteristics:

	* Str vs. ASCIIString with    `@inbounds` – same speed
	* Str vs. ASCIIString without `@inbounds` – 14% slowdown
	* Str vs. UTF8String  with    `@inbounds` – 2.55x speedup
	* Str vs. UTF8String  without `@inbounds` – 75% speedup

This strikes me as good enough to replace both string types with a
single string type – the artist currently known as `Str`.
@stevengj
Copy link
Member

I would prefer to just drop the ASCIIString/UTF8String distinction, which I thought Jeff had also suggested some time ago (see also #8872). This is currently one of the most painful and confusing parts of Julia string handling. Not only does it lead to type instability, but also one gets lots of inadvertently overtyped arrays, e.g. comprehensions yielding ASCIIString[...] where ByteString[...] was intended. Yes, iteration over characters is a bit faster if we know in advance that the data is ASCII, but how important is that operation, really?

Or is that planned as a separate PR after this one lands? I'm confused because this PR includes a Str type, and I'm not sure what that's for.

{
jl_bytevec_struct_t b;
if (n < 2*sizeof(void*)) {
memcpy(b.here.data, data, n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If n == 2*sizeof(void*) - 1, then it looks like the string data cannot be NUL-terminated. Julia currently guarantees NUL-termination (in array.c) so that strings can be passed to external C code expecting NUL-terminated strings.

Or is the plan to implement this on top of ByteVecs, similar to how it is implemented for UTF-16 and UTF-32 strings (i.e. str.data is actually a bytevec of length 1 more than the length of the string, and contains an explicit NUL terminator)? This might be cleaner, although it is a subtle breaking change for any code that currently looks at the data field of strings. (We could rename all of the string-type data fields to data0 in order to make the breakage noisy.)

@ScottPJones
Copy link
Contributor

What about doing the following, more like Python 3:
Standard strings are not null terminated.
Standard strings could possibly be one of: ASCII, Unicode1 (really ANSI Latin1), Unicode 2 (i.e. UTF16, but with no surrogate pairs, like Python does), or UTF32.
This makes lots of operations much faster (O(1) instead of O(n)), and generally saves a lot of space,
since most strings will be representable either by ASCII or Unicode1.
Conversion operations between the types don't require any checking of the contents... you can do very fast (optimized to use operations that may work up to 64 bytes at a time) to widen/narrow between
1, 2, or 4 bytes.
Conversions to/from UTF-8 and UTF-16 (very important) are also going to be fast...
In the very frequent case where you have a string marked internally as ASCII, no conversion is required to go to UTF8, for strings marked ASCII or Unicode1, going to UTF-16 or UTF-32 is just a widening operation, Unicode2 -> UTF-16 is a noop also, only Unicode1 to UTF-8 requires a bit more work, but that can be made faster by doing a check on chunks of bytes with a simple & operation, and then doing the
conversion of 0x80-0xff to 0xC2 0x8/9x or 0xC3 0x8/9x, while just copying unchanged blocks without any high bits set.
For conversions from UTF-8 to standard strings, my UTF-8 validator (written in Julia!) very quickly gets all the information needed to select one of the 4 internal representations, and what sort of conversion operations are needed (i.e. if the UTF-8 string only has ASCII values, it can just be copied in...)
Same thing for UTF-16 and UTF-32, the validator gives the information needed to determine the smallest internal type, and the # of characters it needs to allocate.
Substring operations, which I don't think work at all down in a performant fashion (don't you have to always create a new string object, and tack on that \0, unless the substring goes to the end of the string?) will be O(1) operations, not the O(n) mess now for UTF8 & UTF16 strings.
Even handling \0 at the end, is not really that much of an issue.
Keep a flag in the internal representation [BTW, I like very much this ByteVec stuff, how it is packing short strings, this approach would work well I think with ByteVec], that says whether the string does have a trailing \0.
I assume that the string allocation will be rounding up the memory to some 4, 8, or 16 byte boundary, correct?
So, in most cases, you already have the room to just stick a \0 byte or 2/4 byte word at the end.
(75%, 87.5%, or 93.75% chance for ASCII/Unicode1, 50%/75%/87.5% for Unicode2, or 0%/50%/75% for UTF4)
If there isn't room already, you could either, set the nul terminated flag to 0, or allocate an extra 4/8/16 bytes to be able to store a 0.
The Cstring/Cwstring types of course would still need to add a \0 for substrings that weren't already \0 terminated, but that would be pretty easy, and checked easily by a simple flag in the substring type
to see if you actually needed to do anything...

You wouldn't get rid of UTF8String or UTF16String, they would still be available for use, for people who need to convert but
string literals would not generate UTF8String, it would be one of the above 4 types (which includes the current ASCIIString) (and are all subtypes of DirectIndexString)

So, less memory requirements, better performance all around... do you guys like?
I'd love to work with @StefanKarpinski and @JeffBezanson to make this happen, and allow Julia
to have first class string processing performance.

@catawbasam
Copy link
Contributor

I think this sounds interesting. DirectIndexes are nice and simple, and operations that require working back from the end of a string are a little painful with UTF-8.

Unicode 2 is UCS2, is that right? If so, could we call it that? Some folks might still require a full implementation of UTF-16.

Presumably file and stream IO would still generally need to be UTF-8.

You've probably seen the very nice work done to support unicode characters like math symbols, e.g. in the REPL and in IJulia. How would your strings play there? Some folks also seem to like emojis; even pizza, ahem.

Presumably you're getting accustomed to thinking about type stability as you write Julia. It might be worth thinking through how that would work with your approach.

Look forward to seeing more!

@ScottPJones
Copy link
Contributor

Yes, Unicode2 could be considered UCS-2, you just have to remember that UCS-2 doesn't allow the characters between 0xd800 and 0xdfff. (I didn't call it that on purpose, because many people confuse UCS-2 with UTF-16). Unicode1 is also ANSI Latin1.... I wanted to indicate that it was really just a 1-byte subset of Unicode, as Unicode2 (UCS-2) is a 2-byte subset of Unicode.
File and stream I/O - it depends, that is really usually whatever encoding you need... UTF-8 is very common for the web (I think the most common now, over 50%), but a lot is ANSI Latin1 or CP1252...
I/O is really a separate issue from the internal encoding of strings...
The math symbols are all in the BMP, there are a few simple emoticons in the BMP, and emoji are all non-BMP, so they would be encoded with UTF-32.
The emoji take 4 bytes in all encodings, currently, or with my scheme...

I believe that the experience with Python 3 was that this scheme saved space, and greatly improved performance...

Note: I'm not talking about removing the UTF8String or UTF16String types, just not using the combination of ASCIIString & UTF8String to encode string literals, and instead only use DirectIndexString types, i.e. ASCIIString, Latin1String, UCS2String, and UTF32String.
(do you like Latin1String & UCS2String better?)

Yes - I think there is probably a bit less problem with type stability...
currently, a string literal can be either:
ASCIIString, which is <: DirectIndexString <: AbstractString, or UTF8String, which is <: AbstractString.
With my scheme, all string literals are of types <: DirectIndexString.

@nalimilan
Copy link
Member

@ScottPJones I'm not sure this is the best place to discuss this. Better open a new thread.

Anyway, my two cents: the default string type must be able to handle all Unicode chars, to avoid the current situation where you end up with an ASCIIString or a UTF8String depending on its contents (which is annoying because e.g. a concretely-typed array of strings is only able to contain one of those types). So it cannot be ASCII nor Latin-1 (no idea why you call that "Unicode1"...), nor even a more complete Unicode subset like UCS-2. Thus, if you want a fixed-width encoding, all you can use is UTF-32, which leads to a waste of memory for most cases. This is why UTF-8, despite its complexity, is such a compelling choice.

That said, all kinds of custom string types which are more efficient in specific use cases can get first-class support in Julia. I don't see why you care so much about the type used by string literals.

@ScottPJones
Copy link
Contributor

@nalimilan Happy to open a new thread - but I'm rather new to GitHub - do you mean make a new issue? I'll explain my reasoning to you then.

@ScottPJones
Copy link
Contributor

@nalimilan Just one thing though - do you insist on only a single number type? That's where your reasoning leads...

@pao
Copy link
Member

pao commented May 1, 2015

Yes, please open a new issue.

(I don't think that's a valid reductio ad absurdum; there are only two default number types in Julia as it is. It doesn't preclude the existence or use of other types.)

@ScottPJones
Copy link
Contributor

@pao, I'm sorry, but that's not at all what I've seen in Julia, and what is worse, the default numeric types are not even consistent in their behavior!
There are two default string types from literals, and seven! from numeric types...

typeof(0x0) -> UInt8
typeof(0xfff) -> UInt16
typeof(0xfffff) -> UInt32
typeof(0xfffffffff) -> UInt64
typeof(0xffffffffffffff) -> UInt64
typeof(0xfffffffffffffffffff) -> UInt128
typeof(0xfffffffffffffffffffffffffffffff) -> UInt128
typeof(0xffffffffffffffffffffffffffffffffffffffff) -> Base.GMP.BigInt
typeof(0) -> Int64
typeof(123412341234213423) -> Int64
typeof(12341234123421342312341234234) -> Int128
typeof(1234123412342134231234123423412341234123423412342134) -> Base.GMP.BigInt
typeof(123.0) -> Float64
typeof(123123412342134234.0) -> Float64
typeof(1231234123421342312341234123412344.0) -> Float64
typeof(123123412342134231234123412341234124123412344.0) -> Float64
typeof(123123412342134231234123412341234124123412344124123432.0) -> Float64

typeof("abc") -> ASCIIString
typeof("\uff") -> UTF8String
typeof("\uffff") -> UTF8String
typeof("\U10ffff") -> UTF8String

@ScottPJones
Copy link
Contributor

Oh, and by the way, I think those inconsistencies can lead to hard to spot bugs...

~0 -> -1
~0x0 -> 0xff
~0x00 -> 0xff
~0x000 -> 0xffff
~0x0000 -> 0xffff
~0x00000 -> 0xffffffff
~0x000000000 -> 0xffffffffffffffff
~0x00000000000000000 -> 0xffffffffffffffffffffffffffffffff
and the last two cases are really fun!
~0x00000000000000000000000000000000 -> 0xffffffffffffffffffffffffffffffff
~0x000000000000000000000000000000000 ->  -1

@nalimilan
Copy link
Member

@ScottPJones Please, move this to yet another issue or mailing list thread. This is completely unrelated.

@ScottPJones
Copy link
Contributor

@nalimilan Was that the right way? Thanks!

@StefanKarpinski
Copy link
Member Author

0x000000000000000000000000000000000 should probably be a syntax error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.