-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitfields #1616
Bitfields #1616
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1616 +/- ##
==========================================
+ Coverage 64.44% 64.44% +<.01%
==========================================
Files 1002 1002
Lines 326730 326852 +122
Branches 13202 13216 +14
==========================================
+ Hits 210550 210656 +106
- Misses 113321 113331 +10
- Partials 2859 2865 +6
|
0789834
to
a280e5c
Compare
Now essentially complete and ready for review. |
How is it possible that merging this reduces global code coverage by about 14 percent? -- I would guess that adding such little feature shouldn't change code coverage in any significant way(?) |
@Stefan-Kohl https://codecov.io/gh/gap-system/gap/pull/1616 reports coverage at 64.01% - I think the comment above stays from some earlier test which failed (so less code covered) and then codecov failed to update that comment by some reason. |
The coverage data above is indeed meaningless, as some of the tests failed. Ultimately, this is still a bug in codecov (it should report something like "unable to generate coverage percentage" or so). But one we can easily recognize and live with. |
Yes, maskset.tst fails with
|
lib/masksets.gd
Outdated
## This function sets up the machinery for a set of bitfields of the given | ||
## widths. The total of the widths must not exceed 60 bits on 64-bit architecture | ||
## or 28 bits on a 32-bit architecture and the number of widths must not exceed | ||
## 10 on a 64-bit aerchitecture or 6 on a 32-bit architecture. Also for performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "aerchitecture"
lib/masksets.gd
Outdated
## or 28 bits on a 32-bit architecture and the number of widths must not exceed | ||
## 10 on a 64-bit aerchitecture or 6 on a 32-bit architecture. Also for performance | ||
## reasons some checks that one might wish to do are ommitted. In particular, | ||
## the builder and setter functions do not check if the value[s] passed to them are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uneven offset of the comment text
lib/masksets.gd
Outdated
## the builder and setter functions do not check if the value[s] passed to them are | ||
## negative or too large. | ||
## | ||
## You can tell which architecture you are running on by acccessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"which architecture" -> "which type architecture" (or "kind" of whatever)
lib/masksets.gd
Outdated
## | ||
## <Description> | ||
## | ||
## This function sets up the machinery for a set of bitfields of the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why not call this a "bitfield" then? The name is well-known, at least for people who know C/C++. Whereas "MaskSet" is not (and I don't think I would have guessed what it means here).
lib/masksets.gd
Outdated
## one of the fields from an immediate integer</Item> | ||
## <Mark><C>setters</C></Mark> <Item> a list of <M>n</M> functions of two arguments each of which returns the packed value | ||
## in which one of the fields has been replaced by the given value | ||
## note that this does NOT modify the immediate integer.</Item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not immediately clear what "the" immediate integer here is. Perhaps say: "... a list of n functions, each taking two arguments, and ; the return value of the i-th setter is a new packed value, with the value of the i-th field set to ."
All in all, I think it would be a good idea to pick a specific set of terms for the various "parts" involved here, and stick to them. E.g. referring to the same thing sometimes as "immediate integer" and sometimes as "packed value" is confusing. I'd explain the connection once, then stick to one term.
src/intfuncs.c
Outdated
static Obj doMaskGetter(Obj self, Obj data) { | ||
UInt mask = NLOC_FUNC(self); | ||
UInt start = get_start(mask); | ||
UInt len = get_len(mask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funky formatting here.
Perhaps you might be willing to consider using clang-format
?
lib/masksets.gd
Outdated
## 10 on a 64-bit aerchitecture or 6 on a 32-bit architecture. Also for performance | ||
## reasons some checks that one might wish to do are ommitted. In particular, | ||
## the builder and setter functions do not check if the value[s] passed to them are | ||
## negative or too large. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... what happens then? Is the value cut off? Are bits outside of the fields modified?
At the very least say: "If the caller does this, the result is completely undefined"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: I think this implicitly says: "the field values are non-negative values", but this is never said explicitly (or am I just overlooking it?)
src/intfuncs.c
Outdated
|
||
/* | ||
* For the builders we also repurpose BODY_FUNC to hold a GAP integer telling | ||
* us how many fields there are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"also"? As in: something else was "also" repurposed? But what?
It would be really nice if there was a documentation comment before this code which gave at least a brief overview of what is stored where and why...
src/intfuncs.c
Outdated
*/ | ||
|
||
#ifdef SYS_IS_64_BIT | ||
#define LENSTART_WIDTH 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These days, we try to use enums for constants like this. They give better compiler diagnostics, and one is likely to shoot oneself into the foot, e.g. with #define VALUE 1+2 ... x = VALUE * 2;
.
src/intfuncs.c
Outdated
#else | ||
#define LENSTART_WIDTH 8 | ||
#define WIDTH_WIDTH 5 | ||
#define MAX_FIELDS 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there at most 6 fields? I have not yet tried to understand the code after this, but I just looked at it, and it seems quite complicated... But why is that necessary? In my imagination, it would be sufficient to do this: For each field, the getter/setter need exactly two things: the offset of the field, and a mask (or alternatively, the number of bits in the fields, but I think the mask is more efficient).
The mask takes up one Obj, the offset another (but the offset fits in a single byte, so it might be easy to stash it somewhere else). If we are really concerned about space usage, we can store an offset and field with in 6 bit each.
Assuming we stored the mask and offset, the generic setter andgetter function would look like this (in untested pseudo code, but I hope it's enough to get the basic idea across); note that this does not use INTOBJ_INT or INT_INTOBJ at, because if we setup the mask and offset suitably, they are not necessary.
Obj SetterFunc(Obj self, Obj packedValue, Obj fieldValue)
{
GAP_ASSERT(IS_INTOBJ(packedValue));
GAP_ASSERT(IS_INTOBJ(fieldValue));
UInt mask = extractMaskFromFunc(self);
UInt offset = extractOffsetFromFunc(self);
UInt v = ((UInt)fieldValue ^ 1) << offset;
UInt newVal = ((((UInt)packedValue) & ~mask) | v;
return (Obj)newVal;
}
Obj GetterFunc(Obj self, Obj packedValue)
{
GAP_ASSERT(IS_INTOBJ(packedValue));
UInt mask = extractMaskFromFunc(self);
UInt offset = extractOffsetFromFunc(self);
UInt newVal = ((((UInt)packedValue) & mask) >> offset) | 1;
return (Obj)newVal;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have plenty of fields to store things in: ENVI_FUNC, NLOC_FUNC, BODY_FUNC, FEXS_FUNC
Renamed to bitfields and improved comments and documentation. Still a few points to deal with. |
I think I've addressed everything from @fingolfin's verty helpful review except one point. This concerns how the getter and setter functions store and use the information that tells them about the specific bitfield they are getting or setting. I use a single word which stores the offset (low 8 bits) and length (rest) of the field. This is compatible with the BEXTR instruction, which we, or a very smart compiler could use for the getter. @fingolfin suggests using two words (which we have available) one to store the offset and the other a mask to select out the desired bits. This makes the C getter and setter code a bit simpler, and probably also the generated assembler, but doesn't fit with BEXTR and does involve accessing one extra word of memory, although probably not a new cache line. I'm not sure which approach is better. |
Do you already have a variant of your AVL tree code that uses these bitfields (as a showcase if nothing else)? It seems to be the time for me to do some nitpicking: I think every other place in the GAP kernel uses a capital |
I have a variant of the union-find structure. Runs about as fast and uses half the memory Look at https://github.com/stevelinton/datastructures/tree/union-find (in gap/union-find2.gi and in src/uf.c in the functions with 2 in their names. |
ffecdb9
to
24b80a5
Compare
@markuspf I've now got my AVL tree code with threading etc. working as far as adding and iterating over the tree. I haven't attacked delete yet, or done the kernel accelerators. The main conclusion for the bitfields is that we really want support for Boolean fields -- that is one bit fields whose getters and setters deal with True and False instead of 1 and 0. Shouldn't make any performance difference worth the mention, and it will make a lot of code more readable. |
The test failure is due to the the manual example not being "clean" |
I have some doubts about focusing on I also don't understand why there is this limitation on 6 resp. 10 fields -- at least with the approach I suggested, there is absolutely no reason for that, so if you want to have 60 1-bit fields on 64bit archs, you can have them. |
Aha, I guess the limitation comes from the desire to have a "builder" function which creates a single "bitfield" given a sequence of field values. I somehow missed this when reviewing the previous version of this PR (or perhaps it was not yet documented?) This all still feels like it is some highly specialized code, only of interest for relatively low-level work. Is it right really that useful beyond the datastructures package? Perhaps it should be in there? Perhaps it is useful in general, I just have hard time seeing it; to me, it would be much more attractive if GAP had bit manipulation functions which allow users to XOR, OR, AND, NOT immediate ints (and perhaps do bit test/set/clear). Of course that may be somewhat slower than this highly dedicated code for your applications; but OTOH, it is also vastly more general, and easy to exaplain to most programmers. |
For the union find code, builders seem to be only used in constructors (so not time critical), so at least there, a more naive implementation which takes a list of widths (which can be of arbitrary width), and builds up everything from it should be fine. This could be implemented as follows: have a kernel function which takes a list of width, and list of field values (they should be of the same length), and uses them to build the desired immediate values. Then, in MakeBitfield := function(arg)
local bf;
bf := _kernel_MakeBitfield(arg);
bf.builder := function(values...) return _kernel_build(bf.widths, values); end
return bf;
end; |
OK. I'm persuaded to change the data in the function header to mask and offset, and yes, I find myself using builders much less often than I expected (in my avl tree code under development they are used once when a new node is added. |
On the question of generality. Adding all the necessary operations to the language would probably be better, but a lot more work. Also you'd end up wanting to write getter and setter functions for any non-trivial cases anyway and then you're paying GAP function calling overhead for something like get_subtree_size := word -> word & 0xFFFFFFFFFFFFFF0 >> 4; whereas with the code I have you essentially get that function directly and it runs is 10-20 ns. I don't really know how wide an application it will have, but there are plenty of bits of code in the library that use whole words for things that only have a few possible values in large mathematical data structures. |
6cdbc75
to
e0f26d9
Compare
e0f26d9
to
3eed2bf
Compare
This version uses the mask and offset setup and gets rid of the limit on the number of fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now -- but it better should be, as it feels I kind of bullied Steve into them :/. Anyway...
My remaining remarks all are unimportant quibbles about spaces and indentions, and could be ignored.
src/intfuncs.c
Outdated
static inline void SET_OFFFSET_BITFIELD_FUNC(Obj func, UInt offset) | ||
{ | ||
return SET_FEXS_FUNC(func, INTOBJ_INT(offset)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove empty line?
src/intfuncs.c
Outdated
{ | ||
UInt mask = MASK_BITFIELD_FUNC(self); | ||
UInt offset = OFFSET_BITFIELD_FUNC(self); | ||
if (!ARE_INTOBJS(data, val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the differing indentation here and elsewhere?
(BTW, if you have it installed, you can use git clang-format
to automatically format your commits)
src/intfuncs.c
Outdated
static Obj FuncBUILD_BITFIELDS(Obj self, Obj args) | ||
{ | ||
GAP_ASSERT(IS_PLIST(args)); | ||
GAP_ASSERT(LEN_PLIST(args) >= 1 && ELM_PLIST(args,1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra space before &&
?
src/intfuncs.c
Outdated
UInt i; | ||
for (i = nfields; i > 0; i--) { | ||
GAP_ASSERT(ISB_LIST(widths, i)); | ||
Obj y = ELM_LIST(widths,i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insert space after comma?
src/intfuncs.c
Outdated
Obj y = ELM_LIST(widths,i); | ||
GAP_ASSERT(IS_INTOBJ(y)); | ||
x <<= INT_INTOBJ(y); | ||
GAP_ASSERT(ELM_PLIST(args, i+1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That GAP_ASSERT
is somewhat redundant, as the IS_INTOBJ
test later on will simply fail in this case. But of course it doesn't hurt to have here.
src/intfuncs.c
Outdated
Obj bgetters = NEW_PLIST(T_PLIST + IMMUTABLE, nfields); | ||
for (UInt i = 1; i <= nfields; i++) { | ||
UInt mask = (1L << starts[i]) - (1L << starts[i - 1]); | ||
Obj s = NewFunctionCT(T_FUNCTION, SIZE_FUNC, "<field setter>", 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also just NewFunctionC
here and elsewhere (makes it slightly easier to read the code, IMO, but that's perhaps taste)
tst/testinstall/bitfields.tst
Outdated
@@ -0,0 +1,47 @@ | |||
gap> START_TEST("bitfields.tst"); | |||
|
|||
# Test correct behaviour for a variety of numbers of fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra space before "numbers"?
tst/testinstall/bitfields.tst
Outdated
> Print("Bad return from getter",i," ",j," ",x,"\n"); | ||
> fi; od; | ||
> if bf.booleanGetters[1](x) <> (vals[2] = 1) then | ||
> Print("Bad return from Boolean Getter"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is "Boolean Getter", below it is "boolean setter" -- why the differing case?
Still would be kind of nice if @markuspf or @ChrisJefferson could have a quick look at this, too. |
Bitfields provide a fast (by GAP standards) and tidy way of viewing an immediate integer as a set of bitfields. Handy mainly for compact data structures.
3eed2bf
to
951edc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only minor thing is of course now that one would want a file in dev/Updates to reflect this change, but maybe I should just hit "merge" for now to not drag this on.
This is work in progress on an implementation of fast access to bitfields within small integers.
It seems to work, but is not thoroughly tested. As currently committed it needs a CPU with BEXTR (Haswell or later). I'll set up autoconf for that, of course.
Performance seems pretty decent. My laptop does a loop with 200 million gets in about 1.8s and 200 million sets in about 2.5s.
Relates to issue #1611