-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instruction formats #530
Comments
Makes sense.
By this I suppose you want to be able to have a nonzero initial value for the counter in the marshal form?
Like this? struct {
char *instr_format;
} opcode_format[256] = {
[NOP] = {"I_"},
...
[LOAD_CLOSURE] = {"IB"},
...
}; PS. I cleaned up some backticks in your post. |
Why special-case the counter (first cache entry)? Is it so it can be included in the marshal format? The |
I could easily add this to python/cpython#100735, assuming the info is only needed by the compiler. |
The formats should be an enum for speed and compactness, but use the above format in the names for readability. enum {
OPCODE_FORMAT_I_,
OPCODE_FORMAT_IB,
...
};
uint8_t opcode_format[256] = {
[NOP] = OPCODE_FORMAT_I_,
...
[LOAD_CLOSURE] = OPCODE_FORMAT_IB,
...
}; |
Yes, for marshal. There is no data, but marshal needs to initialize it to
Endianness shouldn't be a problem, just store the top bits first. |
Okay, sounds good. I will put everything in the same array of opcode metadata, and marshal.c can |
So there's a potentially endless amount of variation in this. |
Another (minor) thing is that the cases generator currently doesn't read opcode.py, so it doesn't know which opcodes have an oparg ( |
Maybe the generated code could initialise the format conditionally:
|
I'd rather add syntax to the DSL so we can make bytecodes.c the source of truth and start generating opcode.py from it. In any case it can wait. E.g.
Inside the |
Yes. Please use an enum, we can't switch on strings. I wouldn't worry too much about the number of different formats. The generator can rely on |
Okay, thanks for the guidance. Will work on that. |
We have an enum as of GH-100895.
That's only 6 variants, but it doesn't use The longest appears to be 6 bytes but that's also incorrect -- for legacy instructions it doesn't know how much cache there is. So, caveat emptor, but now we can at least address the remaining issues incrementally. |
There are now 8 variants:
At Brandt's recommendation I changed from using I also have a proposal for how to do the instruction decoding and arg extension; see #540 (and a PR linked from there, #539). Maybe we can prototype this using a legacy instruction? (But which one???) |
I'm having fun combining LOAD_CONST and MAKE_FUNCTION into MAKE_FUNCTION_FROM_CODE. |
This was completed a while ago |
Currently we have one instruction format, plus caches.
For instrumentation, I want to combine test-and-branch instructions, e.g.
COMPARE_OP; POP_JUMP_IF_FALSE
would becomeCOMPARE_AND_BRANCH
.For the register interpreter we want to have instructions with up to 4 operands, but not waste space for instructions with fewer operands.
We also want 16 bit values in the cache, which is not currently supported by marshal, so that we need a wasteful quickening step for all code, even if it run only once.
Changes needed
The format of an instruction is already described in
bytecodes.c
. The interpreter generator should output a table mapping the opcode of an instruction to its format.Marshalling needs to know about 16 bit values, and caches. This is probably the largest change.
See python/cpython#99555
Generated code already knows the length of the instruction, so there is no change there.
The bytecode compiler, particularly the assembler, will need to understand formats, so that it emits the correct format.
write_instr
and computing jump offsets will get more complex, but the rest of the compiler should be unchanged.What formats do we need.
Currently there is only one format, but with some instructions having caches.
If we include caches in the format, there are 6 formats with caches sizes of 0, 1, 2, 4, 5 and 9.
I would like to add 16 bit operands as well, and we will need between 0 and 3 8 bit operands.
Expressing formats.
I
the instruction (opcode)B
8 bit operand_
unused 8 bits (UPDATE: changed toX
)H
16 bit (one code unit) operandC
16 bit first cache entry (the counter)0
Zeroed 16 bit entryExisting examples:
RETURN_VALUE
:I_
(UPDATE:IX
)LOAD_FAST
:IB
LOAD_ATTR
:IBC00000000
Hypothetical examples:
COMPARE_AND_BRANCH
:IBHC0
BINARY_OP
:IBBBHC
Generating all formats as enum, will ensure that the we get a compiler warning for any
switch
that misses a case.The text was updated successfully, but these errors were encountered: