-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce codegen IR #94
Conversation
…; no need for the dependency on the state interface here.
Nodes here currently correspond to FSM states, although that need not be true for the future, especially when identifying parallel walks may be an option. The intention is for decision-making about code generation to be made present explicitly in this IR, such that the code generation outputs things roughly as given. This way, I hope for various optimisations to be shared across multiple output languages, but also for the code output parts to be simpler.
This explicitly states ranges for erroring, as opposed to leaving them to a `default:` clause. Then the `default:` clause can be used for a dominant mode to transition to another state. The main situation I have in mind is for regexps like `/[^abc]/`, where writing out every matching symbol is a lot more cumbersome than writing out every symbol which doesn't match. Thus we can generate code like: ``` ; ./build/bin/re -plc '[^abc][xyz]' int fsm_main(int (*fsm_getc)(void *opaque), void *opaque) { int c; assert(fsm_getc != NULL); enum { S0, S1, S2 } state; state = S0; while (c = fsm_getc(opaque), c != EOF) { switch (state) { case S0: /* start */ switch ((unsigned char) c) { case 'a': case 'b': case 'c': return TOK_UNKNOWN; default: state = S1; break; } break; case S1: /* e.g. "d" */ switch ((unsigned char) c) { case 'x': case 'y': case 'z': state = S2; break; default: return TOK_UNKNOWN; } break; case S2: /* e.g. "dx" */ return TOK_UNKNOWN; default: ; /* unreached */ } } /* end states */ switch (state) { case S2: return 0x1; /* "[^abc][xyz]" */ default: return EOF; /* unexpected EOF */ } } ```
*/ | ||
|
||
enum ir_strategy { | ||
IR_NONE = 1 << 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the comment below, is this simultaneously being used as a type tag for the struct ir_state
union, but also as a set of allowed strategies in make_ir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just the former. I was originally thinking of adding a mask of which strategies to allow, but I tried that and didn't like it much, and currently I think it'd make more sense to have options like "always make a table" set in struct fsm_options
instead.
|
||
#include "lx/ast.h" | ||
#include "lx/print.h" | ||
|
||
/* XXX: abstraction */ | ||
int | ||
fsm_print_cfrag(FILE *f, const struct fsm *fsm, | ||
fsm_print_cfrag(FILE *f, const struct ir *ir, const struct fsm_options *opt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once it's working with the IR, what is struct fsm_options *opt
still needed for? Could that be stored within the IR instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For various rendering options, like "always hex". I could duplicate those to an equivalent struct ir_options
containing just the relevant subset perhaps - I actually tried splitting the .c files here such that they don't include any of the FSM structs at all. But I decided there wasn't any benefit, since this is all internal anyway.
This PR introduces an IR datastructure for libfsm after the DFA and before the code generation output.
The main idea is that the decisions for code generation are done here, rather than when printing code. This leaves the code generation to simply walk the IR, printing what's there, without mixing in logic. There are still a few choices which might be made during the printing of code, but those ought to be cosmetic only.
This IR is intended for producing code only. Some of the output from libfsm expresses state machines verbatim (especially libfsm/print/dot.c which attempts to show an FSM compactly but verbatim). Those still walk the FSM directly.
IR nodes correspond approximately to states in the FSM. This may change over time when various transformations possibly combine IR nodes.
There are several node types, for the various "strategies" of generating code for a particular state. Here's an example showing most of them:
and its corresponding DFA: