Introduce codegen IR #94

katef · 2018-07-13T13:43:04Z

This PR introduces an IR datastructure for libfsm after the DFA and before the code generation output.

The main idea is that the decisions for code generation are done here, rather than when printing code. This leaves the code generation to simply walk the IR, printing what's there, without mixing in logic. There are still a few choices which might be made during the printing of code, but those ought to be cosmetic only.

This IR is intended for producing code only. Some of the output from libfsm expresses state machines verbatim (especially libfsm/print/dot.c which attempts to show an FSM compactly but verbatim). Those still walk the FSM directly.

IR nodes correspond approximately to states in the FSM. This may change over time when various transformations possibly combine IR nodes.

There are several node types, for the various "strategies" of generating code for a particular state. Here's an example showing most of them:

; ./build/bin/re -pl ir -z '[^a-z].' 'm+[0-9a-f]([^a-zA-Z]A|[a-z]B|.B)?' \
    | dot -Tpng -o /tmp/x.png

and its corresponding DFA:

…; no need for the dependency on the state interface here.

…tate.

Nodes here currently correspond to FSM states, although that need not be true for the future, especially when identifying parallel walks may be an option. The intention is for decision-making about code generation to be made present explicitly in this IR, such that the code generation outputs things roughly as given. This way, I hope for various optimisations to be shared across multiple output languages, but also for the code output parts to be simpler.

This explicitly states ranges for erroring, as opposed to leaving them to a `default:` clause. Then the `default:` clause can be used for a dominant mode to transition to another state. The main situation I have in mind is for regexps like `/[^abc]/`, where writing out every matching symbol is a lot more cumbersome than writing out every symbol which doesn't match. Thus we can generate code like: ``` ; ./build/bin/re -plc '[^abc][xyz]' int fsm_main(int (*fsm_getc)(void *opaque), void *opaque) { int c; assert(fsm_getc != NULL); enum { S0, S1, S2 } state; state = S0; while (c = fsm_getc(opaque), c != EOF) { switch (state) { case S0: /* start */ switch ((unsigned char) c) { case 'a': case 'b': case 'c': return TOK_UNKNOWN; default: state = S1; break; } break; case S1: /* e.g. "d" */ switch ((unsigned char) c) { case 'x': case 'y': case 'z': state = S2; break; default: return TOK_UNKNOWN; } break; case S2: /* e.g. "dx" */ return TOK_UNKNOWN; default: ; /* unreached */ } } /* end states */ switch (state) { case S2: return 0x1; /* "[^abc][xyz]" */ default: return EOF; /* unexpected EOF */ } } ```

silentbicycle · 2018-07-14T16:01:35Z

src/libfsm/print/ir.h

+ */
+
+enum ir_strategy {
+	IR_NONE     = 1 << 0,


Based on the comment below, is this simultaneously being used as a type tag for the struct ir_state union, but also as a set of allowed strategies in make_ir?

No, just the former. I was originally thinking of adding a mask of which strategies to allow, but I tried that and didn't like it much, and currently I think it'd make more sense to have options like "always make a table" set in struct fsm_options instead.

src/libfsm/print/ir.h

silentbicycle · 2018-07-14T16:06:51Z

src/lx/print/c.c


 #include "lx/ast.h"
 #include "lx/print.h"

 /* XXX: abstraction */
 int
-fsm_print_cfrag(FILE *f, const struct fsm *fsm,
+fsm_print_cfrag(FILE *f, const struct ir *ir, const struct fsm_options *opt,


Once it's working with the IR, what is struct fsm_options *opt still needed for? Could that be stored within the IR instead?

For various rendering options, like "always hex". I could duplicate those to an equivalent struct ir_options containing just the relevant subset perhaps - I actually tried splitting the .c files here such that they don't include any of the FSM structs at all. But I decided there wasn't any benefit, since this is all internal anyway.

katef added 22 commits June 23, 2018 08:48

The presence of a state opaque pointer is equivalent to fsm_isend()…

4e25db9

…; no need for the dependency on the state interface here.

Pass opaque pointers directly to leaf callbacks, rather than an FSM s…

7306162

…tate.

Retrofit existing C code generation behind the codegen IR.

48a2a50

Populate examples.

06cf32c

Free IR.

9eca171

Pretty-printing for human readable range labels.

bcc1307

Group together IR ranges by destination state.

4c2134b

Regenerated for grouped range fall-through.

e85079d

Centralise state name printing.

768f342

Some commentary and error handling.

1447455

Escaping for example strings.

8d62347

Merge with master.

a8033ba

Merge branch 'master' into codegen-ir

0c684b9

Convert to centralised escaping.

6bb3ccd

Add IR output to json.

40aa7ba

Regression tests for IR construction.

c8b76df

Switch the IR from pointers to indicies.

df573f3

Provide an IR stratgy explicitly for _complete_ states.

ca7b529

Slightly kinder naming.

46f32eb

Free some things.

291e08a

silentbicycle reviewed Jul 14, 2018

View reviewed changes

Clarification.

c335979

katef merged commit dfc5b8a into master Jul 14, 2018

katef deleted the codegen-ir branch July 14, 2018 16:34

katef mentioned this pull request Mar 11, 2019

Simple lx -l json printer #118

Open

katef mentioned this pull request Dec 12, 2019

re generates wrong JSON #186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce codegen IR #94

Introduce codegen IR #94

katef commented Jul 13, 2018 •

edited

Loading

silentbicycle Jul 14, 2018

katef Jul 14, 2018

silentbicycle Jul 14, 2018

katef Jul 14, 2018

Introduce codegen IR #94

Introduce codegen IR #94

Conversation

katef commented Jul 13, 2018 • edited Loading

silentbicycle Jul 14, 2018

Choose a reason for hiding this comment

katef Jul 14, 2018

Choose a reason for hiding this comment

silentbicycle Jul 14, 2018

Choose a reason for hiding this comment

katef Jul 14, 2018

Choose a reason for hiding this comment

katef commented Jul 13, 2018 •

edited

Loading