The IR design, type checking, and pre-optimizing #11

NalaGinrut · 2020-05-01T18:30:48Z

The current Rust compiler contains two IRs before GENERIC:

HIR for type-checking
MIR for borrow-checking and some pre-optimizing.

My previous experiences are more about functional programming language's compiler. That's relatively easier for coding, since there's pattern matching, and there're fewer side-effects (or even no) so that the optimizing is pretty easy: find the correct pattern, and inline the function or closure, then execute the rewriting rules. This process can cover many common optimizing, say, constant-fold, dead-variable-elimination, and dead-function-elimination, etc. For Rust, I think they're doing similar rewriting, but I need more researches.

The Rust compiler is written in Rust, so there's pattern matching. I guess we have to write more code for the tree node matching. After all, pattern matching is just syntax sugar, which expands more code that we have to write in C++.

I'm not sure if we can follow the exact design of HIR and MIR, since C++ may not be possible to cover the expressiveness exactly so that it's better to design a similar IR for taking advantage of C++ features. I'm just guessing, and I need more researches for the conclusion.

So I think the plan could be:

Implementing HIR according to Rust's design
Type-checking in HIR
Implementing MIR
HIR->MIR
Borrow-checking
Other pre-optimizing
MIR->GENERIC

That's a rough plan, there're more things, including memory management, library interfaces, exceptions handling, etc. But I'm not sure where to put them in the pipeline. So I just listed them.

Comments?

philberty · 2020-05-01T19:28:38Z

We have HIR with @SimplyTheOther AST classes which are very expressive to do all the resolution and static analysis we need. That ticks off the top 3.

HIR->MIR i think for now using the Backend.h wrapper over GENERIC gcc tree's will work. I am not concerned about extra optimizations at this stage but there is the borrow checking and gccgro does its own escape analysis at this level so we will have to do that too.

I just want to avoid any other IR's because i think the AST and the Backened IR is enough for the front-end at least for now.

NalaGinrut · 2020-05-01T19:56:34Z

OK, then we may change the first item to "name resolution".
Do you mean we do the borrow-checking and escape-analysis in GENERIC tree? I've no idea if it's good enough for us, but we can try.

philberty · 2020-05-01T20:35:57Z

My only thing is that when we bring things down to the Backend abstraction GENERIC although thats what we feed GCC to get output i think we get alot of similar concepts as MIR not quite the same its still fairly high level but i would rather get this first project out of the way then look at it again where it could very well fit in to have another IR.

NalaGinrut · 2020-05-01T21:22:31Z

Agreed.

SimplyTheOther · 2020-05-02T14:10:45Z

According to the rustc dev guide and associated links, rustc used to have AST-based borrow-checking (and presumably type checking since it comes before borrow-checking), but they abandoned that approach due to difficulties with implementation (and for the borrow-checker, to allow "non-lexical lifetime" borrow checking, but that may not be a problem at this point with no borrow checker).

For now though, without complex features like borrow checking, I think the AST for IR alone could suffice (though it may make some things difficult).

NalaGinrut · 2020-05-02T18:37:47Z

@philberty @SimplyTheOther OK, let's try to do all the things with AST, if we found something too difficult, that's even better for us if we have to introduce special IR later.

philberty · 2020-12-03T16:13:32Z

I've been considering a lot of what @bjorn3 mentioned about the end architecture for the compiler. Working with the AST for now in theory we could squeeze out GIMPLE but i fear the compiler could be hard to maintain at that point in terms of generating all the glue necessary for everything to work correctly without having MIR.

Even at the moment doing type resolution using the AST i have butchered some of the AST classes with extra fields to have the data we need and created duplicated scope classes for lookups. I am starting to look at implementing HIR which does seem to map very closely to what the AST looks like post type resolution right now. It would also help clean the code up a lot and create a common reference point.

bjorn3 · 2020-12-03T16:38:20Z

Rustc does translation to HIR before typechecking. It stores the typecheck results in a side-table (or rather query result) as the HIR is immutable.

philberty · 2021-03-15T14:52:37Z

We are loosely following the rustc pipeline:

The only missing piece is MIR, the abstraction over GCC GENERIC is very similar to mir and we may not need MIR.

…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 #5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 #6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 #7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 #8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 #9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <jakub@redhat.com> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.

NalaGinrut added the plan label May 1, 2020

philberty linked a pull request May 17, 2020 that will close this issue

Phil/compilation simple #24

Merged

philberty closed this as completed in #24 May 17, 2020

philberty reopened this Jun 13, 2020

philberty added this to the Core Datastructures milestone Dec 18, 2020

philberty removed this from the Core Datastructures milestone Jan 6, 2021

philberty closed this as completed Mar 15, 2021

philberty mentioned this issue Jul 26, 2021

Can't call extern functions #421

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The IR design, type checking, and pre-optimizing #11

The IR design, type checking, and pre-optimizing #11

NalaGinrut commented May 1, 2020 •

edited

Loading

philberty commented May 1, 2020

NalaGinrut commented May 1, 2020

philberty commented May 1, 2020

NalaGinrut commented May 1, 2020

SimplyTheOther commented May 2, 2020

NalaGinrut commented May 2, 2020

philberty commented Dec 3, 2020

bjorn3 commented Dec 3, 2020

philberty commented Mar 15, 2021

The IR design, type checking, and pre-optimizing #11

The IR design, type checking, and pre-optimizing #11

Comments

NalaGinrut commented May 1, 2020 • edited Loading

philberty commented May 1, 2020

NalaGinrut commented May 1, 2020

philberty commented May 1, 2020

NalaGinrut commented May 1, 2020

SimplyTheOther commented May 2, 2020

NalaGinrut commented May 2, 2020

philberty commented Dec 3, 2020

bjorn3 commented Dec 3, 2020

philberty commented Mar 15, 2021

NalaGinrut commented May 1, 2020 •

edited

Loading