Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The IR design, type checking, and pre-optimizing #11

Closed
NalaGinrut opened this issue May 1, 2020 · 9 comments · Fixed by #24
Closed

The IR design, type checking, and pre-optimizing #11

NalaGinrut opened this issue May 1, 2020 · 9 comments · Fixed by #24
Labels

Comments

@NalaGinrut
Copy link
Contributor

NalaGinrut commented May 1, 2020

The current Rust compiler contains two IRs before GENERIC:

  • HIR for type-checking
  • MIR for borrow-checking and some pre-optimizing.

My previous experiences are more about functional programming language's compiler. That's relatively easier for coding, since there's pattern matching, and there're fewer side-effects (or even no) so that the optimizing is pretty easy: find the correct pattern, and inline the function or closure, then execute the rewriting rules. This process can cover many common optimizing, say, constant-fold, dead-variable-elimination, and dead-function-elimination, etc. For Rust, I think they're doing similar rewriting, but I need more researches.

The Rust compiler is written in Rust, so there's pattern matching. I guess we have to write more code for the tree node matching. After all, pattern matching is just syntax sugar, which expands more code that we have to write in C++.

I'm not sure if we can follow the exact design of HIR and MIR, since C++ may not be possible to cover the expressiveness exactly so that it's better to design a similar IR for taking advantage of C++ features. I'm just guessing, and I need more researches for the conclusion.

So I think the plan could be:

  • Implementing HIR according to Rust's design
  • Type-checking in HIR
  • Implementing MIR
  • HIR->MIR
  • Borrow-checking
  • Other pre-optimizing
  • MIR->GENERIC

That's a rough plan, there're more things, including memory management, library interfaces, exceptions handling, etc. But I'm not sure where to put them in the pipeline. So I just listed them.

Comments?

@NalaGinrut NalaGinrut added the plan label May 1, 2020
@philberty
Copy link
Member

We have HIR with @SimplyTheOther AST classes which are very expressive to do all the resolution and static analysis we need. That ticks off the top 3.

HIR->MIR i think for now using the Backend.h wrapper over GENERIC gcc tree's will work. I am not concerned about extra optimizations at this stage but there is the borrow checking and gccgro does its own escape analysis at this level so we will have to do that too.

I just want to avoid any other IR's because i think the AST and the Backened IR is enough for the front-end at least for now.

@NalaGinrut
Copy link
Contributor Author

OK, then we may change the first item to "name resolution".
Do you mean we do the borrow-checking and escape-analysis in GENERIC tree? I've no idea if it's good enough for us, but we can try.

@philberty
Copy link
Member

My only thing is that when we bring things down to the Backend abstraction GENERIC although thats what we feed GCC to get output i think we get alot of similar concepts as MIR not quite the same its still fairly high level but i would rather get this first project out of the way then look at it again where it could very well fit in to have another IR.

@NalaGinrut
Copy link
Contributor Author

Agreed.

@SimplyTheOther
Copy link
Member

According to the rustc dev guide and associated links, rustc used to have AST-based borrow-checking (and presumably type checking since it comes before borrow-checking), but they abandoned that approach due to difficulties with implementation (and for the borrow-checker, to allow "non-lexical lifetime" borrow checking, but that may not be a problem at this point with no borrow checker).

For now though, without complex features like borrow checking, I think the AST for IR alone could suffice (though it may make some things difficult).

@NalaGinrut
Copy link
Contributor Author

@philberty @SimplyTheOther OK, let's try to do all the things with AST, if we found something too difficult, that's even better for us if we have to introduce special IR later.

@philberty philberty linked a pull request May 17, 2020 that will close this issue
@philberty philberty reopened this Jun 13, 2020
@philberty
Copy link
Member

I've been considering a lot of what @bjorn3 mentioned about the end architecture for the compiler. Working with the AST for now in theory we could squeeze out GIMPLE but i fear the compiler could be hard to maintain at that point in terms of generating all the glue necessary for everything to work correctly without having MIR.

Even at the moment doing type resolution using the AST i have butchered some of the AST classes with extra fields to have the data we need and created duplicated scope classes for lookups. I am starting to look at implementing HIR which does seem to map very closely to what the AST looks like post type resolution right now. It would also help clean the code up a lot and create a common reference point.

@bjorn3
Copy link

bjorn3 commented Dec 3, 2020

Rustc does translation to HIR before typechecking. It stores the typecheck results in a side-table (or rather query result) as the HIR is immutable.

@philberty philberty added this to the Core Datastructures milestone Dec 18, 2020
@philberty philberty removed this from the Core Datastructures milestone Jan 6, 2021
@philberty
Copy link
Member

We are loosely following the rustc pipeline:

image

The only missing piece is MIR, the abstraction over GCC GENERIC is very similar to mir and we may not need MIR.

bors bot pushed a commit that referenced this issue Jan 25, 2022
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 #5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 #6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 #7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 #8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 #9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants