Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add assembly SourceLocation #861

Merged
merged 8 commits into from
Apr 25, 2023
Merged

feat: add assembly SourceLocation #861

merged 8 commits into from
Apr 25, 2023

Conversation

vlopes11
Copy link
Contributor

@vlopes11 vlopes11 commented Apr 17, 2023

This commit introduces [SourceLocation], a structure linked to a [Token] via [TokenStream].

It will allow the link between a MASM source and a parsed [Operation].

related issue: #857

@vlopes11 vlopes11 mentioned this pull request Apr 17, 2023
28 tasks
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I left a couple of test-related comments inline, but I have a more general question:

What is the reason for introducing TokenLines and SourceToken structs? It seems to me that to support mapping of tokens to source locations we just need to replace these lines with some parse_line_tokens() function which would return a list of tokens together with their SourceLocation's. Or does this complicate things for some reason?

assembly/src/tokens/lines.rs Outdated Show resolved Hide resolved
assembly/src/tokens/lines.rs Outdated Show resolved Hide resolved
assembly/src/tokens/lines.rs Outdated Show resolved Hide resolved
@vlopes11 vlopes11 changed the title feat: add assembly SourceLocation, SourceToken & TokenLines feat: add assembly SourceLocation & SourceToken Apr 18, 2023
@vlopes11 vlopes11 changed the base branch from next to vlopes11-tokenizer April 18, 2023 16:12
@vlopes11 vlopes11 force-pushed the vlopes11-tokenizer branch 2 times, most recently from 3cd7652 to 46619ee Compare April 21, 2023 22:28
Base automatically changed from vlopes11-tokenizer to next April 21, 2023 22:54
This commit introduces [SourceLocation], a structure linked to a [Token]
via [TokenStream].

It will allow the link between a MASM source and a parsed [Operation].

related issue: #857
@vlopes11 vlopes11 requested a review from bobbinth April 21, 2023 23:07
@vlopes11 vlopes11 changed the title feat: add assembly SourceLocation & SourceToken feat: add assembly SourceLocation Apr 21, 2023
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good! I left a couple of small comments inline.

assembly/src/lib.rs Show resolved Hide resolved
assembly/src/tokens/stream.rs Outdated Show resolved Hide resolved
assembly/src/tokens/stream.rs Outdated Show resolved Hide resolved
@vlopes11 vlopes11 requested a review from bobbinth April 24, 2023 17:12
@bitwalker
Copy link
Contributor

bitwalker commented Apr 25, 2023

Just wanted to throw my two cents in, because I've implemented this kind of infrastructure a few times before, have done so for the compiler already, and @grjte and I discussed extracting that crate out of the miden-ir repo into its own repo and using it with AirScript as well (that way we have a common set of diagnostics infrastructure for all our frontends). It may be worth exploring using it here as well.

For some additional context on how this generally gets used with a frontend, this parsing infrastructure builds on top of that diagnostics crate, and can be used with lalrpop or any hand-written parser. Instantiation of the DiagnosticsHandler isn't shown here, nor are the diagnostics that get emitted shown here, but the result are rustc-like diagnostics emitted to the terminal with all the fancy bits you'd expect.

At a high-level, the diagnostics crate provides a few things:

  • It tracks SourceFiles in a CodeMap (de-duplicated) with a unique SourceId, source spans (i.e. SourceSpan) are byte ranges paired with a SourceId that lets us reconstruct the original source span or convert to a filename + line/column number when emitting diagnostics.
  • Recording of diagnostics emitted by the program via the DiagnosticsHandler. A Diagnostic has a severity (error/warn/etc), an optional error code, a message, and one or more associated spans with optional notes.
  • Customizable emission of diagnostics (i.e. they can be recorded in-memory, printed, or ignored)
  • A Span<T> wrapper type, a Spanned trait, and proc macro support for deriving Spanned types

All of the above is oriented primarily towards the frontend of a compiler/interpreter (i.e. used during parsing/compilation), but it can be easily integrated into emission of a more stable debug info format in a variety of different ways depending on what properties you want from the debug info. For example, you could serialize the entire CodeMap to preserve all of the original sources; or you could simply emit a map of SourceId to corresponding filenames, which would be much smaller, but slightly less convenient to work with. You could also convert SourceSpan to a more stable Location equivalent which contains filename/line/column, at the cost of increased memory usage.

Emitting Diagnostics

#[derive(Spanned)]
struct Foo {
    #[span]
    span: SourceSpan,
}
impl Foo {
    fn compile(&self, diagnostics: &DiagnosticsHandler) -> anyhow::Result<()> {
       ...
       match bar() {
           ok @ Ok(_) => ok,
           Err(err) => {
               diagnostics.diagnostic(Severity::Error)
                       .with_message(format!("failed to bar: {}", &err))
                       .with_primary_label(self.span(), "in this foo")
                       .emit()
               Err(err)
        }
    }
}

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thank you!

@bobbinth
Copy link
Contributor

@bitwalker - this looks cool! It might be too big of a lift to integrate this into the assembler in a single go - but I think we'll need to use something like CodeMap relatively soon. So, there might be a good way to move to this type of a structure in stages.

@vlopes11
Copy link
Contributor Author

@bitwalker yes, using the SourceId seems to be the right path. I'll proceed merging this PR for these initial steps, but on the long run we should target for a structure like you described.

We should target using a centralized implementation too, but let's see a bit down the road

@vlopes11 vlopes11 merged commit f3a7131 into next Apr 25, 2023
@vlopes11 vlopes11 deleted the vlopes11-source-location branch April 25, 2023 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants