Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SourceLocation for parsing error & debug #857

Closed
Tracked by #866
vlopes11 opened this issue Apr 16, 2023 · 2 comments
Closed
Tracked by #866

Add SourceLocation for parsing error & debug #857

vlopes11 opened this issue Apr 16, 2023 · 2 comments
Assignees
Labels
assembly Related to Miden assembly

Comments

@vlopes11
Copy link
Contributor

Currently, we display a parsing error based on the line number of a compilation. This is introduced by #854

However, this is very simplistic and we might need a better representation that is flexible enough for advanced functionality, such as source mapping.

It is a common standard to use as location a combination of path, line, column. These 3 combined can uniquely map to a string that can be displayed to the user, to give him feedback.

One possibility is to have the following struct:

pub struct SourceLocation {
    pub path: String,
    pub line: u32,
    pub column: u32,
}

However, for the parsing process, this SourceLocation would exist for every Token. This might become very expensive for large compilations as we would greatly increase the memory requirements with the additional path. In fact, this information would be mostly duplicated, as normally we have a couple of files for compilation, and a far greater tokens count than files count. One alternative is to create a bidirectional map path <-> id, and use this id as part of the source location.

pub struct SourceLocation {
    pub path_id: usize,
    pub line: u32,
    pub column: u32,
}

This would introduce the requirement to have a common context for compilation that would hold the bidirectional map. Such context could look like

pub struct SourcePathRepository {
    path_ids: BTreeMap<String, usize>,
    paths: Vec<String>,
}

The path_ids is an efficient lookup to avoid duplicated paths, and paths would hold the indexes defined by the ids. This way, we can efficiently append new paths, as we would first lookup the path_ids and insert only if it doesn't exist, and we have O(1) to return a path, given an id.

There is also another possibility to further improve this context. The SourceLocation assumes the path exists, but it might be that it will not always be readily available. This might happen if we compile a source with a bundled, pre-compiled library. We could have a set of predefined sources that will be available for any application that will read these source locations and display them to the user:

pub enum Source {
    StandardLibrary {
        module: String,
        version: String,
    },
    AbsolutePath(String),
    RawContents(String),
}

And replace the simple string by this enum

pub struct SourcePathRepository {
    path_ids: BTreeMap<Source, usize>,
    paths: Vec<Source>,
}

This SourcePathRepository can be easily extended into a solution that will support source mapping for debug purposes.

Originally posted by @bobbinth in #854 (comment)

@bobbinth bobbinth added the assembly Related to Miden assembly label Apr 17, 2023
@bobbinth
Copy link
Contributor

I agree with the above, though I think the should start simple and implement SourceLocation just as

pub struct SourceLocation {
    pub line: u32,
    pub column: u32,
}

Once this is working, we can add path_id to SourceLocation. In our case, path would probably be a logical path to a module (rather than a file path) - e.g., std::math::u64 - and the mapping of logical paths to files would need to be provided separately.

vlopes11 added a commit that referenced this issue Apr 17, 2023
This commit introduces new data structures that will associate metadata
with a token.

The intent of such metadata is to allow the creation of source mapping,
as well as provide more insightful parsing error report.

The introduces TokenLines instruction will encapsulate the process of
reading strings & associating the token metadata with the yielded
tokens. It will not replace the TokenStream struct, as the latter is
responsible of interfacing with MASM. TokenLines will have the single
responsibility of fetching tokens, and will not offer token navigation
as TokenStream does.

The introduced SourceLocation struct will allow the link between a
source path; either from a library or a disk MASM file, into a parsed
token - hence, also to its Instruction, Operation, and AST location.

SourceLocation is split from SourceToken because the former is intended
to be used globally, and the latter is specific to the parsing process.

related issue: #857
vlopes11 added a commit that referenced this issue Apr 18, 2023
This commit introduces [Tokenizer], a tokens iterator that will decouple
tokenizing logic from the parse logic of [TokenStream].

It will create the foundation to bind tokens to their location,
regardless of parse constraints.

This commit aims to unblock the source mapping work.

related issue: #857
vlopes11 added a commit that referenced this issue Apr 18, 2023
This commit introduces new data structures that will associate metadata
with a token.

The introduced SourceLocation struct will allow the link between a
source path; either from a library or a disk MASM file, into a parsed
token - hence, also to its Instruction, Operation, and AST location.

SourceLocation is split from SourceToken because the former is intended
to be used globally, and the latter is specific to the parsing process.

related issue: #857
vlopes11 added a commit that referenced this issue Apr 18, 2023
This commit introduces new data structures that will associate metadata
with a token.

The introduced SourceLocation struct will allow the link between a
source path; either from a library or a disk MASM file, into a parsed
token - hence, also to its Instruction, Operation, and AST location.

SourceLocation is split from SourceToken because the former is intended
to be used globally, and the latter is specific to the parsing process.

related issue: #857
vlopes11 added a commit that referenced this issue Apr 21, 2023
This commit introduces [LinesStream], a lines iterator that will
pre-process the strings for [TokensStream].

It aims to unblock a link between a token location on a MASM fileand a
parsed operation.

related issue: #857
vlopes11 added a commit that referenced this issue Apr 21, 2023
This commit introduces [LinesStream], a lines iterator that will
pre-process the strings for [TokensStream].

It aims to unblock a link between a token location on a MASM fileand a
parsed operation.

related issue: #857
vlopes11 added a commit that referenced this issue Apr 21, 2023
This commit introduces [SourceLocation], a structure linked to a [Token]
via [TokenStream].

It will allow the link between a MASM source and a parsed [Operation].

related issue: #857
vlopes11 added a commit that referenced this issue Apr 21, 2023
This commit introduces [SourceLocation], a structure linked to a [Token]
via [TokenStream].

It will allow the link between a MASM source and a parsed [Operation].

related issue: #857
@bobbinth
Copy link
Contributor

Closed by #865 and #861.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assembly Related to Miden assembly
Projects
None yet
Development

No branches or pull requests

2 participants