Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[P4fmt]: attaching comments to IR Nodes #4845

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

snapdgn
Copy link
Contributor

@snapdgn snapdgn commented Aug 1, 2024

No description provided.

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved
@snapdgn snapdgn force-pushed the comments-handling branch 3 times, most recently from 82edb45 to 37344eb Compare August 8, 2024 13:13
@snapdgn snapdgn marked this pull request as ready for review August 8, 2024 13:14
@snapdgn snapdgn marked this pull request as draft August 8, 2024 14:33
@snapdgn snapdgn marked this pull request as ready for review August 8, 2024 14:43
@snapdgn snapdgn marked this pull request as draft August 8, 2024 14:49
@snapdgn
Copy link
Contributor Author

snapdgn commented Aug 8, 2024

This takes care of attachment of comments to basic parent Node types(Typedef, Type_Name, control blocks, structs etc).
The Node List is not exhaustive and underdeveloped, and support for further nodes will be added gradually.
It currently attempts to store the comments related to each node in two lists, 'before' (holds comments that are directly above the nodes) and after(contains comments that are on the same line as the nodes).
Plan is to hold all the comments related to a node and attach them at all at appropriate positions based on column information.
The comment list will store a list of code points, with each entry containing a pair of the SourcePosition and the corresponding comment.

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
}

//////////////////////// TYPES ////////////////////
bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the criterion for a supported node here? Otherwise I would use IR::Statement, IR::Type_Declaration, or IR::StatOrDecl here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the algorithm presented in the blog post (the Bazel approach) is that the preorder traversal will visit all AST nodes, and it can pick the right AST node to attach comments to automatically. The first node with a comment right before it will get that comment attached to it, and that comment will be removed from the list of comments to be attached, so that any child AST node on the same line won't get the same comment attached again.

This algorithm can guarantee a pretty reasonable attachment of line comments. And a similar algorithm can be applied in a postorder traversal to attach the inline trailing/suffix comments.

@fruffy I don't have enough knowledge on this, but is there a more convenient way in P4C to express that I want to do the same thing for all AST nodes regardless of its type? Maybe as you mentioned in another comment, adding IR::Node to a preorder is the way to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *) and postoder(Node *). Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ... one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder would likely place it. But I guess that can't be really said in general and either will be good enough.

Copy link
Contributor

@vlstill vlstill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the point of the compiler in general, I think there should not the the mutable in the SourceInfo and I am concern if this could impact compiler performance.

Other than that, I don't think naming all the node types is a good approach.

lib/source_file.h Outdated Show resolved Hide resolved
lib/source_file.h Outdated Show resolved Hide resolved
Comment on lines 219 to 223
mutable std::vector<Util::Comment *> before;
mutable std::vector<Util::Comment *> after;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point -- this make the SourceInfo, which is ubiquitous in the compiler, quite a bit larger (by 6 pointers with the common impl of vector). It would be good to have some benchmarks to see this does not unduly show down the compiler (not p4fmt backend) or increase its memory consumption too much.

Copy link
Contributor

@asl asl Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd go this way, I'd strongly suggest using small vectors that could hold up to 1 entry inline. But overall I concur that enlarging every IR node by this amount should be carefully benchmarked. We do create lots of nodes. There is huge malloc traffic everywhere. And while another 48 bytes might look small, it is an overhead paid on every node. Even if there are zero comments.

Also, comments are quite rare and here the price is paid by every node. Maybe the better solution is to use the technique similar to TrailingObjects in LLVM / clang.

Copy link
Contributor Author

@snapdgn snapdgn Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of storing those prefix and suffix comments as a part of SourceInfo, I was thinking how about storing that extra object in a hashmap, Hashmap<node-id, extra-comment-object>. This would avoid the unnecessary overhead to the compiler.
Would this be a good and straightforward solution for the time being?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your plan to update this side map on IR change? Overall, side long-living maps should be discouraged as they could easily go stale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to add some more context. We were thinking about using this side map only in p4fmt. With this constraint, there probably isn't any IR change expected I guess.

@snapdgn has also looked into using TrailingObjects. I agree that would be a better solution. My suggestion is to go with a simple solution for now, to unblock further p4fmt experimentation, as long as the solution doesn't influence the rest of P4C. Then switching to the TrailingObjects technique could be left as a future optimization.

@asl What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this side map is local, then certainly we're fine.

lib/source_file.h Outdated Show resolved Hide resolved
lib/source_file.h Outdated Show resolved Hide resolved
lib/source_file.h Outdated Show resolved Hide resolved
lib/source_file.h Show resolved Hide resolved
}

//////////////////////// TYPES ////////////////////
bool Attach::preorder(const IR::Type_Bits *t) { return preorderImpl(t); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think enumerating all/most node types is a good idea (the fact the actual implementation is the same is a good hint for that!). It defeats the purpose of having the nodes in a class hierarchy and it would make this very fragile to IR node addition. From what @qobilidop described as the high-level algorithm, it might even work to have just a single preorder(Node *) and postoder(Node *). Comment attachment does not really seem to dependent on whether the node is e.g. an addition or subtraction, it even does not seem depend on whether it is e.g. an expression or a type definition. Even in cases of comments inside statements like x = foo(/* block size */ a + /* block count */ b); // get ... one would probably get a good enough comment attachment (the only question is really if the trailing comment belongs to the assignment or the call (where postorder would likely place it. But I guess that can't be really said in general and either will be good enough.

@snapdgn snapdgn force-pushed the comments-handling branch 4 times, most recently from 2aff0aa to da4b314 Compare August 20, 2024 08:30
@snapdgn snapdgn marked this pull request as ready for review August 21, 2024 05:52
@snapdgn
Copy link
Contributor Author

snapdgn commented Aug 22, 2024

Comments are stored in a side map for now, as discussed.

Copy link
Collaborator

@fruffy fruffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick round of comments.

It would be good to have a reference output here. We can use the checker that you wrote. This way we can see the current behavior.

lib/source_file.cpp Outdated Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
lib/source_file.h Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Show resolved Hide resolved
backends/p4fmt/attach.cpp Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
@qobilidop
Copy link
Member

There are merge conflicts with the main branch. Please rebase/merge and resolve them.

@snapdgn snapdgn force-pushed the comments-handling branch 4 times, most recently from b87606d to 0dfc44a Compare August 26, 2024 10:04
@snapdgn
Copy link
Contributor Author

snapdgn commented Sep 5, 2024

Would be grateful for any updated feedback on this.

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
frontends/parsers/parserDriver.h Show resolved Hide resolved
lib/source_file.h Outdated Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
@snapdgn
Copy link
Contributor Author

snapdgn commented Sep 7, 2024

Almost all of the suggestions and comments have been addressed.

I'm planning on taking care of optimizations in a separate PR. (Sorting comments for easy lookup, etc).

I would appreciate a final review to confirm if this is ready for merging.

Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>

Signed-off-by: Nitish <snapdgnn@proton.me>
Signed-off-by: Nitish <snapdgnn@proton.me>
@qobilidop
Copy link
Member

qobilidop commented Sep 12, 2024

@asl @vlstill Any remaining concerns? I'll try to merge this if no new issues are raised.

backends/p4fmt/attach.cpp Outdated Show resolved Hide resolved
backends/p4fmt/attach.h Outdated Show resolved Hide resolved

namespace P4::P4Fmt {

std::optional<std::pair<const IR::P4Program *, const Util::InputSources *>> parseProgram(
const ParserOptions &options) {
auto *file = fopen(options.file.c_str(), "r");
Copy link
Contributor

@asl asl Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of fopen here? Just for "no such file" diagnostics as file is unused below? What does parseProgramSources reports in case no input file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to std::filesystem::exists , if that helps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need this? What does parseProgramSources repors in case non-existent input file?

Copy link
Contributor Author

@snapdgn snapdgn Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is necessary as parseProgramSources doesn't report anything if there's no input file, it just fails to parse and returns null. There are no other checks to validate this.

backends/p4fmt/p4fmt.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. There are no tests as well.

backends/p4fmt/attach.h Outdated Show resolved Hide resolved
@asl
Copy link
Contributor

asl commented Sep 12, 2024

@asl @vlstill Any remaining concerns? I'll try to merge this if no new issues are raised.

@qobilidop See my comments on concerns

Signed-off-by: Nitish <snapdgnn@proton.me>
@snapdgn
Copy link
Contributor Author

snapdgn commented Sep 13, 2024

See comments. There are no tests as well.

The tests are intended to be included as part of this PR.

Signed-off-by: Nitish <snapdgnn@proton.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants