-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Spanned
to retrieve source locations on AST nodes
#1435
base: main
Are you sure you want to change the base?
Conversation
Spanned
to retrieve sourcec locations on AST nodesSpanned
to retrieve source locations on AST nodes
@alamb Hey, we've started using this functionality internally with pretty great success so far. It's still a draft for now, because of the missing todo!'s, but I would appreciate feedback on the overall design and if you think this can get merged in the foreseeable future once issues are addressed. 😄 |
How is this |
Alright, I have now gotten rid of all the todo!s and warnings and un-drafted the PR. There is a lot of missing implementations of spans and I have documented those (in hindsight maybe a derive macro would have been the way to go here, someone better at writing those than me is free to take a shot at it 😅 ). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling this @Nyrox! took a quick look and left some comments inline, I'll make some time to do another pass
where | ||
S: Into<String>, | ||
{ | ||
assert!(quote == '\'' || quote == '"' || quote == '`' || quote == '['); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the assert I think we can remove to avoid panicking in any case. if we want to restrict the quote char the fn would rather return a Result<Self>
I imagine, though in this case I think it could make more sense to let the parser accept any char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing it sounds good to me, I just wanted to have the same behaviour as Ident::with_quote
which still has the check, so it should prob. be removed in both places then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the verdict here? Remove or not? :P I am not sure it's in the scope of this PR tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't realise the behavior exists already, it sounds reasonable to follow suit. The current behavior feels unusual to me though, and I agree that it'd be out of scope for this PR to change it. I can take a closer look at it and we can have the behavior here match the outcome
Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Nyrox! The changes look reasonable to me overall given the discussion in the GH issue. Left some comments, one mostly wondering around the equality behavior now that the token location is embedded within the AST
@@ -9805,6 +9853,7 @@ fn parse_map_access_expr() { | |||
#[test] | |||
fn parse_connect_by() { | |||
let expect_query = Select { | |||
select_token: TokenWithLocation::wrap(Token::make_keyword("SELECT")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, wondering, I'm not sure if the behavior to include tokenlocation in comparisons is desirable, thinking rather comparing any two select statements or ast nodes should have the same behavior as today, ignoring any metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree it can be a bit tricky. For this particular case it doesn't matter I think, because TokenWithLocation's custom Eq implementation ignoring spans. So as long as selects actually have a select token they will compare.
What actually is the use case of having Eq on ast nodes? I guess it's so you can throw them into HashMaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no my mistake I missed that the span is being ignored in the comparison here as well. the actual token isn't currently ignored however? thinking that potentially makes it awkward needing to figure out what a token on a property should be, and not sure if there are guarantees provided by the parser on its value? (I'm assuming having the token included in the node is useful)
Not only Eq, but PartialEq as well, basically thinking as with today, any two subexpressions should remain equal regardless of what part of the tree they show up in (for example required when taking fingerprints of sql and comparing them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah. I mean "Token" in most of these cases is going to be Keyword, so there is some certainty to them being predictable. I.e. if two queries have different tokens, I would very much expect them to actually be different queries.
But I also get if it opens too many cans of worms, so we could just omit it from the trait implementations again, it's just a bit annoying then that we would have to start writing custom implementations for every type that is storing a Token.
The Token is useful, because you need to store it (or a at least it's Span) so you can include it for reported spans.
I.e. select a from b
, if you don't store the token for 'select' can't include that as part of it's span :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree it can become unweildy to try to workaround this by needing custom impls for each type, also as you mention its probably a reasonable assumption that the queries are different if they have different tokens (given that they come from the same parser)
Bump 😅 |
Hi @Nyrox! We had an internal discussion to try to upstream the location changes we use in SYNQ. I found some time to finish rebase today and then noticed your PR. (#1480) I like your suggested changes, as they are very noninvasive for the downstream consumers. If this could be included in the main, that would be absolutely amazing. Having location inside every Ident is probably best. In my PR, I wrapped all Idents with location, but our fork uses them only when we need them. Having it always available inside of It would also be good to design and include wrapping of nodes. It is essential, especially for enum cases where adding location to each of the enum cases would be too verbose. Also in some places those "smaller" locations just don't make much sense. In my PR, I'm using fc8e44a#diff-fc6a8d66b8cb6bd48a119a70e489cbcef82e7f551e7cf08e8e972ef4e774ef49R12259-R12308 to align captured span to a non-whitespace portion of the query based on token indexes. It is very fast to add such spans with accurate capture this way. |
Hi @lustefaniak, thanks I remember seeing your fork when initially researching this 😅 |
Hey all, think we've gotten quite far here - is there anything actively blocking this from getting approved? Are there any actions I can take to help you guys or do you just need the time to get to this item? 😊 |
This PR adds a new trait
Spanned
to retrieve source spans on AST nodes, see #161 , by recursively traversing the AST and combining spans. This approach is in contrast to the one taken in #839 and #790 by trying to minimise the amount of breaking changes for downstream users, by avoiding wrapping everything inWithSpan<T>
.Main Changes
The general philosophy of the PR is to be "good enough" without breaking things. As a result certain expressions will have broken or incorrect spans. I imagine these can be cleaned up in future PRs (which might require breaking changes).
f.e. many expressions do not include keywords in their span i.e.
will have it's source span simply reported as
<expr>::span
and there is many such cases, some of which are easier to fix than others. For expressions we can't generate spans for, I useSpan::EMPTY
as a sort of sentinel value to indicate missing information.With this approach the only downstream changes a user should have to do to upgrade, should be adding additional fields when matching on AST nodes.
Future Work
ast::value::Value
. This seems like a breaking change to me, which would require aWithSpan<T>
like type