Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser: detect mismatched/misleading indentation #13593

Closed
wants to merge 1 commit into from

Conversation

hryx
Copy link
Contributor

@hryx hryx commented Nov 19, 2022

Fixes #35

If any line in a block has indentation, then all lines in the block must have the same indentation.

Additionally, the indentation level of a block must be greater than or equal to indentation level of the parent block.

In the spirit of alerting the author to ambiguities or mistakes instead of presumptively fixing it, I added this to the parsing phase. This is similar to how mismatched whitespace around binary operators causes a parse error instead of being reformatted.

Notes

The original issue is light on details, but my interpretation is that potential programmer mistakes arise from sequences of statements, not container decls. So I made this indentation error totally independent per container. Blocks are only considered relative to its container; a sub-container and its contents may be out-dented relative to its parent.

If it is better to have statements in nested containers require monotonic indentation, that is an easy fix, just let me know!

This feature also ignores decls and fields, since that is not likely a source of ambiguity/bugs.

This adds one new field to Parser.

@hryx hryx force-pushed the mismatched-indentation branch from 5d6e8f4 to 5417cb2 Compare November 19, 2022 09:00
@SuperAuguste
Copy link
Contributor

SuperAuguste commented Nov 19, 2022

A major concern with this change is that it would kill any use of zig fmt for reparing broken indentation (e.g. from pasting, editors poorly auto-indenting, etc.) as the parser would die instead of ignoring and, upon formatting, fixing broken indentation. I believe this would negatively contribute to the Zig programmer experience.

Though on the positive side of things, this would stop pesky

if (a)
  ...;
  ...;

issues, which might be well worth it.

@hryx
Copy link
Contributor Author

hryx commented Nov 19, 2022

Since I just implemented this tonight, I haven't gotten a realistic picture of how painful this will actually be to use. But one annoyance I foresee is that some editors (including the one I use) are bad at issuing the correct indentation level for the next line when you hit "enter". Which means some annoying friction despite the IDE experience.

Looking forward to giving this a shot and seeing what it's like, regardless.

@sebastianhoffmann
Copy link

If running zig fmt on save can still fix the indentation and hence make this basically invisible for most users, it's imo just a net positive feature with the biggest impact likely being zig fmt becoming almost mandatory.

However, if zig fmt can't fix this as this error will prevent it from running as well, like @SuperAuguste suggests, this would have a much greater negative impact on developer experience than even the unused variables story has for debug builds / experimentation.

@IntegratedQuantum
Copy link
Contributor

In order to avoid friction for developers this should allow as many indentation styles as possible.
So here is a list of things that I consider important:

  1. Does this work with tabs?

  2. Does this work with mixed indentation?
    For example mixed tabs/spaces as used in "tabs for indentation, spaces for alignment".
    And also when many people work on the same document there sometimes is code with mixed 4/2 spaces indentation or mixed spaces/tabs. While I agree that this is ugly, I think this should be allowed regardless.

  3. Does this work well with parenthesis?
    I sometimes do this when my line gets too long:

function(
    x,
    y,
    z
);
  1. Does this work with line wrapping?
    This is another way of dealing with long lines:
x = a
  + b
  + c;
  1. What about empty lines?
    Some editors silently remove indentation of empty lines.

  2. What about comment lines?
    When commenting out code I find it nicer to comment it at zero indentation. (For me as a tab user this keeps the commented-out code at the same position)

{
	x= y;
//	y = z
}
  1. I think it should also work when there is multiple blocks declared in one line. (This is one of your error test cases)
    I have seen people do this when declaring a function inside another function:
const lambda = struct { fn xyz() void {
    code();
}};

@hryx
Copy link
Contributor Author

hryx commented Nov 19, 2022

We should probably keep discourse on the feature itself on the original issue (#35) and keep this PR for implementation/review.

But to quickly answer your questions @IntegratedQuantum:

Does this work with tabs?

Column is counted as byte offset after a newline or file start, so yes.

Does this work with mixed indentation?

No, because there is no way to know tab width.

Does this work well with parenthesis?
Does this work with line wrapping?
What about empty lines?
What about comment lines?

These are all unaffected. This only affects indentation of statements relative to each other, and your examples are all either single statements or not statements at all.

I think it should also work when there is multiple blocks declared in one line. (This is one of your error test cases)

I think you should look again at the code and test cases, your example is allowed by this change.


const parent_block_tok = p.current_block_indent_defining_token;
defer p.current_block_indent_defining_token = parent_block_tok;
const parent_indent = p.tokenColumn(parent_block_tok);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor performance note: This leads to some redundant scanning for the token column, but knowing the token (not just column) is useful for the error note. An alternative is to have Parser store a tuple of token, column like var sibling below.

@andrewrk andrewrk self-requested a review November 21, 2022 00:36
@hryx
Copy link
Contributor Author

hryx commented Nov 24, 2022

@andrewrk Let me know if you want me to rebase to get the new CI checks, or if you plan to wait till after 0.10.1 before reviewing this. No rush of course :)

@andrewrk andrewrk force-pushed the mismatched-indentation branch from 5417cb2 to a21726e Compare December 12, 2022 02:50
@andrewrk
Copy link
Member

Thanks for working on this!

Here are the next steps to getting it merged:

  • Rebase against master branch and fix conflicts
  • Add a new compile error test
  • Open new pull request against master branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

error for incorrectly indented code / misleading indentation
5 participants