Attendees:
- Jordan Harband (JHD)
- Richard Gibson (RGN)
- Justin Ridgewell (JRL)
- Shu-yu Guo (SYG)
SYG: Before we start, did the champion group decide on syntax vs API function (and if API function, composable or not)?
JRL: Planning to bring it up at the next meeting. Functionally, all the questions would apply equally regardless of API vs syntax
JRL: First question is what is considered whitespace? Spaces, tabs, or unicode whitespace? For whitespace, we're excluding newlines because they can't be used for indentation. The real question is do we consider spaces/tabs or all unicode whitespace? I'm leaning towards all unicode whitespace even if it's not currently all used.
JHD: Seems reasonable to me. Theoretically no one will run into the question in practice, if they do it seems way less weird to have it be \s than special spaces or tabs.
SYG: I'm a unicode newb, so does this mean we are tying ourselves to a live whitespace definition?
JRL: We already do in RegExp.
SYG: SGTM
JRL: Cool. Next question: how do we decide what indentation to remove? (Presents #13) Three different possibilities:
- Take indentation of the first line. Remove that from all of the lines.
- As presented of the last meeting, the common indentation is determined and used. This one is complicated but has produced the best results so far.
- Always take the closing line as indentation.
For 1 or 3, this gives explicit errors because you'd have to have at least the first (or closing) line's level of indentation on all lines
JHD: Why does it matter how wide your indentation is?
JRL: Because of spaces vs tabs, we'll talk about that in a moment.
JHD: What I mean is, I imagine this to work by drawing a vertical line where on the left everything is chopped off.
JRL: Well, imagine your first line or closing line is at a different level of indentation than the rest of the line. What do you define the vertical line to be?
JHD: Not the first or last line.
JRL: That's essentially common indentation.
SYG: To clarify, common indentation actually means "least indented"?
JRL: Would it be less confusing if I renamed it least instead of common?
SYG, JHD: Yes.
JRL: The difference among the three options is the explicitness of the error conditions. With least indentation, you could have a line in the middle of the template that's not indented correctly and would skew the indentation removal from actually happening. For the other options it's clear which line drives the indentation.
JHD: Taking the indentation of any one line seems pretty magical to me.
JRL: You can always see which line determined the indentation but the question is whether we want to make this an explicit error condition or built-in behavior.
JHD: I like least indentation.
SYG: I'm fine with least indentation as well. This would require at least 2 passes, but it's still linear. Interestingly this has interplay with the API vs syntax question. If syntax, maybe the lexer/parser can track the last indentation during parsing itself. Another question: for least indentation, how do you envision users figuring out the offending line that's messing up the indentation?
JRL: Visual inspection is the main one, or a regexp to find out which line has no indentation in the output.
JRL: Next question: what happens for empty/whitespace-only lines? Easiest answer is we don't do anything about it. If you want two whitespace characters removed, then every line needs those two whitespace chars and you just remove them. All lines are significant in this answer, even empty/whitespace-only ones. Another alternative is we don't consider empty lines in the indentation computation. Another alternative is to ignore whitespace-only lines. I think the 2nd is the most intuitive to people but leads to really weird edge cases. If we do ignore a whitespace-only line, if the least indentation has fewer/more whitespace characters than a whitespace-only line, do we try to remove what's there, or ignore it?
JHD: The way I'd define whitespace is the whitespace from the caret to the first non-whitespace character. Once the maximum common whitespace is computed, up to that amount is removed from every line. If that’s two spaces, then a line of 30 spaces in source would end up with 28. Like vertical block selection in MacOS.
SYG: That's an interesting point, have you looked at what happens with vertical block selection in popular editors?
JHD: I haven't. I've been basing this on other implementations, particularly Python's.
JRL: Seems like people are leaning towards removing up to the max amount possible of whitespace after the least indentation whitespace computation. But now we have a problem. Suppose we have a line that's a \Space\Tab. What do you do with the tab?
JHD: I'd leave the tab. Tabs don't have inherent width unless the user told you.
JRL: Let's backstep, I forgot to ask another question. When we match the least indentation, is it the total number of whitespace characters, or the exact sequence of whitespace characters? For example, \Space\Tab vs \Space\Space on two lines. Is the least indentation \Space (exact sequence) or \Space\Tab (any two whitespace characters)?
JHD: In this formulation "common" makes more sense now.
SYG: I'm leaning towards tracking the exact whitespace characters.
JRL: So \Space\Tab and \Space\Space, the common indentation would be just \Space. Going back to ignoring lines question, if the common indentation is \Space\Space, and we have a whitespace-only line that's \Space\Tab\Newline, what happens? If \Space and \Tab aren't interchangeable. We can remove the most possible out of the least indentation, i.e. one space, leaving the tab. I think that approach is viable. If we go with the other definition where we only ignore empty lines, we don't run into this edge case.
SYG, JRL, JHD: Leaning towards ignoring whitespace/empty lines in the computation-of-least-indentation phase, and in the removal phase try to remove as many characters of the computed indentation as possible.
JRL: Final question, do we treat opening/closing lines as special, say they can only have whitespace, or as regular lines?
JHD: Do you have an example where treating them as regular lines would be weird?
JRL: Since we chose to ignore whitespace only lines in computation of indentation, treating them as regular lines actually works out fine. If we had chosen to ignore empty/everything significant, then the opening/closing lines become harder to deal with.
SYG: That the opening/closing lines fall out of ignoring whitespace only lines in indentation computation is supporting evidence to me that that is the right choice.
JRL: That answers all my questions, thanks.