Skip to content
This repository has been archived by the owner on Sep 6, 2021. It is now read-only.

Fix #3057: Toggle Block Comment doesn't work if the open/close delimiters are the same #3060

Closed
wants to merge 13 commits into from
Closed

Fix #3057: Toggle Block Comment doesn't work if the open/close delimiters are the same #3060

wants to merge 13 commits into from

Conversation

TomMalbran
Copy link
Contributor

This hopefully fixes all the problems mentioned in #3057.

@TomMalbran
Copy link
Contributor Author

@peterflynn Want to give this a review? The block comment on entire lines didn't worked because it was find suffix was finding the prefix since it was on its own line so it matched the expression. The other problem was that the line prefix started in the same way as the block prefix so I added an extra check.

@ghost ghost assigned peterflynn Mar 8, 2013
@@ -264,7 +274,7 @@ define(function (require, exports, module) {
}
} else {
prefixPos = _findCommentStart(startCtx, prefixExp);
suffixPos = _findCommentEnd(startCtx, suffixExp, suffix.length);
suffixPos = _findCommentEnd(startCtx, suffixExp, suffix.length, prefixPos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the code already worked like this, but... I'm finding it a little confusing that both lines here mutate startCtx (if I'm reading _findCommentEnd() correctly). So by the end of the 2nd line here startCtx is actually pointing to the end of the block comment, no? (Which is also potentially confusable with endCtx, i.e. the end of the selection). Seems maybe worth leaving some comments up where these are declared, at least, indicating what they'll mean (or whether they're just scratch vars).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I believe it should always use the ctx, since this is the one used at the end to check if there might be another block comment, making startCtx and endCtx stay always at the start and end of the selection. I did the change and all the tests pass.

@peterflynn
Copy link
Member

Three other notes:

  • Looks like it might need a merge with master? (I didn't get a conflict on my local copy of your branch, but GH seems to disagree...)
  • If you have a few extra min it would be awesome to add a couple unit tests for these cases.
  • Since this fixes the CoffeeScript case, want to roll in the languages.json addition of "###" here too?

@peterflynn
Copy link
Member

@TomMalbran done reviewing. Thanks for the fix!

@TomMalbran
Copy link
Contributor Author

  1. My pull request about multiple line prefixes landed this weekend, so I knew it would make some conflicts on this branch.
  2. Sure I can do something like I did with multiple line prefixes, with a special new language. I could work using CoffeeScript, but just to make sure it still works if the languages are removed from the core.
  3. Will do.

@TomMalbran
Copy link
Contributor Author

I ended doing a complete new fix, since I didn't consider the case where the prefix could actually be the suffix. This change now made it possible to uncomment a block comment when part of the selection is inside a block comment and part is inside a line comment, but I don't think is a bad thing, so I left it like that.

I also added a new fix here where doing a line comment on a block comment in languages like CoffeeScript or Lua, would remove part of the block comment prefix/suffix.

I'll try to do the test tomorrow, if the fix finally works.

@@ -212,16 +247,17 @@ define(function (require, exports, module) {
* @private
* Moves the token context to the token that ends the block-comment. Ctx starts in a block-comment.
* Returns the position of the sufix or null if gets to the end of the document and didn't found it.
* @param {!{editor:{CodeMirror}, pos:{ch:{string}, line:{number}}, token:{object}}} ctx - token context
* @param {!{editor:{CodeMirror}, pos:{ch:{number}, line:{number}}, token:{object}}} ctx - token context
* @param {!RegExp} suffixExp - a valid regular expression
* @param {!number} suffixLen - length of the suffix
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to update this JSDocs. Will do it on the next commit

@TomMalbran
Copy link
Contributor Author

I added some tests. 2 for normal cases and 6 for edge cases. Let me know if you think it might need more tests.

@@ -110,7 +128,7 @@ define(function (require, exports, module) {
for (i = startLine; i <= endLine; i++) {
line = editor.document.getLine(i);
// A line is commented out if it starts with 0-N whitespace chars, then a line comment prefix
if (line.match(/\S/) && !_matchExpressions(line, lineExp)) {
if ((line.match(/\S/) && !_matchExpressions(line, lineExp)) || _matchExpressions(line.trim(), blockExp)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the comment above this could use some updating. It's not immediately obvious why finding a block comment delimiter means there's an uncommented out line. Don't you need to know (a) whether it's an open vs. close delimiter, and (b) whether any non-blank lines exist on the "outside" side of the delimiter (and still within the startLine-endLine range)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as a corollary, it seems like with symmetrical delimiters (like "###"), you can only know "(a)" by looking at the tokens...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just here so when you want to toggle line comment a line starting with 0-N whitespaces and then a block comment prefix or suffix where at least one line comment prefix is the prefix of either one, then it shouldn't remove part of the block comment prefix, and instead just comment it. This doesn't check block comments and try to to be smart about them.

The case I am trying to avoid is when you use toggle block comment in the first line of the next code:

  ###
  block comment
  ###

that will turn into:

  ##
  block comment
  ###

Which breaks the block comment and doesn't do either a line comment or uncomment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think that makes sense. In that case, maybe the name should be changed to _containsNotLineComment() -- since block comments will cause it to return true too, and since that's actually what the code calling it wants to know.

Or if that name is too awkward you could invert the sense of it and use a name like _isAllLineCommented().

result = TokenUtils.moveSkippingWhitespace(TokenUtils.moveNextToken, ctx);
while (result && !ctx.token.string.match(blockExp[1])) {
result = TokenUtils.moveSkippingWhitespace(TokenUtils.moveNextToken, ctx) &&
_isBlockComment(ctx, blockExp, lineExp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a line comment within a block comment, wouldn't this cause us to halt early and never find the end of the block comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually same about the reverse direction in _findCommentStart()... I wonder if it's actually enough just to have the one _isBlockComment() check at the top outside of the loop, and leave the loop code itself unchanged -- if the token we start from is inside a block comment, we shouldn't care about line comments we find in between there and the block comment delimiter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we should add a couple unit tests for these nesting cases. E.g. try to block-uncomment a block comment containing a line comment -- right now that does nothing, I think because if the issues described above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I didn't thought about those cases, was trying to cover the line comments right before or after the block comment. That is a good idea, to leave it just outside, but it might still check the line comment inside the block comment, and break early or if we remove the line comment prefix check and the token starts in a line comment it could still confuse a block suffix with a block prefix as before.

@peterflynn
Copy link
Member

Partway through reviewing -- will try to finish up tonight.

@TomMalbran
Copy link
Contributor Author

@peterflynn There are mainly 3 cases now:

  1. Is the selection covering line comments? If it is, are we inside a block comment? If not, line uncomment if all the lines in the selection are line comments, if not block comment.
  2. Is the selection all white space characters? If it is, it finds the prefix and suffix and determines if the selection is inside the block comemnt.
  3. Are we in a comment token? (we already now that is not a line comment) Then find the prefix and suffix
    The last step is to find if there is another block comment in the selection.

But CoffeeScript is making the find prefix/suffix fails since it always finds the same when they are alone in one line, so it requires an extra check to find if we found the prefix or suffix and find the other one, which is the part where I am having problem with in some special cases.

I'll try to see in another branch if a refactor could make this simpler, and if it does, I can then try to add a fix for this here.

Actually @redmunds is right, there are some cases where it might be impossible to determine if we are in a block comment or not, without checking from the start to the current position, like the following case:

#line1
###
#comment 1
###
#line2
###
#comment 2
###
#line3

If you are in line 2, you can fail to find the prefix and suffix, without checking from the start, since every line is a comment. Now I was thinking, what if we in the case of finding a commented line trying to determine if is inside a block comment or not, we would uncomment, wait for the tokenizer to be done, check if it is still a block comment or not, and comment it again, and trying to keep this as a single edition? Might not be great, but can solve this problem without going to the hackier solution that I mentioned in my Edit2 before, or checking the entire file.

To @peterflynn last comment, what if the tokenizer would give an extra key: commentType with the values of line or block, and then make the tokenizers functions return this information to CodeMirror as an object/array, only for the comment tokens and CodeMirror can add this information. This would make it backwards compatible, and the mode changes should be that hard to make, since is just replacing a few lines in each mode.

@TomMalbran
Copy link
Contributor Author

I used the special regular expression for the line prefixes, and that made the code a lot simpler in lots of cases.

I tried to fix most of the cases mentioned and the ones I end up finding and added more tests for those cases. The only problem I found is in my example, trying to toggle uncomment the middle block comment, but works when there are fewer line comments, and that seems to be an extreme case where the only solution is to parse the document from the start to the position of the cursor.

I'll still might try to refactor it, but I don't find the code that complex yet.

@TomMalbran
Copy link
Contributor Author

@peterflynn I went ahead I did the code refactor, since I been thinking about it for the last couple of days and got a nice idea about the algorithm. The idea is similar to your proposal, but with a some variations to better cover every case and avoid unnecessary loops, the idea is:

  1. If the token is at a whitepsace, move to the the first non-ws token.
  2. Find the first block comment inside the selection. Here I added a flag to distinguish later from comments at the start of the selection or not.
  3. If we found a comment:
    • If this is a line comment over the complete line, and is at the start of the selection, then use the sub function to find if is a block comment or a line comment.
    • If this is a line comment over the complete line, but not at the start of the selection, we know that is a line comment.
    • If this is a line comment, but not over the complete line, then we can just comment.
    • If is not a line comment, then is a block comment. It might still be a block delimiter alone in the line, so for languages with equal block delimiters, I used the same function as before to find is a suffix or prefix and moved the token so that it is inside the block comment.
  4. If is a block comment.
    • From the token result find the prefix and then the suffix. Notice that since it always starts from inside a block comment, it wont find the same block delimiter 2 times.
    • Find if there is another block comment inside the selection and no-op if there is.
    • Since at the beginning we moved away from the ws, we should check if the selection is actually inside the block comment found. We might have started at a ws before a block comment prefix.
  5. If it is a line comment, we can just check if we can uncomment them when all the lines in the selection are commented, since we covered all the other cases.
  6. Make the edition as before.

The sub function tries to determine if there is a block comment that starts before the ctx given and ends at the position of the ctx or after, so this is why it can be used for both cases mentioned before. What it does is moves backwards while the line is a line comment, and when the loop ends it checks the result.

  1. If it is a comment (and we haven't reached the start of the document), then if no prefix or suffix match the string, is a block comment.
  2. If we found another block delimiter alone and we can't tell which one it is, we call recursively the same function and return the opposite value.
  3. If the suffix matches, then we found the end of a block comment, so the next line isn't a block comment.
  4. If the prefix matches, then we found the start of the block comment, so it returns true.
  5. If it wasn't a comment or we reached the start of the document, we can return false. Which covers the base case of the recursive case.

The code seems a lot more straight forward now and covers every case (in the tests and tested) even the one mentioned earlier with all the line comments, although in that case it will have to search from the cursor position to the start of the document.

@TomMalbran
Copy link
Contributor Author

There was a recent change in CodeMirror with the comment tokenizer as from codemirror/codemirror5@02ea26c, it now treats leading spaces inside a block comment as comment.

I haven't checked if it also makes the empty lines inside a block comment have a comment class too, if it does, we can remove the first movement to get out of the ws tokens, if not, we should only move when starting on an empty line, and for both cases, we can remove the the special case in the block comment section that checks if we moved when we didn't needed to. It will even make it easier to identify block comments, since if there is a line comment inside, the tokenizer string should now start with white spaces and then the line comment prefix, but it would still be a problem with line comments with no ws before the prefix.

Once we update the CodeMirror and if this is still open, we should check this cases, and fix them here, since I did a code refactor. If not, I'll fix them in a new request.

@njx
Copy link

njx commented Apr 16, 2013

This seems valuable, but would be nontrivial work to finish and ensure it works properly. Marking [PM] to consider moving to backlog.

@njx
Copy link

njx commented Apr 16, 2013

(I think the criteria for the backlog item should basically be that it should be possible to make line/block commenting for CoffeeScript and similar languages work the same way as it does for other languages.)

@TomMalbran
Copy link
Contributor Author

@njx What do you mean? The code here already works for CoffeeScript and every other language suppoerted. It actually passes ever test and any additional testing I did. The code is a lot simpler right now which makes it easier to maintain and fix if needed in the future. The current implementation isn't that great.

@TomMalbran
Copy link
Contributor Author

Any updates here? As mentioned before all the work is done, it only requires a new review of the updated code, fixes if required and then we can merge and fix the issue. It looks like @njx got the wrong idea of the status of this request. I can just merge with master, fix the merge issues and have someone to review it. If wanted I can rebase it too :)

@njx
Copy link

njx commented Jul 15, 2013

Sorry this is languishing. Unfortunately this was long enough ago that I've forgotten why I thought that. @peterflynn - do you recall whether we had other concerns here?

peterflynn added a commit that referenced this pull request Oct 31, 2013
migrate existing clients of TokenUtils. Simplifies code that skips
whitespace but doesn't change much else yet, beyond a few specific cleanups:
- add & clarify docs in EditorCommandHandlers
- in blockCommentPrefixSuffix(), make start/endStream effectively immutable
  per TomM in #3060
- in CSSUtils, simplify how streams are cloned & remove unneeded pos cloning
- in HTMLUtils.getTagInfo() remove unneeded 2nd iterator
@neojski
Copy link

neojski commented Nov 5, 2013

Anything new here? I read some of this long discussion but for me even the "lineComment": ["#"] (even if it breaks block comments) for coffeescript in languages.json is quite useful.

@TomMalbran
Copy link
Contributor Author

Once i have more time, I will grab this PR and make a new one clean.

@njx
Copy link

njx commented Mar 8, 2014

@TomMalbran I had totally forgotten about this pull request when I was doing the work in cmv4 to make the commenting stuff work with multiple selections - we should have landed this first. Looking at the changes here, it seems like conceptually it shouldn't be bad to merge what you have with the multiselection changes, because most of the changes were around how the edits are actually applied and selections are tracked when multiple selections are being edited, whereas most of your changes are in the code that leads up to that point. However, the actual merge would likely be messy.

My suggestion would be either:
(1) if you have time sometime soon, update this pull request against master (but we won't merge it yet), and then after we land cmv4 and it stabilizes, I can update it to work with multiple selections and land it; or
(2) if you feel like looking into the changes for multiple selections in the cmv4 branch and figuring out how to apply them to this PR, you could open a new PR against the cmv4 branch - we probably wouldn't land it until after cmv4 lands, but we could start re-reviewing it at least.

Sorry for all the churn around this PR. I'll add a card for it to make sure we don't lose track of it again (and remove the PM tag).

@TomMalbran
Copy link
Contributor Author

Sure I can remake this PR against master, since I am now working on the brackets repo and not in my fork.

@TomMalbran
Copy link
Contributor Author

@njx I went ahead and remade this PR, which was a lot easier than expected. New PR #7135.

@peterflynn
Copy link
Member

Closing since this was obsoleted by the new PR.

@TomMalbran Sorry again btw that this fell off the radar for such an abysmally long time

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants