Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: use dedent for de-indentation in lexer, fix #15184 #16699

Closed
wants to merge 3 commits into from

Conversation

a-mr
Copy link
Contributor

@a-mr a-mr commented Jan 12, 2021

When first line in a doc comment is empty, lexer.nim memorizes indentation at it and keeps one space on the following lines:

##
## x

tokenized as comment tok.literal =


 x

(with space before x) which is considered a block quote in RST.

@a-mr
Copy link
Contributor Author

a-mr commented Jan 12, 2021

bummer: there is no dedent in csources, because csources is of version 0.20.0.

It worked so well in my local copy :-(

@timotheecour
Copy link
Member

timotheecour commented Jan 13, 2021

@a-mr
you're hitting #16646, see workaround mentioned there which works:

in strutils, change as follows:

since (1, 3):
  func indentation*(s: string): Natural = ...
  func dedent*(s: string, count: Natural = indentation(s)): string {.rtl,
      extern: "nsuDedent".} = ...

after that you're hitting timotheecour#521
for which there's also a workaound:
add this to compiler/lexer.nim:

proc dedent(a: string): string =
  # pending https://github.com/timotheecour/Nim/issues/521
  let b = a
  let i = indentation(b)
  dedent(a, i)

then this works

@timotheecour timotheecour reopened this Jan 13, 2021
@Araq
Copy link
Member

Araq commented Jan 13, 2021

Why can't we fix the existing logic instead...

@timotheecour
Copy link
Member

To avoid duplicating code. Code reuse is good.

@a-mr
Copy link
Contributor Author

a-mr commented Jan 13, 2021

@Araq , @timotheecour . Bringing into consideration another edge case. Assume we really want to start comment from a quote:

proc f* =
  ##   Quote
  ## Paragraph
  discard

Current logic sets indentation to 0 at "Quote", then it will re-adapt indentation to 0 again at "Paragraph", so we get tok.literal wrongly as:

Quote
Paragraph

To make indentation right in such cases we need to look ahead through the entire string and find non-whitespace-character with minimal indentation. It's what dedent does. So we are back to re-inventing dedent.

I think, we can avoid the workarounds proposed by timotheecours and just copy a (short, few lines long) implementation of dedent into lexer.nim.

Alternative solution

You can specify that zero or one space in any comment correspond to zero indentation, then grows linearly. This solution avoids to do any de-indentation, except the trivial crop that can be done on each line separately #x -> x, # x -> x,

#  x

-> " x", etc.

initial indentation, spaces after ## resulting indentation
0 0
1 0
2 1
.. ..
n n-1

The side effect of such decision is that there will appear a block quote if e.g. 2 spaces are present after "#"

const pi = 3.14 ##  I made a mistake and put 2 spaces accidentally and would get a block quote all of a sudden :-)

@Araq
Copy link
Member

Araq commented Jan 14, 2021

To avoid duplicating code. Code reuse is good.

So do it in a way that doesn't slow down things. Also: I avoid helpers from the stdlib nowadays because you never know if it gets "fixed" in an incompatible manner because it's "inconsistent with Python/Unix/some other proc somewhere else".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants