Implement token list API #1829

ksss · 2024-05-24T14:03:06Z

This is a PR intended for discussion.

I want a token list.

I am implementing a RuboCop extension for RBS. In RuboCop, indentation and spacing are adjusted based on the positions of various tokens such as comments and (.

Problem

If I try to create these features from the results of RBS::Parser, I have to implement complex processing. This includes searching for token positions from RBS::Location objects and finding the position information of end-of-line comments from all locations.

Example 1: Search space in block.

# Search block start char `{`
# I hope there is no literal `“{”`...
lbrace_length = method_type.location.source.index('{')

# Search char before '{'
char_before_lbrace_length = method_type.location.source.rindex(/[^\s]/, lbrace_length)

if char_before_lbrace_length + 2 != lbrace_length
  add_offence(...)
end

Example 2: Search space between any token.

scanner = StringScanner.new(source)
tokens = []
pos = 0
until scanner.eos?
  case
  when scanner.scan('[')
    pos += 1
    tokens << [:pLBRACKET, pos]
  when scanner.scan(']')
    pos += 1
    tokens << [:pRBRACKET, pos]
  when ...

I understand that my use case is unique, so I believe there is little need to modify the existing parsing process.

It would be helpful to have a method to obtain a sequence of tokens as a new API.

Use case of token list in RuboCop

https://github.com/rubocop/rubocop/blob/12fd014e255617a08b7b42aa5df0745e7382af88/lib/rubocop/cop/layout/extra_spacing.rb

Proposal for token list API

Low level

I propose a low-level API called RBS::Parser#_lex, following the example of _parse_signature and similar methods. This low-level API aims to obtain the necessary information for a sequence of tokens using minimal C code.
It is desirable to be able to obtain all tokens, including comments.

High level

I propose a high-level API called RBS::Parser#lex. The name lex is inspired by Prism#lex. This high-level API will wrap the sequence of tokens obtained from _lex, making it more convenient to handle.

ksss · 2024-05-27T06:13:55Z

If possible, it would be nice to have a line break token as in Prism.

soutaro · 2024-05-28T07:41:23Z

@ksss Can you fix the steep type check failure? I plan to implement supports for line breaks and comment tokens, on the top of this PR.

ksss · 2024-05-29T01:52:20Z

Thank you for reviewing. I fixed type checking.

I plan to implement supports for line breaks and comment tokens, on the top of this PR.

GREAT! THANKS!

soutaro

🎉

ksss added 3 commits May 24, 2024 18:57

Implement token list API

6477215

Ruby API

c7451e3

Add alloc_lexer

a7ac349

soutaro added this to the RBS 3.5 milestone May 28, 2024

ksss added 2 commits May 29, 2024 10:45

Move class to each file

3396e52

Add Type and Document for Parser.lex

a274107

soutaro approved these changes May 29, 2024

View reviewed changes

soutaro added this pull request to the merge queue May 29, 2024

Merged via the queue into ruby:master with commit 0831489 May 29, 2024
17 checks passed

ksss deleted the lex branch May 29, 2024 02:42

soutaro mentioned this pull request May 29, 2024

Include trivia tokens to lex result #1831

Merged

soutaro added the Released PRs already included in the released version label Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement token list API #1829

Implement token list API #1829

ksss commented May 24, 2024

ksss commented May 27, 2024

soutaro commented May 28, 2024

ksss commented May 29, 2024

soutaro left a comment

Implement token list API #1829

Implement token list API #1829

Conversation

ksss commented May 24, 2024

I want a token list.

Problem

Use case of token list in RuboCop

Proposal for token list API

Low level

High level

ksss commented May 27, 2024

soutaro commented May 28, 2024

ksss commented May 29, 2024

soutaro left a comment

Choose a reason for hiding this comment