-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix R code cell parser and add # %%
support (#3709)
#3858
Conversation
- R parser would end cells after first blank line; this is now removed - `# %%` is now recognized as a valid cell demarcation in R - cellDecorationSetting now matches python (only highlights currently active cell - Tests added to test for these changes
All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks for the PR! Tests pass locally and it works well in a dev build 🎉 :
Screen.Recording.2024-07-05.at.09.58.30.mov
Just one question about the regexp before I approve.
BTW I couldn't find a spec on the R #'
syntax and it seems I misunderstood it. Do you know if a spec exists?
Also, could you please share more about:
The main reason this would be a substantial change is that the R parser would have to be specialized and the parser would no longer be "generic" as it seems like is currently planned for it.
I arrived at the current interface as the simplest possible thing to support what was needed by both Python and R – although you've pointed out that the R implementation was wrong on my end.
We can totally change the CellParser
interface as needed to support these requirements. We can even change which component is implemented by each language (doesn't have to be the parser). The main goal here is that, in future, any language can very easily add "code cell" support. Do you have any suggestions?
Sorry this is so long. Did a bit of a deep dive and trying to be complete :-) From what I can tell (looking at R, python, julia, and stata), there are two designs for code cells:
knitr::spin(knit = FALSE, text = "
1 + 1
#' # Header
#'
#' Paragraph
2 + 2
") |> cat(sep = "\n")
#>
#> ```{r}
#> 1 + 1
#> ```
#>
#> # Header
#>
#> Paragraph
#>
#> ```{r}
#> 2 + 2
#> ``` There are two bugs I think that can happen with a regex based approach, which is why parsing the language could be potentially necessary. First, you are within a string and it just so happens you have a # %% at the start of a new line: # %%
"""
String
# %%
String continues
"""
Second, R has # %%
#' The length of a string
#'
#' Technically this returns the number of "code points", in a string. One
#' code point usually corresponds to one character, but not always. For example,
#' an u with a umlaut might be represented as a single character or as the
#' combination a u and an umlaut.
#'
#' @inheritParams str_detect
#' @return A numeric vector giving number of characters (code points) in each
#' element of the character vector. Missing string have missing length.
#' @seealso [stringi::stri_length()] which this function wraps.
#' @export
#' @examples
#' str_length(letters)
#' str_length(NA)
#' str_length(factor("abc"))
#' str_length(c("i", "like", "programming", NA))
str_length <- function(string) {
}
# %%
code ...
I believe a parser that could do both syntaxes would go like this:
create_cells:
Overall, I think I could code up a draft parser that works for both cases, but not sure how to make this logic generic. Alternatively, could ask each LSP to be able to parse a document and return As an aside, |
Co-authored-by: Wasim Lorgat <mwlorgat@gmail.com> Signed-off-by: Kyle F Butts <buttskyle96@gmail.com>
@seeM, is it possible for the parser to observe the |
@kylebutts Thank you for your contributions and for the detailed write-up, amazing! 😄 Note that CI seems to be failing on the latest run. SummaryJust to double-check my understanding, is this an accurate summary of where we are right now:
Moving forwardHere's my suggested path forward:
|
Addresses #3709. Good to merge! Adding to Moving Forward, I also think the extension could do a better job at caching the code cell computation work. As it stands, the code is parsed multiple times every time the cursor is moved (add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Merged, thanks again @kylebutts! 😄 I agree re caching! |
Changes:
# %%
is now recognized as a valid cell demarcation in RRemaining enhancement:
Complete R support would recognize
#'
as markdown cells. This feature is fully supported inknitr::spin
andquarto render
for a script.This would be require more substantial changes (that I would be happy to implement). For example:
This example has three cells: code cell, markdown, and code cell. But the latter one requires seeing the markdown comment, ending the first code cell, and starting the second.
The
knitr::spin
/quarto render
treat anything after a markdown comment as a new cell, so this behavior should probably be matched. In this exampleoutput: false
does not apply to 2 + 2, but might seem like it will because running the code would run both 1 + 1 and 2 + 2.The main reason this would be a substantial change is that the R parser would have to be specialized and the parser would no longer be "generic" as it seems like is currently planned for it.