Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-line strings #414

Merged
merged 24 commits into from
Dec 13, 2022
Merged

Multi-line strings #414

merged 24 commits into from
Dec 13, 2022

Conversation

jdidion
Copy link
Collaborator

@jdidion jdidion commented Nov 4, 2020

The ability to have multi-line strings has been requested several times. It would also make easier several of the other proposals under consideration, such as in-line Dockerfile and evaluation of arbitrary expressions. I went with Python/Scala-style rather than HEREDOC style as that seems most idiomatic for WDL.

This is implemented in the v2 ANTLR4 grammar in a wdlTools branch: https://github.com/dnanexus-rnd/wdlTools/tree/multiline-strings/src/main/antlr4/v2

Checklist

  • Pull request details were added to CHANGELOG.md

@jdidion jdidion added enhancement An enhancement to the WDL language. spec change A suggested change to the WDL specification. labels Nov 4, 2020
@jdidion jdidion changed the title Update SPEC.md Multi-line strings Nov 4, 2020
@patmagee
Copy link
Member

patmagee commented Nov 4, 2020

@jdidion This is amazing. I believe this will be considered a highly welcome addition to the WDL specification!

@cjllanwarne
Copy link
Contributor

  • IMO the "least surprising" behavior would be that all whitespace in the multi-line string are preserved by default.
  • I like the idea of starting a line with | to indicate "strip margin" but in practice, people might try to use this to generate bash commands, in which case erasing |s and > feels dangerous. Could that be a function (eg like strip_margin(multiline_string) rather than default behavior)?

@cjllanwarne
Copy link
Contributor

cjllanwarne commented Nov 4, 2020

We could also have a remove_newlines(multiline_string) to replace the "we remove all newlines by default" behavior currently proposed

@jdidion
Copy link
Collaborator Author

jdidion commented Nov 4, 2020

My feeling was that the most common usage for multi-line strings will be to break up long strings. If you want to write bash, you put that in the command block, no?

I don't think we need a special remove_newlines function - just use sub(str, "\n", " ").

@patmagee patmagee added this to the v2.0 milestone Nov 4, 2020
@cjllanwarne
Copy link
Contributor

Coming from scala might be distorting my sense of "what people expect" w.r.t. newlines in multi-line strings.

Usually the "what would people expect" tie-breaker in cases of different aesthetic preferences is "what would python do", which seems appropriate here too.

@cjllanwarne
Copy link
Contributor

As far as I can tell, python also preserves newlines in multi-line string literals (https://www.techbeamers.com/python-multiline-string/). Unless there's a really strong case against it, I think that should be what we do.

Yes, I think bash commands usually go in command blocks, but I still think having a syntax that erases very common bash-style syntax by default in multi-line string literals feels dangerous to me

@jdidion
Copy link
Collaborator Author

jdidion commented Nov 4, 2020

Personally, I find it very annoying to have to litter my Scala code with .stripMargin.replaceAll("\n", " ").

I agree that "least surprising" is a good approach, but I also feel it can be over-ridden by the desire to simplify the most common use case. We can add more examples to make it clear that if you want to write a multi-line bash string (or whatever) that preserves newlines you have to use | or > prefixes.

That said, I admit I may be wrong about what will be the most common use-case.

@cjllanwarne
Copy link
Contributor

I'd argue with your premise there though that the most common use case is "I want to split a single line string across many lines". I think the most common usage would be "I want a to write a multi-line string directly in my WDL". In which case the simplest way to provide that is to make it the default.

But we're arguing our own personal preferences here - which will never have a definitive answer. I think "choose the options that's most similar to python" is the most objective possible way to resolve these discussions.

@jdidion
Copy link
Collaborator Author

jdidion commented Nov 4, 2020

Fair enough. Let's wait to get a bit more feedback from others. I'm happy to update the spec and grammar if the consensus is to leave in whitespace by default.

@cjllanwarne
Copy link
Contributor

Agreed! An n > 2 is always a good idea...

@patmagee
Copy link
Member

patmagee commented Nov 4, 2020

Personally, I find the least surprising thing to be represent the string "as-is" with new lines intact. for me that is the main point of a multiline string, is to write across multiple lines. The tricky part I see is around preserving indents or removing them. I believe in python (since its tab/space delimeted) this is easy, since you simply strip out leading tabs corresponding to the current indent level. In WDL this is a bit trickier. we could

  1. Not touch anything
  2. Strip the least number of common tabs
  3. Define a syntax (ie | ) for retaining those indents

@jdidion
Copy link
Collaborator Author

jdidion commented Nov 4, 2020

Ok, how about:

  • By default, leave newlines and strings alone
  • If a line is prefixed by |, automatically strip all the leading whitespace before it (as well as the | itself)
    • The | and whitespace is left in tact if it is escaped
  • If the user wants to remove newlines they have to sub(str, "\n", " ")

@cjllanwarne
Copy link
Contributor

I really worry about strings like

String command_prefix = """echo hi \
  | grep foo \
  | wc -l"""

@jdidion
Copy link
Collaborator Author

jdidion commented Nov 4, 2020

Oh, good point.

In that case I guess we may need a trim_leading_whitespace function. By default this would strip the least number of common spaces/tabs after each newline. We could have an optional second argument that would do any/all of the following:

  • If it is an integer, strip (up to) that many leading whitespace characters
  • If it is a whitespace-only string, strip exactly that whitespace from the beginning of each line (and leave in-tact lines that don't begin with exactly that whitespace)
  • If it is a single non-whitespace character, strip all leading whitespace up to and including that character any time that character is the first non-whitespace character in the line

@jdidion
Copy link
Collaborator Author

jdidion commented Nov 4, 2020

A simpler alternative would be a dedent function (akin to the one in python textwrap https://docs.python.org/3/library/textwrap.html#textwrap.dedent) that just removes leading common whitespace, and not try to have a syntax for margin. It could take a second boolean parameter which says whether to also strip out newlines.

@cjllanwarne
Copy link
Contributor

Those ideas all sound reasonable to me.

@pshapiro4broad
Copy link
Contributor

Have you looked at the spec for Java text blocks? The java spec supports both automatically stripping newlines and preserving relative indent and doesn't require a lot of extra syntax. https://openjdk.java.net/jeps/378

@cjllanwarne
Copy link
Contributor

I do like that java text block spec's interpretation of "intrinsic whitespace", which matches very closely the command block behavior.

@patmagee
Copy link
Member

patmagee commented Nov 4, 2020

Ive just started using Java's implementaiton of text blocks and they are very intuitive, so I give a big +1 to that!

@patmagee
Copy link
Member

is the discussion on how to handle whitespace still live or do we have general consensus?

@jmesterh
Copy link

This would be really helpful, we have a large CWL pipeline we want to convert to WDL which uses YAML text blocks extensively for SQL queries. Not having an equivalent in WDL has been frustrating, as the SQL needs to be squashed into a single line. This feature would be greatly appreciated.

@patmagee
Copy link
Member

@jdidion any update on the status of this? It would be great to get this to voting

@jmesterh
Copy link

FWIW I'd vote for the Java text block syntax. Since my last post we've resorted to a yaml → wdl converter which isn't fun.

@hkeward
Copy link

hkeward commented Oct 20, 2022

👍 to the usefulness of this feature!

@jdidion
Copy link
Collaborator Author

jdidion commented Oct 20, 2022

I am good with Java text block syntax. I haven't read the spec in detail but it looks to me like the important things are:

  • Strings start and end with three matching quote ("""...""" or '''...''')
  • The minimum indentation in all non-empty lines is stripped from all lines
  • Newlines are preserved unless they are escaped by a \ as the last character in the line

The only thing that is not clear to me is how to handle mixed indentation. For example, is "<space><tab><space>" longer or shorter than "<space><space><space><space>"? I think the two most reasonable options for mixed indentation are:

  1. Raise an error
  2. Treat each whitespace character as equal (i.e. a tab and a space both count as 1).

Preference?

jdidion added a commit that referenced this pull request Oct 20, 2022
versions/development/SPEC.md Outdated Show resolved Hide resolved
versions/development/SPEC.md Outdated Show resolved Hide resolved
mlin added a commit to chanzuckerberg/miniwdl that referenced this pull request Nov 24, 2022
@mlin
Copy link
Member

mlin commented Nov 25, 2022

Included this feature in miniwdl v1.8.0 version development; I believe it's up-to-date with the spec as of this comment, but def invite anyone interested to hammer on it.

@jdidion
Copy link
Collaborator Author

jdidion commented Nov 28, 2022

@mlin great! I'm personally satisfied with the spec for this feature. I'll tag people for reviews.

Copy link
Contributor

@rhpvorderman rhpvorderman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks excellent. Great work @jdidion !

@geoffjentry
Copy link
Member

LGTM

@patmagee
Copy link
Member

This looks great! Amazing effort from everyone on getting to a design that feels perfect!

Copy link
Contributor

@illusional illusional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor formatting issue, but otherwise 👍

This is a
multi-line string!
>>>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing ``` backticks here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

@jdidion
Copy link
Collaborator Author

jdidion commented Dec 12, 2022

Last call for comments. Plan to merge later this week.

Copy link
Member

@mlin mlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@jdidion
Copy link
Collaborator Author

jdidion commented Dec 13, 2022

🫡

@jdidion jdidion merged commit 06bbbb8 into main Dec 13, 2022
@patmagee patmagee modified the milestones: v2.0, 1.2 Mar 10, 2023
@jdidion jdidion deleted the multiline-strings-2 branch December 8, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to the WDL language. spec change A suggested change to the WDL specification. voting active Changes for which voting is active.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants