Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Text proto format could support multi-line strings #1297

Closed
riannucci opened this issue Mar 5, 2016 · 35 comments
Closed

FR: Text proto format could support multi-line strings #1297

riannucci opened this issue Mar 5, 2016 · 35 comments

Comments

@riannucci
Copy link

In a text proto file like:

something: <
  description: "this\nwould\n naturally be a\n multiline string"
  ...
>

It would be awesome if we could e.g.:

something: <
  description: `
this
would
naturally be a
 multiline string
`
  ...
>

This could be used for e.g. attaching markdown documentation to configuration entries which could then be displayed nicely. I'm not 100% on backticks (implies quoting in the parser)... Other things like a bash style <<EOM .... EOM demarcation could also work and would be easier to parse.

Thoughts?

@bwendling
Copy link

+1 this idea (whatever syntax it takes). Having the ability to write free-flowing text in the text protobuf is great for people who want to use protobufs but still have it be human readable when you have multiline text.

@riannucci
Copy link
Author

FWIW, we did actually implement this as a crappy pre-processor over text proto (using the <<EOM ... EOM syntax). The tests ("examples") are here: https://github.com/luci/luci-go/blob/master/common/proto/multiline_test.go

It also will do a python-style textwrap.dedent on it, so you can do:

something: <
  description: <<EOM
    this
    would
    naturally be a
      multiline string
  EOM
  ...
>

Which would be the equivalent of:

something: <
  description: "this\nwould\nnaturally be a\n  multiline string"
  ...
>

(note the beautiful indentation and lack of weird prefix spaces in the translated version 😉)

@bwendling
Copy link

Hmm...it does appear to be an option, but it's discouraged. I don't know the reasoning for it.

@aselle
Copy link

aselle commented May 5, 2017

This would be a great feature, we're looking at needing to do large blocks of text in a proto for api documentation, and it would be great to do this! In lieu of this we'll probably make a preprocessor as @riannucci is using, but that is much less elegant. Thanks!

@laike9m
Copy link

laike9m commented Mar 15, 2018

+1 for this.

@DonGar
Copy link

DonGar commented Mar 23, 2018

+1 again.

@maguro
Copy link

maguro commented May 3, 2018

So, what's the process to getting this implemented? Does someone simply submit a PR?

@maguro
Copy link

maguro commented May 4, 2018

I'm going to take a stab at this. I'm going to use backticks. Not sure when, or if, I'll get this done. Feel free to ping me.

@maguro
Copy link

maguro commented May 16, 2018

So, using backticks was a breeze to implement, but I realized it’s a huge pain for people who want to embed multi-line markdown strings. The current tokenizer only looks ahead one character; I suspect that this is why there's been no movement on this feature request. I’m extending it to look ahead three characters to accommodate triple single and double quotes.

@xfxyjwf
Copy link
Contributor

xfxyjwf commented May 16, 2018

Multi-line string value is already supported as far as I can tell, see:
https://github.com/google/protobuf/blob/ed4321d1cb33199984118d801956822842771e7e/src/google/protobuf/text_format_unittest.cc#L748

You will need to format it like this:

something: <
  description:
    "this\n"
    "would\n"
    " naturally be a\n"
    " multiline string"
  ...
>

Probably not as convenient as the proposed ones, but it's also less magical (i.e., don't magically remove indentation and line wraps here and there).

@maguro Before you put more work into it, please make sure someone on protobuf-team is supportive of this feature (+a few other team members @acozzette @liujisi ).

@maguro
Copy link

maguro commented May 20, 2018

The unit test TEST_F(TextFormatTest, ParseConcatenatedString) ensures that multi-part strings concatenate correctly. I don't think that it has anything to do with multi-line strings.

I understand that that one can use a list of multi-part strings with each part suffixed with \n. It's not just inconvenient, it's brittle and makes things pretty illegible. Looking at the comments and emoticons posted on this issue, it seems that the wider community has a very poor opinion on that workaround as well.

My proposal is not to remove indentation, etc. Whatever is in the string is what you get; any formatting can and should be done after compilation. It also is backward compatible.

I posted my comments are several weeks ago and the ticket is several years old. How can we proceed in a more timely manner?

@xfxyjwf
Copy link
Contributor

xfxyjwf commented May 22, 2018

For changes that will affect multiple languages, there needs to be a formal proposal and it needs to be approved before proceeding. There is no fast path and the process is Google internal only. This change unfortunately falls into this category and has to be carried through by a Google engineer. It doesn't need to be a protobuf team member though. If you are a Google engineer and is motivated enough to push this feature through all the proto text parsers in Google, feel free to send us a proposal doc. When the proposal is approved, we will accept code changes (from anyone, internally or on github). For external contributors, your best bet is probably to convince someone in Google to go through the process on your behalf.

My personal take on this proposal is that it doesn't provide enough benefit to justify the effort. To me multi-line strings aren't so different from multi-part strings, and there are a lot of text proto parsers we need to update if we are to support it.

@maguro
Copy link

maguro commented May 22, 2018

What are "text proto parsers"?

@xfxyjwf
Copy link
Contributor

xfxyjwf commented May 22, 2018

@maguro Protobuf implementations that support text format apparently have implemented such parsers. Some implementations are opensourced (as in this github repository), but many of them are not and only exist inside Google. Also protobuf implementations are not the only text proto parsers. We have tools that process/convert/format text protos. For example, if you check-in a text proto file in our internal code base, a tool will parse and format the file according to a formatting guideline, which by the way has specific rules about how to format multi-part strings.

@maguro
Copy link

maguro commented May 22, 2018

So, these tools parse raw proto source files? If that is the case, then if a proto file did not use the new multi-line string, then they should be fine, correct?

@xfxyjwf
Copy link
Contributor

xfxyjwf commented May 22, 2018

@maguro If nobody uses the new multi-line string, they will be fine.

@maguro
Copy link

maguro commented May 23, 2018

So, there doesn't seem to be a problem on that front then; if there is then one could add a flag to turn it off. I'll try to find a Google engineer to shepherd this on your side.

FWIW, I really don't how one would prefer

string scope = 2 [(niantic.cfg.fld) = {
      description: "To get an ID Token to authenticate a user, you are required to specify\n"
                   "the scope identifier `openid`. For example: `scope=openid`.\n"
                   "\n"
                   "Additionally, to access private user data from the Yahoo APIs, include\n"
                   "the relevant [API scope\n"
                   "identifiers](https://developer.yahoo.com/oauth2/guide/yahoo_scopes/index.html#scope-identifiers).\n"
                   "The scopes can be delimited by a\n"
                   "space or comma. In the example below, the scope identifier is specified\n"
                   "for requesting the ID Token and an Access Token that provides read\n"
                   "access to the Yahoo Mail API:\n"
                   "\n"
                   "* `scope=openid mail-r`\n"
                   "* `scope=openid,mail-r`"
  }];

over the much cleaner

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   To get an ID Token to authenticate a user, you are required to specify
                   the scope identifier `openid`. For example: `scope=openid`.

                   Additionally, to access private user data from the Yahoo APIs, include
                   the relevant [API scope
                   identifiers](https://developer.yahoo.com/oauth2/guide/yahoo_scopes/index.html#scope-identifiers).
                   The scopes can be delimited by a
                   space or comma. In the example below, the scope identifier is specified
                   for requesting the ID Token and an Access Token that provides read
                   access to the Yahoo Mail API:

                   * `scope=openid mail-r`
                   * `scope=openid,mail-r`
                   """
  }];

@maguro
Copy link

maguro commented Jun 6, 2018

Good news. I got a Google manager to shepherd this. I was also able to maintain the zero-copy stream semantics. I'll try to get the patch submitted later this week, more likely the weekend.

@Goodwine
Copy link

@maguro

over the much cleaner [...]

From your example, what would this become?

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   foo
                   """

Would the description be: foo or \n\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\sfoo\n\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s

And what about this?

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   foo
      bar
                   """

@maguro
Copy link

maguro commented Aug 18, 2018

It would be the latter. The compiler should not make any asssumptions about the intentions of the author.

@zzhang2019
Copy link

What is the conclusion on this? Does this feature get implemented?

@maguro
Copy link

maguro commented Apr 15, 2019

Yeah, let me dust this off and chat w/ our lawyers.

@maguro
Copy link

maguro commented Apr 23, 2019

Looking good:

Screen Shot 2019-04-23 at 1 03 49 AM

@maguro
Copy link

maguro commented Apr 23, 2019

The plugin is aware of two kinds of strings, ones that are for string values and ones that are for string names, e.g. import paths:

Screen Shot 2019-04-23 at 6 57 47 AM

When a string value is used somewhere a string name is required, the proper error occurs:

Screen Shot 2019-04-23 at 6 58 30 AM

@maguro
Copy link

maguro commented Apr 28, 2019

Ok, as you can see above, it supports both backtick and triple-double-quotes to delimit multiline strings. I'm just making that clear.

@amauryfa
Copy link

amauryfa commented May 3, 2019

Coming from a Python background, I was surprised by the backtick syntax. Then I read about the JavaScript Template Strings. But wouldn't it be confusing for users, since this is not really a template?

@maguro
Copy link

maguro commented May 11, 2019

Coming from a Python background, I was surprised by the backtick syntax. Then I read about the JavaScript Template Strings. But wouldn't it be confusing for users, since this is not really a template?

One is free to use """ if happens to be stuck in a JavaScript universe and gets confused as to when they are editing a proto file vs a JavaScript file. :)

Seriously though, I'm not married to backticks and I'm happy to remove them if that's the consensus here.

@mikelnrd
Copy link

Interested in this. @maguro was this agreed on internally inside of google in the end?

For others, more info here: #1297 (comment)

@BSBandme
Copy link
Contributor

Please first sign cla for #6078 to move further.

@jhump
Copy link
Contributor

jhump commented Jul 24, 2019

@maguro, I'm a little confused that what you are describing, as well as the PR (#6078) is about adding a new syntax for multi-line string literals in the protobuf source syntax. However, this issue is about adding multi-line string literal support to the protobuf text format.

I guess you only looked at changing the compiler since that is the only piece that can be changed without impacting all of the languages/runtimes (which all have their own implementation of the text format)?

@maguro
Copy link

maguro commented Aug 15, 2019

@maguro, I'm a little confused that what you are describing, as well as the PR (#6078) is about adding a new syntax for multi-line string literals in the protobuf source syntax. However, this issue is about adding multi-line string literal support to the protobuf text format.

I guess you only looked at changing the compiler since that is the only piece that can be changed without impacting all of the languages/runtimes (which all have their own implementation of the text format)?

Ahh, yeah. Sorry about the confusion; mea culpa. The problem I need to solve is on the compiler end.

@Closius
Copy link

Closius commented May 18, 2022

What is the status of this issue? <<EOM ... EOM syntax is not working for me

image

I get an error main.proto:140:82: Error while parsing option value for "openapiv2_operation": Expected string, got: <

@fowles
Copy link
Contributor

fowles commented May 18, 2022

The status is unchanged from xfxyjwf's comment earlier.

This would need a dedicated person internal to Google to drive it through the design and review process. The benefits are unlikely to outweigh the costs for our team to directly push this.

@dzou
Copy link

dzou commented Aug 10, 2022

Would be a great value add in developer experience to have this feature!

@deannagarcia
Copy link
Member

As stated above, unfortunately this work will not be prioritized by the protobuf team in the near future.

tesujimath added a commit to tesujimath/beancount-parser-lima that referenced this issue Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests