FR: Text proto format could support multi-line strings #1297

riannucci · 2016-03-05T00:59:26Z

In a text proto file like:

something: <
  description: "this\nwould\n naturally be a\n multiline string"
  ...
>

It would be awesome if we could e.g.:

something: <
  description: `
this
would
naturally be a
 multiline string
`
  ...
>

This could be used for e.g. attaching markdown documentation to configuration entries which could then be displayed nicely. I'm not 100% on backticks (implies quoting in the parser)... Other things like a bash style <<EOM .... EOM demarcation could also work and would be easier to parse.

Thoughts?

The text was updated successfully, but these errors were encountered:

bwendling · 2016-03-22T23:49:24Z

+1 this idea (whatever syntax it takes). Having the ability to write free-flowing text in the text protobuf is great for people who want to use protobufs but still have it be human readable when you have multiline text.

riannucci · 2016-03-23T00:29:56Z

FWIW, we did actually implement this as a crappy pre-processor over text proto (using the <<EOM ... EOM syntax). The tests ("examples") are here: https://github.com/luci/luci-go/blob/master/common/proto/multiline_test.go

It also will do a python-style textwrap.dedent on it, so you can do:

something: <
  description: <<EOM
    this
    would
    naturally be a
      multiline string
  EOM
  ...
>

Which would be the equivalent of:

something: <
  description: "this\nwould\nnaturally be a\n  multiline string"
  ...
>

(note the beautiful indentation and lack of weird prefix spaces in the translated version 😉)

bwendling · 2016-03-23T00:51:00Z

Hmm...it does appear to be an option, but it's discouraged. I don't know the reasoning for it.

aselle · 2017-05-05T19:25:49Z

This would be a great feature, we're looking at needing to do large blocks of text in a proto for api documentation, and it would be great to do this! In lieu of this we'll probably make a preprocessor as @riannucci is using, but that is much less elegant. Thanks!

laike9m · 2018-03-15T06:38:31Z

+1 for this.

DonGar · 2018-03-23T00:31:34Z

+1 again.

maguro · 2018-05-03T19:20:07Z

So, what's the process to getting this implemented? Does someone simply submit a PR?

maguro · 2018-05-04T21:42:26Z

I'm going to take a stab at this. I'm going to use backticks. Not sure when, or if, I'll get this done. Feel free to ping me.

maguro · 2018-05-16T13:44:09Z

So, using backticks was a breeze to implement, but I realized it’s a huge pain for people who want to embed multi-line markdown strings. The current tokenizer only looks ahead one character; I suspect that this is why there's been no movement on this feature request. I’m extending it to look ahead three characters to accommodate triple single and double quotes.

xfxyjwf · 2018-05-16T19:26:05Z

Multi-line string value is already supported as far as I can tell, see:
https://github.com/google/protobuf/blob/ed4321d1cb33199984118d801956822842771e7e/src/google/protobuf/text_format_unittest.cc#L748

You will need to format it like this:

something: <
  description:
    "this\n"
    "would\n"
    " naturally be a\n"
    " multiline string"
  ...
>

Probably not as convenient as the proposed ones, but it's also less magical (i.e., don't magically remove indentation and line wraps here and there).

@maguro Before you put more work into it, please make sure someone on protobuf-team is supportive of this feature (+a few other team members @acozzette @liujisi ).

maguro · 2018-05-20T17:39:00Z

The unit test TEST_F(TextFormatTest, ParseConcatenatedString) ensures that multi-part strings concatenate correctly. I don't think that it has anything to do with multi-line strings.

I understand that that one can use a list of multi-part strings with each part suffixed with \n. It's not just inconvenient, it's brittle and makes things pretty illegible. Looking at the comments and emoticons posted on this issue, it seems that the wider community has a very poor opinion on that workaround as well.

My proposal is not to remove indentation, etc. Whatever is in the string is what you get; any formatting can and should be done after compilation. It also is backward compatible.

I posted my comments are several weeks ago and the ticket is several years old. How can we proceed in a more timely manner?

xfxyjwf · 2018-05-22T00:29:19Z

For changes that will affect multiple languages, there needs to be a formal proposal and it needs to be approved before proceeding. There is no fast path and the process is Google internal only. This change unfortunately falls into this category and has to be carried through by a Google engineer. It doesn't need to be a protobuf team member though. If you are a Google engineer and is motivated enough to push this feature through all the proto text parsers in Google, feel free to send us a proposal doc. When the proposal is approved, we will accept code changes (from anyone, internally or on github). For external contributors, your best bet is probably to convince someone in Google to go through the process on your behalf.

My personal take on this proposal is that it doesn't provide enough benefit to justify the effort. To me multi-line strings aren't so different from multi-part strings, and there are a lot of text proto parsers we need to update if we are to support it.

maguro · 2018-05-22T20:41:51Z

What are "text proto parsers"?

xfxyjwf · 2018-05-22T21:20:07Z

@maguro Protobuf implementations that support text format apparently have implemented such parsers. Some implementations are opensourced (as in this github repository), but many of them are not and only exist inside Google. Also protobuf implementations are not the only text proto parsers. We have tools that process/convert/format text protos. For example, if you check-in a text proto file in our internal code base, a tool will parse and format the file according to a formatting guideline, which by the way has specific rules about how to format multi-part strings.

maguro · 2018-05-22T22:18:07Z

So, these tools parse raw proto source files? If that is the case, then if a proto file did not use the new multi-line string, then they should be fine, correct?

xfxyjwf · 2018-05-22T22:38:07Z

@maguro If nobody uses the new multi-line string, they will be fine.

maguro · 2018-05-23T00:58:34Z

So, there doesn't seem to be a problem on that front then; if there is then one could add a flag to turn it off. I'll try to find a Google engineer to shepherd this on your side.

FWIW, I really don't how one would prefer

string scope = 2 [(niantic.cfg.fld) = {
      description: "To get an ID Token to authenticate a user, you are required to specify\n"
                   "the scope identifier `openid`. For example: `scope=openid`.\n"
                   "\n"
                   "Additionally, to access private user data from the Yahoo APIs, include\n"
                   "the relevant [API scope\n"
                   "identifiers](https://developer.yahoo.com/oauth2/guide/yahoo_scopes/index.html#scope-identifiers).\n"
                   "The scopes can be delimited by a\n"
                   "space or comma. In the example below, the scope identifier is specified\n"
                   "for requesting the ID Token and an Access Token that provides read\n"
                   "access to the Yahoo Mail API:\n"
                   "\n"
                   "* `scope=openid mail-r`\n"
                   "* `scope=openid,mail-r`"
  }];

over the much cleaner

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   To get an ID Token to authenticate a user, you are required to specify
                   the scope identifier `openid`. For example: `scope=openid`.

                   Additionally, to access private user data from the Yahoo APIs, include
                   the relevant [API scope
                   identifiers](https://developer.yahoo.com/oauth2/guide/yahoo_scopes/index.html#scope-identifiers).
                   The scopes can be delimited by a
                   space or comma. In the example below, the scope identifier is specified
                   for requesting the ID Token and an Access Token that provides read
                   access to the Yahoo Mail API:

                   * `scope=openid mail-r`
                   * `scope=openid,mail-r`
                   """
  }];

maguro · 2018-06-06T14:23:56Z

Good news. I got a Google manager to shepherd this. I was also able to maintain the zero-copy stream semantics. I'll try to get the patch submitted later this week, more likely the weekend.

Goodwine · 2018-08-15T18:10:03Z

@maguro

over the much cleaner [...]

From your example, what would this become?

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   foo
                   """

Would the description be: foo or \n\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\sfoo\n\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s\s

And what about this?

string scope = 2 [(niantic.cfg.fld) = {
      description: """
                   foo
      bar
                   """

maguro · 2018-08-18T11:35:48Z

It would be the latter. The compiler should not make any asssumptions about the intentions of the author.

zzhang2019 · 2019-04-03T16:39:00Z

What is the conclusion on this? Does this feature get implemented?

maguro · 2019-04-15T20:42:37Z

Yeah, let me dust this off and chat w/ our lawyers.

maguro · 2019-04-23T08:07:17Z

Looking good:

maguro · 2019-04-23T14:03:30Z

The plugin is aware of two kinds of strings, ones that are for string values and ones that are for string names, e.g. import paths:

When a string value is used somewhere a string name is required, the proper error occurs:

maguro · 2019-04-28T14:58:16Z

Ok, as you can see above, it supports both backtick and triple-double-quotes to delimit multiline strings. I'm just making that clear.

amauryfa · 2019-05-03T22:10:34Z

Coming from a Python background, I was surprised by the backtick syntax. Then I read about the JavaScript Template Strings. But wouldn't it be confusing for users, since this is not really a template?

maguro · 2019-05-11T14:12:19Z

Coming from a Python background, I was surprised by the backtick syntax. Then I read about the JavaScript Template Strings. But wouldn't it be confusing for users, since this is not really a template?

One is free to use """ if happens to be stuck in a JavaScript universe and gets confused as to when they are editing a proto file vs a JavaScript file. :)

Seriously though, I'm not married to backticks and I'm happy to remove them if that's the consensus here.

mikelnrd · 2019-05-15T10:41:30Z

Interested in this. @maguro was this agreed on internally inside of google in the end?

For others, more info here: #1297 (comment)

BSBandme · 2019-06-11T23:44:48Z

Please first sign cla for #6078 to move further.

jhump · 2019-07-24T19:33:40Z

@maguro, I'm a little confused that what you are describing, as well as the PR (#6078) is about adding a new syntax for multi-line string literals in the protobuf source syntax. However, this issue is about adding multi-line string literal support to the protobuf text format.

I guess you only looked at changing the compiler since that is the only piece that can be changed without impacting all of the languages/runtimes (which all have their own implementation of the text format)?

maguro · 2019-08-15T23:03:59Z

@maguro, I'm a little confused that what you are describing, as well as the PR (#6078) is about adding a new syntax for multi-line string literals in the protobuf source syntax. However, this issue is about adding multi-line string literal support to the protobuf text format.

I guess you only looked at changing the compiler since that is the only piece that can be changed without impacting all of the languages/runtimes (which all have their own implementation of the text format)?

Ahh, yeah. Sorry about the confusion; mea culpa. The problem I need to solve is on the compiler end.

Closius · 2022-05-18T08:16:29Z

What is the status of this issue? <<EOM ... EOM syntax is not working for me

I get an error main.proto:140:82: Error while parsing option value for "openapiv2_operation": Expected string, got: <

fowles · 2022-05-18T15:59:57Z

The status is unchanged from xfxyjwf's comment earlier.

This would need a dedicated person internal to Google to drive it through the design and review process. The benefits are unlikely to outweigh the costs for our team to directly push this.

dzou · 2022-08-10T21:07:01Z

Would be a great value add in developer experience to have this feature!

deannagarcia · 2022-09-02T18:42:20Z

As stated above, unfortunately this work will not be prioritized by the protobuf team in the near future.

protocolbuffers/protobuf#1297

xfxyjwf added the enhancement label Jul 23, 2016

rpoyner-tri mentioned this issue Jan 26, 2017

Add Python protobufs to the Bazel build RobotLocomotion/drake#4912

Merged

BSBandme assigned xfxyjwf Jun 8, 2018

xfxyjwf added c++ P3 labels Jun 8, 2018

zzhang2019 unassigned xfxyjwf Apr 3, 2019

maguro mentioned this issue Apr 23, 2019

Add support for multi-line strings protostuff/protostuff-compiler#60

Open

maguro mentioned this issue Apr 23, 2019

Add support for multiline strings ksprojects/protobuf-jetbrains-plugin#139

Closed

This was referenced Apr 28, 2019

Add support for support multiline strings #6078

Closed

Add support for multiline strings protostuff/protostuff-compiler#61

Open

Add support for multiline string values ksprojects/protobuf-jetbrains-plugin#140

Closed

BSBandme assigned acozzette Jun 11, 2019

BSBandme added the wait for user action label Jun 11, 2019

habernal mentioned this issue Aug 13, 2019

Fixing wrong line break in documentation tensorflow/tensorflow#31443

Merged

artemgavrilov mentioned this issue Sep 16, 2021

PMM-8706: Add specific error codes for incompatible service percona/pmm#777

Merged

elharo added text format and removed c++ labels Apr 7, 2022

deannagarcia closed this as completed Sep 2, 2022

tesujimath added a commit to tesujimath/beancount-parser-lima that referenced this issue Feb 29, 2024

Why are multiline strings so lame in Protobuf Text Format?

098b4c2

protocolbuffers/protobuf#1297

FR: Text proto format could support multi-line strings #1297

FR: Text proto format could support multi-line strings #1297

Comments

riannucci commented Mar 5, 2016

bwendling commented Mar 22, 2016

riannucci commented Mar 23, 2016

bwendling commented Mar 23, 2016

aselle commented May 5, 2017

laike9m commented Mar 15, 2018

DonGar commented Mar 23, 2018

maguro commented May 3, 2018

maguro commented May 4, 2018

maguro commented May 16, 2018 • edited Loading

xfxyjwf commented May 16, 2018

maguro commented May 20, 2018

xfxyjwf commented May 22, 2018

maguro commented May 22, 2018

xfxyjwf commented May 22, 2018

maguro commented May 22, 2018

xfxyjwf commented May 22, 2018

maguro commented May 23, 2018

maguro commented Jun 6, 2018

Goodwine commented Aug 15, 2018

maguro commented Aug 18, 2018

zzhang2019 commented Apr 3, 2019

maguro commented Apr 15, 2019

maguro commented Apr 23, 2019

maguro commented Apr 23, 2019

maguro commented Apr 28, 2019

amauryfa commented May 3, 2019

maguro commented May 11, 2019

mikelnrd commented May 15, 2019

BSBandme commented Jun 11, 2019

jhump commented Jul 24, 2019

maguro commented Aug 15, 2019

Closius commented May 18, 2022 • edited Loading

fowles commented May 18, 2022 • edited Loading

dzou commented Aug 10, 2022

deannagarcia commented Sep 2, 2022

maguro commented May 16, 2018 •

edited

Loading

Closius commented May 18, 2022 •

edited

Loading

fowles commented May 18, 2022 •

edited

Loading