Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding: provide canonical output format #1121

Open
bufdev opened this issue May 10, 2020 · 47 comments
Open

encoding: provide canonical output format #1121

bufdev opened this issue May 10, 2020 · 47 comments

Comments

@bufdev
Copy link

bufdev commented May 10, 2020

https://go-review.googlesource.com/c/protobuf/+/151340 added internal/detrand to denote to users that they should not rely on the stability of output. This is used for example in encoding/protojson to add a space between json key/value pairs depending on hash of the binary.

Users often want to rely on some outputs having the same format, for example:

  • Some users store JSON versions of Protobuf messages on disk. Adding a space to random serializations will result in the resulting files being different, causing differences on commits for example.
  • Testing on output becomes impossible. Although this is an explicit goal in the release notes, this is recognized in protobuf-go directly as internally, detrand.Disable is frequently called in testing:
internal/filedesc/desc_test.go:28:	detrand.Disable()
internal/encoding/json/encode_test.go:20:func init() { detrand.Disable() }
internal/encoding/text/encode_test.go:20:func init() { detrand.Disable() }
internal/msgfmt/format_test.go:30:	detrand.Disable()
encoding/prototext/encode_test.go:27:	detrand.Disable()
encoding/protojson/encode_test.go:33:func init() { detrand.Disable() }
testing/protocmp/xform_test.go:21:	detrand.Disable()

In our own tests, we had output switch between:

{"path":"a.proto","start_line":11,"start_column":10,"end_line":11,"end_column":13,"type":"FIELD_LOWER_SNAKE_CASE","message":"Field name \"Baz\" should be lower_snake_case, such as \"baz\"."}

And:

{"path":"a.proto", "start_line":11, "start_column":10, "end_line":11, "end_column":13, "type":"FIELD_LOWER_SNAKE_CASE", "message":"Field name \"Baz\" should be lower_snake_case, such as \"baz\"."}

This took us about 45 minutes to figure out, we thought our code was broken, and we found this package eventually.

I am not aware of other JSON packages (stdlib, otherwise) that do this in general. It's reasonable to allow users to disable this if they want - can detrand be made into a public package, or at least detrand.Disable be exposed in a public package?

@dsnet
Copy link
Member

dsnet commented May 12, 2020

My analysis of thousands upon thousands of usages of text and JSON at Google indicates that approximately 90% of usages do not care about stability at all, 9.9% are golden tests that should be written in a different way (more below), and only 0.1% are cases that truly require some degree of stability because they have some system that checks in serialized messages into source control (for non-testing purposes).

Golden tests are undoubtedly convenient to write, but they impose a cost on the overall ecosystem. The reason why the old text and JSON implementations could not wrap the new text and JSON implementations was because the slightly different output broke thousands of golden tests and it was not worth the effort to clean all of those up. Even worse, during the lifetime of the old implementation, because of brittle tests, we were unable to improve the output (or even fix bugs) for the benefit of the majority. We are not going let that situation arise again.

The deliberate instability is intended to prevent the creation of brittle tests (or even worse, production code that uses a serialized message to produce what they thought was a stable fingerprint). For the 9.9% cases that are golden tests, it forces those tests to semantically compare the structured messages, rather syntactically compare the serialized messages. Semantic comparisons are more maintainable than syntactic comparisons.

This took us about 45 minutes to figure out, we thought our code was broken, and we found this package eventually.

I'm sorry to hear that. After investigating the usages at Google, I am convinced more than ever that some form of deliberate instability is the right direction. The way that instability is manifest could be improved (i.e., either more often so it manifests earlier and more easily, or less often such as for every point release of the module), but it's here to stay. Any ability to disable it defeats the point of even having it.

Output Stability of Go Protocol Buffer Serialization

The Go implementation of protocol buffers provides functionality to serialize as several different formats (e.g., binary wire format, JSON format, and text format).

Each format only specifies a grammar for what is valid, but none of the formats specify a "canonical" representation for how a marshaler ought to exactly format a given message. The lack of a canonical format means that it is generally inappropriate to compare the serialized bytes of a message to determine equality or to rely on the serialized bytes to produce a stable hash of the message. Functionally, protobuf serialization is unstable (a limitation that is not specific to Go; e.g., the C++ maintainers repeatedly warn users that the output of their implementation is not stable).

To avoid giving the illusion that the output is stable, the Go implementations of JSON and text marshaling deliberately introduce minor differences so that byte-for-byte comparisons are likely to fail. While the implementation of wire marshaling does not currently have any deliberate instability, the output is still not guaranteed to be stable. Instability is added to prevent the addition of new code that relies on a property of stability that we do not guarantee. This protects users from fatal bugs that may manifest in the future and provides the Go protobuf authors the freedom to make future changes to the output as necessary for the benefit of most users.

I am not aware of other JSON packages (stdlib, otherwise) that do this in general.

The reality is that most packages do not guarantee any form of stability (with the exception of fmt and strconv, which are the backbones to implement nearly every other format). Most packages in the standard library (including encoding/json) are not stable and have changed their output throughout their lifetimes. Every time it does so, there's always users who complain that the changed output broke their use case, but the problem is that the user code was relying on a property that the libraries never promised.

In fact, you're actually well aware of part of Go with deliberate instability: maps. The Go specification has always left the exact order of map iteration to be unspecified. Unfortunately, users incorrectly assume that the output is stable and write code that is built upon that wrong assumption. Every time the implementation of maps was changed, it broke all these tests, making improvements difficult. Thus, in Go 1.3 the iteration order of maps was pseudo-randomized to prevent the creation of brittle tests. Many users complained at first, but eventually users learned to write their tests in less brittle ways, and the whole Go ecosystem is way better off for it today.

As the maintainer for a number of standard packages (e.g., archive/tar, encoding/json, compress/flate, etc), I would love to have introduced some degree of deliberate instability like we did with maps. However, it is very hard to introduce deliberate instability after years of presenting the illusion that the output is stable (even if it is not). As a maintainer of these packages, there are times where we would like to make a beneficial change to the output, but choose not to because it breaks users (even if the compatibility agreement technically gives us the right to).

As for protojson and prototext, we had the opportunity to make this decision from the beggining. The decision is guided by years of experience in seeing what a heavy cost brittle tests can impose on authors of libraries and their freedom to make changes that benefit the majority.

Golden tests

The most common cause of brittle targets are golden tests, where the serialized form of a protobuf message is compared byte-for-byte against a snapshot of a serialized message produced at some point in history. While golden tests are convenient to write, they age poorly and become maintenance burdens in the long run. We recommended against them if any part of the golden output is produced by a dependency not authored by the owner, which is the case for output containing serialized messages.

There are many testing patterns that can be classified as a golden test. The following is one such example where the text serialized form of a message is compared byte-for-byte against a text serialized message stored in a file. It incorrectly assumes that prototext.Marshal will output the exact sequence of bytes for all time for the same message.

func TestGolden(t *testing.T) {
    gotMsg := LoadConfig(...)
    gotBuf, err := prototext.Marshal(gotMsg)
    if err != nil {
        ...
    }
    wantBuf, err := ioutil.ReadFile(...)
    if err != nil {
        ...
    }
    // BAD: Syntactic comparison performed on the serialized message.
    if diff := cmp.Diff(wantBuf, gotBuf); diff != "" {
        t.Errorf("LoadConfig() mismatch (-want +got):\n%s", diff)
    }
}

Instead of syntactically comparing the serialized form of a message, the test should semantically compare the structured form of a message. This is done by parsing the golden test data as a structured message and performing a semantic comparison on that data using functionality that knows how to properly compare protobuf messages.

func TestGolden(t *testing.T) {
    gotMsg := LoadConfig(...)
    wantBuf, err := ioutil.ReadFile(...)
    if err != nil {
        ...
    }
    wantMsg := new(configpb.Config)
    if err := prototext.Unmarshal(wantBuf, wantMsg); err != nil {
        ...
    }
    // GOOD: Semantic comparison performed on the structured message.
    if diff := cmp.Diff(wantMsg, gotMsg, protocmp.Transform()); diff != "" {
        t.Errorf("LoadConfig() mismatch (-want +got):\n%s", diff)
    }
}

If the serialized output is part of a larger blob, it can be difficult to isolate and parse out the serialized output for a semantic comparison.

func TestGolden(t *testing.T) {
    // MyFormat is the function under test, which produces a custom format.
    // Part of the output depends on prototext.Format, which is unstable.
    // As a result, this is not naively suitable for a golden test.
    got := MyFormat(...)
    want := `[Name="Bob", Proto={foo:5 bar:"hello"}, Date=2009-11-10]`
    // BAD: Syntactic comparison performed on a larger string containing
    // a serialized message (i.e., the `foo:5 bar:"hello"` portion).
    if diff := cmp.Diff(want, got); diff != "" {
        t.Errorf("MyFormat() mismatch (-want +got):\n%s", diff)
    }
}

An alternative is to reformat the serialized message at runtime to ensure that it is always serialized with the current implementation.

// mustReformatText reformats the serialized text output of a message so that
// it is agnostic towards the exact implementation today and into the future.
func mustReformatText(m proto.Message, s string) string {
    if err := prototext.Unmarshal([]byte(s), m); err != nil {
        panic(err)
    }
    return prototext.Format(m)
}

func TestGolden(t *testing.T) {
    got := MyFormat(...)
    // GOOD: The unstable portion is reformatted so that it uses the exact same
    // implementation of prototext.Format as what MyFormat is using.
    want := `[Name="Bob", Proto={` + mustReformatText(new(foopb.MyMessage), `foo:5 bar:"hello"`) + `}, Date=2009-11-10]`
    if diff := cmp.Diff(want, got); diff != "" {
        t.Errorf("MyFormat() mismatch (-want +got):\n%s", diff)
    }
}

This technique relies on serialization being deterministic even if does not guarantee stability. The Go implementations of text and JSON are guaranteed to provide deterministic output (i.e., within the same binary, the exact same message will be serialized in the exact same way).

Stored output (non-testing)

A rare, but valid need for output stability is generated output (e.g., code or configurations) where instability causes an unacceptable number of frequent differences (perhaps because these generated outputs are stored into source control or elsewhere). For this use case that requires output stability, the output of serialization should be passed through a formatter to normalize the data to a degree:

WARNING: The use of a formatter only provides a degree of stability. It does not (and fundamentally can not) guarantee perfect stability due to the lack of a canonical format specification. Formatters themselves also do not necessarily guarantee that their output will be stable for all time (e.g., gofmt, the popular tool used to format Go source code is not stable). However, the maintainers of these tools know that frequent changes induce potentially large amounts of churn and keep the number of changes to a minimum.

@dsnet
Copy link
Member

dsnet commented May 12, 2020

internally, detrand.Disable is frequently called in testing

It is appropriate for the module itself to disable instability because this module fully controls every aspect of how the output is produced. It is not appropriate for users of the module to do so precisely because they do not control the implementation of the module.

@neild
Copy link
Contributor

neild commented May 12, 2020

It is appropriate for the module itself to disable instability because this module fully controls every aspect of how the output is produced. It is not appropriate for users of the module to do so precisely because they do not control the implementation of the module.

In addition, the places where instability is disabled are mostly (hopefully all) tests of the encoder. This is a fundamentally different case than a test which uses the encoder. If we change the encoder in a way that breaks the existing tests, we will change the tests in the same commit. This form of tight coupling between the code under test and the test itself is expected and normal.

@kentonv
Copy link

kentonv commented May 13, 2020

The reason why the old text and JSON implementations could not wrap the new text and JSON implementations was because the slightly different output broke thousands of golden tests and it was not worth the effort to clean all of those up.

Interesting. I'm really curious: what were the subtle differences between the implementations? It kind of seems to me that there's only one obvious way to write the JSON (order fields by number, serialize values using as few characters as possible without losing precision, omit all unnecessary whitespace).

@dsnet
Copy link
Member

dsnet commented May 13, 2020

I'm really curious: what were the subtle differences between the implementations?

Between the old and new text implementation for Go the largest difference was the switch from "<>" delimiters to "{}", and the removal of trailing whitespace characters. There are other changes that individually add up to many noticeable changes.

It kind of seems to me that there's only one obvious way to write the JSON (order fields by number,

I'm in support for a well-defined canonical format and have argued for one on various occasions, but that's not my decision to make. Until one is specified, we're not going to pretend like our implementation is canonical in any way.

serialize values using as few characters as possible without losing precision

Floats are the most challenging, and it's more complicated than that. For example, "1e2" and "100" represent the same number, but both occupy the same number of characters. It's certainly possible to define a canonical formatting of floats, but someone needs to do it.

@kentonv
Copy link

kentonv commented May 13, 2020

Between the old and new text implementation for Go the largest difference was the switch from "<>" delimiters to "{}", and the removal of trailing whitespace characters.

Oh man, I'd totally forgotten that <> were allowed as block delimiters by protobuf text format. That dubious "feature" goes way back to proto1 IIRC, though by the time of proto2, {} was well-established as being the preferred block delimiter...

But, you specifically mentioned JSON format, which has far fewer of these options. I was curious about the differences there.

Floats are the most challenging

Fair point. Actually, it occurred to me that that is probably the one case that gets tricky.

When I implemented proto2 text format the first time, a major difference from proto1 was that it actually represented floats precisely with as few digits as possible, whereas proto1 used a printf() formatter that either used too many or too few digits (I forget which). Some people did grumble about that but generally they accepted that the change was an improvement.

I think that's the only time I remember making an actual change to text format output in C++.

I too used to harangue people for comparing against golden output in unit tests, insisting they should use a diff library instead -- I'm probably in part responsible for creating this bit of protobuf culture. In retrospect I'm not sure if it was the right call. Diff libraries are cool but forcing people to use them is forcing people to spend time learning a thing that usually won't benefit them enough to repay that cognitive investment. And, these heavy-handed tactics probably lead to some resentment of the Protobuf platform.

I'm in support for a well-defined canonical format and have argued for one on various occasions, but that's not my decision to make.

Ah. So, clearly users want this, and it doesn't seem technically difficult to commit to a stable output, and you like the idea, but it's a matter of getting agreement across protobuf implementations on the One True Format. And you're afraid to settle on something for Go specifically because it might turn out not to match the universal format later?

That makes sense, but it's kind of sad to have value to users blocked by inability to get consensus.

This might be a situation where you can create de facto consensus without actually formally getting agreement, though. If the protobuf ecosystem were ever to come to agreement on this, it seems very likely that they'd coalesce on matching the output of the C++ implementation, don't you think? If you made the Go implementation exactly match the C++ implementation, then that would only make the argument stronger that that should then be enshrined as the canonical format.

Worst case, the ecosystem decides on something else, and you need to provide a flag that makes your implementation output the right format. (But the C++ implementation will almost certainly have to provide a flag as well, so you're in good company.)

It may be worth taking the risk, to make life better for users. Just my opinion, though.

@puellanivis
Copy link
Collaborator

Eh… I’ve seen way too many tests that are really poorly written, either brittle, or unbreakable. I don’t think cutting off golden tests is a bad idea, even if some people will be frustrated at the cognitive load necessary to learn them. Promoting people towards thinking about comparing the semantics of their data rather than comparing some syntax of their data is going to be a benefit, even if some people might be resistant or even hostile to the change.

And say we go with your proposed canonical JSON encoding, assuming that some solution for floats exist. What does your golden test produce? (I’ve limited this to only 10 elements, but I’ve had to deal with much more in some production scenarios):

FAIL test_against_golden: got {"0":"A","1":"B","2":"C","3":"D","4":"E","5":"F","6":"G","7":"H","8":"I","9":"J"}, wanted {"0":"A","1":"B","2":"C","3":"D","4":"E","5":"F","6":"G","7":"H","8":"l","9":"J"}

Where’s the error? What is wrong? Nothing is immediately clear from the output available from the test against the golden standard except that it’s wrong.

Although, it occurs to me that I am probably preaching to the choir here. We all probably agree that golden tests are a bad idea (outside of testing the encoder itself), we just disagree about if we should force people out of them. I’m inclined towards a “yes”, because golden tests are just themselves such a bad idea (outside of testing the encoder itself).

@kentonv
Copy link

kentonv commented May 14, 2020

@puellanivis That was my argument 10 years ago. I now think I was arrogant -- I was trying to force them to do things my way "for their own good", rather than letting them decide for themselves. Better to make sure users are aware of their options, and then let them decide.

Really the best thing I remember happening related to this is when someone finally went and wrote a gtest add-on library for comparing protobufs, and it was actually easier to use the macros than it was to compare bytes. That was unambiguously great for users.

Anyway, there's a lot of reasons other than golden tests that stable output is valuable. Caching, signatures, ...

@neild
Copy link
Contributor

neild commented May 14, 2020

There is unquestionably value in stable output. However, the value is only there if the output truly is stable. Incidental stability, where output appears stable but can change unexpectedly in the future, is hazardous.

For example, a cache depending on a stable key can easily cause a service outage if that key changes unexpectedly. Restart all tasks with the new key, and the cache hit rate drops to zero. Even a safer rolling restart will increase contention and drop the hit rate. In a service provisioned with the expectation of some amount of caching, this can be fatal. (Not theoretical: I've worked on such a service, and cache key changes needed to be handled with great care.)

Far worse, a change in a signature used as a database key can lead to an irrevocably corrupt database.

We're happy to support a stable serialization form, but it needs to be stable--guaranteed unchanging. It also needs to be portable--supported by other major protobuf implementations. A fundamental goal of protobufs is to be a language-neutral, platform-neutral interchange format. The place for innovation in serialization forms is not in a single implementation.

@bufdev
Copy link
Author

bufdev commented May 14, 2020

This sounds like something that would be great in documentation, and probably should be added there. Then, as users will be informed of the choice, let users decide what they want to do.

There's a big difference between a library such as encoding/json having unstable output between Golang versions, and a library creating unstable output depending on the binary hash for the same pinned version. In one case, given a pinned Golang version and pinned libraries, output is stable, and if users are writing golden tests, fixing them for a given pin is a one-time operation.

I think that while the intent here is good and obvious, the result will be quite different. Disabling users of this widely-used library to make an informed decision will not lead to users refactoring their golden tests. Instead, users will just work around it using json.Compact, which is not optimal, and it will lead to frustration from the Protobuf community.

Also, as @kentonv points out:

This might be a situation where you can create de facto consensus without actually formally getting agreement, though. If the protobuf ecosystem were ever to come to agreement on this, it seems very likely that they'd coalesce on matching the output of the C++ implementation, don't you think? If you made the Go implementation exactly match the C++ implementation, then that would only make the argument stronger that that should then be enshrined as the canonical format.

Worst case, the ecosystem decides on something else, and you need to provide a flag that makes your implementation output the right format. (But the C++ implementation will almost certainly have to provide a flag as well, so you're in good company.)

@kentonv
Copy link

kentonv commented May 14, 2020

For example, a cache depending on a stable key can easily cause a service outage if that key changes unexpectedly. Restart all tasks with the new key, and the cache hit rate drops to zero.

That's a great example. But isn't the use of detrand in fact extremely likely to cause such an outage, since the randomization only changes when the binary changes? It seems like it would be extremely hard to catch such a problem in testing.

@dsnet
Copy link
Member

dsnet commented May 14, 2020

That was my argument 10 years ago. I now think I was arrogant -- I was trying to force them to do things my way "for their own good", rather than letting them decide for themselves. Better to make sure users are aware of their options, and then let them decide.

It's not so much my way vs your way. It's a question of what's best for the collective good, which is no longer directly about you or me, but everyone.

In an earlier comment, I alluded to the tragedy of the commons:

The tragedy of the commons is a situation in a shared-resource system where individual users, acting independently according to their own self-interest, behave contrary to the common good of all users by depleting or spoiling the shared resource through their collective action.

Applied to the dynamics between library users and authors:

  • the "shared-resource" is a given software library
  • the "individual users" are the users of that library
  • the "common good" are the benefits that a library (and their authors) can provide to all users
  • "spoiling the shared resource" are the unintentional effects that users can have on preventing the library (and their authors) from making changes. When there are a sufficiently number of improper usages of a library, it binds the hands of authors from making beneficial changes (lest they break too many users). This is a tragedy for the commons.

The interaction between authors and users of a library is through an interface, which is a set of behavior that the library promises to provide users. In general, there are 3 categories:

  • Specified behavior. This is what the library intends to provide. The "backward compatibility" principle means that a library cannot modify, change, or remove anything in this category without being a major version change (which is functionally equivalent to providing entirely new library).
  • Unspecified behavior. This is behavior that the library happens to have today, but the library author is technically under no obligation to uphold this behavior. Thus, there is no guarantee that it remains the same tomorrow or in the future.
  • Impossible behavior. This is what's impossible to do with the library, whether by simply not having a feature (in which case it can't be held wrong), or by relying on the language's type system to formally prevent a certain pattern (e.g., using io.Reader makes it impossible for the user to accidentally pass an io.Writer to flate.NewReader).

Too often, people define "software correctness" as simply being software that happens to work today (e.g., passes my tests and provides the desired functionality). However, that definition is problematic because it fails to take into consideration the passage of time on software maintainance. It permits the use of both specified and unspecified behavior, which leads to problems when unspecified behavior changes.

Titus Winters defines "software engineering" as "programming integrated over time". The key distinction between “programming” and “engineering” is the ability to think about the future. Thus, a better definition of "software correctness" is not merely whether code works today, rather whether it continues to work tomorrow, next year, and or even the next decade. This definition permits only the use of specified behavior and forbids the use of unspecified behavior (which can change).

The challenge with unspecified behavior is that it's easy to accidentally rely on it without realizing it. Stable output is a property that most libraries do not provide, but it's easy to accidentally rely on it as stable if most releases of that library do not change the output where it presents an illusion of stability (that it does not provide). In the best case, a user only relied on stability for testing purposes, so a changed output broke their tests. In the worst case, a user relied on stability for production purposes (e.g., hashing it as a key for a database), so a changed output is now fatal to their production service.

Hyrum's law states:

With a sufficient number of users of an API, it does not matter what you promised in the contract, all observable behaviors of your interface will be dependend upon by somebody.

Per Hyrum's law, it means that at scale, there will eventually be many users who rely on unspecified behavior. If there is a sufficient number of users exercising an unspecified property of the library (e.g., the illusion of stability), then the library author's hand is functionally forced to include that behavior as "specified" even if they don't want to admit it. This is what happened with the old text and JSON implementations. It's also what's functionally happened to various degrees for several standard library packages like encoding/json, encoding/csv, archive/tar, etc.

The purpose of deliberate instability is to move the illusion of stability out of the "unspecified" category and closer to the "impossible" category. It pushes users of it in tests to perform semantic comparisons instead of syntactic comparisons. It pushes users of it in production to seek other solutions that are more maintainable and protect them from potentially fatal bugs in production that cannot be easily fixed (short of resilvering all of their database data or forking their dependencies and freezing it). It provides library authors the freedom to make changes to the output (for the common good) since there should be far less (or even no) users that could be relying on the output as stable.

The issue of brittle tests is not a problem specific to protobufs. I maintain several standard library packages, and was (for a while) responsible for rolling out each Go toolchain release for Google. I had the opportunity (or the curse) of observing first-hand the effects of Hyrum's law applied at scale. Golden tests that make incorrect assumptions that their dependencies are stable are the number 1 reason (by far) why breakages occur when trying to switch over to the new Go toolchain. This is why it's the first topic I address in my talk about "Forward Compatible Go Code".

Three notable examples (among many):

  1. A team assumed that the output of compress/gzip was stable and bit-for-bit identical for all time. They computed a hash of the GZIP'd file and used that as the key into a database. When compress/gzip was optimized to be both faster and provide better compressed output, it broke that team's production logic. In the end, the team needed to fork the compression library since they needed a property of stability that compress/gzip wasn't going to provide since the benefit of faster compression and better compressed output outweighs the benefit of this team's use-case.

  2. Another team assumed that the output of archive/tar was stable for similar reasons. In the end, the upstream changes to the tar package were not merged (and never merged IIRC) to avoid breaking this team's production logic. Here's an example of the author's hands being bound when it shouldn't have been.

  3. The sort package is known to be sub-optimal. There are other sorting implementations that are faster. Despite the presence of a sort.Stable, many users have come to assume that sort.Sort is stable when it isn't. As a result, faster sorting implementations cannot be used since it breaks too many users who have come to depend on what the current implementation happens to do. There was a proposal to add deliberate instability to sort, but it was deemed too costly to do after the fact. Upon rejecting his own proposal, Russ Cox comments, "I wish we'd [added deliberate non-determinism] when we introduced sort.Sort".

@dsnet
Copy link
Member

dsnet commented May 14, 2020

Anyway, there's a lot of reasons other than golden tests that stable output is valuable. Caching, signatures, ...

Building on what Damien said. This is not a hypothetical, but a real problem. It's uncommon, but there was a minute number of users who depended on the old Go text implementation being stable for all time. In fact, they were using it as a key into a database. Changing the output doesn't just break their tests, it breaks their production service.

it's kind of sad to have value to users blocked by inability to get consensus.

Golden tests, hashing, and signatures are reasonable desires. However, they're not reasonable while using a library that does not guarantee stable output. So, I'm firmly on the side of blocking this until there is a proper specification. It's a somewhat low priority task for me, but I do plan on proposing a formal canonical text/JSON specification to the protobuf team. A canonical format is something I'd like to provide our users since it allows them to write golden tests in a safe and correct way.

If people really want a frozen output in Go, they're welcome to use the text and JSON implementation from the old module. They won't change anymore. It seems insensible to have the new implementations provide an illusion of stability, when the old implementations provide "true" stability (albeit begrudgingly because our hands are tied).

That's a great example. But isn't the use of detrand in fact extremely likely to cause such an outage, since the randomization only changes when the binary changes? It seems like it would be extremely hard to catch such a problem in testing.

At Google, the testing infrastructure conveniently provides an environment variable to use as a seed for exactly the purpose of introducing deliberate instability. We use that internally and it manifests the problem quickly in testing.

Outside Google, the Go toolchain unfortunately does not provide a way to tell if the binary is a test or not, nor does the "go test" framework provide any form of instability seed (to my knowledge).

Even if the problem slips past the deliberate instability during testing, it's far better to catch the fact that your production service relied on invalid stability properties early on when you have a small number of users, rather than later one when you many many users and the effects of an outage are widely seen and far harder to fix.

@dsnet
Copy link
Member

dsnet commented May 14, 2020

if users are writing golden tests, fixing them for a given pin is a one-time operation.

@bufdev, in CL/223278, I proposed changing the deterministic randomness to be seeded by the patch version. It has the disadvantage of occurring less often, but has the advantage of occurring only at the moment the user upgrades the google.golang.org/protobuf module. Would that be a better outcome (for you)?

Instead, users will just work around it using json.Compact, which is not optimal

Sub-optimal, but still better than the status quo.

This sounds like something that would be great in documentation

Agreed. Most of what I posted in #1121 (comment) was copied from a document I was working on. There's a lot to do with regard to documentation (e.g., #1075).

@bufdev
Copy link
Author

bufdev commented May 14, 2020

I empathize with what you're trying to do here, and I know it has sound basis and great intentions - users should not be relying on the stability of JSON serialization. However, they do and will, and any solution that results in whitespace being purposefully randomly added to the JSON serialization, and without users being able to make an informed decision to disable such functionality, will result in:

  • Frustration from the Golang Protobuf community.
  • Large-scale mitigation - everyone will just use json.Compact.
  • Further fragmentation of Golang Protobuf users.

I think the last point is really key - speaking personally, one of my hopes for v2 is that it reduces the fragmentation we've had for the last half decade between the gogo/protobuf and golang/protobuf world. Not allowing users to disable this (and other items such as #992) will not result in better tests, or better golang/protobuf usage, instead it will result in frustration and large-scale mitigation.

At Google, the testing infrastructure conveniently provides an environment variable to use as a seed for exactly the purpose of introducing deliberate instability. We use that internally and it manifests the problem quickly in testing.

This is another key point though - (I believe, please correct if I'm wrong) this library is intended to be used by everyone, including outside of google3, and there's a lot of varying usage out there. Even speaking for myself - I know there's better ways to do certain things, and I understand your golden test argument, but even as an experienced Protobuf user, I don't have time to go update my tests and convert them to the cmp package - I have other things to do. I know there are many out there that rely on golden tests much more heavily than we do, and they won't update either (not just me musing either - I've personally already heard about this issue).

Outside Google, the Go toolchain unfortunately does not provide a way to tell if the binary is a test or not, nor does the "go test" framework provide any form of instability seed (to my knowledge).

If it did, however, it would likely be opt-in. Here, we're just asking for opt-out.

Again, I really do empathize with the intent here, however I think the result is not going to be as intended. If the goal is to have the largest tent for golang/protobuf, enable users worldwide to effectively use this, and reduce fragmentation in the ecosystem, adding the informed ability to call detrand.Disable is what is needed in this case for the larger Protobuf community.

@dsnet
Copy link
Member

dsnet commented May 14, 2020

on golden tests much more heavily than we do, and they won't update either ... I don't have time to go update my tests and convert them to the cmp package - I have other things to do.

And that's fine. We're not planning on updating the golden tests inside Google that already depend on text stability. At Google, that means 90% of usages will migrate without a problem, 10% will stay on the old text and JSON implementations forever, which is fine. We're never removing the package.

It's about what you do with new tests when you're writing it for the first time. In my comment earlier, I showed a test using a syntactic comparison and a test using a semantic comparison. Here's a diff between the two:

 func TestGolden(t *testing.T) {
     gotMsg := LoadConfig(...)
-    gotBuf, err := prototext.Marshal(gotMsg)
+    wantBuf, err := ioutil.ReadFile(...)
     if err != nil {
         ...
     }
-    wantBuf, err := ioutil.ReadFile(...)
-    if err != nil {
+    wantMsg := new(configpb.Config)
+    if err := prototext.Unmarshal(wantBuf, wantMsg); err != nil {
         ...
     }
-    if diff := cmp.Diff(wantBuf, gotBuf); diff != "" {
+    if diff := cmp.Diff(wantMsg, gotMsg, protocmp.Transform()); diff != "" {
         t.Errorf("LoadConfig() mismatch (-want +got):\n%s", diff)
     }
 }

They're the same number of lines. It's only marginally more complex to write the more sustainable comparison compared to the brittle one.

I really do empathize with the intent here, however I think the result is not going to be as intended.

I also empathize with the people who want stability and I'd like to give it to users. Honestly, the best course of action here is to pursue a canonical text/JSON format, rather than debate what the behavior of a fundamentally unstable output should be, because all the answers are bad.

If the goal is to have the largest tent for golang/protobuf, enable users worldwide to effectively use this, and reduce fragmentation in the ecosystem, adding the informed ability to call detrand.Disable is what is needed in this case for the larger Protobuf community.

Between having the ability to opt-out of detrand, and not having it at all, I'd be better to not have it at all. An opt-out is not on table as an option.

@bufdev
Copy link
Author

bufdev commented May 14, 2020

We're never removing the package.

Yes, but users are being asked to upgrade to the new v2 implicitly regardless, as of 1.4.0 - you could use the old jsonpb.Marshal for sure, but saying this is the only option (outside of working around this with json.Compact) doesn't seem right.

Here's a diff between the two:

This is a test already using cmp - this is not the concern. The concern, for example, is people who are writing integration tests and comparing output, for example, for CLIs.

Users should be able to make informed decisions for themselves in this case.

@dsnet
Copy link
Member

dsnet commented May 14, 2020

you could use the old jsonpb.Marshal for sure, but saying this is the only option (outside of working around this with json.Compact) doesn't seem right.

Why? Users want a feature of stability that the protobuf ecosystem does not define. Using the old implementations seems fine to get at that property (even if it was never our intention). I believe we're debating about something where all the options have detriments. The right course of action is to focus energy on getting a canonical format defined and implemented.

The concern, for example, is people who are writing integration tests and comparing output, for example, for CLIs.

I don't see how that changes anything. You have to "compare output" sooner or later. So, instead of the integration test checking the contents of two files byte-for-byte syntactically, it parses them and compares them semantically.

@bufdev
Copy link
Author

bufdev commented May 14, 2020

The right course of action is to focus energy on getting a canonical format defined and implemented.

This would be a fantastic outcome - is there anything we can do to help get this started? Let me know.

@dsnet
Copy link
Member

dsnet commented May 14, 2020

This would be a fantastic outcome - is there anything we can do to help get this started? Let me know.

A canonical formatting for floats is probably the largest challenge. I've done some digging for prior art:

  • OLPC's canonical JSON: they just forbid floats, which is unsatisfactory.
  • www-xml-schema has some discussion about it, but I don't know if there was a concrete specification.
  • I should note that simply declaring C++'s printf for floats as the canonical format is not suitable since it is not specified what the output is.

If you can find a well-specified way to format floats, that'd be a great help.

@bufdev
Copy link
Author

bufdev commented May 14, 2020

Going back many years in my CS life, heh.

Some potential leads:

@bufdev
Copy link
Author

bufdev commented May 14, 2020

More up to date link for the first that standardizes on the second: https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-17#section-3.2.2.3 (Edit: they both actually do, this hasn't changed, this is just the latest version of this spec I can find)

@kentonv
Copy link

kentonv commented May 14, 2020

I should note that simply declaring C++'s printf for floats as the canonical format is not suitable since it is not specified what the output is.

If it were me, I'd specifically choose glibc as the golden implementation. I realize that may sound absurd, but in reality most successful standards start from copying a successful implementation.

(Fun fact: Cap'n Proto's stringification of floats goes out of its way to make other platforms match glibc. Amusingly, that code is based on code I originally wrote for protobuf, but I guess I made the glibc-conformance changes after copying it to Cap'n Proto.)

Another crazy idea: If floats are the only part that's hard to make stable, why not use detrand specifically to randomize floats (append zeros after the decimal). Most people don't have floats in their data, and they can then get stability. Meanwhile, it's already well-understood that one shouldn't expect floating-point values to be exact.

@bufdev
Copy link
Author

bufdev commented May 14, 2020

Another crazy idea: If floats are the only part that's hard to make stable, why not use detrand specifically to randomize floats (append zeros after the decimal). Most people don't have floats in their data, and they can then get stability. Meanwhile, it's already well-understood that one shouldn't expect floating-point values to be exact.

Side note: I've tried to get people to stop using floats/doubles in Protobuf for years (although I"m betting we all have)

@bufdev
Copy link
Author

bufdev commented May 14, 2020

* https://www.ecma-international.org/ecma-262/6.0/index.html#sec-tostring-applied-to-the-number-type (linked from 3.2.2.3 above)

This also looks like a college homework assignment from a theory class. Note that the IETF link alludes to this being implemented in V8, so maybe there's prior art? Also it links to https://github.com/ulfjack/ryu

@bufdev
Copy link
Author

bufdev commented May 14, 2020

Even better: https://github.com/cespare/ryu

@kentonv
Copy link

kentonv commented May 14, 2020

Ohhhhh, yeah, when talking about JSON, "do what V8 does" makes way more sense than "do what protobuf-C++ does".

@bufdev
Copy link
Author

bufdev commented May 14, 2020

golang/go#15672 has a discussion on adding this to strconv

Tldr this might be the way to go, just to agree on this algorithm Protobuf-wide - seems like it's a coming standard anyways :-)

@dsnet
Copy link
Member

dsnet commented May 14, 2020

Thanks @bufdev! These look like great leads. There does seem to be a lot of promise in using ECMA 7.1.12.1's definition (which happens to be implemented by V8). I'll take a look at the implementation some other day.

@dsnet
Copy link
Member

dsnet commented May 15, 2020

For prior of ES6 float formatting in Go (which is the same as ECMA-262, which is the same as V8), see golang/go@92b3e36 and golang/go#14135.

radhus added a commit to einride/grpc-service-config-go that referenced this issue Aug 1, 2022
protojson doesn't guarantee stable JSON output, and deliberately
introduce minor differences in output to prevent the illusion of this,
see [1].

As this tool generates code, we'd like to keep the output as stable as
possible, use the advice in [1] and [2] (at the end) and run the
generated JSON through a formatter before including it in the generated
code. This also changes from using the error ignoring Format() function
to Marshal().

[1]: https://developers.google.com/protocol-buffers/docs/reference/go/faq#unstable-json
[2]: golang/protobuf#1121 (comment)
radhus added a commit to einride/grpc-service-config-go that referenced this issue Aug 2, 2022
protojson doesn't guarantee stable JSON output, and deliberately
introduce minor differences in output to prevent the illusion of this,
see [1].

As this tool generates code, we'd like to keep the output as stable as
possible, use the advice in [1] and [2] (at the end) and run the
generated JSON through a formatter before including it in the generated
code. This also changes from using the error ignoring Format() function
to Marshal().

[1]: https://developers.google.com/protocol-buffers/docs/reference/go/faq#unstable-json
[2]: golang/protobuf#1121 (comment)
jchadwick-buf added a commit to connectrpc/connect-go that referenced this issue Feb 21, 2023
Unfortunately, protobuf offers no story whatsoever for canonicalization
of protobuf or protojson output. It seems the best we can get is to just
make it deterministic within each implementation.

Related issues:
* golang/protobuf#1121
* golang/protobuf#1373
SnowmanSeniorDev added a commit to SnowmanSeniorDev/cloud-foundation-toolkit that referenced this issue Apr 17, 2023
In github.com/GoogleCloudPlatform/cloud-foundation-toolkit/issues/902, I suggested eventually moving to protojson.

According to golang/protobuf#1121 (comment), that library does not support a stable serialization output. That means that the test in proto_test.go would become flaky.

The suggestion there is "Instead of syntactically comparing the serialized form of a message, the test should semantically compare the structured form of a message", so I've done that here by loading the got and want strings as JSON and comparing the structured JSON.
bwplotka pushed a commit to prometheus/client_golang that referenced this issue Aug 11, 2023
By upgrading prometheus/client_model, several test functions had to be re-written due to 2 breaking changes made in protobuf when parsing messages to text:
1. '<' and '>' characters were replaced with '{' and '}' respectively.
2. The text format is non-deterministic. More information in golang/protobuf#1121

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
@adg
Copy link
Contributor

adg commented Apr 17, 2024

Just burned a few hours on this. Questioned my sanity more than once.

The documentation on protojson.Marshal says:

Do not depend on the output being stable. It may change over time across different versions of the program.

Which, sure, I read and thought I understood. But I did not anticipate that different builds with totally unrelated changes might produce different JSON output. If I were to send a change to strengthen the language of this warning, would that be accepted? Were the docs more explicit about "different versions of the program" then I would have spent my time doing more productive things today.

@puellanivis
Copy link
Collaborator

Yes, if you open a Gerrit change request with strengthened language then it will be considered. I cannot say for sure that it would be accepted, but if it’s a positive change, then it should get merged. I’m wondering what you’re proposing, but I suppose introducing it with a Gerrit change is the best way to discuss the proposal, rather than just discussing it here.

@dsnet
Copy link
Member

dsnet commented Apr 17, 2024

When I was still at Google, I wrote a proposal specification defining canonical serialization for the binary, text, and JSON formats. At the time, the protobuf team was going through some leadership changes and the proposal unfortunately fell on deaf ears. IIRC, the document was entitled "Canonical Serialization for Protocol Buffers" or something and there was an unsubmitted CL that implemented the proposed canonicalization scheme in the Go implementation. We should explore having the protobuf team formally approve that specification (or something similar to it).

@adg
Copy link
Contributor

adg commented Apr 17, 2024

@puellanivis I sent this change: https://go-review.googlesource.com/c/protobuf/+/579895

@dsnet That sounds worth exploring (very good doc btw). However, I don't have the bandwidth to get involved in this issue beyond updating the documentation.

In the longer term my team will move away from depending on any kind of deterministic protojson encoding.

gopherbot pushed a commit to protocolbuffers/protobuf-go that referenced this issue Apr 18, 2024
Because these encoders use internal/detrand to make their output
nondeterministic across builds, use stronger wording to warn users of
their instability.

Updates golang/protobuf#1121

Change-Id: Ia809e5c26ce24d17f602e7fbae26d9df2b57d05b
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/579895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Lasse Folger <lassefolger@google.com>
Reviewed-by: Lasse Folger <lassefolger@google.com>
Reviewed-by: Michael Stapelberg <stapelberg@google.com>
@AnomalRoil
Copy link

AnomalRoil commented Jun 27, 2024

This has recently bitten us when implementing cryptographic signatures of some packets.

FYI, the generated String() methods of the generated struct doesn't include any wording about it not being consistent across builds, and seems to usually default to:

	return protoimpl.X.MessageStringOf(x)

which in turn uses

	return prototext.MarshalOptions{Multiline: false}.Format(m)

which uses Format with the appropriate warnings in the documentation.

@zbakkernomi
Copy link

also bitten - use case is deterministic serialization of rpc request messages for caching and singleflight

@omar391
Copy link

omar391 commented Oct 27, 2024

Apart from the float issue, I think we can

  • apply a JSON formatted / minifier lib on top of protojson output
  • pinn JSON formatter lib version

This should produce stable JSON output unless I miss something else.

@puellanivis
Copy link
Collaborator

Yes, using encoding/json.Compact can be used to have a more deterministic output. However, there is no actual agreed upon canonical JSON format from which this package can guarantee that the order will not change in the future.

So, even if you’re using json.Compact there is no guarantee that a future version won’t change the output. As you say, you could pin the JSON formatter version, but this will also pin your protobuf version, and you won’t get any updates or fixes that we later implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests