-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/text/encoding: decoder aborts on some invalid encodings instead of using U+FFFD #18898
Comments
I had not seen it before but it seems that this issue was already address in a code comment (https://github.com/golang/text/blob/c27e06ce2911df9af962d3325d66ce05b7502c32/encoding/encoding.go#L23) :
So here is a list of all the encodings returning errors :
And from what I see here https://encoding.spec.whatwg.org/#error the default error management for all decoders should be doing replacement instead of fatal error by default (even for UTF-16). |
You have marked the issue as "proposal", but IMHO that's a bit of a stretch if there's a TODO in the code, you're actually asking to fix a known problem. anyway cc @mpvl who owns |
Sorry my bad I saw the comment when I was having a second look around. |
@nigeltao Could you clarify why these decoders returned errors instead of substituting the replacement character? |
In https://groups.google.com/d/msg/golang-nuts/pENT3i4zJYk/o35l8WmyBwAJ on 2015 December, I said:
I still can't remember why we return errors instead of substituting U+FFFD. For the record, there's also an "Error Modes" CL proposal (since abandoned) at https://go-review.googlesource.com/#/c/8303/ But as I cannot seem to contact 2013-era-nigeltao, I'm OK with just always substituting U+FFFD and changing the implementations to match the documentation, as per the TODO. |
Then substituting U+FFFD without the option to do otherwise seems the best approach to me. |
CL https://golang.org/cl/37316 mentions this issue. |
CL https://golang.org/cl/37317 mentions this issue. |
CL https://golang.org/cl/37319 mentions this issue. |
CL https://golang.org/cl/37322 mentions this issue. |
CL https://golang.org/cl/37324 mentions this issue. |
CL https://golang.org/cl/37325 mentions this issue. |
Updates golang/go#18898 Change-Id: Ic5fb77af67656889a387fa75a3e6efc9b9975817 Reviewed-on: https://go-review.googlesource.com/37316 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>
Updates golang/go#18898 Change-Id: I9868004acb11abbfee8492c9de8ba374f6dcb2ac Reviewed-on: https://go-review.googlesource.com/37319 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>
Updates golang/go#18898 Change-Id: I0636d4ff317bbca015cf2a2a2ce9f452bdc601fd Reviewed-on: https://go-review.googlesource.com/37322 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>
Not sure if the right number of replacement characters is returned in all instances, but seems reasonable. Updates golang/go#18898 Change-Id: Ibf6efdb079191aa6db4eb05b41b7dae593947bd0 Reviewed-on: https://go-review.googlesource.com/37324 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>
Updates golang/go#18898 Change-Id: I049e5ba1fca9529eeacc3aa58f7e5c2d17f22ecd Reviewed-on: https://go-review.googlesource.com/37317 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>
As for UTF-16, it will return only an error if ExpectBOM is given. This mode is not used by either the ianaindex or htmlindex, and can easily be avoided by users if they do not wish an error in such circumstances. As this is just an opt-in error, and an error for which it is unclear what replacement would mean, I'm not altering UTF-16. The same holds for UTF-32. |
CL https://golang.org/cl/37595 mentions this issue. |
Updates golang/go#18898 Change-Id: If234aa5fdc35daf5ab02f49400462aa0c1ffa5ea Reviewed-on: https://go-review.googlesource.com/37325 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>
This CL also fixes a bug in the JIS 212 mode. Updates golang/go#18898 Change-Id: Idb10e375bbc5d278db4e8550f26986a5d2bb6caa Reviewed-on: https://go-review.googlesource.com/37595 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Nigel Tao <nigeltao@golang.org>
All decoders should no longer return errors except for the opt-in error for UTF-16/32. |
Hello,
I noticed that for some encodings (at least all the ones for Japanese, Korean, traditional and simplified Chinese) the
Transform
method of their respective decoder handle invalid character by failing the decoding. But for example the UTF8 decoder replace everything it does not know by the invalid one (\ufffd).Wouldn't it be better to normalize all the decoders to replace unknown characters by the invalid one?
The text was updated successfully, but these errors were encountered: