Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/xml: reject ill-formed XML #68299

Open
DemiMarie opened this issue Jul 4, 2024 · 1 comment
Open

proposal: encoding/xml: reject ill-formed XML #68299

DemiMarie opened this issue Jul 4, 2024 · 1 comment
Labels
Milestone

Comments

@DemiMarie
Copy link
Contributor

DemiMarie commented Jul 4, 2024

Proposal Details

encoding/xml has multiple problems:

  1. It does not check for XML well-formedness constraints: encoding/xml: allows a colon before or after an XML name #68294, encoding/xml: does not reject duplicate attribute names #68295
  2. It does not check for XML namespace constraints: encoding/xml: does not check namespace constraints that do not require keeping extra state #68296, encoding/xml: does not reject multiple attributes with the same namespace URL and local part #68297
  3. Its handling of XML namespaces is known to be buggy.

This proposal covers all issues linked by #68293. These can all be fixed internally to encoding/xml, without changes to the API. However, there will be new API on xml.Decoder:

const (
	AllowLeadingColons = 1 << iota
	AllowTrailingColons
	AllowDuplicateAttributes
	// possibly other flags
)
	
/*
	Sets whether the parser allows ill-formed XML.
	Prior to 1.22, the parser always allowed ill-formed XML.
	Starting in 1.23, ill-formed XML is not allowed by default,
	but it can be re-enabled by calling decoder.AllowIllFormed(-1).
	The Allow* flags can be used for more fine-grained control.
*/
func (d *xml.Decoder) AllowIllFormed(flags int64)

and a GODEBUG flag allow-ill-formed-xml=<bitmask> for course-grained global control.

In the future, xml.Decoder will reject ill-formed XML. If it is found to accept ill-formed XML, this will be considered a bug and fixed, with a new flag so that applications can opt-in to the old behavior.

Debug flags for encoding/xml may be removed with not less than two major versions notice.

My understanding is that this change is enough to guarantee round-trip stability for RawToken users (not for Token users). I believe that is possible to implement a namespace-aware, round-trip-stable parser correctly on top of the RawToken API, but the standard library does not currently implement such a parser.

This is essentially “merge #48641 + debug flags”. this will require additional changes beyond this PR.

@gopherbot gopherbot added this to the Proposal milestone Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

3 participants