Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop adding extra new line after XML declaration with pretty format #164

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Jul 9, 2024

If the XML file does not end with a newline, a space is added to the end of the first line.

Failure: test_indent(REXMLTests::TestDocument::WriteTest::ArgumentsTest)
/Users/naitoh/ghq/github.com/naitoh/rexml/test/test_document.rb:270:in `test_indent'
     267:           output = ""
     268:           indent = 2
     269:           @document.write(output, indent)
  => 270:           assert_equal(<<-EOX.chomp, output)
     271: <?xml version='1.0' encoding='UTF-8'?>
     272: <message>
     273:   Hello world!
<"<?xml version='1.0' encoding='UTF-8'?>\n" +
"<message>\n" +
"  Hello world!\n" +
"</message>"> expected but was
<"<?xml version='1.0' encoding='UTF-8'?> \n" +
"<message>\n" +
"  Hello world!\n" +
"</message>">

diff:
? <?xml version='1.0' encoding='UTF-8'?>
  <message>
    Hello world!
  </message>

This is happen because REXML::Formatters::Pretty#write_document has a logic that depends on the last text node.

We should ignore all top-level text nodes with pretty format.

@@ -111,7 +111,7 @@ def write_document( node, output )
# itself, then we don't need a carriage return... which makes this
# logic more complex.
node.children.each { |child|
next if child == node.children[-1] and child.instance_of?(Text)
next if child == "\n" and child.instance_of?(Text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Does this work with the following XML?

<?xml version="1.0" encoding="UTF-8"?>

<message>Hello world!</message>

It seems that we can ignore all top-level text nodes:

diff --git a/lib/rexml/formatters/pretty.rb b/lib/rexml/formatters/pretty.rb
index a1198b7..a838d83 100644
--- a/lib/rexml/formatters/pretty.rb
+++ b/lib/rexml/formatters/pretty.rb
@@ -111,7 +111,7 @@ module REXML
         # itself, then we don't need a carriage return... which makes this
         # logic more complex.
         node.children.each { |child|
-          next if child == node.children[-1] and child.instance_of?(Text)
+          next if child.instance_of?(Text)
           unless child == node.children[0] or child.instance_of?(Text) or
             (child == node.children[1] and !node.children[0].writethis)
             output << "\n"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right correction, too.

If next if child.instance_of?(Text), then the following XML works.

<?xml version="1.0" encoding="UTF-8"?>

<message>Hello world!</message>

Copy link

@Fryguy Fryguy Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naitoh Interestingly if you have top-level text nodes that are not an ?xml node, then you end up with the strange behavior below where it outputs a leading blank line.

XXX<message>Hello world!</message>
d = REXML::Document.new("XXX<message>Hello world!</message>")
d.to_s
# => "XXX<message>Hello world!</message>"
d.write(out = '', 2)
out
# => "\n<message>\n  Hello world!\n</message>"

Note that I think this leading "XXX" is invalid XML anyway, but is parsed successfully with REXML (Nokogiri fails in STRICT mode with this). I wonder if you also need to fail on text nodes before the root node similar to how you did it in #161 for those after the root node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you also need to fail on text nodes before the root node similar to how you did it in #161 for those after the root node?

XXX<message>Hello world!</message>

This is invalid XML and I think should be an error.

@kou
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's open a new issue and discuss this on the new issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created #184 which fixes the above.

## Why?
Fix REXML::Formatters::Pretty#write_document, which implicitly assumes that the XML file ends with a newline.

If the XML file does not end with a newline, a space is added to the end of the first line.

```
Failure: test_indent(REXMLTests::TestDocument::WriteTest::ArgumentsTest)
/Users/naitoh/ghq/github.com/naitoh/rexml/test/test_document.rb:270:in `test_indent'
     267:           output = ""
     268:           indent = 2
     269:           @document.write(output, indent)
  => 270:           assert_equal(<<-EOX.chomp, output)
     271: <?xml version='1.0' encoding='UTF-8'?>
     272: <message>
     273:   Hello world!
<"<?xml version='1.0' encoding='UTF-8'?>\n" +
"<message>\n" +
"  Hello world!\n" +
"</message>"> expected but was
<"<?xml version='1.0' encoding='UTF-8'?> \n" +
"<message>\n" +
"  Hello world!\n" +
"</message>">

diff:
? <?xml version='1.0' encoding='UTF-8'?>
  <message>
    Hello world!
  </message>

```
@naitoh naitoh requested a review from kou July 10, 2024 12:19
@kou kou changed the title fix REXML::Formatters::Pretty#write_document Stop adding extra new line after XML declaration by REXML::Formatters::Pretty#write_document Jul 11, 2024
@kou kou changed the title Stop adding extra new line after XML declaration by REXML::Formatters::Pretty#write_document Stop adding extra new line after XML declaration with pretty format Jul 11, 2024
@kou kou merged commit 5e140ed into ruby:master Jul 11, 2024
61 checks passed
@naitoh naitoh deleted the fix_formatters_pretty branch July 11, 2024 01:00
aduth added a commit to 18F/identity-idp that referenced this pull request Jul 17, 2024
aduth added a commit to 18F/identity-idp that referenced this pull request Jul 17, 2024
* Bump rexml to resolve security advisory

changelog: Internal, Dependencies, Update dependencies to resolve security advisories

* Add rexml as explicit dependency

Since we use it in our code, it should be an explicit dependency

See: https://github.com/18F/identity-idp/blob/ea8a6081961d6c373a870dd5fea31efce89fde7e/app/services/proofing/aamva/request/verification_request.rb#L60-L102

* Sync AAMVA fixture to expected output

Likely a result of ruby/rexml#164
kou pushed a commit that referenced this pull request Jul 22, 2024
…fore root element (#184)

## Why?

XML with content at the start of the document is invalid.

https://www.w3.org/TR/2006/REC-xml11-20060816/#document

```
[1]   document   ::=   ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog

```
[22]   	prolog	   ::=   	XMLDecl Misc* (doctypedecl Misc*)?
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl
```
[23]   	XMLDecl	   ::=   	'<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc

```
[27]   	Misc	   ::=   	Comment | PI | S
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI

```
[16]   	PI	   ::=   	'<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget

```
[17]   	PITarget	   ::=   	Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
```

https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-doctypedecl
```
[28]   	doctypedecl	   ::=   	'<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>'
```

See: #164 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants