Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper handling of boolean attributes? #186

Closed
miketheman opened this issue Sep 1, 2023 · 5 comments
Closed

Improper handling of boolean attributes? #186

miketheman opened this issue Sep 1, 2023 · 5 comments

Comments

@miketheman
Copy link

Hello! Thanks for the nifty library.

Apologies if this is reported elsewhere, or if I'm using the incorrect language. I've not done much Rust, and came to this library from the Python wrapper nh3, but I think I have an example in Rust that shows the issue.

When passing a value to tag_attributes to allow it through the cleansing cycle, it looks like the code doesn't currently know how to use boolean-type attributes.

One such instance I'm hitting is disabled tag attribute.

Then used in HTML, this often looks like:

  <input name="name" type="text" disabled />

However when passing this to ammonia, the output becomes:

  <input name="name" type="text" disabled="" />

Note the trailing =""

While valid HTML, and executes correctly in the browser I tested, it becomes a more verbose syntax, and unnecessarily adds characters to the output HTML.

Here's an example demonstrating the issue:

use ammonia::Builder;
use maplit::{hashmap, hashset};

fn main() {
    let input_html = "<p><input type=\"checkbox\" disabled></p>";

    let added_tags = hashset!["input"];
    let tag_attributes = hashmap![
        "input" => hashset!["type", "disabled"]
    ];

    let cleaned_html = Builder::default()
        .add_tags(added_tags)
        .tag_attributes(tag_attributes)
        .clean(&input_html)
        .to_string();

    assert_eq!(cleaned_html, input_html);
}

I found this listing of boolean attributes, updated July 2023 - it might be useful.

Alternately, if the source tag attribute has no = value, maybe don't add it?

@notriddle
Copy link
Member

Fixing this would require patching html5ever::serialize to special-case boolean attributes. The implementation code is here.

Is this a blocking issue? I'd rather not take on patched copies of html5ever code if it's just there to trim a few bytes off a seldom-used feature.

@miketheman
Copy link
Author

Thanks - it's not blocking just yet.
We can decide if we want to adapt by adding trailing ="" to whatever boolean attributes we get.

Should this be re-opened against the underlying library instead?

@notriddle
Copy link
Member

That might be the best place to do it, though since the html5 serialization algorithm that it implements doesn't call for this, they probably don't want to merge it.

For each attribute that the element has, append a U+0020 SPACE character, the attribute's serialized name as described below, a U+003D EQUALS SIGN character (=), a U+0022 QUOTATION MARK character ("), the attribute's value, escaped as described below in attribute mode, and a second U+0022 QUOTATION MARK character (").

@miketheman
Copy link
Author

Hmm, I guess so 🤔 Since trailing empty ="" is indeed valid HTML5, we might simply accept that modification and move along.
I'll discuss it with my pals next week.

Thanks for your assistance!

@miketheman
Copy link
Author

Thanks for the pointers, I ended up adding the trailing ="" whenever necessary.

@miketheman miketheman closed this as not planned Won't fix, can't repro, duplicate, stale Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants