Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use whacko instead of cheerio #191

Closed
stevenvachon opened this issue Jan 4, 2016 · 14 comments
Closed

Use whacko instead of cheerio #191

stevenvachon opened this issue Jan 4, 2016 · 14 comments

Comments

@stevenvachon
Copy link

So that incorrect HTML can be corrected, thereby resulting in correct CSS selectors. Bad HTML templates are common from designers.

@jrit
Copy link
Collaborator

jrit commented Jan 4, 2016

There would be a few issues with this based on what is on the readme:

Difference with cheerio:

Use $.load(content) to load HTML documents (e.g. missing <html> tags will be automatically emitted in this case).
Parser options (e.g. xmlMode and normalizeWhitespace) are missing since whacko is intended for spec compliant HTML parsing.
  • load would be unusable because we need to preserve support for fragments
  • xmlMode is a feature that some people are using, such as for SVG

We want to modify the input as little as possible, rather than try to make the input into valid HTML5, which may not be what the user wants at all.

@stevenvachon
Copy link
Author

SVG for emails? If you're referring to inline SVG, that's an HTML5 feature which is fully supported by parse5.

Regarding fragments, this is from the whacko readme:

Use $(content) to create HTML-fragments which can be later added to the loaded document.

juice will need a major version bump, though, to accommodate the necessary API change (new method or option).

@jrit
Copy link
Collaborator

jrit commented Jan 5, 2016

I think it would be more practical for users to simply run their HTML through whacko before they give it to Juice to make sure the input to Juice is valid HTML, if that is something that is a requirement for them. Someone could easily wrap that pipeline into a single module if desired, and if published to npm it could help anyone with the same concern without breaking all the modules dependent on Juice's current behavior. In my opinion, one of the benefits of Juice is that it make as few changes as possible to what it is given.

@stevenvachon
Copy link
Author

<style>
  a { color:yellow }
  table a { color:red }
</style>

<table>
  <a href="#">link</a>
  <tr><td></td></tr>
</table>

Using htmlparser2/cheerio will result in a red link. Any web browser and parse5 will result in a yellow link.

@bago
Copy link
Contributor

bago commented Jun 23, 2016

I'm working to make juice work with jQuery instead of cheerio (#215), and I'll have to add a bit of indirection so to make sure that calls can be adapted to be made by jQuery. I think we could work so to be able to support https://github.com/inikulin/parse5 at the same time, as an alternative, so that juice will have a default "dom manipulation library" (Cheerio) but it will also support jQuery and Parse5.

Did you already check if parse5 have special requirements or if it works by just changing the require('cheerio') call?

@stevenvachon
Copy link
Author

Why do you need to support all three? parse5 can be compiled with browserify/webpack to work in the browser. jQuery will require jsdom server side which is slower than parse5.

@bago
Copy link
Contributor

bago commented Jun 28, 2016

cheerio works in the browser but it increase the library size of about 400KB.. the same is for parse5.

Some people (us = mosaico.io) already needs jquery for different tasks so it is good if we can let juice works with 400KB less dependencies. And we don't want to use jquery on the server side, we want to use it in the browser.

I didn't make benchmarks, but I expect jquery to be faster because it leverages the browser DOM handling than cheerio on the browser that is a lot of javascript lines of code to be compiled by the javascript VM... so it could be also a matter of speed. I hope I'll be able to produce some data wrt this aspect, later.

BTW, I already did what I needed to make it work with jQuery... given it doesn't make juice more complex i hope jrit will merge it. I encourage you to make your PR to support parse5, too if you want that. As jrit told you each parser has its own "quirks" so there is no "best choice" for every scenario.

@stevenvachon
Copy link
Author

What use cases do we have for a browser solution? I would think that it has a much lower priority than running on the server.

@bago
Copy link
Contributor

bago commented Jun 28, 2016

https://github.com/voidlabs/mosaico
This is a full drag&drop email editor running in the browser. The main "server-side" task is currently the "css inlining" and we want to move it to the client, so that the browser will be able to generate the final html to be sent without the need to use server-side logic.
Here is our task voidlabs/mosaico#54

At this time the css-inlining in the backend is one of our bigger issues because some users implement it in PHP, others in ruby, others in java, others node.js others in python or perl.. each one uses an existing css inliner and each existing CSS inliner have its own quirks and break most of our templates... and people think mosaico is broken while their css inliner is broken.

Unfortunately CSS inlining is not an hard-science.. each one interpret inlining in a different way:
http://mosaico.io/email-client-tricks/email-inliners/

@jrit
Copy link
Collaborator

jrit commented Jun 28, 2016

I also have concerns about the size of the library whether it is on the client or server, it is pretty bulky and challenging to trim very much. The client library is definitely smaller though because it excludes all the stuff to fetch remote resources.

Nobody has ever referenced speed of execution as a particular problem in the issues, so baring any really bad change, I don't think that should be a big deciding factor. 10% this way or that shouldn't matter.

I would like to see how the jQuery/cheerio(or similar) refactoring works out. I don't see a pull request open right now. The most important issue to me if someone wants to propose another library is that it does the minimum possible to modify the existing HTML or XML that is being inlined. There are a bunch of other libraries that try to "fix" code and end up breaking document fragments that users need to be left how they are. There should be pretty good test coverage for that currently so breakages should let you know if a library would work or not.

@bago
Copy link
Contributor

bago commented Jun 28, 2016

I'm working here:
https://github.com/bago/juice/tree/abstract-dom

There are really few changes for jquery compatibility (.attr() instead .attribs[]) ). Another change is to be able to specify a different attribute than "style" because IE10/11 will not let you change the style attribute "as you like" and they will try to "fix" your content (add -ms- to some props, remove others), so in order to inline in the client you have to use a different attribute and then fix it later.

Then I'm doing some refactoring that is related to dependencies isolation.

I'm not changing in any way the default behaviour that uses cheerio, just refactoring the code so it can be partially used by jquery with no "references" to cheerio (just to a $ object that share some common behaviour from jquery, cheerio and other dom libraries inspired on jquery).

@bago
Copy link
Contributor

bago commented Jun 28, 2016

BTW I think I'm going offtopic given this issue is about parse5 and I'm just working so to be able to let some part of juice to work with jQuery instead of cheerio. I'm sorry I took-over this issue.

@jrit
Copy link
Collaborator

jrit commented Sep 8, 2016

Given the changes @bago made for 3.0, I'm going to close this one.

@jrit jrit closed this as completed Sep 8, 2016
@stevenvachon
Copy link
Author

cheerio is switching to parse5 anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants