Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html and xml named entity tables should be shared #261

Closed
froydnj opened this issue Mar 30, 2017 · 2 comments · Fixed by #268
Closed

html and xml named entity tables should be shared #261

froydnj opened this issue Mar 30, 2017 · 2 comments · Fixed by #268

Comments

@froydnj
Copy link

froydnj commented Mar 30, 2017

Right now, we have data/entities.json and xml5ever/data/entities.json, which are identical. If a project (such as Servo) requires both html5ever and xml5ever, it'll have duplicate copies of the named entity tables, which are not particularly small--about 230K on x86-64 Linux, roughly half that on 32-bit platforms.

Sticking the entities and the process to generate a phf table for them in a separate, shared crate would alleviate this problem. There might be issues down the road if sets of entities in HTML and XML are no longer identical...but I'm sure solutions can be worked out there if necessary.

@SimonSapin
Copy link
Member

Yes, after discussion in https://github.com/Ygg01/xml5ever/issues/30 we’ve moved the xml5ever code into this repository. The followup that remains to do is sharing more code between them. Maybe the html5ever_atoms crate should be renamed something like markup5ever and be the place for all the shared stuff.

That’s the plan, it just has seemed low-priority so far.

@Ygg01
Copy link
Contributor

Ygg01 commented Mar 31, 2017

There might be issues down the road if sets of entities in HTML and XML are no longer identical...but I'm sure solutions can be worked out there if necessary.

I don't think that should ever happen. XML5 is explicitly made to be close to HTML5 as possible. I mean, I could be wrong (and writing the last few sentences is tempting fate) but unless they break XML in irreparable way, entities.json should be the same.

What differs are parsers and the treebuilders.

bors-servo pushed a commit that referenced this issue May 2, 2017
Xhtml5ever

Ok, this is a large one.

Fixes #266, fixes #261, fixes #210.

It moves html5ever into separate folder, renames html5ever macros markup5ever and stores common code there.

Here is short summary of what I know is and isn't done.

- [x] Make every crate in the repo use a single workspace
- [x] Make sure Travis-CI is running every test
- [x] Rename the the html5ever_atoms crate to markup5ever and update html5ever and xml5ever to use it.
- [x] Increment version numbers
- [x] Make it so that users of either html5ever or xml5ever don’t need to have an explicit dependency to markup5ever
- [x] Export QualName #210
- [x] let markup5ever generate entities.json #261
- [ ] **Move TokenSink to markup5ever**
- [x] Move TreeSink to markup5ever
- [x] Move BufferQueue to markup5ever
- [x] Move SmallCharSet to markup5ever
- [ ] **Deal with driver.rs**

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/html5ever/268)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants