Skip to content

Commit

Permalink
readme updated, npmignore added, version updated
Browse files Browse the repository at this point in the history
  • Loading branch information
inikulin committed Feb 28, 2014
1 parent bd12530 commit fac7a30
Show file tree
Hide file tree
Showing 3 changed files with 119 additions and 20 deletions.
4 changes: 4 additions & 0 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.idea
node_modules
benchmark
test
133 changes: 114 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
parse5
======
Fast full-featured HTML parser for Node. Based on WHATWG HTML5 specification.
To build [TestCafé](http://testcafe.devexpress.com/) we needed fast and ready for production HTML parser for node.js, which will parse HTML as a modern browser's parser.
![logo](https://raw.github.com/inikulin/parse5/master/logo.png)

Fast full-featured HTML parsing/serialization toolset for Node. Based on WHATWG HTML5 specification.
To build [TestCafé](http://testcafe.devexpress.com/) we needed fast and ready for production HTML parser, which will parse HTML as a modern browser's parser.
Existing solutions were either too slow or their output was too inaccurate. So, this is how parse5 was born.

Install
-------
##Install
```
$ npm install parse5
```

Usage and API
-------------

##Simple usage
```js
var Parser = require('parse5').Parser;

Expand All @@ -26,8 +25,7 @@ var fragment = parser.parseFragment('<title>Parse5 is &#102;&#117;&#99;&#107;ing

```

Is it fast?
-----------
##Is it fast?
Check out [this benchmark](https://github.com/inikulin/node-html-parser-bench).

```
Expand All @@ -41,16 +39,114 @@ Fastest is htmlparser2 (https://github.com/fb55/htmlparser2),parse5 (https://git

So, parse5 is as fast as simple specification incompatible parsers and ~15-times(!) faster than the current specification compatible parser available for the node.

Testing
-------

##API reference

---------------------------------------


###Enum: TreeAdapters
Provides built-in tree adapters what can be passed as an optional argument to the `Parser` and `TreeSerializer` constructors.


####&bull; TreeAdapters.default
Default tree format for parse5.


####&bull; TreeAdapters.htmlparser2
Quite popular [htmlparser2](https://github.com/fb55/htmlparser2) tree format (e.g. used in [cheerio](https://github.com/MatthewMueller/cheerio) and [jsdom](https://github.com/tmpvar/jsdom)).


---------------------------------------


###Class: Parser


####&bull; Parser.ctor([treeAdapter])
Creates new reusable instance of the `Parser`. Optional `treeAdapter` argument specifies resulting tree format. If `treeAdapter` argument is not specified, `default` tree adapter will be used.

*Example:*
```js
var parse5 = require('parse5');

//Instantiate new parser with default tree adapter
var parser1 = new parse5.Parser();

//Instantiate new parser with htmlparser2 tree adapter
var parser2 = new parse5.Parser(parse5.TreeAdapters.htmlparser2);
```


####&bull; Parser.parse(html)
Parses specified `html` string. Returns `document` node.

*Example:*
```js
var document = parser.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');
```


####&bull; Parser.parseFragment(htmlFragment, [contextElement])
Parses given `htmlFragment`. Returns `documentFragment` node. Optional `contextElement` argument specifies resulting tree format. If `contextElement` argument is not specified, `<div>` element will be used.

*Example:*
```js
var documentFragment = parser.parseFragment('<table></table>');

//Parse html fragment in context of the parsed <table> element
var trFragment = parser.parseFragment('<tr><td>Shake it, baby</td></tr>', documentFragment.childNodes[0]);
```


---------------------------------------


###Class: TreeSerializer


####&bull; TreeSerializer.ctor([treeAdapter])
Creates new reusable instance of the `TreeSerializer`. Optional `treeAdapter` argument specifies input tree format. If `treeAdapter` argument is not specified, `default` tree adapter will be used.

*Example:*
```js
var parse5 = require('parse5');

//Instantiate new serializer with default tree adapter
var serializer1 = new parse5.TreeSerializer();

//Instantiate new serializer with htmlparser2 tree adapter
var serializer2 = new parse5.TreeSerializer(parse5.TreeAdapters.htmlparser2);
```


####&bull; TreeSerializer.serializer(node)
Serializes the given `node`. Return HTML string.

*Example:*
```js
var document = parser.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');

//Serialize document
var html = serializer.serialize(document);

//Serialize <body> element content
var bodyInnerHtml = serializer.serialize(document.childNodes[0].childNodes[1]);
```


---------------------------------------


##Testing
Test data is adopted from [html5lib project](https://github.com/html5lib). Parser is covered by more than 8000 test cases.
To run tests:
```
$ node test/run_tests.js
```

Custom tree adapter
-------------------

##Custom tree adapter
You can create a custom tree adapter so parse5 can work with your own DOM-tree implementation.
Just pass your adapter implementation to the parser's constructor as an argument:

Expand All @@ -65,15 +161,14 @@ var myTreeAdapter = {
var parser = new Parser(myTreeAdapter);
```

Sample implementation can be found [here](https://github.com/inikulin/parse5/blob/master/lib/default_tree_adapter.js).
Sample implementation can be found [here](https://github.com/inikulin/parse5/blob/master/lib/tree_adapters/default.js).
The custom tree adapter should implement all methods exposed via `exports` in the sample implementation.

Questions or suggestions?
-------------------------
##Questions or suggestions?
If you have any questions, please feel free to create an issue [here on github](https://github.com/inikulin/parse5/issues).

Author
------

##Author
[Ivan Nikulin](https://github.com/inikulin) (ifaaan@gmail.com)


Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "parse5",
"description": "Fast full-featured HTML parser for Node. Based on WHATWG HTML5 specification.",
"version": "0.6.1",
"version": "0.8.1",
"author": "Ivan Nikulin (ifaaan@gmail.com, https://github.com/inikulin)",
"keywords": [
"html",
Expand Down

0 comments on commit fac7a30

Please sign in to comment.