GitHub - spinn3r/boilerpipe-failed-fork: boilerpipe 2.0

Working on a version of boilerpipe that supports using JSoup instead of xerces, etc and also supports extracting the HTML not just the text.

Also, this moves from ant to maven.

I haven't done much work here besides just getting it to work and getting maven setup.

TODO:

Build out a LARGE number of tests (say 500-1000) that verify the output is correct. Do this SOON though because this way I can figure out if there's a regression easily.
move to using multiple modules so that I can have one for nekohtml/xerces and another for jsoup
tell the chromium guys about boilerpipe 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
boilerpipe-core/src/demo/de/l3s/boilerpipe/demo		boilerpipe-core/src/demo/de/l3s/boilerpipe/demo
lib		lib
src		src
INSTALL.txt		INSTALL.txt
LICENSE		LICENSE
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback