libxml2-java is Java language binding for well-known libxml2.
Javadoc is available online.
You need essential build tools such as Java Development Kit 6 or higher, Gradle, GNU Make and most importantly you should have libxml2 development package on your system.
If you need to specify jdk directory manually over system default location, then use --with-jdk option
./configure --with-jdk=/opt/local/java
Otherwise, configure script would try to detect where JDK is installed on your system.
$ sudo port install libxml2
$ ./configure
$ gradle build
$ sudo apt-get install libxml2-dev
$ ./configure
$ gradle build
$ sudo yum install libxml2-devel
$ ./configure
$ gradle build
While you freely run make command many times on your own hand, this step is not required. On running gradle build script, a task named processNativeResources will execute make.
As libxml2 frequantly allocate small chunk of memory, it supports Google's TCMalloc for performance boost.
./configure --with-tcmalloc
It requires system to have google/tcmalloc.h
and -ltcmalloc
.
libxml2-java will free underlying native resources on Object.finalize() by default. It makes you not hassle with memory management issue. However If you have to claim it explicitly, Document.dispose(), XPathContext.dispose() and all nodes implement Disposable will do the job. Note that Docucment.dispose() will free all children nodes as well.
If you don't believe timing of Object.finalize()
and calling dispose() method manually as I don't, libxml2-java allows you to handle memory de-allocation by calling autoDispose(). It will retain Disposable items to backend list until you claim LibXml.disposeAutoRetainedItems()
. If it also makes you hassle, you can avoid it by calling LibXml.setAutoRetainEveryDisposable
. This would retain every disposable objects automatically until you call LibXml.disposeAutoRetainedItems()
.
The backend list holding disposable items is not thread-safe and is managed by internal thread-local-storage. so you need to call LibXml.disposeAutoRetainedItems()
on the same thread as the thread allocated (retained) that items.
Document doc = LibXml.parseString(xml).autoDispose();
// do your job freely.
LibXml.disposeAutoRetainedItems();
or
LibXml.setAutoRetainEveryDisposable();
// use document, xpath without calling dispose() or autoDispose()
LibXml.disposeAutoRetainedItems();
Calling LibXml.printTcmallocStat() allows you to investigate current allocated native memory map by printing status to standard output. If you configure libxml2-java without TCMalloc, LibXml.printTcmallocStat() won't print anything.
- Print all child elements under the root node.
String xml = "<?xml version=\"1.0\"?><root><item /><item /><item /></root>";
Document doc = LibXml.parseString(xml);
Node rootNode = doc.getRootElement();
for(Node node : rootNode) {
out.printf("%s: type=%s%n", node.getName(), node.getType());
}
- Use libxml2-java as default DocumentBuilder by passing org.xmlsoft.jaxp.DocumentBuilderFactoryImpl as java.xml.parsers.DocumentBuilderFactory system property. Then, it allows you to start coding with the standard JAXP API.
DocumentBuilder builder = LibXml.createDocumentBuilderFactory().newDocumentBuilder();
// <?xml version="1.0"?><html><head /><body><p>Good morning</p><p>How are you?</p></body></html>
Document doc = builder.parse(new File("sample.xml"));
Assert.assertEquals("html", doc.getDocumentElement().getNodeName());
- XPath
String xml = "<?xml version=\"1.0\"?>";
xml += "<root>";
xml += "<item>Apple</item>";
xml += "<item tag=\"1\">Bear</item>";
xml += "<item>Cider</item>";
xml += "</root>";
Document doc = LibXml.parseString(xml);
XPathContext ctx = doc.createXPathContext();
XPathObject result = ctx.evaluate("//item[@tag=\"1\"]");
out.println(result.getFirstNode().getChildText()); // Bear
SAXParserFactory implementation has been tested with
- Apache Ant 1.9
- Build simple projects
- Build with Ivy
- Build android projects
- Apache Tomcat 7
- Launched with web.xml, server.xml, context.xml, and my webapps works well as usual
by setting org.xmlsoft.jaxp.SAXParserFactoryImpl as javax.xml.parsers.SAXParserFactory system property then adding libxml2-java.jar on classpath.
DocumentBuilderFactory implementation has been tested with
- Spring Framework 3.2
- Simple app using Spring Data JPA
by setting org.xmlsoft.jaxp.DocumentBuilderFactoryImpl as javax.xml.parsers.DocumentBuilderFactory system property then adding libxml2-java.jar on classpath.
- BasicTest: Test cases building DOM with XML and navigating dom tree
- JaxpTest: Test cases with DocumentBuilderFactory
- SaxTest: Test cases with bare and JSR SAX
- XPathTest: Test cases for XPath APIs.
- DomManipulationTest: Test cases for creating and update DOM.
libxml2-java is not so fast as I expected.
The following is a brief comparison with Apache Xerces which is bundled on JDK with 100KB xml document.
You can examine below comparison by running org.xmlsoft.test.RssTest
.
libxml2-java is simple wrapper for native libxml2. Document object that LibXml.parseFile
returns and their children are lazy initialised on demand, for example Node.getName()
directly calls NewStringUTF(env, xmlNodePtr->name)
. For that reason, returning Document object is 2 times faster than Apache Xerces, but when you start calling Node.getName()
or Node.getChildText()
, it shows same speed or even slower than Apache Xerces's implementation.
Calling Java method from the native codes is obviously slow. Even though libxml2-java caches all core classes, jmethodID, jfieldID, and uses CallNonvirtualXXXMethod rather than CallXXXMethod, it almost 2 times slower than Apache Xerces. Aside from this issue, there are lots of byte/char conversion on every callback method. It makes SAX parsing performance of libxml2-java cannot beat implementation of pure java. Although I tried to put tricky codes to overcome this weakness, it didn't help a lot.
- Make sure libxml2 library is configured with --with-threads option.
libxml2-java is licensed under MIT.