Skip to content

cheminfo/arraybuffer-xml-parser

Repository files navigation

arraybuffer-xml-parser

NPM version build status Test coverage npm download

This code is based on a copy of fast-xml-parser.

The reason is that we wanted to parse large XML files (over 1Gb) and the current implementation of fast-xml-parser use as input a string. In the current implementation of javascript in V8 this limits the size to 512Mb.

In this code we parse directly a Uint8Array (or an ArrayBuffer) and the limit is now 4Gb.

Installation

$ npm i arraybuffer-xml-parser

Usage

XML to JSON

import { parse } from 'arraybuffer-xml-parser';

// in order to show an example we will encode the data to get the ArrayBuffer.

const encoder = new TextEncoder();
const xmlData = encoder.encode(
  `<rootNode><tag>value</tag><boolean>true</boolean><intTag>045</intTag><floatTag>65.34</floatTag></rootNode>`,
);

const object = parse(xmlData, options);

/*
object = {
  rootNode: {
    tag: 'value',
    boolean: 'true',
    intTag: '045',
    floatTag: '65.34',
  },
}
*/

Options

Option Description Default value
trimValues trim string values of an attribute or node true
attributeNamePrefix prepend given string to attribute name for identification '$'
attributesNodeName (Valid name) Group all the attributes as properties of given name. false
ignoreAttributes Ignore attributes to be parsed. false
ignoreNameSpace Remove namespace string from tag and attribute names. false
allowBooleanAttributes a tag can have attributes without any value false
textNodeName Name of the property containing text nodes '#text'
dynamicTypingAttributeValue Parse the value of an attribute to float, integer, or boolean. true
dynamicTypingNodeValue Parse the value of text node to float, integer, or boolean. true
cdataTagName If specified, parser parse CDATA as nested tag instead of adding it's value to parent tag. false
arrayMode When false, a tag with single occurrence is parsed as an object but as an array in case of multiple occurences. When true, a tag will be parsed as an array always excluding leaf nodes. When strict, all the tags will be parsed as array only. When instance of RegEx, only tags will be parsed as array that match the regex. When function a tag name is passed to the callback that can be checked. false
tagValueProcessor Process tag value during transformation. Like HTML decoding, word capitalization, etc. Applicable in case of string only. (value) => decoder.decode(value).replace(/\r/g, '')
attributeValueProcessor Process attribute value during transformation. Like HTML decoding, word capitalization, etc. (value) => value
stopNodes an array of tag names which are not required to be parsed. They are kept as Uint8Array. []

License

MIT