Skip to content

Tokenize and parse attributes string into meaningful tokens and key-value pairs.

License

Notifications You must be signed in to change notification settings

bent10/attributes-parser

Repository files navigation

Attributes Parser

A utility for parsing and tokenizing attributes string into meaningful tokens and key-value pairs.

Install

You can install this module using npm or yarn, it's only 2.68 kB | min: 1.10 kB:

npm i attributes-parser
# or
yarn add attributes-parser

Alternatively, you can also include this module directly in your HTML file from CDN:

Type URL
ESM https://cdn.jsdelivr.net/npm/attributes-parser/+esm
CJS https://cdn.jsdelivr.net/npm/attributes-parser/dist/index.cjs
UMD https://cdn.jsdelivr.net/npm/attributes-parser/dist/index.umd.js

Usage

import parseAttrs from 'attributes-parser'

const attr = `#my-id.foo.bar class="baz" num=3.14 numNeg=-3.14 data-num="3.14" data-value="123" data-value=1_000_000 options=\'{"key": "value", "array": [1, 2, 3]}\' data-list="[1, 2, 3]" punc="a=b,c,d,e" checked=false checked=false data-checked="false" disabled`
const parsedAttr = parseAttrs(attr)

console.log(parsedAttr)
// use `parsedAttr.toString()` to turn it back into a string
// use `parsedAttr.getTokens()` to get the tokens array

Yields:

{
  id: 'my-id', //  from shorthand attr #my-id
  class: 'foo bar baz', //  from shorthand attr .foo.bar and class="baz"
  num: 3.14,  // number
  numNeg: -3.14,  // negative number
  'data-num': '3.14',  // preserve string
  'data-value': 1000000,  // any duplicate key but `class`, last value is kept
  options: { key: 'value', array: [ 1, 2, 3 ] },
  'data-list': [ 1, 2, 3 ],
  punc: 'a=b,c,d,e',  // allowed, no ambiguous ampersand
  checked: false,  // boolean
  'data-checked': 'false',  // preserve string
  disabled: "disabled"  // shorthand
}

Attribute Validation

This module ensure that attribute names and values adhere to the syntax rules:

  • Follows the HTML specification for valid attribute names and values. It uses regular expressions to validate and tokenize attributes based on the rules defined in the specification.

  • Valid attribute names and values will be correctly tokenized and parsed, providing you with meaningful results.

  • Invalid attributes that do not adhere to HTML syntax rules may result in unexpected behavior. It's essential to ensure that the input attributes string comply with HTML standards to get accurate results.

  • If an attributes string contains invalid characters or does not follow the HTML syntax for attributes, the parsing may not produce the desired output.

It is better to validate and sanitize your attributes string to ensure they conform to HTML standards before using this module for parsing. For more information on HTML syntax attributes, refer to the HTML syntax attributes specification.

Below are the test cases that demonstrate valid and invalid attribute patterns according to these rules:

AttributeName

Valid

  • validname
  • validName
  • valid_name
  • valid-name
  • valid42
  • valid-42
  • -valid
  • _valid
  • $valid
  • @valid
  • valid@name
  • :valid
  • valid:name

Invalid

  • \x07Fvalid (Contains prohibited character)
  • invalid name (Contains space character)
  • "invalidName" (Contains prohibited character)
  • name> (Contains prohibited character)
  • name= (Contains prohibited character)
  • name\x00 (Contains prohibited character)
  • name\n (Contains prohibited character)

AttributeShorthand

  • #bar (Shorthand for attribute id="bar")
  • .foo (Shorthand for attribute class="foo")

BooleanLiteral

  • true
  • false

NumericLiteral

Valid

  • 0x1A3 (hexLiteral)
  • 0o755 (octalLiteral)
  • 0b1101 (binaryLiteral)
  • 123.456 (decimalLiteral)
  • 1.23e-45 (scientificLiteral)
  • 0 (zeroLiteral)
  • 1_000_000 (underscoredLiteral)
  • 42 (integerLiteral)
  • 1e3 (scientificNoFraction)

Invalid

  • 12.34e (scientificNoExponent)

StringLiteral (Single-quoted)

Valid

  • 'valid value'
  • "valid@value"
  • 'valid & value'
  • 'valid &; value'
  • '42'
  • '-42'
  • '3.14'
  • '0.5'
  • '-0.5'
  • '.5'
  • 'escaped single quote: \\''
  • 'newline: \n'
  • 'escaped newline: \\n'
  • '[1, 2, 3]'
  • '{foo: "bar"}'

Invalid

  • 'invalid value" (Contains prohibited character)
  • 'invalid value'' (Contains prohibited character)
  • 'invalid &value;' (Contains an ambiguous ampersand, e.g. &)

StringLiteral (Double-quoted)

Valid

  • "valid value"
  • "valid@value"
  • "valid & value"
  • "valid &; value"
  • "42"
  • "-42"
  • "3.14"
  • "0.5"
  • "-0.5"
  • ".5"
  • "escaped double quote: \""
  • "newline: \n"
  • "escaped newline: \n"
  • "[1, 2, 3]"
  • "{foo: 'bar'}"

Invalid

  • "invalid value' (Contains prohibited character)
  • "invalid value"" (Contains prohibited character)
  • "invalid &value;" (Contains an ambiguous ampersand, e.g. &)

StringLiteral (Unquoted)

Valid

  • validValue
  • valid@value
  • 42
  • -42
  • 3.14
  • 0.5
  • -0.5
  • .5
  • true
  • false

Invalid

  • invalid value (Contains prohibited character)
  • value" (Contains prohibited character)
  • value' (Contains prohibited character)
  • value` (Contains prohibited character)
  • =value (Contains prohibited character)
  • value> (Contains prohibited character)
  • value< (Contains prohibited character)
  • value\x00 (Contains prohibited character)
  • value\n (Contains prohibited character)

Related

  • json-loose – Transforms loosely structured plain object strings into valid JSON strings.

Contributing

We 💛  issues.

When committing, please conform to the semantic-release commit standards. Please install commitizen and the adapter globally, if you have not already.

npm i -g commitizen cz-conventional-changelog

Now you can use git cz or just cz instead of git commit when committing. You can also use git-cz, which is an alias for cz.

git add . && git cz

License

GitHub

A project by Stilearning © 2023.