Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review: syntax #1

Open
wooorm opened this issue Oct 17, 2020 · 0 comments
Open

Review: syntax #1

wooorm opened this issue Oct 17, 2020 · 0 comments

Comments

@wooorm
Copy link

wooorm commented Oct 17, 2020

Hey! Awesome work, the first person to write a micromark extension! I’d love to hear more about your experience.

I went through the code and added some questions/tips/bugs:

import { html } from './html'

function wikiLink (opts = {}) {
  const aliasDivider = opts.aliasDivider || ':'

  const aliasMarker = aliasDivider
  // ^ Generally, I would suggest working internally on character codes instead
  // of strings though. In certain cases I had to work with string in micromark,
  // (such as w/ `<![CDATA[`), in which case I’ve used one index to keep track
  // of the position, and one string to track that “buffer”:
  // <https://github.com/micromark/micromark/blob/63cf51445077f18145d35dd3c506859d6f8195ce/lib/tokenize/html-text.js#L62-L64>
  // and
  // <https://github.com/micromark/micromark/blob/63cf51445077f18145d35dd3c506859d6f8195ce/lib/tokenize/html-text.js#L133>
  // It might be a micro optimization though: I found that using less variables
  // was the surest way to make the code small, and otherwise using string.
  // If you’re using codes, you could comment their names:
  // `(code === 91) { // left square bracket` or `(code === 91 /* '[' */) {`.
  const startMarker = '[['
  const endMarker = ']]'
  // ^ If these are always static two-character, maybe an extra state will help.
  // E.g., to parse a processing instruction (`<?x?>`) I didn’t use a buffer but
  // worked on the codes directly.

  function tokenize (effects, ok, nok) {
    var data
    var alias

    var aliasCursor = 0
    var startMarkerCursor = 0
    var endMarkerCursor = 0

    return start

    function start (code) {
      if (code !== startMarker.charCodeAt(startMarkerCursor)) return nok(code)
      // ^ this *should* not happen (as micromark knows you’re hooking into
      // `91`), but I do say to keep it in, just to be safe and signal here too
      // that you’re looking for a bracket.

      effects.enter('wikiLink')
      effects.enter('wikiLinkMarker')
      // ! You are missing a `effects.consume(code)`, or you should pass it to
      // the next with `return consumeStart(code)`.
      // While testing, you can use `micromark/lib`, which comes with
      // assertions to catch this!
      // <https://github.com/micromark/micromark#size--debug>
      return consumeStart
    }

    function consumeStart (code) {
      if (startMarkerCursor === startMarker.length) {
        effects.exit('wikiLinkMarker')
        // ! `return consumeData(code)`
        return consumeData
      }

      if (code !== startMarker.charCodeAt(startMarkerCursor)) {
        return nok(code)
      }

      effects.consume(code)
      startMarkerCursor++

      return consumeStart
    }

    function consumeData (code) {
      // ! You might also inline this into the above, because it’s called once.
      effects.enter('wikiLinkData')
      effects.enter('wikiLinkTarget')
      // ! `return consumeTarget(code)`
      return consumeTarget
    }

    function consumeTarget (code) {
      if (code === aliasMarker.charCodeAt(aliasCursor)) {
        if (!data) return nok(code)
        effects.exit('wikiLinkTarget')
        effects.enter('wikiLinkAliasMarker')
        // ! `return consumeAliasMarker(code)`
        return consumeAliasMarker
      }

      if (code === endMarker.charCodeAt(endMarkerCursor)) {
        if (!data) return nok(code)
        effects.exit('wikiLinkTarget')
        effects.exit('wikiLinkData')
        effects.enter('wikiLinkMarker')
        // ! `return consumeEnd(code)`
        return consumeEnd
      }

      // ! you might want to comment that this is CR, LF, CRLF, HT, VS, and
      // SP (whitespace, EOLs, EOF)
      if (!(code < 0 || code === 32)) {
        data = true
      }

      // ! One thing here that might be of interest to add, is support for
      // escapes: how should `[[my\]page]]` and `[a\:b]` work?
      // If that’s the case, you don’t need to care about all that is escapable
      // in Markdown (ASCII punctuation). In labels, which work somewhat
      // similar to this, I’m doing it like so:
      // <https://github.com/micromark/micromark/blob/63cf51445077f18145d35dd3c506859d6f8195ce/lib/tokenize/factory-label.js#L76>
      // you will probably also support escaping `aliasDivider.charCodeAt(0)`?
      // So in this tokenizer, you mark this as a [“string”](https://github.com/micromark/micromark/blob/63cf51445077f18145d35dd3c506859d6f8195ce/lib/tokenize/factory-label.js#L58)
      // but otherwise you “skip” over the escapes.
      // micromark will later go find all strings, which support character
      // references (`&amp;`) and character escapes (`\&`), and handle ’em.
      effects.consume()

      return consumeTarget
    }

    function consumeAliasMarker (code) {
      if (aliasCursor === aliasMarker.length) {
        effects.exit('wikiLinkAliasMarker')
        effects.enter('wikiLinkAlias')
        // ! `return consumeAlias(code)`
        return consumeAlias
      }

      if (code !== aliasMarker.charCodeAt(aliasCursor)) {
        return nok(code)
      }

      effects.consume(code)
      aliasCursor++

      return consumeAliasMarker
    }

    function consumeAlias (code) {
      if (code === endMarker.charCodeAt(endMarkerCursor)) {
        if (!alias) return nok(code)
        effects.exit('wikiLinkAlias')
        effects.exit('wikiLinkData')
        effects.enter('wikiLinkMarker')
        // ! `return consumeEnd(code)`
        return consumeEnd
      }

      if (!(code < 0 || code === 32)) {
        alias = true
      }

      // ! `code` must be consumed!!
      effects.consume()

      return consumeAlias
    }

    function consumeEnd (code) {
      if (endMarkerCursor === endMarker.length) {
        effects.exit('wikiLinkMarker')
        effects.exit('wikiLink')
        // ! `return ok(code)`
        return ok
      }

      if (code !== endMarker.charCodeAt(endMarkerCursor)) {
        return nok(code)
      }

      effects.consume(code)
      endMarkerCursor++

      // ! `return consumeEnd(code)`
      return consumeEnd
    }
  }

  // ! just fyi, no change needed: I call these things “constructs”, which could also have more fields
  var call = { tokenize: tokenize }

  return {
    text: { 91: call }
  }
}

export {
  wikiLink as syntax,
  html
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant