A gemtext (text/gemini
) parser with support for streaming, ASTs, and CSTs.
Do you:
- 🤨 think that HTTP and HTML are bloated?
- 😔 feel markdown has superfluous features?
- 🤔 find gopher too light?
- 🥰 like BRUTALISM?
Then Gemini might be for you (see this post or this one on why it’s cool).
- What is this?
- When should I use this?
- Install
- Use
- API
- gast
- Types
- Compatibility
- Related
- Contribute
- Security
- License
Dioscuri (named for the gemini twins Castor and Pollux) is a
tokenizer/lexer/parser/etc for gemtext (the text/gemini
markup format).
It gives you several things:
- buffering and streaming interfaces that compile to HTML
- interfaces to create unist compliant abstract syntax trees and serialize those back to gemtext
- interfaces to transform to and from mdast (markdown ast)
- parts that could be used to generate CSTs
These tools can be used if you now have markdown but want to transform it to gemtext. Or if you want to combine your posts into an RSS feed or on your “homepage”. And many other things!
Use this for all your gemtext needs!
This package is ESM only. In Node.js (version 14.14+, 16.0+), install with npm:
npm:
npm install dioscuri
In Deno with esm.sh
:
import * as dioscuri from 'https://esm.sh/dioscuri@1'
In browsers with esm.sh
:
<script type="module">
import * as dioscuri from 'https://esm.sh/dioscuri@1?bundle'
</script>
See each interface below for examples.
This package exports the identifiers buffer
, stream
, fromGemtext
,
toGemtext
, fromMdast
, toMdast
.
The raw compiler
and parser
are also exported.
There is no default export.
Compile gemtext to HTML.
Gemtext to parse (string
or Buffer
).
Character encoding to understand doc
as when it’s a
Buffer
(string
, default: 'utf8'
).
Value to use for line endings not in doc
(string
, default: first line
ending or '\n'
).
Generally, discuri copies line endings ('\n'
or '\r\n'
) in the document over
to the compiled HTML.
In some cases, such as > a
, extra line endings are added:
<blockquote>\n<p>a</p>\n</blockquote>
.
Whether to allow potentially dangerous protocols in URLs (boolean
, default:
false
).
URLs relative to the current protocol are always allowed (such as, image.jpg
).
Otherwise, the allowed protocols are gemini
, http
, https
, irc
, ircs
,
mailto
, and xmpp
.
Compiled HTML (string
).
Say we have a gemtext document, example.gmi
:
# Hello, world!
Some text
=> https://example.com An example
> A quote
* List
…and our module example.js
looks as follows:
import fs from 'node:fs/promises'
import {buffer} from 'dioscuri'
const doc = await fs.readFile('example.gmi')
console.log(buffer(doc))
…now running node example.js
yields:
<h1>Hello, world!</h1>
<br />
<p>Some text</p>
<br />
<div><a href="https://example.com">An example</a></div>
<br />
<blockquote>
<p>A quote</p>
</blockquote>
<br />
<ul>
<li>List</li>
</ul>
Streaming interface to compile gemtext to HTML.
options
is the same as the buffering interface above.
Assuming the same example.gmi
as before and an example.js
like this:
import fs from 'node:fs'
import {stream} from 'dioscuri'
fs.createReadStream('example.gmi')
.on('error', handleError)
.pipe(stream())
.pipe(process.stdout)
function handleError(error) {
throw error // Handle your error here!
}
…then running node example.js
yields the same as before.
Parse gemtext to an AST (gast).
doc
and encoding
are the same as the buffering interface above.
Root.
Assuming the same example.gmi
as before and an example.js
like this:
import fs from 'node:fs/promises'
import {fromGemtext} from 'dioscuri'
const doc = await fs.readFile('example.gmi')
console.dir(fromGemtext(doc), {depth: null})
…now running node example.js
yields (positional info removed for brevity):
{
type: 'root',
children: [
{type: 'heading', rank: 1, value: 'Hello, world!'},
{type: 'break'},
{type: 'text', value: 'Some text'},
{type: 'break'},
{type: 'link', url: 'https://example.com', value: 'An example'},
{type: 'break'},
{type: 'quote', value: 'A quote'},
{type: 'break'},
{type: 'list', children: [{type: 'listItem', value: 'List'}]}
]
}
Serialize gast.
Say our script example.js
looks as follows:
import {toGemtext} from 'dioscuri'
const tree = {
type: 'root',
children: [
{type: 'heading', rank: 1, value: 'Hello, world!'},
{type: 'break'},
{type: 'text', value: 'Some text'}
]
}
console.log(toGemtext(tree))
…then running node example.js
yields:
# Hello, world!
Some text
Place links at the end of the document (boolean
, default: false
).
The default is to place links before the next heading.
Do not put blank lines between blocks (boolean
, default: false
).
The default is to place breaks between each block (paragraph, heading, etc).
gast, probably.
Some mdast nodes have no gast representation so they are dropped.
If you pass one of those in as tree
, you’ll get undefined
out.
Say we have a markdown document example.md
:
# Hello, world!
Some text, *emphasis*, **strong**\
`code()`, and ~~scratch that~~strikethrough.
Here’s a [link](https://example.com 'Just an example'), [link reference][*],
and images: [image reference][*], [](example.png 'Another example').
***
> Some
> quotes
* a list
* with another item
1. “Ordered”
2. List
```
A
Poem
```
```js
console.log(1)
```
| Name | Value |
| ---- | ----- |
| Beep | 1.2 |
| Boop | 3.14 |
* [x] Checked
* [ ] Unchecked
Footnotes[^†], ^[even inline].
[*]: https://example.org "URL definition"
[^†]: Footnote definition
…and our module example.js
looks as follows:
import fs from 'node:fs/promises'
import {gfm} from 'micromark-extension-gfm'
import {footnote} from 'micromark-extension-footnote'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {gfmFromMarkdown} from 'mdast-util-gfm'
import {footnoteFromMarkdown} from 'mdast-util-footnote'
import {fromMdast, toGemtext} from 'dioscuri'
const mdast = fromMarkdown(await fs.readFile('example.md'), {
extensions: [gfm(), footnote({inlineNotes: true})],
mdastExtensions: [gfmFromMarkdown, footnoteFromMarkdown]
})
console.log(toGemtext(fromMdast(mdast)))
…now running node example.js
yields:
# Hello, world!
Some text, emphasis, strong code(), and strikethrough.
Here’s a link[1], link reference[2], and images: image reference[2], [3].
> Some quotes
* a list
* with another item
* “Ordered”
* List
```
A
Poem
```
```js
console.log(1)
```
```csv
Name,Value
Beep,1.2
Boop,3.14
```
* ✓ Checked
* ✗ Unchecked
Footnotes[a], [b].
=> https://example.com [1] Just an example
=> https://example.org [2] URL definition
=> example.png [3] Another example
[a] Footnote definition
[b] even inline
mdast, probably.
Some gast nodes have no mdast representation so they are dropped.
If you pass one of those in as tree
, you’ll get undefined
out.
Say we have a gemtext document example.gmi
:
# Hello, world!
Some text
=> https://example.com An example
> A quote
* List
…and our module example.js
looks as follows:
import fs from 'node:fs/promises'
import {fromGemtext, toMdast} from 'dioscuri'
const doc = await fs.readFile('example.gmi')
console.dir(toMdast(fromGemtext(doc)), {depth: null})
…now running node example.js
yields (position info removed for brevity):
{
type: 'root',
children: [
{
type: 'heading',
depth: 1,
children: [{type: 'text', value: 'Hello, world!'}]
},
{
type: 'paragraph',
children: [{type: 'text', value: 'Some text'}]
},
{
type: 'paragraph',
children: [
{
type: 'link',
url: 'https://example.com',
title: null,
children: [{type: 'text', value: 'An example'}]
}
]
},
{
type: 'blockquote',
children: [
{type: 'paragraph', children: [{type: 'text', value: 'A quote'}]}
]
},
{
type: 'list',
ordered: false,
spread: false,
children: [
{
type: 'listItem',
spread: false,
children: [
{type: 'paragraph', children: [{type: 'text', value: 'List'}]}
]
}
]
}
]
}
gast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.
interface Root <: Parent {
type: 'root'
children: [Break | Heading | Link | List | Pre | Quote | Text]
}
Root (Parent) represents a document.
interface Break <: Node {
type: 'break'
}
Break (Node) represents a hard break.
interface Heading <: Literal {
type: 'heading'
rank: 1 | 2 | 3
value: string?
}
Heading (Literal) represents a heading of a section.
interface Link <: Literal {
type: 'link'
url: string
value: string?
}
Link (Literal) represents a resource.
A url
field must be present.
It represents a URL to the resource.
interface List <: Parent {
type: 'list'
children: [ListItem]
}
List (Parent) represents an enumeration.
interface ListItem <: Literal {
type: 'listItem'
value: string?
}
ListItem (Literal) represents an item in a list.
interface Pre <: Literal {
type: 'pre'
alt: string?
value: string?
}
Pre (Literal) represents preformatted text.
An alt
field may be present.
When present, the node represents computer code, and the field gives the
language of computer code being marked up.
interface Quote <: Literal {
type: 'quote'
value: string?
}
Quote (Literal) represents a quote.
interface Text <: Literal {
type: 'text'
value: string
}
Text (Literal) represents a paragraph.
This package is fully typed with TypeScript.
It exports the additional types Value
(for the input, string or buffer),
BufferEncoding
('utf8'
etc), CompileOptions
(options to turn things to a
string), and FromMdastOptions
(options to turn things into gast).
This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. It also works in Deno and modern browsers.
@derhuerst/gemini
– gemini protocol server and clientgemini-fetch
– load gemini protocol data the way you would fetch from HTTP in JavaScript
Yes please! See How to Contribute to Open Source.
Gemtext is safe.
As for the generated HTML: that’s safe by default.
Pass allowDangerousProtocol: true
if you want to live dangerously.