This document describe the JSON schema that is used to describe entries from the nb+nn dictionary. The format is inspired by TEI Dictionary XML.
The dictionary is a collection of entries that have the following structure:
{
"id": 99999,
"lang": "nn",
"lemmas": ["foo"],
"pos": "v",
"pos2": "v1",
"forms": [["fooe", "fooer", "fooet", "fooet"]],
"etym": {
"langs": [
{
"lang": "latin",
"intro": "...",
"text": "..."
}
],
"cits": [
{
"quote": "..."
}
]
},
"senses": [
{
"n": "1",
"defs": [
{
"def": "..."
}
],
"cits": [
{
"quote": "...",
"usg": "..."
},
],
"sub-defs": [
{
"def": "..."
}
],
"sub-cits": [
{
"quote": "...",
"usg": "..."
},
],
}
],
"links": [
{
"rel": "compare",
"target": 99901,
"text": "..."
},
{
"rel": "related",
"target": 99904,
"text": "...",
"intro": "..."
}
]
}
Description of the elements of the entry object:
-
id
numeric identification of this entry. -
lang
is the ISO 639-1 code for the language the described word belongs to. The descriptions follow the language. The code isnn
for Nynorsk andnb
for Bokmål. -
lemmas
The base form of the word. Note that the same lemma can be used for different entries and that an entry can be indexed by multiple lemmas (when there are multiple ways to write the same word). Ref wikipedia. The field is an array of at least one element. The common case is that there is only one lemma for each entry. -
pos
is the word class the word belongs to. It's a string like 'v=verb', 'n=noun', 'a=adjective', 'av=adverb',... -
pos2
the expanded word class code. Given this code it's possible to algorithmically derive the forms from the lemma. It's a string like 'n1', 'n2', 'v1', 'v2', 'm1' -
forms
lists how this word is written in its different forms. The interpretation of these lists depend on thepos
. For instance for nouns the inner array consist of 4 inflections (singular/plural × indefinitive/definitive form). -
etym
describes where where the word came from. Ref wikipedia -
etym.langs
describe the origins of the word from other languages -
etym.langs[].lang
language name (not ISO 631-1 this time) -
etym.cits
describes the origins of the word from literature -
senses
describes the meaning of the word and examples of use. These descriptions are grouped together. -
senses[].n
This is the label for this group of sense statements. Display can be suppressed when there is only one sense. -
senses[].defs
Array containing definitions. Each definition is an object with the attributedef
containing the text of the definition. -
senses[].cits
Array containing examples of use. Each example of use is an object with the attributesquote
andusg
. -
senses[].sub-defs
Array containing subordinate definitions in the same form as fordefs
. -
senses[].sub-cits
Array containing subordinate examples of use in the same form as forcits
. -
links
is a list of references to other relevant entries. Therel
attribute encodes how this entry relates.
The following attributes repeats at various locations in the structure:
-
intro
is a short unescaped text string that makes sense to inline in front of a link or the main text -
text
is a short HTML fragment (escaped). It's text with<span>...</span>
elements. The span elements can have the class attribute set to one of the following values: "wordform",... -
def
is a text string (unescaped) -
quote
is a text string (unescaped) -
usg
is a text string (unescaped)