How to store the data #1

goodmami · 2019-02-22T03:26:45Z

We need to decide a good way to store the gwadoc data, but it's not yet clear what are the intended uses or who are the intended users beyond generating the HTML documentation.
The current (not checked-in) data is a python file that fills dictionaries with data. If generating documentation is the only use, we may as well put it directly into restructuredText. If we want a Python API, e.g., to request the localized name, definition, reverse, etc. from OMW, then it might make sense to make Python classes (Sphinx's autodoc could possibly be used to generate the docs, then).

In either case we could store the data in a data file and transform it (perhaps with validation) into the target representation. I propose using TOML. Even though it is relatively new and not in the standard library, it was chosen for Rust's package manager and for the future of Python packaging (see PEP-0518), so it has support by major projects.

Here's a what (part of) hypernym would look like:

[hypernym]

  [hypernym.name]
    en = "Hypernym"
    symbol = "⊃"
    ja = "上位語"

  [hypernym.def]
    en = "a word that is more general than a given word"
    pl = "Relacja łącząca znaczenie z drugim, ogólniejszym, niż to pierwsze, ale należącym do tej samej części mowy, co ono"
    ja = "当該synsetが相手synsetに包含される"

There's some flexibility in TOML (but not as flexible as YAML, which is a good thing). Something like this would be equivalent, e.g., if you want to group all attributes by language:

[hypernym]
name.en = "Hypernym"
def.en = "a word that is more general than a given word"
# etc...

And while I would like to place this file (gwadoc.toml or whatever) at the top level so it's more prominent for non-Python users/contributors, that would make it much more difficult to distribute with the project and for the python code to find when run. So it might go under gwadoc/gwadoc.toml instead.

As an alternative, if we don't care much about non-Python users, we could make a Python class like Relation and do things like this:

rels['hypernym'] = Relation(
    name={
        "en": "Hypernym",
        "ja": "上位語",
    },
    def={
        "en": "a word that is more general than a given word",
    }
)

Then query it like this:

>>> hypernym = rels['hypernym']
>>> hypernym.name['en']
Hypernym

The text was updated successfully, but these errors were encountered:

fcbond · 2019-02-22T07:13:16Z

I think we might leave it as a python dictionary for the moment, and concentrate on using and extending it.

Converting to TOML looks like it may make it easier to edit down the road.

goodmami · 2019-02-27T07:20:55Z

For now I've settled on having data structures that behave like dictionaries or classes in that they allow for both key-lookup (e.g. rels['hypernym']['name']['en']) and dot-access (rels.hypernym.name.en). The former is useful when you have the relation or property name in a variable and prefer rels[relation] over getattr(rels, relation) while the latter is much simpler and makes editing the file easier. I also made the data structures raise errors on invalid keys/attributes and defined inventories of valid relations, forms, projects, languages, etc., in order to reduce errors caused by simple typos.

I'll leave this issue open as a feature request for future versions.

fcbond added the enhancement New feature or request label Feb 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to store the data #1

How to store the data #1

goodmami commented Feb 22, 2019

fcbond commented Feb 22, 2019 •

edited

Loading

goodmami commented Feb 27, 2019

How to store the data #1

How to store the data #1

Comments

goodmami commented Feb 22, 2019

fcbond commented Feb 22, 2019 • edited Loading

goodmami commented Feb 27, 2019

fcbond commented Feb 22, 2019 •

edited

Loading