-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No part of speech tags for German non-lemma entries #205
Comments
If i'm understanding this right, it's similar to yomidevs/yomitan#1509 and yomidevs/yomitan#1507, i.e. it's about making the dictionary deinflection format more precise. I think this is a worthwhile goal, but it would require first changing
{
"type": "array",
"description": "Deinflection of the term to an uninflected term.",
"minItems": 2,
"maxItems": 2,
"items": [
{
"type": "string",
"description": "The uninflected term."
},
{
"type": "array",
"description": "A chain of inflection rules that produced the inflected term",
"items": {
"type": "string",
"description": "A single inflection rule."
}
}
]
} Maybe the way to do it would be by adding more items to this array? |
Thanks for the links + reply. I suppose changing the schema would indeed be necessary if the idea is to allow one term entry to have definitions corresponding to terms of varying parts of speech. From my uninformed perspective, though, it would seem more sensible to consider homonyms/homographs having different parts of speech as different lemmas altogether, and to have their definitions listed in separate term entries. If that is indeed how it works with yomitan dictionaries, I don't see why we can't use the "rules" field to mark the part of speech of these inflected term entries, described in the term banks schema as In any event, yes, the lookup code would still need some changes. Depending on how these changes are implemented, putting this part of speech data in this rules field could be a good idea because, as a simple string field, it could work well as an indexed field. Assuming again that 1 entry = 1 part of speech, search results could easily be narrowed down by checking if the rules field matches a string query, rather than requiring that all results have their entire definitions field loaded/analyzed. |
An example: The entry for 2nd-person singular "willst" points to the lemma (infinitive) "wollen" in Wiktionary: https://en.wiktionary.org/wiki/willst#German
However, "wollen" has a homonym "wollen", an adjective meaning "woolen".
In Wiktionary it is clear that this "wollen" is unrelated, as "willst" is marked as a verb there. But in the kty-de-en dictionary, the part of speech of "willst" is only marked implicitly in the "deinflection" rules. So when performing a lookup on the word "wollen" as found in the "willst" entry, there is no easy way to narrow down the search to include only verbs.
If explicit part of speech markers were present in these non-lemma entries (like the definition tags in lemma entries), it would be much easier to recover their parts of speech.
The text was updated successfully, but these errors were encountered: