Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for converting a part of speech into another #5

Open
LifeIsStrange opened this issue Feb 4, 2022 · 5 comments
Open

Add support for converting a part of speech into another #5

LifeIsStrange opened this issue Feb 4, 2022 · 5 comments

Comments

@LifeIsStrange
Copy link

LifeIsStrange commented Feb 4, 2022

@tomaarsen Friendly ping!

For example:
Verb to noun:
die -> death

Adjective to adverb:
Beautiful -> beautifully

Make it happen, please!!
My project would benefit a lot from those abilities :)

Your library has the potential to become a cornerstone of NLU and NLG projects, pushing back the frontier of what can be achieved.

Side note: Did it ever occur to you that your project is the inverse function of a lemnatizer? Kinda obvious in retrospect but maybe there are (or not) insights to derive from this and from the methods of state of the art lemnatizer s.
In other words, i expect a SOTA lemnatizer to have some "understanding" of inflexions in order to reverse them accurately (contrary to a word stemmer), and this might slightly overlap with relevant algorithms for your Inflector.

@tomaarsen
Copy link
Owner

Hello! The conversion between POS categories is a feature I have seen requested elsewhere, but it seems like a tricky one to tackle. I don't have the kind of time needed to have a look at it, sadly.

As per your sidenote: I consider this project to be a combination of a morphological analyzer + morphological generator. The former concerns itself with splitting a word into a root and "parts of words" (e.g. "huggable" -> "hug" + "-able"), while the latter does the exact opposite. (e.g. "hug" + "-able" -> "huggable"). See Chapter 2: Preliminaries from my thesis for more information. (Note: In my thesis I refer to "Inflexion", but this was later renamed to "Inflex", i.e. this work)

Inflex, in theory, performs morphological analysis, extracts the root, replaces the additional "parts of words" with something else, and then applies the morphological generation step to reconstruct a word.

A lemmatizer is essentially the first step, where we don't care about saving the additional "parts of words". So, I would say that Inflex is kind of like a lemmatizer followed by a morphological generator, but the implementation kind of merges the two steps usually.

@LifeIsStrange
Copy link
Author

LifeIsStrange commented Feb 7, 2022

Thx for the answer, I should look at your thesis when I get the time!
That's unfortunate to hear (that you don't currently have the time/energy/will) :/
My time is already taken on a semantic parser unfortunately :m
I am only able to find a word to noun converter ->
https://github.com/pranav-ust/nounification
But 1) the project has not received commits since 2019
2) no idea about the accuracy, don't seem to have been benchmarked publicly
3) it use wordnet so I guess it should fail for out of dictionary words/inflections (does your project suffer from this limitation as well?(although wordnet should be fairly complete I guess?)(btw there is a much more actively maintained fork of wordnet called wordnet English))
4) it only supports nounification, not verification, adjectivisation nor adverbisation :/ therefore it is incomplete.

I wonder if there are other tools in the wild for POS conversions, it's hard to query despite being a foundational task. I can't believe those problems have not been attempted to be solved in the 90s, thoses codes are probably long forgotten by all..

@tomaarsen
Copy link
Owner

I don't know of any tool that does this, sadly. I'm well aware of WordNet and NLTK, but that corpus does not easily allow these POS changes. I agree that it's a shame, but it's a difficult problem.

@LifeIsStrange
Copy link
Author

I read partially your thesis, It is greatly written and the amount of detail is excellent! Quite impressive that you cited a paper from 1943!
I you get the time, I'm curious to know how the situation has evolved regarding:

This section enumerates possible improvements to the Inflexion Python
module to improve its effectiveness and further broaden its applicability.

  1. Improve morphological analysis of past participle, present participle
    and past tense verbs. Currently these forms are not correctly
    identified, causing the morphological generation to be applied on
    non-root words, e.g.:
    • ‘using’ converted to past tense becomes ‘usinged’.
    Instead, the morphological analysis section should recognise that
    the input ‘using’ is of the present participle form, and let the
    morphological generation apply on the root ‘use’ instead.
  2. Improve conversions surrounding comparative and superlative
    adjectives.
    • Add is comparative and is superlative functionality to
    detect whether a given adjective is of these forms.
    • Conversions from singular to comparative or superlative work
    well, but the conversion the other way around is not well
    supported yet.
  3. Implement support for grammatical gender, or otherwise allow for
    non-neuter pronouns. For example, currently the singular of the
    adjective ‘our’ is ‘its’. However, users of Inflexion may desire to
    specify the desired grammatical gender (masculine, feminine or neuter)
    so that ‘her’ or ‘his’ is returned instead.

@tomaarsen
Copy link
Owner

I'm afraid that no improvements on those sections have been made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants