Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Japanese ます, んかった form #1385

Merged
merged 7 commits into from
Sep 2, 2024

Conversation

khaitruong922
Copy link

@khaitruong922 khaitruong922 commented Sep 1, 2024

1. Refactor masu form

Problems

  • Currently there are many wrong conjugations with masu form, for example:

    • 行きまさない (masu -> negative)
    • 行きますな (masu -> imperative negative)
    • 行きましたい (masu -> tai)
  • Verbs end with ます also is also treated as polite ます form

    • 済ません returns (negative) and (potential -> n). (negative) is incorrect, it should be 済まさない.
    • 済ましょう returns (volitional), which is incorrect, should be 済まそう.

Some may not appear in real text, but it can create wrong impressions on Japanese grammar for learners, especially when they start to output on their own.

Solution

  • Remove v5m, add masu and masen conditions.
    masu form is a special form in Japanese with different rules of conjugation and we should not treat it as a godan verb.

  • Follow the conjugation rules in https://en.wiktionary.org/wiki/Appendix:Japanese_verbs
    Only add universal rules: ません、ませんでした、ました、ましたら、まして、ます、ますれば、ませば、ましょう

Screenshot 2024-09-01 124617

  1. Make んかった return past as out condition (make sure it's consistent with other かった ending), update name to '-n past'.

Will update some tests to make sure there are no false positives.
Maybe we should have a refactor PR to update the language transform test logic too. Currently it only checks if a transformation is correct, it does not check if a term has any extra wrong transformation. For testing false positive, we have to know it in advance and explicitly add it to the test. This makes it hard to track the effect when we're adding new forms to the language.

@khaitruong922 khaitruong922 marked this pull request as ready for review September 1, 2024 07:04
@khaitruong922 khaitruong922 requested a review from a team as a code owner September 1, 2024 07:04
Copy link

codspeed-hq bot commented Sep 1, 2024

CodSpeed Performance Report

Merging #1385 will not alter performance

Comparing khaitruong922:refactor-masu (c811ea8) with master (cab7793)

Summary

✅ 3 untouched benchmarks

🆕 2 new benchmarks
⁉️ 2 (👁 2) dropped benchmarks

Benchmarks breakdown

Benchmark master khaitruong922:refactor-masu Change
👁 japanese transformations (n=66) 18.5 ms N/A N/A
🆕 japanese transformations (n=70) N/A 20.8 ms N/A
👁 japanese transformations-full (n=181) 40.5 ms N/A N/A
🆕 japanese transformations-full (n=193) N/A 44.3 ms N/A

@jamesmaa
Copy link
Collaborator

jamesmaa commented Sep 1, 2024

cc @Lyroxide for review. If no review in 3 days I'll glance over it

@Kuuuube Kuuuube added kind/bug The issue or PR is regarding a bug area/linguistics The issue or PR is related to linguistics labels Sep 1, 2024
@Lyroxide
Copy link

Lyroxide commented Sep 2, 2024

Any reason for including ませば? It's only used in kobun, where ませば is not "-ba of ます", but the "-ba of 未然形 of まし". I'd wager that ませば doesn't even exist in modern texts. ますれば seems fine though.

ませんかった threw me off at first but it seems legitimate according to 大辞林・日国

Is it possible to remove "-n past" but chain "-ta" with "-n"? Didn't had a chance to review when this was added initially.

@khaitruong922
Copy link
Author

Thanks for review. Let's remove ませば for consistency since we are not going to include the まする form.

@khaitruong922
Copy link
Author

khaitruong922 commented Sep 2, 2024

I also agree with chaining -n -> -ta for `n-past.

ませんかった is the only exception -n past. It's not actually the -n form of verb, but just appending かった after ません, similar to ませんでした.
I think the correct transformation would be masu -> negative -> -ta, instead of -masu -> negative -> -n past (having double negative transformation)

@khaitruong922
Copy link
Author

Remove n past, added -n condition to chain with -ta.

@jamesmaa jamesmaa added this pull request to the merge queue Sep 2, 2024
Merged via the queue into themoeway:master with commit 31e81fe Sep 2, 2024
9 checks passed
@khaitruong922 khaitruong922 deleted the refactor-masu branch September 2, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/linguistics The issue or PR is related to linguistics kind/bug The issue or PR is regarding a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants