Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stem_model m_a #4

Open
funderburkjim opened this issue Oct 18, 2018 · 12 comments
Open

stem_model m_a #4

funderburkjim opened this issue Oct 18, 2018 · 12 comments

Comments

@funderburkjim
Copy link
Contributor

masculine nouns ending in 'a'

We derive this list from the lexnorm-all2 list by the simple filter
a) key1 ends in short vowel 'a'
b) lexnorm is precisely 'm'.

This excludes many adjectives and other nominals ending in 'a', since these will have more complex normalized lexnorm values, such as 'm:f:n', 'm:f#ikA':n'.

There are 49344 of these simple masculine nouns in 'a'. Their information is put into file:
inputs/nominals/m_a.txt. For example, the two inputs from lexnorm-all2 are merged into one
input in m_a.txt:

579	akzara	a-kzara	m
592.1	akzara	a-kzara	m

becomes:
m_a	a-kzara	579,akzara:592.1,akzara

@funderburkjim
Copy link
Contributor Author

decline_file program

The decline_file program generates declensions based upon the model and stem (first two fields) of
records in m_a.txt (or one of the other files of inputs/nominals/ directory).
The output is written to a file in outputs/nominals/ directory under the same file name; e.g., in this
case to outputs/nominals/m_a.txt.

The format of the output files generated by decline_file is a sequence of lines, each with 3 tab-delimited
fields:

  • model (copied from input file)
  • stem (copied from input file - same as key2 in mw.txt)
  • inflection The declension table for this model and stem

format of declension table

The declension table is represented as a string with 24
parts (separated by colon), representing the singular, dual, plural of 8 cases. Symbolically
1s:1d:1p:2s:2d:2p:3s:3d:3p:4s:4d:4p:5s:5d:5p:6s:6d:6p:7s:7d:7p:8s:8d:8p. The common English
names for the 8 cases are 1 = Nominative, 2 = Accusative, 3 = Instrumental, 4 = Dative,
5 = Ablative, 6 = Genitive, 7 = Locative, 8 = Vocative.

  • missing values are represented by empty strings (such as vocative for personal pronouns)
  • Sometimes, one or more of the 24 declension cells will have alternate values; these are
    represented in csv form with a forward-slash ('/') as the separator.

@funderburkjim
Copy link
Contributor Author

example of declension table

For the line m_a kUpa 53937,kUpa, the output line is
m_a kUpa kUpaH:kUpO:kUpAH:kUpam:kUpO:kUpAn:kUpena:kUpAByAm:kUpEH:kUpAya:kUpAByAm:kUpeByaH:kUpAt:kUpAByAm:kUpeByaH:kUpasya:kUpayoH:kUpAnAm:kUpe:kUpayoH:kUpezu:kUpa:kUpO:kUpAH

It is easier to compare the declension table when it is formatted as a table:

Case S D P
Nominative kUpaH kUpO kUpAH
Accusative kUpam kUpO kUpAn
Instrumental kUpena kUpAByAm kUpEH
Dative kUpAya kUpAByAm kUpeByaH
Ablative kUpAt kUpAByAm kUpeByaH
Genitive kUpasya kUpayoH kUpAnAm
Locative kUpe kUpayoH kUpezu
Vocative kUpa kUpO kUpAH

This agrees with Deshpande, p. 35.

@funderburkjim
Copy link
Contributor Author

Declension of rAma

The declension of rAma with model m_a is:

Case S D P
Nominative rAmaH rAmO rAmAH
Accusative rAmam rAmO rAmAn
Instrumental rAmeRa rAmAByAm rAmEH
Dative rAmAya rAmAByAm rAmeByaH
Ablative rAmAt rAmAByAm rAmeByaH
Genitive rAmasya rAmayoH rAmARAm
Locative rAme rAmayoH rAmezu
Vocative rAma rAmO rAmAH

This agrees with Kale, Section 61, p. 35

@funderburkjim
Copy link
Contributor Author

decline_checks.txt

This file shows declension tables checked against various sources, such as the two shown above.
As it progresses, this can be used as a reference when algorithmic differences are introduced.

If others find the need, I could develop a web application to show these declensions with choice of
the user's model and key2. I'll probably do this eventually, once the algorithms are stable. One
feature would be to allow the user's choice of how to represent Sanskrit. Since the internals of
the algorithms use the SLP1 spelling of Sanskrit words, it is easiest to show outputs, such as the tables
above, also in SLP1.

@gasyoun
Copy link
Member

gasyoun commented Oct 19, 2018

easiest to show outputs, such as the tables
above, also in SLP1.

If our readers are bots - it will work best.

@drdhaval2785
Copy link

https://github.com/sanskrit-coders/indic_transliteration python package seems to support transliteration to and from SLP1 very well. So let us use it, if possible. So that custom transliteration code can be kept to bare minimum.

@gasyoun
Copy link
Member

gasyoun commented Oct 21, 2018

So that custom transliteration code can be kept to bare minimum.

Exactly. Not even @SergeA can read it well, what to speak of other humans...

@funderburkjim
Copy link
Contributor Author

Are you suggesting that all the inputs/outputs that I'm creating should be duplicated, so that there are
not only slp1 spellings but also IAST spellings?

@funderburkjim
Copy link
Contributor Author

re the indic_transliteration package.

Does this package support accents?

@drdhaval2785
Copy link

Are you suggesting that all the inputs/outputs that I'm creating should be duplicated, so that there are
not only slp1 spellings but also IAST spellings?

No. Internals can remain SLP1. Just suggesting the repository so that you can generate output to be displayed to examiner in different encodings of his choice or can take input in different encodings.

@drdhaval2785
Copy link

Does this package support accents?

I guess no. Any specific requirements for accents @funderburkjim ?

@gasyoun
Copy link
Member

gasyoun commented Nov 13, 2018

Internals can remain SLP1

Right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants