Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BHS verbforms #260

Open
funderburkjim opened this issue Mar 21, 2016 · 7 comments
Open

BHS verbforms #260

funderburkjim opened this issue Mar 21, 2016 · 7 comments

Comments

@funderburkjim
Copy link
Contributor

A study was made of the headwords of the BHS dictionary to identify verbs and verbforms.

This was motivated by the whitelisting work being done as mentioned in #254.

It was noticed that many of the otherwise unidentified headwords occurring only in the BHS dictionary (and in no other dictionary as a headword) were verb forms (such as third person singular of some conjugation of the verb). In connection with the whitelisting, it is felt that the spelling correctness of these verb forms should take into account the fact that they are inflected forms.

Of course, there is independent interest in lists of verbs, unrelated to the whitelisting objective.

The program and results are in the dictionaries/BHS/verbs directory of this repository.

A brief description of the files in this directory.

  • readme.md describes slightly more fully the methodology.
  • verbs1.txt and verbforms1.txt list the headwords so classified
    • Note: For BHS, verbs1.txt is empty, meaning that no root forms are found. All the 1200+ verb forms appear as 3rd singular forms of verbs, and are in verbforms1.txt.
  • verbs1.md and verbforms1.md provide supporting data from the digitization, in markdown form
  • verbs1.org and verbforms1.org provide the same data in Emacs org mode format.
@gasyoun
Copy link
Member

gasyoun commented Mar 21, 2016

Let me tell you I've made my own research 2 years ago. BHS similarly as EWA and KEWA quotes roots as -ti and -te forms. I'm no fan of it, but still.

  • Identification pattern is that the headword ends in 'ate' or 'ati' - pattern is to narrow
  • 1266 headwords vs 1741 in Gasuns' list, patterns listed bellow

06.01.2014.
I need to cut of the endings of verbs in BHS. Verbs are quoted in 3rd forms in BHS, so they are easy to locate.
To find them, I used http://www.sanskrit-lexicon.uni-koeln.de/scans/BHSScan/2013/web/webtc2/index.php
suffix "ti", Maximum "all" - 1564, after that I repeated with suffix "te", Maximum "all" and copypasted the results to
a .txt file
I looked in book, http://yadi.sk/d/xmPTu3LLFoDZ6 and I find verb E kilikīl-, same verb I find at http://www.sanskrit-lexicon.uni-koeln.de/scans/BHSScan/2013/web/webtc2/index.php [L=4889] [p= 184,1] kilikīlate, makes a loud noise (of Māra's army). So Schwarz, the author of the 1978 printed edition of a Sanskrit reverse dictionary cut off "kilikīlate" to "kilikīl-". To do so, I need patterns. After that - manual approval.

http://yadi.sk/d/S2LkEfxDFZtqo "-te"
http://yadi.sk/d/8mvZJEe3FZucb "-ti"

Most of Verbs in Schwarz's list marked as E (=BHS) are sopasarga roots. We do not change that. We do not cut upasargas off. We leave them as they are.

False positives (have to be cleaned out manually before rules apply):
Dharmadhātvarcivairocanasaṃbhavamati, n. of a Bodhisattva
Dharmadhātunayajñānagati, n. of a Buddha
Akṣayamati, n. of a Bodhisattva
Acalamati, n. of a son of Māra (favorable to the Bodhisattva)
Ati, read Atri, n. of a Prajāpati

Cutting Rules:
atiprathate -> ate
anuparigṛhṇīte -> te
svādīyati -> ati
sameti -> ti
vasubhūti -> ti
paryavāpnoti -> noti

@drdhaval2785
Copy link
Contributor

Gana and the terminals
BvAdi - ati / ate
adAdi - ti / te
juhotyAdi - ti/te (with duplication of verb)
divAdi - yati / yate
svAdi - noti / nute
tudAdi - ati/ate
ruDAdi - ti/te/Di/De (with 'na' added in between)
tanAdi - oti/ute
kryAdi - nAti/nIte/RAti/RIte
curAdi - ayati/Ayati (with sometimes a->A,[iI]->e,[uU]->o conversion in verb)

These are the major chopping rules.

@gasyoun
Copy link
Member

gasyoun commented Mar 22, 2016

Thanks, @drdhaval2785 - I guess it will kill some of the false positive ones. If we check before apply ti / te rule. No?!

@funderburkjim
Copy link
Contributor Author

@gasyoun False positives (have to be cleaned out manually before rules apply):

In the list I made, these false positives have been pretty thoroughly weeded out.

Is the objective of your 'chopping' to know, for example, that 'anucalati' in BHS would correspond to
'anucal' in MW (if MW had this prefixed form of root cal) ?

If so, this should be doable by a program that (a) removes the prefixes (eg removes 'anu' from
anucalati) and then (b) looks up 'calati' in a table of conjugations (which we have, from various sources,
me, Huet, probably Dhaval.) to discover that 'calati' is 3s of 'cal'.

Is this the kind of analysis you are interested in?

@gasyoun
Copy link
Member

gasyoun commented Mar 23, 2016

Is the objective of your 'chopping' to know, for example, that 'anucalati' in BHS would correspond to
'anucal' in MW (if MW had this prefixed form of root cal) ?

As well, right.

Yes, I'm interested in such analysis. For it maybe even cutting of of upasargas and upasarga combinations would not be needed. Because MW has all the upasargas in it as part of the word. What would be interesting is would be to generate the list of PWG verbs with upasargas - because PWG's nest style makes it impossible to know how many forms are there actually related to verbs.

Is there a list of your false positives?

@funderburkjim
Copy link
Contributor Author

Is there a list of your false positives?

The program (verbs1.py) generating the list is fairly simple. The false positives occur in two ways:

  • By searching for a string n. of (Name of X) occurring in the first line of the definition in bhs.txt.
    As currently written, these are not listed. Someone could modify the program to print these.
  • By a hard-coded list (see nonverbs=['ajitAvati', ..... in the program). These cases were excluded
    by hand, after examining the definitions. There are about 60 of these.

The program also excludes any headword that does NOT end in 'ati' or 'ate'. If there are verbs in BHS that end in some other way (which I doubt), then these would be silently excluded.

@gasyoun
Copy link
Member

gasyoun commented Mar 24, 2016

If there are verbs in BHS that end in some other way (which I doubt), then these would be silently excluded.

None I guess. Thanks for the detailed as usual comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants