Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verbs01 #1

Open
funderburkjim opened this issue Apr 12, 2020 · 5 comments
Open

verbs01 #1

funderburkjim opened this issue Apr 12, 2020 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@funderburkjim
Copy link
Contributor

The verbs01 directory aims

  • to identify the entries in the Cappeller Sanskrit-English dictionary which are verbs, and
  • to provide a correspondence between the headwords of these entries and verb entries of the Monier-Williams dictionary.
  • to identify the verb entries which further have upasargas, and to
    provide a correspondence between these and the prefixed verb entries of MW.

The comments here will focus on the cae_preverb1 report.
cae_preverb1_deva is a Devanagari version of the report.

Currently, 1078 of the 40067 entries of Cappeller are identifed as verbs.
555 of these verbs have upasargas, and a total of 3354 upasargas are identified.

@funderburkjim funderburkjim added the documentation Improvements or additions to documentation label Apr 12, 2020
@funderburkjim
Copy link
Contributor Author

The report is organized according to the CAE entries identified as verbs; each such entry is considered a 'case':

;; Case 0001: L=21, k1=aMh, k2=aMh, code=V, #upasargas=0, mw=aMh (same)

This record provides

  • L = the Cologne ID
  • k1 = the primary headword,
  • k2 = the full headword (usually same as k1)
  • a code, here always V
  • the number of upasargas identified within the cae entry
  • the MW headword believed to correspond to this entry
    • There are 6 cases (mw=?) where no correspondence currently identified.
  • a 'flag' comparing k1 to mw:
    • (same) means the cae headword spelling is the same as the spelling of the MW entry believed
      to correspond to the cae entry (705 cases)
    • (diff) means the k1 and mw spellings differ.(367 cases)

@funderburkjim
Copy link
Contributor Author

preverb

When there are upasargas for a CAE entry, these are grouped below the case.
Consider the verb 'an' (to breathe):

;; Case 0015: L=989, k1=an, k2=an, code=V, #upasargas=6 (5/1), mw=an (same)
01        apa         an                 apAn                 apAn yes apa+an
02         ud         an                 udan                 udan yes ud+an
03        pra         an                 prAn                 prAn yes pra+an
04         vi         an                 vyan                 vyan yes vi+an
05        sam         an                saman                saman yes sam+an
06     anusam         an             anusaman             anusaman no 

There are six upasargas found; 5 have been matched to MW prefixed verbs and one (anusaman)
has not been matched (that is, CAE has 'anusam' as upasarga for 'an', but MW does not.)

The listing for upasargas shows:

  • xx a sequence number for the upasargas for the verb
  • the upasarga
  • the verb
  • a likely spelling of the prefixed verb obtained by joining the upasarga with k1
  • a likely spelling of the prefixed verb obtained by joining the upasarga with the mw root spelling
    (here the mw spelling is same as k1).
  • yes/no indicating whether the prefixed verb is found as an entry in MW dictionary
  • When the prefixed verb is in MW, then a parsing is given of the mw prefixed verb spelling.

Currently, 3099 of the upasargas are identified with MW prefixed verb entries (search ' yes')
and 255 are not identified with MW prefixed verb entries (search ' no').

@funderburkjim
Copy link
Contributor Author

identification of verbs and upasargas

This work used existing cae markup to identify verbs and upasargas:

  • verbs: <vlex type="root"/>
  • upasargas: one of two similar patterns:
    • <div n="p">— *{#UPASARGA#}
    • <div n="p"> *{#UPASARGA#}

Sometimes the UPASARGA field, so defined has several upasargas, so specialized logic was
used here.

@funderburkjim
Copy link
Contributor Author

Source of verb/upasarga markup

The two markups shown above were introduced during the 'meta-line conversion' of cae done
in 2017. See

  • a comment
  • The code on Cologne server is at CAEScan/2014/pywork/correctionwork/cologne-issue-191/

The program that added the <vlex type="root"/> markup is:
python extra1.py temp_caewithmeta0.txt temp_caewithmeta1.txt

For example, this program changed:

<L>21<pc>001<k1>aMh<k2>aMh
{#aMh#}¦ {#aMhate#} ·v walk. C. {#aMhayati#} send.
<LEND>

to

<L>21<pc>001<k1>aMh<k2>aMh
{#aMh#}¦ {#aMhate#} <vlex type="root"/> walk. <ab>C.</ab> {#aMhayati#} send.
<LEND>

The original ·v was markup added by Thomas Malten. Thanks, Thomas!

As for the upasarga markup, this occurred first in file 'temp_caewithmeta5.txt`. The '—' character
identified the first upasarga, and the rest were done by a combination of programmatic
markup (make_div_change.py) and manual adjustment.

@gasyoun
Copy link
Member

gasyoun commented Apr 13, 2020

I can not stress enough, that I was waiting for this dhatu review for 7 years. So even if I stay silent, it's because I'm speachless.

1078 of the 40067 entries of Cappeller are identifed as verbs.

You meant dhatus, I guess. Interesting to see % of dhatus and sopasarga dhatus in each dictionary.

555 of these verbs have upasargas, and a total of 3354 upasargas are identified.

Sopasarga dhatus, so it's upasarga's and upasarga combinations, guess.

There are 6 cases (mw=?) where no correspondence currently identified.

As Cappeller is based on PWK, and not MW, I would need to look up in the German books first.

255 are not identified with MW prefixed verb entries (search ' no').

As MW contains all the possible combinations, it must be a question of different orthography.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants