Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KRM meta-line conversion #200

Closed
funderburkjim opened this issue Jan 25, 2018 · 19 comments
Closed

KRM meta-line conversion #200

funderburkjim opened this issue Jan 25, 2018 · 19 comments

Comments

@funderburkjim
Copy link
Contributor

This issue for the meta-line conversion of KRM (Kṛdantarūpamālā).

@gasyoun
Copy link
Member

gasyoun commented Jan 26, 2018

@drdhaval2785 is there an automated way of checking the correctness of Kṛdantarūpamālā's forms based from your verb analysis?

@drdhaval2785
Copy link
Contributor

No. There is not. Currently only tiNanta forms are generated. Not kfdanta.

@gasyoun
Copy link
Member

gasyoun commented Jan 26, 2018

Not kfdanta

Bad luck.

@funderburkjim
Copy link
Contributor Author

Footnotes

The conversion is almost complete. The most important change is regarding Footnotes.

The text is organized as a sequence of entries, numbered 1 to 2039 (with a few extra labeld e.g. (41-A));
headwords are roots in dhātu-pāṭha form (i.e., with anubandhas). Each entry consists primarily of a list or table of krdanta forms derived from the root. There are copious footnotes.

To understand the original digitization conventions regarding coding of footnotes and the changes introduced to the new meta-line coding, you need to look at the pages 3 and 4 of the printed text.
The first page has first part of entry for 'aka', then a section of footnotes for the page. The second page
has the remaining part of aka entry (with two more footnotes), and the beginning of second entry for 'aki', which also has some footnotes indicators, then the bottom of the page has footnotes for the page.
Next two comments show scans of these two pages.

@funderburkjim
Copy link
Contributor Author

Page 3 : aka begins

image

@funderburkjim
Copy link
Contributor Author

Page 4: aka finishes, and aki begins

image

@funderburkjim
Copy link
Contributor Author

Original strategy for coding the footnotes

It's difficult to know how to code the footnotes in such a way that the footnotes associated with a particular entry are within the scope of coding of the entry itself.
A naive coding would just code the data line-by-line. But then there would be the problem of associating the first two footnotes of page 4 with aka entry.

So instead, Thomas decided to shoe-horn each entire footnote at the location of its mention.
Here is how the beginning of 'aka' looks, up through the second line of the table (IAST coding),
This is excerpted from the Basic Display before the current conversion:

(1) “aka kuṭilāyāṃ gatau” (ī-bhvādiḥ-792 sakarmakaḥ-seṭ-parasmaipadī) ghaṭādiḥ mit .
‘iditastvaṅkate tatra kuṭilāyāṃ gatāvaket .’ (ślo 41) iti devaḥ .
ṇic- san-
ṇvul ākakaḥ— kikā,
 [Footnote: 1. ‘mitāṃ hrasvaḥ’ (6-4-92) 
iti ṇau upadhāyā hrasvaḥ .] 
akakaḥ— kikā, acikiṣa
 [Footnote: 1A ‘ajāderdvitīyasya’ (6-1-2) iti dvitīya- 
syaikācaḥ dvitvam . ‘kuhoścuḥ’ 
(7-4-62) ityabhyāsasya cutvam .] 
kaḥ— ṣikā;
tṛc (tṛn) akitā-trī, akayitā-trī, acikiṣitā-trī;

While the problem of footnote attachment is clearly solved by this coding, the resulting display
grossly distorts the reading of the table of krdantas.

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Jan 29, 2018

Current strategy

The main idea of the current strategy of coding is to place a footnote marker within the table, and then to collect the corresponding footnotes for the entry at the bottom of the entry.
The next comment shows how the total entry for aka looks (snapshot from mobile1 display).

@funderburkjim
Copy link
Contributor Author

Current display of aka

image

@funderburkjim
Copy link
Contributor Author

There are a few more comments that need to be made. I'll get to them tomorrow.

@gasyoun
Copy link
Member

gasyoun commented Jan 29, 2018

problem of footnote attachment is clearly solved by this coding, the resulting display
grossly distorts the reading of the table of krdantas

Exactly. I'll be off till 24nd February, do not loose me, heading Poona.

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Jan 29, 2018

Although the changes in footnote coding definitely improve the display of the tabular data within this work, there remain several weaknesses; here are a couple that catch my eye.

multiline tabular entries.

The last entry in the aka table illustrates this phenomenon:
image
image
The underlying digitization uses a tag <note n=""/>` to identify this as a problem area; This is quite common - occurring 700+ times.

table headings vs. data

In the cases of aka, the table has both columnar labels (ṇic- san- ), and row labels (ṇvul , tṛc , etc.)
Additional markup is required to distinguish these grammatical labels from the kridanta entries.
Such markup would make it possible to develop a search facility whereby a user could determine
that , for instance, AkaH is a kridanta of 'aka'.

The aki entry does not similarly show such labels; perhaps the labels are implicit, or perhaps there is
some other organizing principle -- situation is unclear to me.

Line breaks

Line breaks are significant in many parts of the text (such as to indicate table rows in aka, aki).
In cases where a footnote is the first element in a line, the original footnote coding obscures the
fact that a line-break precedes the footnote marker. This happens, for instance, at footnote marker '9'
in third entry akṣū. This error can be corrected (by inserting a <div n="lb"> tag prior to the
footnote marker <sup>9</sup>).

position of footnote markers within words

The footnote marker occasionally occurs within a kridanta. For instance under 'aka'
image
This positioning, although consistent with the printed text, obscures the full spelling of the kridanta.
My inclination would be to move such footnote markers to the end or beginning of words.

Additional markup

There is a wealth of information in this text; to expose this information to programmatic manipulation will require the efforts of some team with
(a) sufficient technical knowledge of Sanskrit grammar to know how to interpret the details of the text
(b) sufficient technical knowledge of markup principles to be able to devise a markup scheme that captures the grammatical information.

These brief observations may provide some hints when further work on this 'dictionary' is undertaken.

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Jan 29, 2018

Other aspects of the conversion

Headwords uncovered

There are 2061 entries in krm after this work. About 20 of these were previously missed as separate entries due to a variation in the coding.

Correction sections

There are two correction sections in the full krm.txt digitization; these are separate from the entries exposed by the Cologne displays. They are identifIed by text '; BEGIN CORRECTIONS 1' and
'; BEGIN CORRECTIONS 2`. These sections occur at pages 1143 and 1427.
Here is beginning of second correction section:

<H><s>SoDanikA</s>
<NI><s>puwam paNktiH aSudDam SudDam</s>
<>501 17 <s>cAyakA cAyakaH</s>   <<< first example
  • 501 = page number (how 'page' relates to 'puwam' I don't know).
  • 17 = approximate line number on page
  • cAyakA = the error
  • cAyakaH = the correction

image

It would be a fairly straightforward task to implement these corrections. There are approximately 80 corrections in each of the two sections, or 160 corrections in all. Maybe someone can volunteer to do this.

@funderburkjim
Copy link
Contributor Author

off till 24nd February, do not loose me, heading Poona.

Will miss your comments.

If you talk to the PD team at Poona, maybe you can ask if they'll give permission for Cologne to
display our digitization of their dictionary. This would be a way for there to be a much wider audience
for their monumental work.

@funderburkjim
Copy link
Contributor Author

Specialization of display

The main disp.php program used in the Cologne displays for krm was adapted from the pwg version.
A few alterations were required for:

  • note tag --- this tag is peculiar to krm. It is not displayed; signficance desribed above
  • sup tag --- This has been used in some other dictionary; specialized here to show the footnote marker
    in bold.
  • <div n="F"> identifies beginning of a Footnote text. inserts Footnote for readability.
  • <Poem> tag -- occurs twice. Functionally, just treated as line break for first line of poem. Occurs
    under headwords 'kadi' and ziY (slp1 spelling).

@funderburkjim
Copy link
Contributor Author

IAST conversion

This was quite simple for krm. In fact, the only IAST text appears in the appendices. The body of the
text is all Devanagari and English.

@funderburkjim
Copy link
Contributor Author

The krm conversion task is now completed and the results installed.

@gasyoun
Copy link
Member

gasyoun commented Jan 31, 2018

Such markup would make it possible to develop a search facility whereby a user could determine
that , for instance, AkaH is a kridanta of 'aka'.

Yeah, without it the scan is rather useless.

The aki entry does not similarly show such labels; perhaps the labels are implicit, or perhaps there is
some other organizing principle -- situation is unclear to me.

Let's call for @Shalu411 .

My inclination would be to move such footnote markers to the end or beginning of words.

Makes sense. But is it not too big a task not worth the result?

Maybe someone can volunteer to do this.

If only @SergeA is around.

maybe you can ask if they'll give permission for Cologne to
display our digitization of their dictionary.

Let me try.

@gasyoun
Copy link
Member

gasyoun commented Feb 15, 2019

Currently only tiNanta forms are generated. Not kfdanta.

https://gitlab.inria.fr/huet/Heritage_Resources/ subdirectory XML as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants