KRM meta-line conversion #200

funderburkjim · 2018-01-25T01:18:31Z

This issue for the meta-line conversion of KRM (Kṛdantarūpamālā).

gasyoun · 2018-01-26T11:56:15Z

@drdhaval2785 is there an automated way of checking the correctness of Kṛdantarūpamālā's forms based from your verb analysis?

drdhaval2785 · 2018-01-26T15:52:07Z

No. There is not. Currently only tiNanta forms are generated. Not kfdanta.

gasyoun · 2018-01-26T19:17:25Z

Not kfdanta

Bad luck.

funderburkjim · 2018-01-29T03:12:27Z

Footnotes

The conversion is almost complete. The most important change is regarding Footnotes.

The text is organized as a sequence of entries, numbered 1 to 2039 (with a few extra labeld e.g. (41-A));
headwords are roots in dhātu-pāṭha form (i.e., with anubandhas). Each entry consists primarily of a list or table of krdanta forms derived from the root. There are copious footnotes.

To understand the original digitization conventions regarding coding of footnotes and the changes introduced to the new meta-line coding, you need to look at the pages 3 and 4 of the printed text.
The first page has first part of entry for 'aka', then a section of footnotes for the page. The second page
has the remaining part of aka entry (with two more footnotes), and the beginning of second entry for 'aki', which also has some footnotes indicators, then the bottom of the page has footnotes for the page.
Next two comments show scans of these two pages.

funderburkjim · 2018-01-29T03:13:46Z

Page 3 : aka begins

funderburkjim · 2018-01-29T03:15:28Z

Page 4: aka finishes, and aki begins

funderburkjim · 2018-01-29T03:29:50Z

Original strategy for coding the footnotes

It's difficult to know how to code the footnotes in such a way that the footnotes associated with a particular entry are within the scope of coding of the entry itself.
A naive coding would just code the data line-by-line. But then there would be the problem of associating the first two footnotes of page 4 with aka entry.

So instead, Thomas decided to shoe-horn each entire footnote at the location of its mention.
Here is how the beginning of 'aka' looks, up through the second line of the table (IAST coding),
This is excerpted from the Basic Display before the current conversion:

(1) “aka kuṭilāyāṃ gatau” (ī-bhvādiḥ-792 sakarmakaḥ-seṭ-parasmaipadī) ghaṭādiḥ mit .
‘iditastvaṅkate tatra kuṭilāyāṃ gatāvaket .’ (ślo 41) iti devaḥ .
ṇic- san-
ṇvul ākakaḥ— kikā,
 [Footnote: 1. ‘mitāṃ hrasvaḥ’ (6-4-92) 
iti ṇau upadhāyā hrasvaḥ .] 
akakaḥ— kikā, acikiṣa
 [Footnote: 1A ‘ajāderdvitīyasya’ (6-1-2) iti dvitīya- 
syaikācaḥ dvitvam . ‘kuhoścuḥ’ 
(7-4-62) ityabhyāsasya cutvam .] 
kaḥ— ṣikā;
tṛc (tṛn) akitā-trī, akayitā-trī, acikiṣitā-trī;

While the problem of footnote attachment is clearly solved by this coding, the resulting display
grossly distorts the reading of the table of krdantas.

funderburkjim · 2018-01-29T03:34:38Z

Current strategy

The main idea of the current strategy of coding is to place a footnote marker within the table, and then to collect the corresponding footnotes for the entry at the bottom of the entry.
The next comment shows how the total entry for aka looks (snapshot from mobile1 display).

funderburkjim · 2018-01-29T03:35:59Z

Current display of aka

funderburkjim · 2018-01-29T03:41:41Z

There are a few more comments that need to be made. I'll get to them tomorrow.

gasyoun · 2018-01-29T19:09:29Z

problem of footnote attachment is clearly solved by this coding, the resulting display
grossly distorts the reading of the table of krdantas

Exactly. I'll be off till 24nd February, do not loose me, heading Poona.

funderburkjim · 2018-01-29T21:02:33Z

Although the changes in footnote coding definitely improve the display of the tabular data within this work, there remain several weaknesses; here are a couple that catch my eye.

multiline tabular entries.

The last entry in the aka table illustrates this phenomenon:

The underlying digitization uses a tag <note n=""/>` to identify this as a problem area; This is quite common - occurring 700+ times.

table headings vs. data

In the cases of aka, the table has both columnar labels (ṇic- san- ), and row labels (ṇvul , tṛc , etc.)
Additional markup is required to distinguish these grammatical labels from the kridanta entries.
Such markup would make it possible to develop a search facility whereby a user could determine
that , for instance, AkaH is a kridanta of 'aka'.

The aki entry does not similarly show such labels; perhaps the labels are implicit, or perhaps there is
some other organizing principle -- situation is unclear to me.

Line breaks

Line breaks are significant in many parts of the text (such as to indicate table rows in aka, aki).
In cases where a footnote is the first element in a line, the original footnote coding obscures the
fact that a line-break precedes the footnote marker. This happens, for instance, at footnote marker '9'
in third entry akṣū. This error can be corrected (by inserting a <div n="lb"> tag prior to the
footnote marker <sup>9</sup>).

position of footnote markers within words

The footnote marker occasionally occurs within a kridanta. For instance under 'aka'

This positioning, although consistent with the printed text, obscures the full spelling of the kridanta.
My inclination would be to move such footnote markers to the end or beginning of words.

Additional markup

There is a wealth of information in this text; to expose this information to programmatic manipulation will require the efforts of some team with
(a) sufficient technical knowledge of Sanskrit grammar to know how to interpret the details of the text
(b) sufficient technical knowledge of markup principles to be able to devise a markup scheme that captures the grammatical information.

These brief observations may provide some hints when further work on this 'dictionary' is undertaken.

funderburkjim · 2018-01-29T21:16:44Z

Other aspects of the conversion

Headwords uncovered

There are 2061 entries in krm after this work. About 20 of these were previously missed as separate entries due to a variation in the coding.

Correction sections

There are two correction sections in the full krm.txt digitization; these are separate from the entries exposed by the Cologne displays. They are identifIed by text '; BEGIN CORRECTIONS 1' and
'; BEGIN CORRECTIONS 2`. These sections occur at pages 1143 and 1427.
Here is beginning of second correction section:

<H><s>SoDanikA</s>
<NI><s>puwam paNktiH aSudDam SudDam</s>
<>501 17 <s>cAyakA cAyakaH</s>   <<< first example

501 = page number (how 'page' relates to 'puwam' I don't know).
17 = approximate line number on page
cAyakA = the error
cAyakaH = the correction

It would be a fairly straightforward task to implement these corrections. There are approximately 80 corrections in each of the two sections, or 160 corrections in all. Maybe someone can volunteer to do this.

funderburkjim · 2018-01-29T21:28:54Z

off till 24nd February, do not loose me, heading Poona.

Will miss your comments.

If you talk to the PD team at Poona, maybe you can ask if they'll give permission for Cologne to
display our digitization of their dictionary. This would be a way for there to be a much wider audience
for their monumental work.

funderburkjim · 2018-01-29T21:39:09Z

Specialization of display

The main disp.php program used in the Cologne displays for krm was adapted from the pwg version.
A few alterations were required for:

note tag --- this tag is peculiar to krm. It is not displayed; signficance desribed above
sup tag --- This has been used in some other dictionary; specialized here to show the footnote marker
in bold.
<div n="F"> identifies beginning of a Footnote text. inserts Footnote for readability.
<Poem> tag -- occurs twice. Functionally, just treated as line break for first line of poem. Occurs
under headwords 'kadi' and ziY (slp1 spelling).

funderburkjim · 2018-01-29T21:50:07Z

IAST conversion

This was quite simple for krm. In fact, the only IAST text appears in the appendices. The body of the
text is all Devanagari and English.

funderburkjim · 2018-01-29T21:50:45Z

The krm conversion task is now completed and the results installed.

gasyoun · 2018-01-31T04:51:48Z

Such markup would make it possible to develop a search facility whereby a user could determine
that , for instance, AkaH is a kridanta of 'aka'.

Yeah, without it the scan is rather useless.

The aki entry does not similarly show such labels; perhaps the labels are implicit, or perhaps there is
some other organizing principle -- situation is unclear to me.

Let's call for @Shalu411 .

My inclination would be to move such footnote markers to the end or beginning of words.

Makes sense. But is it not too big a task not worth the result?

Maybe someone can volunteer to do this.

If only @SergeA is around.

maybe you can ask if they'll give permission for Cologne to
display our digitization of their dictionary.

Let me try.

gasyoun · 2019-02-15T07:28:41Z

Currently only tiNanta forms are generated. Not kfdanta.

https://gitlab.inria.fr/huet/Heritage_Resources/ subdirectory XML as well?

funderburkjim closed this as completed Jan 29, 2018

drdhaval2785 mentioned this issue Feb 2, 2018

meta-line, IAST conversion tracker #177

Closed

funderburkjim mentioned this issue Feb 2, 2018

BOP meta-line/iast conversion #202

Closed

funderburkjim mentioned this issue Feb 11, 2018

INM meta/iast conversion #206

Closed

funderburkjim mentioned this issue Feb 15, 2019

KRM footnote markup #253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KRM meta-line conversion #200

KRM meta-line conversion #200

funderburkjim commented Jan 25, 2018

gasyoun commented Jan 26, 2018

drdhaval2785 commented Jan 26, 2018

gasyoun commented Jan 26, 2018

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018 •

edited

Loading

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018

gasyoun commented Jan 29, 2018

funderburkjim commented Jan 29, 2018 •

edited

Loading

funderburkjim commented Jan 29, 2018 •

edited

Loading

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018

gasyoun commented Jan 31, 2018

gasyoun commented Feb 15, 2019

KRM meta-line conversion #200

KRM meta-line conversion #200

Comments

funderburkjim commented Jan 25, 2018

gasyoun commented Jan 26, 2018

drdhaval2785 commented Jan 26, 2018

gasyoun commented Jan 26, 2018

funderburkjim commented Jan 29, 2018

Footnotes

funderburkjim commented Jan 29, 2018

Page 3 : aka begins

funderburkjim commented Jan 29, 2018

Page 4: aka finishes, and aki begins

funderburkjim commented Jan 29, 2018

Original strategy for coding the footnotes

funderburkjim commented Jan 29, 2018 • edited Loading

Current strategy

funderburkjim commented Jan 29, 2018

Current display of aka

funderburkjim commented Jan 29, 2018

gasyoun commented Jan 29, 2018

funderburkjim commented Jan 29, 2018 • edited Loading

multiline tabular entries.

table headings vs. data

Line breaks

position of footnote markers within words

Additional markup

funderburkjim commented Jan 29, 2018 • edited Loading

Other aspects of the conversion

Headwords uncovered

Correction sections

funderburkjim commented Jan 29, 2018

funderburkjim commented Jan 29, 2018

Specialization of display

funderburkjim commented Jan 29, 2018

IAST conversion

funderburkjim commented Jan 29, 2018

gasyoun commented Jan 31, 2018

gasyoun commented Feb 15, 2019

funderburkjim commented Jan 29, 2018 •

edited

Loading

funderburkjim commented Jan 29, 2018 •

edited

Loading

funderburkjim commented Jan 29, 2018 •

edited

Loading