Printed Book Categories Lost in OCR? #1

gasyoun · 2014-09-26T07:51:20Z

To solve sanskrit-lexicon/MWS#12 I need help from abroad.
Are the MCI categories from the book kept in the OCR metadata as well?
akarkara 001-a should belong to "1.1. Names of Serpents"
agastyasya 507-a+ 36 should belong to "1.5A Names of Villages"
2387 does not tell nothing, but does 788-a+ 39 tell us enough? A stands for vol. 1? +39 stands for what?
If the data is lost, I can write out the L numbers that belong to each of MCI printed book categories, if we can fit the data in the metadata. Otherwise how can I get a list of all the snakes or of all sages?

Andhrabharati · 2021-09-26T09:04:43Z

Are the MCI categories from the book kept in the OCR metadata as well?

They are, @gasyoun!

Here are the lines from mci.txt

Line 719: `<S>`1.1 Names of Serpents, Birds, Animals etc.
Line 9164: `<S>`1.2 Names of Missiles, Weapons, Bows etc.
Line 15770: `<S>`1.3 Names of Literary Works, Parts of Works etc.
Line 21514: `<S>`1.4 Names of Divisions of Time, Planets, Nakṣatras etc.
Line 26411: `<S>`1.5 Names of Tīrthas, Rivers, Mountains, Forests etc.
Line 46750: `<S>`1.5A Names of Āśramas, Villages, Cities etc.
Line 55376: `<S>`1.6 Names of Countries, Peoples, Islands etc.
Line 83793: `<S>`1.7 Miscellaneous Names

But they are just remaining as section (?) names <S> in the file; probably could be added as some tag to the entries under those sections.

gasyoun · 2021-09-26T20:57:49Z

But they are just remaining as section (?) names

So they are and are not at the same time @Andhrabharati
@funderburkjim how do you like the some tag to the entries under those sections idea?

funderburkjim · 2021-09-27T01:51:56Z

The entries in section 1.1 go from
<L>1<pc>001-a<k1>akarkara to <L>439<pc>085-b<k1>hrAda

Then there is a section <AC>ADDENDA AND CORRIGENDA TO SECTION 1.1 (pp. 1-85)
(these lines do not contribute to the display, as they are not in <L>...<LEND>

Then there is Section 1.2, with entries
<L>440<pc>090-a<k1>akzisaMtarjana<k2> to <L>628<pc>158-b<k1>hala
Then corrections for section 1.2,

Probably etc., etc.

I guess the suggestion is that for each entry within a section, we add some markup indicating
the section in which the entry appears.
For instance, we could add an info tag such as <info section="1.1"/> to each of the
headwords in section 1.1.

One use of such a tag could be to add some text to the display of each entry.
We would need to decide the exact format in displays of such 'section' text.

gasyoun · 2021-09-27T09:45:19Z

I guess the suggestion is that for each entry within a section, we add some markup indicating
the section in which the entry appears.

Exactly.

We would need to decide the exact format in displays of such 'section' text.

As in the original book - in bold and above all?

funderburkjim · 2021-09-27T16:12:09Z

Here is one format in basic display at local installation.

Here is a second format:

I slightly favor second format.

What do others think.
Is one of these what we should use?

Should we try another format ?

Andhrabharati · 2021-09-27T16:23:02Z

The second one looks good enough; but adding the section number 1. probably makes it "complete".
(1. Names of Serpents, Birds, Animals etc.)

And similar approach could be applied to "the numbered appendices" in few works (like IEG, AP90, AP etc.) as well.

funderburkjim · 2021-09-27T16:31:21Z

Here's another option, per suggestion

funderburkjim · 2021-09-27T16:32:22Z

We could also use similar approach for 'VN' (Corrections) in various dictionaries.

Andhrabharati · 2021-09-27T16:46:19Z

Just like to give reference to what we had in a work at our site-

One need not know the Telugu script to see what is done, as highlighted in the image. Section numbers and sub-section numbers (where exisitng) are all indicated properly.

gasyoun added the question label Sep 26, 2014

gasyoun assigned funderburkjim Sep 26, 2014

drdhaval2785 mentioned this issue Dec 20, 2020

todo list in 2021 (in descending order of importance) sanskrit-lexicon/COLOGNE#325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Printed Book Categories Lost in OCR? #1

Printed Book Categories Lost in OCR? #1

gasyoun commented Sep 26, 2014

Andhrabharati commented Sep 26, 2021

gasyoun commented Sep 26, 2021

funderburkjim commented Sep 27, 2021

gasyoun commented Sep 27, 2021

funderburkjim commented Sep 27, 2021

Andhrabharati commented Sep 27, 2021

funderburkjim commented Sep 27, 2021

funderburkjim commented Sep 27, 2021

Andhrabharati commented Sep 27, 2021 •

edited

Loading

Printed Book Categories Lost in OCR? #1

Printed Book Categories Lost in OCR? #1

Comments

gasyoun commented Sep 26, 2014

Andhrabharati commented Sep 26, 2021

gasyoun commented Sep 26, 2021

funderburkjim commented Sep 27, 2021

gasyoun commented Sep 27, 2021

funderburkjim commented Sep 27, 2021

Andhrabharati commented Sep 27, 2021

funderburkjim commented Sep 27, 2021

funderburkjim commented Sep 27, 2021

Andhrabharati commented Sep 27, 2021 • edited Loading

Andhrabharati commented Sep 27, 2021 •

edited

Loading