Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Printed Book Categories Lost in OCR? #1

Open
gasyoun opened this issue Sep 26, 2014 · 9 comments
Open

Printed Book Categories Lost in OCR? #1

gasyoun opened this issue Sep 26, 2014 · 9 comments
Assignees
Labels

Comments

@gasyoun
Copy link
Member

gasyoun commented Sep 26, 2014

To solve sanskrit-lexicon/MWS#12 I need help from abroad.
Are the MCI categories from the book kept in the OCR metadata as well?
akarkara 001-a should belong to "1.1. Names of Serpents"
agastyasya 507-a+ 36 should belong to "1.5A Names of Villages"
2387 does not tell nothing, but does 788-a+ 39 tell us enough? A stands for vol. 1? +39 stands for what?
If the data is lost, I can write out the L numbers that belong to each of MCI printed book categories, if we can fit the data in the metadata. Otherwise how can I get a list of all the snakes or of all sages?

agastya

@Andhrabharati
Copy link

Are the MCI categories from the book kept in the OCR metadata as well?

They are, @gasyoun!

Here are the lines from mci.txt

Line 719: `<S>`1.1 Names of Serpents, Birds, Animals etc.
Line 9164: `<S>`1.2 Names of Missiles, Weapons, Bows etc.
Line 15770: `<S>`1.3 Names of Literary Works, Parts of Works etc.
Line 21514: `<S>`1.4 Names of Divisions of Time, Planets, Nakṣatras etc.
Line 26411: `<S>`1.5 Names of Tīrthas, Rivers, Mountains, Forests etc.
Line 46750: `<S>`1.5A Names of Āśramas, Villages, Cities etc.
Line 55376: `<S>`1.6 Names of Countries, Peoples, Islands etc.
Line 83793: `<S>`1.7 Miscellaneous Names

But they are just remaining as section (?) names <S> in the file; probably could be added as some tag to the entries under those sections.

@gasyoun
Copy link
Member Author

gasyoun commented Sep 26, 2021

But they are just remaining as section (?) names

So they are and are not at the same time @Andhrabharati
@funderburkjim how do you like the some tag to the entries under those sections idea?

@funderburkjim
Copy link

The entries in section 1.1 go from
<L>1<pc>001-a<k1>akarkara to <L>439<pc>085-b<k1>hrAda

Then there is a section <AC>ADDENDA AND CORRIGENDA TO SECTION 1.1 (pp. 1-85)
(these lines do not contribute to the display, as they are not in <L>...<LEND>

Then there is Section 1.2, with entries
<L>440<pc>090-a<k1>akzisaMtarjana<k2> to <L>628<pc>158-b<k1>hala
Then corrections for section 1.2,

Probably etc., etc.

I guess the suggestion is that for each entry within a section, we add some markup indicating
the section in which the entry appears.
For instance, we could add an info tag such as <info section="1.1"/> to each of the
headwords in section 1.1.

One use of such a tag could be to add some text to the display of each entry.
We would need to decide the exact format in displays of such 'section' text.

@gasyoun
Copy link
Member Author

gasyoun commented Sep 27, 2021

I guess the suggestion is that for each entry within a section, we add some markup indicating
the section in which the entry appears.

Exactly.

We would need to decide the exact format in displays of such 'section' text.

As in the original book - in bold and above all?

@funderburkjim
Copy link

Here is one format in basic display at local installation.

image

Here is a second format:

image

I slightly favor second format.

What do others think.
Is one of these what we should use?

Should we try another format ?

@Andhrabharati
Copy link

The second one looks good enough; but adding the section number 1. probably makes it "complete".
(1. Names of Serpents, Birds, Animals etc.)

And similar approach could be applied to "the numbered appendices" in few works (like IEG, AP90, AP etc.) as well.

@funderburkjim
Copy link

Here's another option, per suggestion

image

@funderburkjim
Copy link

We could also use similar approach for 'VN' (Corrections) in various dictionaries.

@Andhrabharati
Copy link

Andhrabharati commented Sep 27, 2021

Just like to give reference to what we had in a work at our site-

image

One need not know the Telugu script to see what is done, as highlighted in the image. Section numbers and sub-section numbers (where exisitng) are all indicated properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants