Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACC alternate headword patterns #21

Open
drdhaval2785 opened this issue May 18, 2017 · 11 comments
Open

ACC alternate headword patterns #21

drdhaval2785 opened this issue May 18, 2017 · 11 comments
Assignees

Comments

@drdhaval2785
Copy link
Contributor

This issue is devoted to bring out alternate headwords / embedded headword patterns from revised acc.txt (with meta lines).

@drdhaval2785
Copy link
Contributor Author

Pattern 1

if len(line.split('#}')) > 2 and u'¦' in line:

This is reasonable estimate of two headwords on the first line of any entry.

This may miss longish alternate headwords ending on next line, but a total of 1229 such cases exist.

@gasyoun
Copy link
Member

gasyoun commented May 18, 2017

miss longish alternate headwords ending on next line

You've got a sample?

@drdhaval2785
Copy link
Contributor Author

drdhaval2785 commented May 18, 2017 via email

@drdhaval2785
Copy link
Contributor Author

Pattern 2 - Author - work relationship

This is more or less peculiar to ACC.
Most likely the works should be treated as sub headwords if it doesn't exist (or maybe even if it exists).
Debate welcome.

<L>28<pc>1-001,2<k1>aKaRqAnanda<k2>aKaRqAnanda
{#aKaRqAnanda#}¦
<HI1>Advaitaratnakośa, vedānta. Rice 130.
<HI1>Ratnakośaṭīkā, vedānta. Rice 166.
<HI1>Mantroddhāraprakaraṇa. NW. 186.
<HI1>Mahāviṣṇupūjāpaddhati. NW. 186.
<HI1>Muktisopāna. Ben. 41.
<LEND>

@drdhaval2785
Copy link
Contributor Author

Pattern 3 - Book - Commentary relationship

There are some items which are just mentioned by C.
But some of them are individual entries with name of commentary too.

See Sauravāsanā, Kiraṇāvalī, Sūryasiddhāntodāharaṇa, Vāsanābhāṣya, Saurabhāṣya, Gūḍhārthaprakāśaka.
Many of these are not enumerated as separate headwords.
They should also form alternate headwords. This will enhance possibility of identifying commentaries by their own names.

<L>38834<pc>2-175,2<k1>sUryasidDAnta<k2>sUryasidDAnta
{#sUryasidDAnta#}¦ jy. IO. 312. 454. 580. 1510 (fr.). 1844.
<>2260. 2263. Rgb. 884. 885. 887. Stein 177. Sūrya- 
[Page2-176-a+ 48]
<>siddhānte Bhūgolādhyāya and Gaṇitādhyāya (?). Peters.
<>4, 38.
<HI1><symbol n="C.">C.</symbol> Oudh XXII, 76.
<HI1><symbol n="C.">C.</symbol> Sauravāsanā by Kamalākara. Bhau Dāji 30
<>(inc). Rgb. 885.
<HI1><symbol n="C.">C.</symbol> by Caṇḍeśvarācārya. Bhau Dāji 125. Rgb. 886.
<HI1><symbol n="C.">C.</symbol> Kiraṇāvalī by Dādābhāī. IO. 1122. 2261.
<HI1><symbol n="C.">C.</symbol> Sūryasiddhāntodāharaṇa by Divākara. Oudh
<>XXII, 76.
<HI1><symbol n="C.">C.</symbol> Siddhāntadīpikā by Nārāyaṇa. CU. add. 1602.
<HI1><symbol n="C.">C.</symbol> Vāsanābhāṣya or Saurabhāṣya by Nṛsiṃha,
<>son of Kṛṣṇa Gaṇaka. IO. 1755. 2264.
<HI1><symbol n="C.">C.</symbol> by Bhaṭṭotpala. Quoted by Divākara in Prau-
<>ḍhamanoramā.
<HI1><symbol n="C.">C.</symbol> by Bhūdhara, son of Devadatta. IO. 580.
<>2262. Oudh XX, 112. 124. Stein 177.
<HI1><symbol n="C.">C.</symbol> by Yallaya. Gov. Or. Libr. Madras 109.
<HI1><symbol n="C.">C.</symbol> Gūḍhārthaprakāśaka by Raṅganātha, son of
<>Ballāla. GB. 120. IO. 454. 1844. 2263.
<>Stein 177.
<HI1><symbol n="C.">C.</symbol> Gahanārthaprakāśikā, a <symbol n="C.">C.</symbol> and udāharaṇa, by
<>Viśvanātha, son of Divākara. Devīpr. 79, 16.
<>Rgb. 807. 887. Stein 177.
<LEND>

@gasyoun
Copy link
Member

gasyoun commented May 18, 2017

Most likely the works should be treated as sub headwords

Makes sense.

They should also form alternate headwords.

What's the extraction algo?

@drdhaval2785
Copy link
Contributor Author

@funderburkjim and @gasyoun Pattern 1 is implemented.
Total of 1591 alternate headwords added by this method.
They have been formed into the format described for acc_hwextra.txt in this comment sanskrit-lexicon/COLOGNE#133 (comment).

The code is in the dev server pywork/correctionwork/issue-alt-21/pattern_finder.py

Output is this - altlines.txt

On my local machine the alternates are showing properly. So Jim's modifications in redo_hw.sh scripts and redo_xml.sh scripts is working fine as usual.

@drdhaval2785
Copy link
Contributor Author

drdhaval2785 commented May 21, 2017

longish alternate headwords ending on next line

There are many such cases @gasyoun . Now the code can handle them properly. Now the total counter is 1591. Therefore 362 (1591-1229) such cases are there where this line crossing phenomenon happenned.

See e.g.

<L>236<pc>1-006,1<k1>ajYAnaboDinI<k2>ajYAnaboDinI
{#ajYAnaboDinI#}¦ or {#aDyAtmavidyopadeSaviDi#} or {#saMkziptavedA-#}
<>{#ntaSAstraprakriyA,#} a <symbol n="C.">C.</symbol> on the Ātmabodha, by Śaṅka-
<>rācārya. IO. 100. Paris (B 159 c. D 57 b). Hall
<>p. 105. L. 678. Bik. 554. K. 112. B. 4, 36. 38.
<>Report XXVII. Ben. 69. 81. Rādh 5. Oudh V, 22.
<>NP. V, 170. Poona 43. Peters. 3, 391.
<HI1><symbol n="C.">C.</symbol> by Amṛtānanda. K. 112.
<LEND>

{#saMkziptavedA-#}\n<>{#ntaSAstraprakriyA,#} has some leftover on second line too.

@drdhaval2785
Copy link
Contributor Author

Pattern 2 and 3 can't be fully segregated. If there is some interest of members in this direction, I may try.
Too much markup mess there.

@gasyoun
Copy link
Member

gasyoun commented May 21, 2017

362 (1591-1229) such cases

Indeed many. Pattern 2,3 may be of interest if a general approach can be developed. If it's unique for ACC, is not urgent. Otherwise, if it can help change the way we work with all other dictionaries - then it's the real difference between a book and digital copy - it's hyperlinks.

@drdhaval2785
Copy link
Contributor Author

sanskrit-lexicon/COLOGNE#316 is the placeholder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants