-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACC alternate headword patterns #21
Comments
Pattern 1
This is reasonable estimate of two headwords on the first line of any entry. This may miss longish alternate headwords ending on next line, but a total of 1229 such cases exist. |
You've got a sample? |
You've got a sample?
Just a possibility. Not sure whether any instance exists or not.
|
Pattern 2 - Author - work relationshipThis is more or less peculiar to ACC.
|
Pattern 3 - Book - Commentary relationshipThere are some items which are just mentioned by C. See Sauravāsanā, Kiraṇāvalī, Sūryasiddhāntodāharaṇa, Vāsanābhāṣya, Saurabhāṣya, Gūḍhārthaprakāśaka.
|
Makes sense.
What's the extraction algo? |
@funderburkjim and @gasyoun Pattern 1 is implemented. The code is in the dev server pywork/correctionwork/issue-alt-21/pattern_finder.py Output is this - altlines.txt On my local machine the alternates are showing properly. So Jim's modifications in redo_hw.sh scripts and redo_xml.sh scripts is working fine as usual. |
There are many such cases @gasyoun . Now the code can handle them properly. Now the total counter is 1591. Therefore 362 (1591-1229) such cases are there where this line crossing phenomenon happenned. See e.g.
|
Pattern 2 and 3 can't be fully segregated. If there is some interest of members in this direction, I may try. |
Indeed many. Pattern 2,3 may be of interest if a general approach can be developed. If it's unique for ACC, is not urgent. Otherwise, if it can help change the way we work with all other dictionaries - then it's the real difference between a book and digital copy - it's hyperlinks. |
sanskrit-lexicon/COLOGNE#316 is the placeholder. |
This issue is devoted to bring out alternate headwords / embedded headword patterns from revised acc.txt (with meta lines).
The text was updated successfully, but these errors were encountered: