Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded additional headwords #5

Open
funderburkjim opened this issue Oct 5, 2016 · 2 comments
Open

Embedded additional headwords #5

funderburkjim opened this issue Oct 5, 2016 · 2 comments
Assignees

Comments

@funderburkjim
Copy link
Contributor

funderburkjim commented Oct 5, 2016

I'm not sure whether you want to consider this topic within this repository.

Many dictionaries have additional headwords 'embedded' within the given entries of the dictionary.

For example, many dictionaries have prefixed verbs indicated by various prefix sections within the body
of the entry for the verb.

Other dictionaries, like STC, I think, have many compounds indicated only as sub-entries under a
major headword.

So the idea for consideration is to identify these embedded headwords.

@gasyoun
Copy link
Member

gasyoun commented Oct 5, 2016

For example, many dictionaries have prefixed verbs indicated by various prefix sections within the body
of the entry for the verb.

Yes, sure. Dhaval has requested you to look at PWG, PWK to see what dhatus (nI is a good example) take what upasargas (sandhi rules apply in many cases, so it's not just mechanical and @drdhaval2785 code might help). And yes, I think the issue is at home here.

@drdhaval2785
Copy link
Contributor

https://github.com/sanskrit-lexicon/alternateheadwords/blob/master/scripts/embedded.py is the script which is used to scrape embedded items out of .txt files.
Right now made to work only for STC and PWG. Generic enough to apply to other dictionaries too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants