-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AS to IAST / SLP1 for all dicts #110
Comments
The letter-number system (AS) is used only as a way to represent printed text that is composed of Latin alphabet with diacritics. It is language agnostic -- can represent Sanskrit words, French words, German words, etc. When reprsenting Sanskrit words, there are differences among texts as to the representation using Latin letters with diacritics. It is reasonable to have the primary form of the dictionary use modern IAST conventions as a way to represent the Sanskrit words Latin-alphabet-with-diacritics. That's the principle I'm using with these dictionaries, IAST is better than AS, and modern IAST is better than historical IAST. In the Monier Williams dictionary, I took an approach closer to what you are suggesting, I think; namely,
Thus it would be possible to produce a version of other dictionaries (say MD, ACC), where the
I don't plan to undertake this IAST->SLP1 conversion of Sanskrit words any time soon. Conversion to IAST is a big enough task for me now. But if you want to tackle this for a particular dictionary, I'll work with you if you like. |
Too big and no actual need. Time will come. |
09/02/2017 Retire this table in place of the one at #177 .Here is the status of AS/IAST conversion for the various dictionaries:
|
A daunting amount of work to do, but it seems worthwhile to convert all the AS coding:
I'll fill in the table above as progress is made. |
Yeah, it's a hunt. Most wanted:
|
It is a dubious honor to be the only assignee . |
Yeah, let's give @drdhaval2785 or @vvasuki as try 👍 |
A separate question: All sanskrit words are clearly identified in the xml-s ?
(I must decline, @gasyoun - far too occupied by separate sanskrit projects.) |
No and will not be, if you do not invent a regex and do manual cleanup after. |
@funderburkjim At the very least, the words you do convert to IAST or SLP1 should be marked up (identifiable as sanskrit words). Reason: downstream users would want to see those words in the script of their convenience for easy reading/ lookup. |
Easier said than done. It would be good to have all the IAST sanskrit words identified as such, which is your suggestion. I'll put this in my todo list. Words which appear in Devanagari in a printed text are generally identifiable (with an Sanskrit words which appear in the text as IAST are the problem. The problem is distinguishing such This IAST word classification has been done only for MW. |
Oh, right, that's feasible. |
I will try it. Will post here if I do something substantial in this direction. |
And there we have the best sanskrit dictionaries to make the distinction :-) . Even if they're not covered (now / in the near future) its still ok to mark off just the ones we do end up converting to IAST / SLP as indic - it will help the final user downstream to that extant.
I'd say check the prefix minus the terminal s and mark it as an indic word. But that's not so important as these are few. |
Hundreds, and just regexing will not help, more variants occur. And as it's of no priority for Jim now, let's leave it. There should be an Indian interested in weeding out the words. All we have tried, @vvasuki for the last 3 years was at least to clean the headwords. In 2-3 years will be there. The amount of work done is like 1/10 compared to what has to be done inside the dictionaries. But as I adore what Jim does a lot, I would not want him to do what others can, only where he is best. That's my take and I'll stand on it. Headwords first. Additional markup - let India wake up and tell when she is reading for some Sanskrit NLP. It's just about time. Otherwise best research on Sanskrit for last 200 years is done outside India. |
You certainly know how to push the right buttons! But I'll let that pass considering the source, time and place :-)
Of course, it's up for Jim to decide (and I'm not insisting) - you've made your opinion clear. |
In 2006 there was nothing. Not even PWG was online. In 10 years a lot has changed. But it's sad to see that the role of people from India (other than Dhaval) is so small. What I see is that a single person can do as much as an academic institute. It's a pity to see such conditions. |
There is are proper times, places and forms to express such sadness and examine the causes. This certainly isn't it. What's the relevance of these notes to the task at hand? You are not going to "guilt" or irritate Indians into changing their priorities and jumping in by noting such things here (of all places). In any case, do note that all these dictionaries were manually typed in the first place by Indians paid by Europeans.
Again, this is neither relevant to the current issue nor helpful. Which academic institute will change because of your comment here? Or can we look for some engineering insight hidden in this unlikely source? I am all for praising dhaval but I think he would take manu quite seriously "सम्मानाद्ब्राह्मणो नित्य- मुद्विजेत विषादिव । अमृतस्येव चाकाङ्क्षे- दवमानस्य सर्वदा ।।" |
Seriously, if you want to increase Indian participation, write an email to sanskrit-programmers linking to various issues in these projects where they can contribute to and invite contribution by python programmers (without absurdly insulting Indian scholarship along the way). That's far more likely to be productive. |
I've not seen anything big enough, some small projects and that's it. People code as a hobby and only what they like. These tasks are bigger than just hobby. Can you document what you see, can I ask you for a favor? You know the coders, I do not. |
And people such as those here do not? And people there would not like contributing here? People should indeed do what they truly think is important and enjoyable for themselves. Culture need not be advanced by the miserable. It is quite arrogant to think that others ought to share your priorities (smacks of an extension of the classic "white man's burden" ).
There are tasks that are bigger than just hobby, but it is false that hobbyists cannot make significant contributions here.
No - I don't know the active coders - same as you. Just shrIvatsa, who is likely to pass. If you're too busy to write the email, I can, of course.
If it were not for such Indians, there would be nobody who typed as well.
I know that you know, and you know that I know that you know. Just bringing the picture into "the picture" and clearing selective amnesia. |
Will use table at #177 and retire above table. |
I am for SLP1.
But if conversion or identification is difficult, IAST will also do.
For all dict.xml please. This will bring uniformity in all XMLs. Need for separate disay tools for different dictionaries can be reduced.
The text was updated successfully, but these errors were encountered: