-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PWK Devanagari Сommas #102
Comments
@funderburkjim are we on the same page? |
Devanagari commas are a special case of Devanagari punctuation. In the original digitizations from Thomas's group, this detail was not handled. Here is an excerpt from the entry for 'ca' in PW, taken from pw.txt:
The particular part you were looking at is #{cEva , cEva - cEva , cEva - ca , ca - cEva , 'cEva hi} It would be possible for a program to separate this out as: #{cEva} , #{cEva}- #{cEva} , #{cEva} - #{ca}, #{ca} - #{cEva} , #{'cEva} #{hi} But is it worth the effort to do this, and to do it so as not to introduce some unexpected errors? |
Sure I'm afraid of the unexpected errors as well. I think it's worth only if we want to make 1 to 1 replica. Do we want to, Jim? |
I don't see Devanagari commas as a problem. Until we identify Devanagari commas (or other European punctuation within the scope of Devanagari text) as a substantive problem, I don't think we should try to fix it. |
Even I don't see them as a real problem. |
@jmigliori has made me aware of http://www.sanskrit-lexicon.uni-koeln.de/scans/PWScan/2014/web/webtc/indexcaller.php?input=slp1&output=deva&key=ca with it's
चैव - च , च - चैव ,
- the commas in the digital text are "devanagari commas", but should not, they should be Roman text commas, everywhere. @funderburkjim Any idea how to fix it without weeks of regex blundering?The text was updated successfully, but these errors were encountered: