Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWK Devanagari Сommas #102

Open
gasyoun opened this issue Apr 20, 2015 · 5 comments
Open

PWK Devanagari Сommas #102

gasyoun opened this issue Apr 20, 2015 · 5 comments
Labels

Comments

@gasyoun
Copy link
Member

gasyoun commented Apr 20, 2015

@jmigliori has made me aware of http://www.sanskrit-lexicon.uni-koeln.de/scans/PWScan/2014/web/webtc/indexcaller.php?input=slp1&output=deva&key=ca with it's
चैव - च , च - चैव , - the commas in the digital text are "devanagari commas", but should not, they should be Roman text commas, everywhere. @funderburkjim Any idea how to fix it without weeks of regex blundering?

@gasyoun
Copy link
Member Author

gasyoun commented Oct 30, 2015

@funderburkjim are we on the same page?

@funderburkjim
Copy link
Contributor

Devanagari commas are a special case of Devanagari punctuation. In the original digitizations from Thomas's group, this detail was not handled. Here is an excerpt from the entry for 'ca' in PW, taken from pw.txt:

<H1>001{ca}1{ca}^1¦ ‹Conj.› ²1) {%und , auch , |<g>te</g> , , que.%} ‹Steht hinter beiden zu 
verbindenden Theilen (oft durch› {%sowohl - als auch%} ‹wiederzugeben) , nur nach dem letzten 
oder nur nach dem ersten. Bei drei und mehr zu verbindenden Theilen überall , nur nach dem 
letzten oder hier und da. In gebundener Rede steht› #{ca} {%bisweilen an unrichtiger Stelle%} ‹(› #
{vEcitryaM nItividyAM dadAti ca} ‹st.› #{ca da°}) ‹und auch müssig. Werden zwei Sätze durch 
wiederholtes› #{ca} ‹verbunden , so hat das erste Verbum finitum den Ton.› #{ca} ‹--› #{ca} ‹in 
einem negativen Satze› {%weder -noch%} ; #{na Kalu ca - nEva} ‹dass. Grammatiker , 
''Lexicographen und Erklärer gebrauchen› #{ca} ‹oft elliptisch› {%(auch ‹so v.a.› dieses und noch 
'Anderes)%}. ‹In Verbindung mit andern Partikeln› ; #{cEva , cEva - cEva , cEva - ca , ca - cEva , 
'cEva hi} ‹(am Ende eines Halbverses)› , #{cApi , ca - cApi , cApi - ca , api ca , na - na - api ca} ‹(mit 
'fehlender Negation)› , #{na - na cApi , api cEva , cEvApi , ca taTA , taTA ca , taTEva ca}. ²2) #{ca - 
'tu} = #{ca - ca} {%sowohl - als auch%} ¯110,13. ²3) {%oder%} , ‹mit› #{vA} ‹wechselnd oder dessen 
'Stelle vertretend (nach› #{utAho})

The particular part you were looking at is

#{cEva , cEva - cEva , cEva - ca , ca - cEva , 'cEva hi}

It would be possible for a program to separate this out as:

#{cEva} , #{cEva}- #{cEva} , #{cEva} - #{ca}, #{ca} - #{cEva} , #{'cEva} #{hi}

But is it worth the effort to do this, and to do it so as not to introduce some unexpected errors?

@gasyoun
Copy link
Member Author

gasyoun commented Nov 3, 2015

Sure I'm afraid of the unexpected errors as well. I think it's worth only if we want to make 1 to 1 replica. Do we want to, Jim?

@funderburkjim
Copy link
Contributor

I don't see Devanagari commas as a problem. Until we identify Devanagari commas (or other European punctuation within the scope of Devanagari text) as a substantive problem, I don't think we should try to fix it.

@drdhaval2785
Copy link
Contributor

Even I don't see them as a real problem.
Deep fridge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants