meta-line, IAST conversion tracker #177

drdhaval2785 · 2017-09-02T07:54:56Z

This is to follow the progress of meta-line conversion and IAST of different dictionaries.
#110 (comment) this comment gave some status by @funderburkjim. It was in some non-descript issue. So tagging it here, lest it should be lost track of.

Dict	IAST	meta-line
ACC	20 May 2017 approx.	20 May 2017 approx.
AE	03/13/2018	#217 03/13/2018
AP90	#159	#158 06/30/2017
AP	April 2017	#162 07/11/2017
BEN	#112	04/10/2017
BHS	01/30/2018	#201 01/30/2018
BOP	02/02/2018	#202 02/02/2018
BOR	02/23/2018	#213 02/23/2018
BUR	#105 April 2017	#166 07/28/2017
CAE	#191 Oct 2017	#191 10/30/2017
CCS	#198 12/28/2017	#198 12/28/2017
GRA	#199 01/24/2018	01/24/2018
GST	02/06/2018	#204 02/06/2018
IEG	02/08/2018	#205 02/08/2018
INM	02/11/2018	#206 02/11/2018
KRM	01/28/2018	#200 01/28/2018
MCI	02/11/2018	#207 02/11/2018
MD	#103 03/27/2017	#161 07/07/2017
MW72	03/09/2018	#215 03/09/2018
MW	06/19/2018	#216
MWE	02/26/2018	#214 02/26/2018
PD	02/04/2018	#203 02/04/2018
PE	02/13/2018	#208 02/13/2018
PGN	02/15/2018	#209 02/15/2018
PUI	02/17/2018	#210 02/17/2018
PWG	#190 12/16/2017	12/16/2017
PW	#183 10/17/2017	10/17/2017
SCH	27 Apr 2017	22 June 2017
SHS	#170 08/06/2017	#172 08/08/2017
SKD	No conversion needed.	#176 08/24/2017
SNP	02/19/2018	#212 02/19/2018
STC	09/16/2017	#182 09/16/2017
VCP	No conversion needed.	#173 08/16/2017
VEI	02/18/2018	#211 02/18/2018
WIL	#154 20 June 2017	#154 20 June 2017
YAT	#154 31 May 2017	#154 31 May 2017

funderburkjim · 2017-09-02T19:51:40Z

Added some missing dates to table above based on #110.

Will use table here for further progress.

gasyoun · 2017-09-02T20:04:20Z

pwg, pw, gra, cae, ccs should come next in order of importance, IMHO.

funderburkjim · 2017-09-02T20:45:41Z

I want to do stc next, so we'll have both the Sanskrit-French dictionaries done.

Although pwg and pw are more important dictionaries than cae and ccs, pwg and pw are also going to be much harder to do (i think) so my tendency is to do cae and ccs next. And then pwg and pw.

I anticipate that gra is going to be very hard from the IAST conversion point of view (lots of accents), so I want to put it off.

meta-line conversion for AE will be done AFTER Sampada is finished; Reason: Generating the files for the UI she is using would require reprogramming by the meta-line conversion.

For MW72, kind of want Jonathan's Greek entry to be done first -- but may need to do conversion before he has time to complete his work.

funderburkjim · 2017-12-17T01:52:40Z

With the PW/PWG dictionaries now done, my intention is to do the meta-line/iast conversion on the
other dictionaries, but in a more cursory manner. I think it is important to have all the dictionaries in this form, so that we may then have display and maintenance (correction) code to be more uniform.

I'm still not sure how to fit MW into this mold, but it should not be forever different, even though it
will retain its place as the original instance of this Sanskrit Dictionary digitization project.

gasyoun · 2017-12-17T17:28:33Z

I'm still not sure how to fit MW into this mold, but it should not be forever differen

Exactly and the fact remains - it's the most popular as well.

Another question. Right now there is no single dictionary file where all the corrections are incorporated, right? It's always generated on the fly for web, based on several files, right? Can we have a cleaned .xml as well, Jim?

funderburkjim · 2017-12-17T20:37:21Z

there is no single dictionary file where all the corrections are incorporated

If I understand the question, then this statement is not right.

The corrections ARE incorporated into X.txt and X.xml.

For example, consider PWG dictionary. There are numerous versions of the digitization,
starting with pwg. These versions are kept in the 'orig' (for 'original') directory.

pwg.txt      Always the latest version
pwg_orig.txt   The version as presented by Thomas (the first version)
pwg_orig_utf8.txt     converted to utf8 encoding
pwg_orig_utf8_slp1.txt.   Devanagari encoding changed to slp1
Other intermediate versions
pwg0.txt
pwg1.txt
pwg2.txt
pwg3.txt
pwg4.txt
pwg5.txt
pwg6.txt
pwg7.txt
pwg8.txt
pwg9.txt
pwgdel.txt
pwgheader.txt

The script 'update.sh' aims to describe the exact steps by which each of the intermediate versions
are constructed. Here is that file for pwg:

echo "BEGIN update.sh"
#  construction of pwg_orig_utf8_slp1.txt
# cd convertwork
# python transcode.py 1 ../../orig/pwg_orig_utf8.txt ../../orig/pwg_orig_utf8_slp1.txt
#  construction of pwg0.txt
# # back to pywork
# cd ../
# python pwgall.py ../orig/pwg_orig_utf8_slp1.txt ../orig/pwgheader.txt ../orig/pwg0.txt ../orig/pwgdel.txt
echo "Apply changes in manualByLine01_slp1 to pwg0, getting pwg1"
python updateByLine.py ../orig/pwg0.txt manualByLine01_slp1.txt ../orig/pwg1.txt 
echo "Apply changes in manualByLine02_slp1 to pwg1, getting pwg2"
python updateByLine.py ../orig/pwg1.txt manualByLine02_slp1.txt ../orig/pwg2.txt 
echo "apply manualByLine03_slp1 changes to pwg2 to get pwg3..."
python updateByLine.py ../orig/pwg2.txt manualByLine03_slp1.txt ../orig/pwg3.txt 
#skip old manualByLine04, since accent changes already made in pwg0
#echo "apply manualByLine04_slp1 changes to pwg3 to get pwg4..."
#python updateByLine.py ../orig/pwg3.txt manualByLine04_slp1.txt ../orig/pwg4.txt 
echo "apply manualByLine04_slp1 changes to pwg3 to get pwg4..."
python updateByLine.py ../orig/pwg3.txt manualByLine04_slp1.txt ../orig/pwg4.txt 
#echo "construct missingByLine"
#cat missing/updprephk/missing*.txt > missingByLine.txt
#cat missing/updprep/missing*.txt > missingByLine_slp1.txt
echo "apply missingByLine_slp1 changes to pwg4 to get pwg5..."
python updateByLine.py ../orig/pwg4.txt missingByLine_slp1.txt ../orig/pwg5.txt 
# manualByLine05_slp1 is a copy of 
#  correctionwork/arabic/arabic_prep4_upd_completed.txt
echo "apply manualByLine05_slp1 changes to pwg5 to get pwg6... (Arabic)"
python updateByLine.py ../orig/pwg5.txt manualByLine05_slp1.txt ../orig/pwg6.txt 
echo "apply manualByLine06_slp1 changes to pwg6 to get pwg7..."
python updateByLine.py ../orig/pwg6.txt manualByLine06_slp1.txt ../orig/pwg7.txt 

echo "manualByLine07_slp1 is from correctionwork/greek/greek_prep5_upd_edit.txt"
echo "apply manualByLine07_slp1 changes to pwg7 to get pwg8... "
python updateByLine.py ../orig/pwg7.txt manualByLine07_slp1.txt ../orig/pwg8.txt 

#echo "apply manualByLine08_slp1 changes to pwg8 to get pwg... "
#python updateByLine.py ../orig/pwg8.txt manualByLine08_slp1.txt ../orig/pwg.txt
# 12-14-2017 Meta-line conversion, with IAST
cd correctionwork/cologne-issue-190
sh redo.sh  # generates temp_pwgwithmeta2.txt
cd ../../
cp correctionwork/cologne-issue-190/temp_pwgwithmeta2.txt ../orig/pwg9.txt
python updateByLine.py ../orig/pwg9.txt manualByLine09_slp1.txt ../orig/pwg.txt

echo "END update.sh"
echo "NEXT redo_hw.sh"

The headword list and the xml version are constructed from pwg.txt. So, the xml version also
incorporates all corrections to date.

gasyoun · 2017-12-20T18:41:26Z

xml version also
incorporates all corrections to date

Thanks for the clarification, was not sure.

drdhaval2785 · 2018-02-12T01:04:15Z

Have beem a mute spectator for quite some time. Remarkable achievement and speed by @funderburkjim. Once the conversion is done for all (or near all), many of the scripts can be made generic.

Some work may be needed to make content markup uniform then.

gasyoun · 2018-02-16T07:13:42Z

Only 7 left, but there is MW, so we are getting actually close to Unicode and understandable and standardised code thanks to Jim.

funderburkjim · 2018-02-20T00:52:07Z

make content markup uniform

Right, there will be. Will adapt some of the code conversion programs to do a survey of the markup of
the various dictionaries, with the aim of converging markup where possible.

MW will be a bear; saving it until last. I'm sure it will put up a valiant struggle when I try to make its
form more similar to that of other dictionaries.

funderburkjim · 2018-06-19T22:40:55Z

All the little boxes seem to be filled in now. Hurray!

Next steps will probably be to review the work, and
a) uniformity of tags across dictionaries (#87, #116)
b) document the iast conversions (#216, #227)

differences between text iast and modern iast
differences between coded iast and modern iast (e.g., I think mw72 has a couple of variances
here

I think we can close this issue now.

gasyoun · 2018-06-20T05:33:11Z

All the little boxes seem to be filled in now.

In less than a year. In a Jimless condition we would not make it in 20.

I think we can close this issue now.

You sure deserve it and wanted to do it long ago, so yes, yes, yes.

funderburkjim · 2018-06-20T19:58:15Z

@gasyoun Thanks for the encouragement! It helps.

drdhaval2785 mentioned this issue Sep 2, 2017

Yates - IAST and meta-line conversions done #154

Closed

drdhaval2785 added the Documentation How TXT , XML work label Sep 2, 2017

gasyoun assigned drdhaval2785 Sep 2, 2017

funderburkjim mentioned this issue Sep 2, 2017

AS to IAST / SLP1 for all dicts #110

Closed

funderburkjim mentioned this issue Dec 30, 2017

CCS meta-line/iast conversion #198

Closed

funderburkjim mentioned this issue Jan 8, 2018

Russian in PWG, a few more sanskrit-lexicon/CORRECTIONS#414

Open

funderburkjim closed this as completed Jun 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta-line, IAST conversion tracker #177

meta-line, IAST conversion tracker #177

drdhaval2785 commented Sep 2, 2017 •

edited by funderburkjim

Loading

funderburkjim commented Sep 2, 2017

gasyoun commented Sep 2, 2017

funderburkjim commented Sep 2, 2017 •

edited

Loading

funderburkjim commented Dec 17, 2017

gasyoun commented Dec 17, 2017

funderburkjim commented Dec 17, 2017

gasyoun commented Dec 20, 2017

drdhaval2785 commented Feb 12, 2018

gasyoun commented Feb 16, 2018

funderburkjim commented Feb 20, 2018

funderburkjim commented Jun 19, 2018

gasyoun commented Jun 20, 2018

funderburkjim commented Jun 20, 2018

meta-line, IAST conversion tracker #177

meta-line, IAST conversion tracker #177

Comments

drdhaval2785 commented Sep 2, 2017 • edited by funderburkjim Loading

funderburkjim commented Sep 2, 2017

gasyoun commented Sep 2, 2017

funderburkjim commented Sep 2, 2017 • edited Loading

funderburkjim commented Dec 17, 2017

gasyoun commented Dec 17, 2017

funderburkjim commented Dec 17, 2017

gasyoun commented Dec 20, 2017

drdhaval2785 commented Feb 12, 2018

gasyoun commented Feb 16, 2018

funderburkjim commented Feb 20, 2018

funderburkjim commented Jun 19, 2018

gasyoun commented Jun 20, 2018

funderburkjim commented Jun 20, 2018

drdhaval2785 commented Sep 2, 2017 •

edited by funderburkjim

Loading

funderburkjim commented Sep 2, 2017 •

edited

Loading