-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgra-meta2_1.txt
264 lines (249 loc) · 13.3 KB
/
gra-meta2_1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
gra-meta2.txt
July 2023
This supercedes gra-meta2.txt of Jan 24, 2018 as a description of
the current digitization gra.txt.
The digitization is in the utf-8 encoding.
The {X...X} style of coding serves several purposes:
{@X@} : bold text
{%X%} : italic text
{nnn,n} Identify RV links. (04-06-2021)
The pseudo-xml <> style of coding is used as follows:
<L>,<pc>,<k1>,<k2>,<LEND> are used in the 'meta lines' (see below)
<ab>X</ab> global abbreviation. The 'expansion' is provided in file
https://github.com/sanskrit-lexicon/csl-pywork/blob/master/v02/distinctfiles/gra/pywork/graab/graab_input.txt
<ab n="T">X</ab> local abbreviation; "T" is the expansion.
<lang>X</lang> global abbreviation for a language name
The expansion is also in the graab_input.txt file.
e.g. go. <id>goth.</id> <disp>gothisch - gothic</disp>
<lang n="T">X</lang> a local language abbreviation. Only 1 instance.
<ls>X</ls> A literary source abbreviation. The 'expansion' is in file
https://github.com/sanskrit-lexicon/csl-pywork/blob/master/v02/distinctfiles/gra/pywork/graauth/tooltip.txt
e.g. AV. [Atharvaveda— Roth & Whitneyʼs edition]
<ls>X N</ls> Literary source abbreviation with specification.
e.g. <ls>AV. 1,2,3</ls>
<ls n="Y">Z</ls> Implied literary source.
e.g. <ls n="AV.">14,1,57</ls>.
<div n="H"> A major subdivision
<div n="P"> A minor subdivision
<div n="Pf"> A prefix subdivision (such as for an upasarga)
<div n="TS"> A termination/suffix subdivision
<div n="W"> A Whole-word subdivision; this occurs mostly in pronoun category.
<F>X</F> footnote (only 1 instance)
<gk>X</gk> Greek text
<heb>X</heb> Hebrew text (only 1 instance)
<hom>N</hom> Homonym number
Meta lines:
Each entry of the digitization is contained within a beginning and ending
markup. As example,
<L>2<pc>0001<k1>a<k2>2. a
<hom>2.</hom> {@(a).@}¦ Deutestamm der <pe n="1sten/ersten">1.</pe> Person, siehe unter ahám.
<LEND>
The ending markup is <LEND>.
The beginning markup contains various identifying fields, expressed in
a <fieldname>fieldvalue format. The fields are:
required fields
L Cologne record identifier
pc page-col reference to scanned image
k1 key1. The headword spelling, in slp1 coding for Sanskrit headwords
k2 key2. The original headword spelling, in slp1 coding.
Contains accents, homonym numbers.
May have more than one part with comma-delimiter. e.g.
<L>25<pc>0004<k1>akutra<k2>a-ku/tra, a-ku/trA
Markup conventions other than the 'pseud-xml' type.
[PageX] Page breaks are coded as [Page...].
Page breaks are more specifically coded as
[PageP] where P goes from 1 to 1776.
Some pages have two parts, and the 2nd parts are indicated with an 'a';
e.g. 505a.
¦ Separates the 'headword part' from the gloss. In first line of entry
〉 sequence parenthesis. e.g.
daher 1〉 {%Antheil;%} 2〉 {%Erbtheil;%} 3〉 {%Partei;%}
4〉 {%der viele Antheile besitzt%} oder {%zu vergeben hat%} und daher
5〉 Name eines der Aditisöhne.
〰 to supply the word(s) under discussion
〔X〕 literary source details. e.g. <ls>VS. 〔14,4〕</ls>.
Used when there is no CDSL link target.
„X“ quoted text, e.g. „in Bewegung setzen“
… omitted text
√ following word is Sanskrit root
e.g. {@√aghāy,@}¦ {%Schaden zufügen wollen%} (von aghá)
ʼ Apostrophe
‿ Vowel sandhi to be applied. e.g. śyenásya‿iva
_ and ⏑ long and short (meter)
-------------------------------------
Words using diacritics with the Latin alphabet are represented in Unicode
characters. The vast majority of such words with diacritics are Sanskrit,
and of course some words are German. But there are are also a smaller number
of words in other languages.
The spelling aims to be consistent with modern IAST unicode,
generally as described in https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration; the usage of diacritics in the printed text has many differences from
the modern IAST appearing in this version of the digitization.
Here are the extended ASCII characters that occur in the text.
with their frequency as of this writing (07-11-2023):
--------------------------------------------------------------
Extended ascii unicode characters, with counts.
These may be separated into 4 categories:
54 Latin characters (with diacritics); usually Sanskrit text
in IAST with accents.
78 Greek characters
3 Hebrew characters
20 Other extended ascii characters. Generally some kind of markup.
LATIN
c̣ (\u0063\u0323) 2 := LATIN SMALL LETTER C + COMBINING DOT BELOW
n̐ (\u006e\u0310) 7 := LATIN SMALL LETTER N + COMBINING CANDRABINDU
à (\u00e0) 207 := LATIN SMALL LETTER A WITH GRAVE
á (\u00e1) 64162 := LATIN SMALL LETTER A WITH ACUTE
â (\u00e2) 2 := LATIN SMALL LETTER A WITH CIRCUMFLEX
ä (\u00e4) 4714 := LATIN SMALL LETTER A WITH DIAERESIS
è (\u00e8) 7 := LATIN SMALL LETTER E WITH GRAVE
é (\u00e9) 6264 := LATIN SMALL LETTER E WITH ACUTE
ë́ (\u00eb\u0301) 1 := LATIN SMALL LETTER E WITH DIAERESIS + COMBINING ACUTE ACCENT
ì (\u00ec) 2 := LATIN SMALL LETTER I WITH GRAVE
í (\u00ed) 20084 := LATIN SMALL LETTER I WITH ACUTE
î (\u00ee) 2 := LATIN SMALL LETTER I WITH CIRCUMFLEX
ï (\u00ef) 6 := LATIN SMALL LETTER I WITH DIAERESIS
ñ (\u00f1) 1257 := LATIN SMALL LETTER N WITH TILDE
ò (\u00f2) 6 := LATIN SMALL LETTER O WITH GRAVE
ó (\u00f3) 5031 := LATIN SMALL LETTER O WITH ACUTE
ö (\u00f6) 3462 := LATIN SMALL LETTER O WITH DIAERESIS
ù (\u00f9) 11 := LATIN SMALL LETTER U WITH GRAVE
ú (\u00fa) 10024 := LATIN SMALL LETTER U WITH ACUTE
û (\u00fb) 4 := LATIN SMALL LETTER U WITH CIRCUMFLEX
ü (\u00fc) 5712 := LATIN SMALL LETTER U WITH DIAERESIS
ý (\u00fd) 2 := LATIN SMALL LETTER Y WITH ACUTE
þ (\u00fe) 1 := LATIN SMALL LETTER THORN
ā (\u0101) 35572 := LATIN SMALL LETTER A WITH MACRON
ā̀ (\u0101\u0300) 33 := LATIN SMALL LETTER A WITH MACRON + COMBINING GRAVE ACCENT
ā́ (\u0101\u0301) 24497 := LATIN SMALL LETTER A WITH MACRON + COMBINING ACUTE ACCENT
ă (\u0103) 1 := LATIN SMALL LETTER A WITH BREVE
ē (\u0113) 2 := LATIN SMALL LETTER E WITH MACRON
ě (\u011b) 22 := LATIN SMALL LETTER E WITH CARON
ī (\u012b) 7712 := LATIN SMALL LETTER I WITH MACRON
ī́ (\u012b\u0301) 3034 := LATIN SMALL LETTER I WITH MACRON + COMBINING ACUTE ACCENT
ĭ (\u012d) 5 := LATIN SMALL LETTER I WITH BREVE
ł (\u0142) 2 := LATIN SMALL LETTER L WITH STROKE
Ś (\u015a) 21 := LATIN CAPITAL LETTER S WITH ACUTE
ś (\u015b) 12119 := LATIN SMALL LETTER S WITH ACUTE
ū (\u016b) 4243 := LATIN SMALL LETTER U WITH MACRON
ū̀ (\u016b\u0300) 1 := LATIN SMALL LETTER U WITH MACRON + COMBINING GRAVE ACCENT
ū́ (\u016b\u0301) 2248 := LATIN SMALL LETTER U WITH MACRON + COMBINING ACUTE ACCENT
ŭ (\u016d) 15 := LATIN SMALL LETTER U WITH BREVE
ů (\u016f) 1 := LATIN SMALL LETTER U WITH RING ABOVE
ŷ (\u0177) 1 := LATIN SMALL LETTER Y WITH CIRCUMFLEX
ḍ (\u1e0d) 773 := LATIN SMALL LETTER D WITH DOT BELOW
ḥ (\u1e25) 128 := LATIN SMALL LETTER H WITH DOT BELOW
ḷ (\u1e37) 4 := LATIN SMALL LETTER L WITH DOT BELOW
ṃ (\u1e43) 2140 := LATIN SMALL LETTER M WITH DOT BELOW
ṅ (\u1e45) 497 := LATIN SMALL LETTER N WITH DOT ABOVE
ṇ (\u1e47) 7021 := LATIN SMALL LETTER N WITH DOT BELOW
ṛ (\u1e5b) 7900 := LATIN SMALL LETTER R WITH DOT BELOW
ṛ́ (\u1e5b\u0301) 2895 := LATIN SMALL LETTER R WITH DOT BELOW + COMBINING ACUTE ACCENT
ṝ (\u1e5d) 56 := LATIN SMALL LETTER R WITH DOT BELOW AND MACRON
ṝ́ (\u1e5d\u0301) 109 := LATIN SMALL LETTER R WITH DOT BELOW AND MACRON + COMBINING ACUTE ACCENT
ṣ (\u1e63) 14813 := LATIN SMALL LETTER S WITH DOT BELOW
ṭ (\u1e6d) 2388 := LATIN SMALL LETTER T WITH DOT BELOW
ẓ (\u1e93) 1 := LATIN SMALL LETTER Z WITH DOT BELOW
GREEK
Π (\u03a0) 1 := GREEK CAPITAL LETTER PI
ά (\u03ac) 30 := GREEK SMALL LETTER ALPHA WITH TONOS
έ (\u03ad) 34 := GREEK SMALL LETTER EPSILON WITH TONOS
ή (\u03ae) 16 := GREEK SMALL LETTER ETA WITH TONOS
ί (\u03af) 43 := GREEK SMALL LETTER IOTA WITH TONOS
α (\u03b1) 98 := GREEK SMALL LETTER ALPHA
ᾱ́ (\u03b1\u0304\u0301) 1 := GREEK SMALL LETTER ALPHA + COMBINING MACRON + COMBINING ACUTE ACCENT
β (\u03b2) 4 := GREEK SMALL LETTER BETA
γ (\u03b3) 37 := GREEK SMALL LETTER GAMMA
δ (\u03b4) 34 := GREEK SMALL LETTER DELTA
ε (\u03b5) 55 := GREEK SMALL LETTER EPSILON
ζ (\u03b6) 3 := GREEK SMALL LETTER ZETA
η (\u03b7) 21 := GREEK SMALL LETTER ETA
ι (\u03b9) 44 := GREEK SMALL LETTER IOTA
ῑ́ (\u03b9\u0304\u0301) 1 := GREEK SMALL LETTER IOTA + COMBINING MACRON + COMBINING ACUTE ACCENT
λ (\u03bb) 28 := GREEK SMALL LETTER LAMDA
μ (\u03bc) 80 := GREEK SMALL LETTER MU
ν (\u03bd) 98 := GREEK SMALL LETTER NU
ξ (\u03be) 7 := GREEK SMALL LETTER XI
ο (\u03bf) 69 := GREEK SMALL LETTER OMICRON
π (\u03c0) 49 := GREEK SMALL LETTER PI
ρ (\u03c1) 97 := GREEK SMALL LETTER RHO
ς (\u03c2) 89 := GREEK SMALL LETTER FINAL SIGMA
σ (\u03c3) 45 := GREEK SMALL LETTER SIGMA
τ (\u03c4) 61 := GREEK SMALL LETTER TAU
υ (\u03c5) 26 := GREEK SMALL LETTER UPSILON
φ (\u03c6) 17 := GREEK SMALL LETTER PHI
χ (\u03c7) 42 := GREEK SMALL LETTER CHI
ω (\u03c9) 86 := GREEK SMALL LETTER OMEGA
ϋ (\u03cb) 1 := GREEK SMALL LETTER UPSILON WITH DIALYTIKA
ό (\u03cc) 39 := GREEK SMALL LETTER OMICRON WITH TONOS
ύ (\u03cd) 33 := GREEK SMALL LETTER UPSILON WITH TONOS
ώ (\u03ce) 3 := GREEK SMALL LETTER OMEGA WITH TONOS
ϑ (\u03d1) 30 := GREEK THETA SYMBOL
ϝ (\u03dd) 1 := GREEK SMALL LETTER DIGAMMA
ϰ (\u03f0) 63 := GREEK KAPPA SYMBOL
ϱ (\u03f1) 2 := GREEK RHO SYMBOL
ϳ (\u03f3) 3 := GREEK LETTER YOT
ἀ (\u1f00) 22 := GREEK SMALL LETTER ALPHA WITH PSILI
ἄ (\u1f04) 17 := GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA
ἐ (\u1f10) 9 := GREEK SMALL LETTER EPSILON WITH PSILI
ἑ (\u1f11) 1 := GREEK SMALL LETTER EPSILON WITH DASIA
ἔ (\u1f14) 7 := GREEK SMALL LETTER EPSILON WITH PSILI AND OXIA
ἕ (\u1f15) 1 := GREEK SMALL LETTER EPSILON WITH DASIA AND OXIA
Ἐ (\u1f18) 1 := GREEK CAPITAL LETTER EPSILON WITH PSILI
ἠ (\u1f20) 1 := GREEK SMALL LETTER ETA WITH PSILI
ἡ (\u1f21) 2 := GREEK SMALL LETTER ETA WITH DASIA
ἤ (\u1f24) 1 := GREEK SMALL LETTER ETA WITH PSILI AND OXIA
ἥ (\u1f25) 3 := GREEK SMALL LETTER ETA WITH DASIA AND OXIA
ἰ (\u1f30) 3 := GREEK SMALL LETTER IOTA WITH PSILI
ἱ (\u1f31) 2 := GREEK SMALL LETTER IOTA WITH DASIA
ἴ (\u1f34) 2 := GREEK SMALL LETTER IOTA WITH PSILI AND OXIA
ἵ (\u1f35) 2 := GREEK SMALL LETTER IOTA WITH DASIA AND OXIA
ἶ (\u1f36) 1 := GREEK SMALL LETTER IOTA WITH PSILI AND PERISPOMENI
ὀ (\u1f40) 8 := GREEK SMALL LETTER OMICRON WITH PSILI
ὄ (\u1f44) 5 := GREEK SMALL LETTER OMICRON WITH PSILI AND OXIA
ὅ (\u1f45) 1 := GREEK SMALL LETTER OMICRON WITH DASIA AND OXIA
ὐ (\u1f50) 2 := GREEK SMALL LETTER UPSILON WITH PSILI
ὑ (\u1f51) 2 := GREEK SMALL LETTER UPSILON WITH DASIA
ὔ (\u1f54) 1 := GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA
ὕ (\u1f55) 1 := GREEK SMALL LETTER UPSILON WITH DASIA AND OXIA
ὖ (\u1f56) 1 := GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI
ὠ (\u1f60) 2 := GREEK SMALL LETTER OMEGA WITH PSILI
ὤ (\u1f64) 1 := GREEK SMALL LETTER OMEGA WITH PSILI AND OXIA
ὦ (\u1f66) 1 := GREEK SMALL LETTER OMEGA WITH PSILI AND PERISPOMENI
ὰ (\u1f70) 1 := GREEK SMALL LETTER ALPHA WITH VARIA
έ (\u1f73) 1 := GREEK SMALL LETTER EPSILON WITH OXIA
ὴ (\u1f74) 1 := GREEK SMALL LETTER ETA WITH VARIA
ὸ (\u1f78) 1 := GREEK SMALL LETTER OMICRON WITH VARIA
ᾱ (\u1fb1) 1 := GREEK SMALL LETTER ALPHA WITH MACRON
ᾶ (\u1fb6) 1 := GREEK SMALL LETTER ALPHA WITH PERISPOMENI
᾽ (\u1fbd) 1 := GREEK KORONIS
ῆ (\u1fc6) 8 := GREEK SMALL LETTER ETA WITH PERISPOMENI
ῑ (\u1fd1) 1 := GREEK SMALL LETTER IOTA WITH MACRON
ΐ (\u1fd3) 1 := GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
ῖ (\u1fd6) 4 := GREEK SMALL LETTER IOTA WITH PERISPOMENI
ῦ (\u1fe6) 2 := GREEK SMALL LETTER UPSILON WITH PERISPOMENI
ῶ (\u1ff6) 2 := GREEK SMALL LETTER OMEGA WITH PERISPOMENI
HEBREW
דְּ (\u05d3\u05bc\u05b0) 1 := HEBREW LETTER DALET + HEBREW POINT DAGESH OR MAPIQ + HEBREW POINT SHEVA
נֵ (\u05e0\u05b5) 1 := HEBREW LETTER NUN + HEBREW POINT TSERE
רְ (\u05e8\u05b0) 1 := HEBREW LETTER RESH + HEBREW POINT SHEVA
OTHER EXTENDED ASCII CHARACTERS
¦ (\u00a6) 11506 := BROKEN BAR
× (\u00d7) 1 := MULTIPLICATION SIGN
ʼ (\u02bc) 267 := MODIFIER LETTER APOSTROPHE
‒ (\u2012) 44 := FIGURE DASH
– (\u2013) 5 := EN DASH
— (\u2014) 14996 := EM DASH
“ (\u201c) 992 := LEFT DOUBLE QUOTATION MARK
„ (\u201e) 992 := DOUBLE LOW-9 QUOTATION MARK
‥ (\u2025) 308 := TWO DOT LEADER
… (\u2026) 228 := HORIZONTAL ELLIPSIS
‿ (\u203f) 1048 := UNDERTIE
⁓ (\u2053) 2 := SWUNG DASH
√ (\u221a) 1010 := SQUARE ROOT
⏑ (\u23d1) 44 := METRICAL BREVE
⏓ (\u23d3) 5 := METRICAL SHORT OVER LONG
⸗ (\u2e17) 4 := DOUBLE OBLIQUE HYPHEN
〉 (\u3009) 43376 := RIGHT ANGLE BRACKET
〔 (\u3014) 896 := LEFT TORTOISE SHELL BRACKET
〕 (\u3015) 896 := RIGHT TORTOISE SHELL BRACKET
〰 (\u3030) 5958 := WAVY DASH