Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IEG meta/iast conversion #205

Closed
funderburkjim opened this issue Feb 7, 2018 · 5 comments
Closed

IEG meta/iast conversion #205

funderburkjim opened this issue Feb 7, 2018 · 5 comments
Labels
Documentation How TXT , XML work

Comments

@funderburkjim
Copy link
Contributor

This issue devoted to comments regarding the conversion of the Cologne digitization ieg.txt of the Indian Epigraphical Glossary.

@funderburkjim
Copy link
Contributor Author

IAST issues

The text contains words which are in Dravidian language(s), as well as Sanskrit words. In the Dravidian
words there are several consonants which in print form have a diaresis beneath the Latin letter. Since
there are no unicode code points of this form, I've used the 'with line beneath' Unicode characters.
For instance:
image
displays as
image

There is one such Dravidian character, described as 'Dravidian cerebral voiced fricative' which occurs
100+ times, and for which no Unicode representation was coded:
image
This was left in its AS form as l13:
image

Another minor point is that I used 'ṃ' for anusvara, instead of the author's ṁ.

@funderburkjim
Copy link
Contributor Author

Sections of the text

The digitization includes not only the main section of entries, but apparently all of the text. There are
the following sections:

; TITLE
; DEDICATION
; CONTENTS
; PREFACE
; SPECIAL ABBREVIATION
; GENERAL ABBREVIATIONS
; SYSTEM OF TRANSLITERATION
; ENTRIES
; APPENDIX I
; APPENDIX II
; APPENDIX III
; INDEX
; ADDITIONS AND CORRECTIONS

Problematic headwords

The approximately 8000 headwords identified include about 6800 in the main section, and 1150 in the
three Appendices.

The 700+ headwords from the second appendix TAX NAMES IN DRAVIDIANLANGUAGES are problematic.
They are almost exclusively non-Sanskrit words. Currently these Dravidian words make their presences
felt in our headword compendiums such as sanhw1 and hwnorm1c. There are also several words in
the main section that seem problematic, for instance words with SLP1 spelling ending in 'E'. For instance,
image
In such cases, where the author specifically names a Sanskrit word, perhaps we should use the
Sanskrit word as the 'key1'.

There are also some English words in the headword list, for instance 'diestruck', from the 3rd appendix
NAMES OF COINS, METAL WEIGHTS:
image

As with the Dravidian words, the inclusion of these words in our lists of Sanskrit words is misleading.

@funderburkjim
Copy link
Contributor Author

Suggested Enhancement: abbreviations

There are digitized sections on abbreviations in the preface. These could provide the basis for <ab> markup that would facilitate tooltips for users.

@funderburkjim
Copy link
Contributor Author

Markup

Aside from the meta-line markup, the only markup added is for line breaks.
<div n="lb"> for simple line break, and <div n="P"> for indented line breaks.

@funderburkjim
Copy link
Contributor Author

The converted form has now been installed at Cologne.

@funderburkjim funderburkjim added the Documentation How TXT , XML work label Feb 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation How TXT , XML work
Projects
None yet
Development

No branches or pull requests

1 participant