Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AE meta/iast conversion #217

Closed
funderburkjim opened this issue Mar 12, 2018 · 6 comments
Closed

AE meta/iast conversion #217

funderburkjim opened this issue Mar 12, 2018 · 6 comments
Labels
Documentation How TXT , XML work

Comments

@funderburkjim
Copy link
Contributor

This issue for comments related to the meta-line/iast conversion of ae.txt, the Cologne digitization of Apte Student's English-Sanskrit Dictionary.

@funderburkjim
Copy link
Contributor Author

Very little IAST

In a prior update, AS codings were changed to IAST. There are only a few instances (< 100).

@funderburkjim
Copy link
Contributor Author

Markup minimal

<div n="lb"/>

This line-break markup is the only markup added during this conversion. It is appropriate since the
lines of the digitization of AE correspond to the lines of the printed text.

@funderburkjim
Copy link
Contributor Author

Previously mentioned enhancements

As a reminder, here is a collection of references to previously suggested improvements to AE

@funderburkjim
Copy link
Contributor Author

Complete hyphenations enhancement

There are about 9000 instances of Sanskrit words which are hyphenated at a line break; and about 500 other hyphenated words.

These need to be resolved (and marked with the <lbinfo> idea).

complete abbreviated instances

Consider the example under headword 'bristle':
image

b. ing would be much clearer without the abbreviation: bristling.

Note that the expansion is not simple string concatenation: 'bristle' + 'ing' NOT EQUAL 'bristling' --- the
'e' is dropped. So this is an interesting English NLP problem to solve.

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Mar 13, 2018

Sections of Digitization

The digitization of ae also includes the front matter of the text, and a short appendix of abbreviations:

      3:; TITLE
     37:; PREFACE to 3rd edition
     72:; PREFACE to 1st edition
    440:; PREFACE to 2nd edition
    456:; DIRECTIONS TO THE STUDENT
    548:; ENTRIES
  89379:; ABBREVIATIONS

Tooltips for general abbreviations

The abbreviations section has subsections:

  • Grammatical Terms &c.
  • Names of Works.

As with other dictionaries, the digitization utility would be increased by marking the abbreviations, and
constructing a table of expansions which could be used by displays to construct tooltips.

@funderburkjim funderburkjim added the Documentation How TXT , XML work label Mar 13, 2018
@funderburkjim
Copy link
Contributor Author

The meta-line conversion is now installed at Cologne.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation How TXT , XML work
Projects
None yet
Development

No branches or pull requests

1 participant