PWG meta-line/IAST conversion #190

funderburkjim · 2017-10-23T23:43:12Z

This is a placeholder for questions which arise in the course of this conversion, which will begin in a few weeks.

I'm starting this issue now to have a place for this link to a related question.

gasyoun · 2017-10-24T08:51:28Z

begin in a few weeks

PWG and PW was my most wanted.

funderburkjim · 2017-11-14T04:38:56Z

Adjust page breaks within `<ls>`.

In the digitization, Page breaks are indicated accurately, and sometimes such page breaks occur
in the middle of a literary source. For instance, under hw agnIzomIya (slp1), Proper recognition of such a literary source is much more convenient if the page break is moved out. And this is what I've done as
part of the meta-line conversion. Using the original markup, here's the agnizomIya example.

OLD
{¤AV.[Page01.0038]9, 6, 6.¤}
NEW
{¤AV.9, 6, 6.¤} [Page01.0038]   
FULL LS ITEM then Page break.

The set of all instances (1751 of them) are in this file:

pwg_ls_page_adj.txt

The first two numbers in the file are

case number
line-number within the pre-meta-line version of pwg.txt.

gasyoun · 2017-11-14T08:13:56Z

So 1751 times the source was broken = unrecognized. I guess we can recognize, add markup and get back or the line brakes will be left everywhere except literary sources?

funderburkjim · 2017-11-16T05:06:14Z

All the line breaks are present; they are just offset slightly so they don't occur in the middle of a literary source.

gasyoun · 2017-11-16T06:10:34Z

offset slightly

Oh, understood.

funderburkjim · 2017-11-17T01:15:48Z

LS separation

Here is an example where there are several distinct literary sources that are 'run together'. First the scan:

Current coding

FIRST 
<ls>R2V. 1, 46, 10. 91, 17. 125, 3. 7, 98, 1. 8, 61, 2. 9, 62, 4. 67, 28. 68, 4. 74, 5. VS. 5, 7. 20, 27. AV. 6, 49, 2. 11, 1, 9.</ls>
SECOND
 <ls>AK. 1, 1, 2, 34. H. 99. MED. c2. 1. R. 1, 7, 17.</ls>

suggested revised coding

FIRST
<ls>R2V. 1, 46, 10. 91, 17. 125, 3. 7, 98, 1. 8, 61, 2. 9, 62, 4. 67, 28. 68, 4. 74, 5.</ls> <ls>VS. 5, 7. 20, 27. </ls> <ls>AV. 6, 49, 2. 11, 1, 9.</ls>
SECOND
 <ls>AK. 1, 1, 2, 34.</ls> <ls>H. 99.</ls> <ls>MED. c2. 1.</ls> <ls>R. 1, 7, 17.</ls>

Questions:

Is the revised coding better, because it separates distinct sources?
Is it worthwhile to spend some time now to attempt to devise some programmatic way to
generate the revised coding?
- This is probably non-trivial.

drdhaval2785 · 2017-11-17T02:11:08Z

Seems worthwhile.

gasyoun · 2017-11-17T05:34:23Z

Is the revised coding better, because it separates distinct sources?

You know it is. Based on it one day we can add hyperlinks.

Is it worthwhile to spend some time now to attempt to devise some programmatic way to
generate the revised coding?

If it's weeks - yes. If months - no. I would divide the trivial and leave the non-trivial if a solution can't be found in a week.

funderburkjim · 2017-11-17T22:28:51Z

Wide text in pwg

The digitization uses a special coding for text which is printed in a typographically distinct form which might be described as 'wide' format (extra space between letters).
There are 51000+ such instances, with 9000+ distinct instances.
Here are a few samples from print:
hw = a

hw = a, col. 2

hw = a, col 2, near bottom

SIgnificance ?

One question is, what is the author's intent in using this special typography? Maybe there are multiple purposes. Maybe it's just a print-setting phenomenon with no semantic content.

Such a typographic feature is noted in the digitization of some other dictionaries. Notably PW(K),.

In the meta-line conversion of PW, such text was tagged as <is>X</is>. I guess I'll use the same tag here in PWG.

gasyoun · 2017-11-17T22:45:10Z

Maybe it's just a print-setting phenomenon with no semantic content.

Does not seem so. @SergeA any clue?

SergeA · 2017-11-18T00:36:37Z

One question is, what is the author's intent in using this special typography?
Maybe it's just a print-setting phenomenon with no semantic content.

All these examples show transliterated words, as terms (gaṇa, avj.) or names (Viṣṇu, Śaṁkar.) given in their full form, or abbreviated, and which are neither German words, nor Sanskrit headwords, nor Sanskrit quotations. So they are printed in a peculiar way for easy separating from other text.

In this connection I want to mention the output I´ve seen in MW. There transliterated terms as "Vedas" etc. while selected Devanagari output are represented as "वेदs". That's not good at all. Transliterated terms and names must be treated separately from real Sanskrit words (stems and quotations). The term "Vedas" should be rendered in Latin letters, no matter which output is selected. A separate markup for these words in digitalization allows also to change the outdated transliteration scheme etc.

funderburkjim · 2017-11-18T20:56:37Z

the output I´ve seen in MW ...

You make an interesting observation. I've opened an MWS issue as a placeholder for responding ... don't want to divert right now from thinking about PWG.

avj.

Many of the words seem to be Sanskrit words. But what kind of abbreviation is avj. ? If not Sanskrit, maybe this is miscoded.

Reference of all instances

For reference, this file has a listing of the current instances, with frequency.
temp_filter_wide.txt

In this file, there are many letter-number codings. As a later step in this conversion, I'll transcode these
to modern IAST.

I started using the <is> tag back in the IAST conversion of Burnouf, when such words were shown in
italic script, so the acronym is 'italicized Sanskrit'.

SergeA · 2017-11-18T22:07:23Z

But what kind of abbreviation is avj. ?

I suppose avj. can be for avjaja (avyaya) = indeclinable.

funderburkjim · 2017-11-19T01:27:48Z

Nax. question

Nax. is coded as 'wide' text 250 or so times. A random sample of these indicates that they are all part of some literary source reference related to WEBER.
E.g.

It will have a chance to get transcoded to IAST by virtue of being part of a literary source.

I think this should not be coded as 'wide'.

Any objections, @SergeA ?

funderburkjim · 2017-11-19T01:51:42Z

More wide text with literary sources

Nax., and avj. mentioned above have the common feature that they coded as 'wide' text that occurs within the scope of a literary reference. So they are recognizable as having form <ls>...<is>X</is>...</ls>. A search for all such X identifies approx. 140 distinct such X, occuring 2900+ times.
These X values are in temp_filter_isls.txt.
Maybe given all these instances, which I find hard to understand, I should just leave the coding
alone for now (This is a change of opinion from the comment I think this should not be coded as 'wide'. under Nax above.)

How to read Agni ?

Here's an instance with Agni. How should this literary source be read?

SergeA · 2017-11-19T02:56:24Z

By comparison with the corresponding RV text, the reading is:
Ṛv. 1, 13, 5. Agni 5, 4, 3. 37, 1. 7, 2, 4.
Ṛv. 1, 13, 5. = see ṚgVeda, mandala 1, hymn 13, verse 5
Agni 5, 4, 3. = see the same RV, verse 5.4.3, where the headword ghṛtapṛṣṭha is in relation with agni
37, 1. = see the same RV, the same 5th mandala, hymn 37, verse 1
7, 2, 4. = see the same RV, verse 7.2.4
(The word agni is related only to the the verse RV 5.4.3, and not to the next referenced 5.37.1 ; 7.2.4 etc.)

And also:
dessen Rosse Ṛv. 1, 14, 6. ऊर्मि 10, 30, 8.
dessen Rosse Ṛv. 1, 14, 6 = see RV 1.14.6 ... (here I'm not sure about "dessen Rosse")
ऊर्मि 10, 30, 8. = see the same RV., verse 10.30.8, where the headword ghṛtapṛṣṭha is in relation with ūrmi

As I noticed, the numbers visually differ, according to the level of text divisions. The number for the big section is the highest and also black, the number for the verse is smallest. This representation makes the references more readable.

SergeA · 2017-11-19T03:27:33Z

Nax. is explained in the sources
4.018WEBER, Nax. = WEBER, Die vedischen Nachrichten von den Naxatra (Mondstationen). Berlin, 1860. 1862.WEBER, Nax.
so nax. is a term Naxatra, nakṣatra, used within a source name.
Perhaps there is some system in combining small caps with wide font in PWG sources. I do not know.

gasyoun · 2017-11-20T06:11:47Z

The term "Vedas" should be rendered in Latin letters, no matter which output is selected.

Indian users would disagree.

As I noticed, the numbers visually differ, according to the level of text divisions. The number for the big section is the highest and also black, the number for the verse is smallest. This representation makes the references more readable.

Yeah, in the past Jim was able to represent the levels with font sizes at a REGEX level.

Perhaps there is some system in combining small caps with wide font in PWG sources.

Small caps is reserved for sources only, right?

funderburkjim · 2017-12-01T20:34:36Z

Just a note to let others know that progress is being made in the <ls> refactoring for pwg. I hope a graceful stop point will be reached next week. Please be patient.

gasyoun · 2017-12-02T01:46:22Z

Just a note

Good to know.

funderburkjim · 2017-12-16T00:46:08Z

This round of work on pwg is primarily over. The current status is that the Basic, List, Adv. Search, and mobile1 displays are all based on the new form of the data. The data used in the list-0.2, list-0.2s displays is based on the prior form of the data.
Others should examine the current displays.

My next task will be to document what has been done, and some things that remain to be done some other time.

I'll also work to get caught up with the comments others have made while I've been on this pwg excursion.

I'll pull list-0.2(s) to use the current data when a bit of time has passed and we don't need to look at the old form for comparison.

gasyoun · 2017-12-16T03:58:15Z

I'll pull list-0.2(s) to use the current data when a bit of time has passed and we don't need to look at the old form for comparison.

So be it. Was missing you on this trip around (or rather inside) the world.

funderburkjim · 2017-12-16T23:09:00Z

Summary of changes to pwg

The main changes were to the base digitization, pwg.txt. These changes flowed through to similar changes in pwg.xml. In addition, a few differences in the base display were introduced.

pwg-meta2

Most of the changes to pwg.txt are quite technical in nature. One way to understand these changes
is to compare the meta files before and after. The pwg-meta.txt file describes salient features of the digitization before recent changes; the pwg-meta2.txt file pertains to the current form of the digitization. Copies of both these files are in this gist.

Sample comparison

An intuitive understanding of the changes is given by a close reading of comparable entries in the previous and current versions of the digitization. Since it is short, the first entry is a good place to start.

PREVIOUS (pwg8.txt)

[This is just one line -in pwg8.txt- I've introduced line breaks so this comment will be easier to read]

<H1>000{a}1{a}^1¦ Interj. {#a apehi#} (die beiden Vocale fliessen nicht in einander) 
¯{¤P. 1, 1, 14, Sch.¤}; vgl. †{gan2a} {#cAdi#} und ¯{¤VOP. 2, 19.¤} 
Drückt Mitleid aus ({#anukampAyAm#}) ¯{¤MED. †{avj.} 2.¤}

CURRENT (pwg.txt)
[There are several shorter lines in pwg.txt]

<L>1<pc>1-0001<k1>a<k2>a<h>1
1. {#a#}¦ Interj. {#a apehi#} (die beiden Vocale fliessen nicht in einander) 
<ls>P. 1, 1, 14,</ls> 
<ls>Sch.</ls>; vgl. <is>gaṇa</is> {#cAdi#} und 
<ls>VOP. 2, 19.</ls> Drückt Mitleid aus ({#anukampAyAm#}) 
<ls>MED. <is>avy</is>. 2.</ls>
<LEND>

PRINTED TEXT

funderburkjim · 2017-12-16T23:51:42Z

Guided tour of the comparison

Header

The old header is <H1>000{a}1{a}^1¦. In the new form, this contributes to:

The meta-line: <L>1<pc>1-0001<k1>a<k2>a<h>1. But note that this meta line also has
- L the cologne id of this entry.
- pc the identification of the printed page of the entry.
The first part of the 'body' of the entry: 1. {#a#}¦. Note that this format now corresponds closely
to the beginning of the printed text.

Sanskrit text and italic text

These are identified in the same way in both forms: {#X#} and {%X%}. [No italic text in this example].

Literary source

OLD : ¯{¤P. 1, 1, 14, Sch.¤}
NEW : <ls>P. 1, 1, 14,</ls> <ls>Sch.</ls>

The first difference is simply a change of notation: from ¯{¤X¤} to <ls>X</ls>.
The second difference is that the Scholiast abbreviation has been separated into a separate tag in the
new form. In this case, this should be considered a minor flaw of the new form, since the
preceding <ls> ends with a comma; this ls-scope problem is quite thorny, and I'll discuss it more
fully in a separate issue.

iast text

Words appearing in the original digitization with coding †{X} are transformed to <is>X</is> in the new form. The feature of the printed form is wide letter spacing. There are two instances in this
example.

In the old form, X is coded in the AS (letter-number) system (e.g. gan2a).

Examination of the instances throughout the text led me to believe that X is always a Sanskrit word appearing in Roman alphabet with diacritics. The text author uses his own system of diacritics. With this assumption, X in the new coding is transformed to modern IAST. So, gan2a becomes gaṇa, and
avj becomes avy.

The distinct occurences of these are relatively rare. I'll discuss this more fully in a separate issue.

Incidentally, note closely the position of the period in the second example: †{avj.} and <is>avy</is>.
Since (as @SergeA pointed out) this is probably an abbreviation for avyaya, it might be better for
the new form to have the period within the scope of the tag here : <is>avy.</is>.

funderburkjim · 2017-12-17T00:36:30Z

Divisions

The other major difference in the two forms regards coding of subdivisions within an entry.
We need to look at other entries to see this.

letter divisions

The second entry shows letter divisions:
OLD

<H1>000{a}1{a}^2¦ Pronominalstamm: ²a) der 1sten Person, enthalten in 
{#aha/m, AvA/m, AvA/ByAm, Ava/yos, asmA/n, asmA/Bis, asma/Byam, asma/t, asmA/kam, asmA/su#}
 und im ved. {#asme/#} . -- ²b) der 3ten Person; •f. {#A#} .   ............... etc.

NEW

<L>2<pc>1-0001<k1>a<k2>a<h>2
2. {#a#}¦ Pronominalstamm: 
<div n="2"> a) der 1sten Person, enthalten in {#aha/m, AvA/m, AvA/ByAm, Ava/yos, asmA/n, asmA/Bis,
 asma/Byam, asma/t, asmA/kam, asmA/su#} und im ved. {#asme/#} . 
<div n="2">— b) der 3ten Person; <lex>f.</lex> {#A#} .

Compare ²a) in the old form to <div n="2"> a) in the new form.
Also, note the line break at the division in the new form. This helps to break up very long lines in the
original digitization into much more manageable (easier to handle in corrections) in the new form.

In comparing -- ²b) to <div n="2">— b), note that

the double-hyphen -- is changed to an em-dash — and
that em-dash is within the scope of the <div> tag.

It seems to be a feature of the print that the first division of a sequence has no em-dash.

number divisions

Number divisions are similar. Compare ³4) to <div n="1"> 4)

Greek alphabet divisions

Compare ¹a) to <div n="3"> α).
Note that the old form uses a system of Latin letters to represent Greek letters, while the new form uses Unicode Greek letters directly.

Number, Letter, , Greek hierarchy

I think the prevailing hierarchy principle is : Numbers > Letters > Greek.
However, there are certainly exceptions. For instance, look at the entry a above, where
there are two letter divisions, but no number division. Further study might provide more insight into
this aspect of the author's organization.

Prefix verb forms

For verb entries, the author uses a generally consistent system of presenting prefix forms.
Consider the first verb aMsay:
OLD

<H1>000{aMsay}1{aMsay},¦ {#aMsa/yati#} ³1) {%theilen%}, ¯{¤KAVIKALPADR. im C2KDR.¤}; vgl. 
{#aMSay#} . -- ³2) {%schlagen, kämpfen%} ({#samAGAte#}) ¯{¤WEST. Dha10tup. §35, 64.¤}

-<P>- {#vi#} {%theilen, brechen, unschädlich machen, abwehren%}: {#SaktiM vyaMsitAM mADavena#} 
¯{¤MBH. 1, 197.¤} {#vyaMsayAmAsa taM tasya prahAram#} ¯{¤3, 11728.¤}

NEW

<L>41<pc>1-0006<k1>aMsay<k2>aMsay
{#aMsay#}¦, {#aMsa/yati#} 
<div n="1"> 1) {%theilen%}, 
<ls>KAVIKALPADR.</ls> im <ls>ŚKDR.</ls>; vgl. {#aMSay#} . 
<div n="1">— 2) {%schlagen, kämpfen%} ({#samAGAte#}) 
<ls>WEST. Dhātup. § 35, 64.</ls>

<div n="p">— {#vi#} {%theilen, brechen, unschädlich machen, abwehren%}: {#SaktiM vyaMsitAM mADavena#} 
<ls>MBH. 1, 197.</ls> {#vyaMsayAmAsa taM tasya prahAram#} 
<ls>3, 11728.</ls>
<LEND>

So the general conversion is -<P>- to <div n="p">—.

Note that the prefix in question generally appears just after the division markup (space + {#X#}), so
that it should be easy to pull out the prefixes for a given verb as a first step in generating extra
prefixed verb headwords (such as vi + aMsay -->[Sandhi] vyaMsay).

Vgl. divisions

It seemed to me that there is a common pattern which should be marked as a division, although the
original coding did not provide this.

<L>31636<pc>3-0487<k1>dakziRasTa<k2>dakziRasTa
{#dakziRasTa#}¦ ({#da° + sTa#}) <lex>adj.</lex> {%zur Rechten stehend%}; <lex>m.</lex> {%Wagenlenker%} 
<ls>AK. 2, 8, 2, 28.</ls> 
<ls>H. 760.</ls> 
<div n="v">— Vgl. {#savyezWa#} .
<LEND>

Vgl. is an abbreviation:

Yes, vgl. (with an L) is a common abbreviation for vergleiche (compare). I believe the English equivalent 
is cf. (abbreviation of Latin confer, sometimes also used in German).J

funderburkjim · 2017-12-17T00:48:01Z

Other conversion details

lex tag

•f. becomes <lex>f.</lex>. Also for m. , n. and adj..

lang tag

Various language tags are all changed to the <lang> tag used in recent conversions of other dictionaries.
Here's a summary

<g>X</g> -> <lang n="greek">X</lang>
 <R>X</R> -> <lang n="russian">X</lang>
  <A>X</A> -> <lang n="arabic">X</lang>
  <OH>X</OH> -> <lang n="oldhebrew">X</lang>

More special cases

  Replace ellipsis character … with space
  Replace -- with em-dash
  <sic>  with blank  (1 time: Klätscherei<sic> L = 45429, hw = piSunatA

`<ab>` tag

There are many, many abbreviations used in pwg. Although the new version of the digitization
marks almost NONE of these, provision has been made for this. I'll make a separate 'enhancement`
issue to discuss this further.

funderburkjim · 2017-12-17T01:03:46Z

xml markup

With the new form of the pwg.txt digitization, the xml form pwg.xml is only modestly different than pwg.txt. Here are the main differences

Meta-line

The meta line elements get converted to xml elements as follows:

<L>X -> <L>X</L>
<pc>X -> <pc>X</pc>
<k1>X -> <key1>X</key1>
<k2>X -> <key2>X</key2>
<h>X -> <hom>X</hom> (if homonym present)

body

The non-meta lines are put into the <body> element of the xml.
The only conversions are:

{#X#} -> <s>X</s> for Devanagari text, in SLP1 transliteration
{%X%} -> <i>X</i> for italic text.
<div> tags are open in pwg.txt; appropriately placed closing tags </div> are generated for
pwg.xml.
& -> & this is a requirement of the XML coding protocol.

n attribute of `<ls>`

This is the biggest difference between pwg.txt and pwg.xml. Where possible, we convert
<ls>X</ls> to <ls n="Y">X</ls> ; Y is the Cologne id for the literary source, as currently
determined by a particular master file pwgbib.txt. This assignment simplifies the generation of
tool tips for literary source elements in the displays of pwg. We may at some time choose to
have these Cologne id's (Y) as part of the pwg.txt markup, but I think it is premature to do so now.

funderburkjim · 2017-12-17T01:41:54Z

Display features

Many of the changes to pwg.txt do not show up as differences in the html displays for pwg, since
these changes were generated by (a) the conversion of the former digitization to xml and (b) by the logic that constructed html from the former xml.

However, there are a couple of display-visible differences:

`<ls>` tooltip

It was recently suggested by Marcis and others that there were some browser problems in the
use of links into popup windows as a technique for showing the user the expansion of the literary source abbreviations (this was relevant to the MW, PW and former PWG displays, which are the dictionaries where literary source markup is present).

Thus, as an experiment, I changed the display system for PWG to show the literary source expansions as tooltips. The system seems to work fairly well, although there are some funky details of the tooltip
display that may need attention before we apply this technique to MW and PW. In addition, there
are some details regarding the content of the tooltips that need attention. I'll discuss this more
fully in the separate issue regarding enhancements to PWG ls system.

It is quite tricky to convert the capitalized text of the literary source into a form that shows
larger and smaller capital letters in the display. Generally, the current display does this adequately,
but there are a few variations from the printed text; for instance in the ls abbreviation H. an., the
display shows an. in small caps., rather than in lower case. It is probably more trouble than it is
worth to alter this detail of the display.

Also, in the previous version of PWG display, an attempt was made to mimic the size differences
in the number sequences of an ls entry. I judged this attempt to have too many flaws, and to
be too difficult to do properly; and thus omit this flourish in the current display.

`<lex>` tag tooltip

Tooltips are displayed for the elements (m,f,n,adj) marked with the <lex> tag.

`<ab>` tag tooltip

There is provision in the displays for tooltips for <ab> markup; but, as mentioned, there is almost
none of this markup currently present.

IAST 'wide' text

Elements of form <is>X</is> are displayed in a way similar to the printed text,
by using the letter-spacing CSS feature. Specifically:
<is>X</is> -> <span style='letter-spacing:2px;'>X</span>. To my eye, this 2px spacing
is quite close to the printed text.

divisions

Divisions (<div n="1">X</div>) are indented, much as before.
The indentation increases as n goes from 1 to 3; when n=v or p, there is no indentation.

funderburkjim · 2017-12-17T01:44:26Z

This ends the comments that come to mind on the general features of the conversion.
Comments definitely welcome, as usual.

Additional issues will provide more details regarding <ls>, <is> and <ab> markup, and
areas where further work can improve this markup.

gasyoun · 2017-12-17T17:37:08Z

it might be better for
the new form to have the period within the scope of the tag here

makes sense. What a tremendous work!

I think the prevailing hierarchy principle is : Numbers > Letters > Greek

Agree.

should be easy to pull out the prefixes for a given verb as a first step in generating extra
prefixed verb headwords (such as vi + aMsay -->[Sandhi] vyaMsay).

The only big feature I lack myself badly left.

It seemed to me that there is a common pattern

Agree on Vgl. = compare.

We may at some time choose to
have these Cologne id's (Y) as part of the pwg.txt markup, but I think it is premature to do so now.

Agree.

an. in small caps., rather than in lower case. It is probably more trouble than it is
worth to alter this detail of the display.

Agree, not worth the trouble in 2018.

previous version of PWG display, an attempt was made to mimic the size differences
in the number sequences of an ls entry. I judged this attempt to have too many flaws, and to
be too difficult to do properly; and thus omit this flourish in the current display.

I disagree. It did help a lot not getting lost. I would want to see it as it is still possible, Jim. It's brilliant even as it was.

To my eye, this 2px spacing
is quite close to the printed text.

Indeed, but I would go for a CSS class and not just hard coding. But let it be, it's just the puritan in me. Because hard coding was old school even in 1999, the year I launched my 1st website. And by the fact - I'm in St. Petersbourg righ now, just in a few hundert metres away where the Dictionary was printed.

funderburkjim · 2017-12-21T01:26:55Z

@gasyoun Thanks for feedback, Marcis.

I'll take a look at mimicing the size differences in the number sequences of an ls entry again sometime. Bug me about it in a few months if it still comes up.

Is there some memorial in St. Pet. that identifies the spot where PWG was printed?

Right now, it's more convenient to imbed styles in disp.php, since there are different CSS files for the different displays (Basic, List, etc.). So by putting in disp.php, all displays get the benefit. Otherwise I'd have to change multiple css files. Not bragging about this arrangement for sure, but that's the way it is now.

Curious of your opinion on use of tooltips for LS references, rather than link to popup.

funderburkjim · 2017-12-21T01:38:50Z

The list-0.2s display now also is based on the new form of data for pwg.

gasyoun · 2017-12-21T02:12:34Z

list-0.2s display

Time to make it public?

funderburkjim · 2017-12-21T21:28:33Z

Added a link to list-0.2s on home page.

Needs documentation. Hint Hint!

funderburkjim · 2017-12-22T20:53:11Z

User comment re display details:

User Odile Caujolle made a comment regarding the popup LS references in MW, and I asked her to
review the tooltip version in PWG. While she apparently liked the tooltip aspect, she made these suggestions regarding other details of the display:

yes, but i must say that i find the display very illegible ...
the gray in between the black is hardly legible, and
 there is no underligning to inform that we can get some information for it. 
I would suggest to keep the flashing blue,   [the bright blue underling of LS sources in MW]
restore the underlining, 
but keep the little capitals and font

MW coloration:

PWG coloration:

What do others think?

gasyoun · 2017-12-23T02:12:39Z

I would suggest to keep the flashing blue, [the bright blue underling of LS sources in MW] restore the underlining, but keep the little capitals and font

Can only agree.

funderburkjim · 2017-12-26T21:37:29Z

Is this ok? (example from basic display for pwg)

gasyoun · 2017-12-27T05:58:37Z

Is this ok?

Blue is blue, but the reference sizes are gone and we sure want to see them back, as bad as they are - they give a visual hint.

SergeA · 2017-12-28T05:44:35Z

Tool-tips are good for abbreviations, and are not so good for the sources. For me the great benefit of the pop-up window for sources is the possibility of easy copying of the text. With the tool-tips this copying becomes impossible. Is it possible to make them tool-tipped but also keep clickable for pop-up window? So the copying functionality will not be lost.

The pop-up works fine in my FireFox, but in Chrome the line position is sometimes slightly misplaced.

funderburkjim · 2018-01-01T21:59:07Z

copying tooltip text

The pwg display example uses the default tooltip, so behavior is governed by the browser's internal (and not modifiable) behavior.

The jQueryUI Tooltip widget provides for customization. After half an hour of research, I found no immediate customization that permits copy-pasting from the tooltip text, but I suspect that this could be done.

Also, Bootstrap has tooltip functionality that might be customizable in this way.

An example from wikipedia

Look at this example (from Vedic Sanskrit article).

If you hover over one of the superscript numbers, you get a nicely formatted little 'tooltip' and you can move the mouse into it and copy/paste from it.

This looks like a very nice solution to me. What do you think?

SergeA · 2018-01-02T15:29:17Z

This looks like a very nice solution to me. What do you think?

Yes, this works.

gasyoun · 2018-01-02T16:15:46Z

This looks like a very nice solution to me.

Indeed. A copy-pastable tooltip would be a solution.

funderburkjim · 2018-01-02T21:29:52Z

OK. I'll put this on todo list.

It remains to know how to extract this particular piece of web technology
so that it can be applied in the context of Cologne display functions.

@gasyoun
Do you have any contacts who could learn how the wikipedia tooltip technique works? Is it all done in some Javascript library? or is it done via a PHP extension to the mediawiki software that runs wikipedia?
What we need is a small self-contained example.

gasyoun · 2018-01-02T23:06:52Z

Jim, let me explore.

gasyoun · 2019-09-05T12:20:30Z

@artforlife any idea?

drdhaval2785 · 2020-12-17T08:20:59Z

After sending the only remaining item to #321, this issue is safe to close.

funderburkjim mentioned this issue Nov 18, 2017

Coding of Sanskrit words in MW - sanskrit-lexicon/MWS#55

Closed

This was referenced Dec 19, 2017

PWG IAST conversion #195

Closed

Abbreviations in PWG #197

Open

drdhaval2785 mentioned this issue Dec 29, 2017

meta-line, IAST conversion tracker #177

Closed

drdhaval2785 mentioned this issue Dec 17, 2020

Copying tooltip text #321

Open

drdhaval2785 closed this as completed Dec 17, 2020

PWG meta-line/IAST conversion #190

PWG meta-line/IAST conversion #190

Comments

funderburkjim commented Oct 23, 2017

gasyoun commented Oct 24, 2017

funderburkjim commented Nov 14, 2017

Adjust page breaks within <ls>.

gasyoun commented Nov 14, 2017

funderburkjim commented Nov 16, 2017

gasyoun commented Nov 16, 2017

funderburkjim commented Nov 17, 2017

LS separation

Current coding

suggested revised coding

Questions:

drdhaval2785 commented Nov 17, 2017

gasyoun commented Nov 17, 2017

funderburkjim commented Nov 17, 2017

Wide text in pwg

SIgnificance ?

gasyoun commented Nov 17, 2017

SergeA commented Nov 18, 2017

funderburkjim commented Nov 18, 2017 • edited Loading

avj.

Reference of all instances

SergeA commented Nov 18, 2017

funderburkjim commented Nov 19, 2017

Nax. question

funderburkjim commented Nov 19, 2017

More wide text with literary sources

How to read Agni ?

SergeA commented Nov 19, 2017 • edited Loading

SergeA commented Nov 19, 2017

gasyoun commented Nov 20, 2017

funderburkjim commented Dec 1, 2017

gasyoun commented Dec 2, 2017

funderburkjim commented Dec 16, 2017

gasyoun commented Dec 16, 2017

funderburkjim commented Dec 16, 2017 • edited Loading

Summary of changes to pwg

pwg-meta2

Sample comparison

funderburkjim commented Dec 16, 2017

Guided tour of the comparison

Header

Sanskrit text and italic text

Literary source

iast text

funderburkjim commented Dec 17, 2017

Divisions

letter divisions

number divisions

Greek alphabet divisions

Number, Letter, , Greek hierarchy

Prefix verb forms

Vgl. divisions

funderburkjim commented Dec 17, 2017

Other conversion details

lex tag

lang tag

More special cases

<ab> tag

funderburkjim commented Dec 17, 2017

xml markup

Meta-line

body

n attribute of <ls>

funderburkjim commented Dec 17, 2017

Display features

<ls> tooltip

<lex> tag tooltip

<ab> tag tooltip

IAST 'wide' text

divisions

funderburkjim commented Dec 17, 2017

gasyoun commented Dec 17, 2017 • edited Loading

funderburkjim commented Dec 21, 2017

funderburkjim commented Dec 21, 2017

gasyoun commented Dec 21, 2017

funderburkjim commented Dec 21, 2017

Adjust page breaks within `<ls>`.

funderburkjim commented Nov 18, 2017 •

edited

Loading

SergeA commented Nov 19, 2017 •

edited

Loading

funderburkjim commented Dec 16, 2017 •

edited

Loading

`<ab>` tag

n attribute of `<ls>`

`<ls>` tooltip

`<lex>` tag tooltip

`<ab>` tag tooltip

gasyoun commented Dec 17, 2017 •

edited

Loading