Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AE corrections by page #340

Closed
funderburkjim opened this issue Mar 11, 2017 · 36 comments
Closed

AE corrections by page #340

funderburkjim opened this issue Mar 11, 2017 · 36 comments

Comments

@funderburkjim
Copy link
Contributor

From the experience of AE corrections in #318, this dictionary is very dirty. There are many errors still remaining, despite the large number of corrections from #318.

Thus, I think that to get a reasonably clean dictionary, we need to systematically review the words on each page.

To that end, a UI has been developed with the aim of making this process of correction of words on a page as efficient as possible.

The main difference from the other correction UIs is that all the lines of a word entry are editable in a
WYSIWIG way (using the tinyMCE javascript library in the background).

I'd like others to try out the current sample of this UI, and make suggestion comments,
This sample deals just with the words on page 1 of AE.

@funderburkjim
Copy link
Contributor Author

demolink devanagari

demolink slp1

Note: this was developed on local computer, and then uploaded to Cologne. I think it works on Cologne as locally, but have not checked.

Thanks in advance for any feedback.

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented Mar 12, 2017 via email

@gasyoun
Copy link
Member

gasyoun commented Mar 12, 2017

There is a lot of space around.

Wasted space, agree.

uiui

This will empower reading of scanned and digitized things side by
side.

Agree, that's similar to https://en.wikisource.org/wiki/Page:Sanskrit_Grammar_by_Whitney_p1.djvu/103 but even more advanced because of the editing in WYSIWYG mode.

And before hand cleaning I would propose to do some regex cleanup.

Abate, v. t. ह्रस् c, लघयति (D.), शम् c.

The c, has no dot, but should always have, like in शम् c., can we check for them and if not, add?

What about extracting all devanagari words and comparing to MW, never done before, right? Should we not work with this good looking UI after the big dirt is out? I think it will give more fruit, if we use the old batch methods to weed out the hundreds of spelling mistakes out there.

@funderburkjim
Copy link
Contributor Author

about extracting all devanagari words and comparing to MW.

This WAS done (and in fact comparisons made to all dictionary headwords) in the prior step of correction (see #318). That step got around 1500 corrections, as I recall.

@funderburkjim
Copy link
Contributor Author

c, -> c.

Good observation. This is always 'causal'. Will put it on todo list.

@funderburkjim
Copy link
Contributor Author

The wasted space is not a big deal here.

Since we are dealing with just one scan page at a time, the user can click to open that page in a separate window, do the scan enlargement, and be set for all the cases in the batch.

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Mar 27, 2017

Sampada's begun working on the pages, and I've done a few.

It seems to take about 30-45 min per page. On the first 9 pages, there were 67 corrections -- that would
be 7 per page.

If @juhnowski @SergeA or others preferring Devanagari want to do any, please let me know, so work can be
coordinated with what Sampada is doing. I'm planning to do installation about every 10 pages, so not all pages are prepared at once.

The UI should already support Devanagari, but only limited testing has been done with Devanagari. Although the correction UI is WYSIWIG, there are a couple of
quirks of data entry that a corrector needs to understand so that corrections are properly interpreted.

@gasyoun
Copy link
Member

gasyoun commented Mar 28, 2017

It seems to take about 30-45 min per page. On the first 9 pages, there were 67 corrections -- that would
be 7 per page.

That's a lot both ways. Thanks to Sampada.

I'm planning to do installation about every 10 page

Understood.

@funderburkjim
Copy link
Contributor Author

Meant to say @SergeA in comment above.

@funderburkjim
Copy link
Contributor Author

Can anyone recognize this word, under headword 'arch':

image

@gasyoun
Copy link
Member

gasyoun commented Mar 31, 2017

चनुर

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented Mar 31, 2017 via email

@funderburkjim
Copy link
Contributor Author

@drdhaval2785 Thanks. That fits with vidagDa.

@sanskritisampada
Copy link

sanskritisampada commented Apr 1, 2017 via email

@gasyoun
Copy link
Member

gasyoun commented Apr 1, 2017

Did not expect to find so many errors

I did. I only hope that it's the best UI possible, so you spend as little amount of time as possible in doing so. Thanks, Sampada!

@funderburkjim
Copy link
Contributor Author

In AE, the author typically writes a root as 'masj' (to sink, to bathe) and gives forms as 'majjati', etc.
Also 'masja' in SKD, VCP.

In MW, PW , WIL the root shows as 'majj' .

So far, I haven't found an author who mentions both forms.

Two questions:

  • Are we right in thinking 'masj' and 'majj' are the same root?
  • Is there some explanation for this difference in different dictionaries?

@gasyoun
Copy link
Member

gasyoun commented Apr 11, 2017

Is there some explanation for this difference in different dictionaries?

In Zaliznyak, the authority above all in Russian on dhatus, it's called majj. And it was just yesterday I was exploring it. But I've never seen 'masj' and 'majj' in one source, it's allomorphs of the same root. I've never seen an explanation. MW, PW are on the right track.

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented Apr 11, 2017 via email

@gasyoun
Copy link
Member

gasyoun commented Apr 11, 2017

masj is what remains.

So it's not a final product, but a mid-stage byproduct, we could say so, Dhaval?

@funderburkjim
Copy link
Contributor Author

Really good explanation as to why both forms are acceptable as 'the' root.

@funderburkjim
Copy link
Contributor Author

question on rakzaRasAna

Under headword bulwark in AE, we see

bulwark [p= 044] : Bulwark, s. वप्रः-प्रं, प्राकारः. 2 आश्रयः, श- 
-रणं, संश्रयस्थानं, आलंबः; रक्षणसानं. [L=1285]

Is rakzaRasAna possible (if so how to understand 'sAna') -- the print shows this.

Or, should it be rakzaRasTAna which makes sense (a bulwark is a 'place of protection' ) ?

@funderburkjim
Copy link
Contributor Author

question on jrim

Under headword COOL, with sense 'to cool down', AE appears to have 'jrim', maybe. Such a Sanskrit
word is found nowhere.

Could Apte have meant 'jfmB' (MW spelling), one of whose meanings (with causal) is 'cause to feel at ease'?

image

@funderburkjim
Copy link
Contributor Author

funderburkjim commented Aug 9, 2017

missing headword 'Invincible'

Sampada discovered a missing headword in the digitization for page 239 of AE.

2388 old
32388 new <P>{@Invincible,@} {%a.%} {#ajawya, durjaya, durAsada, a-#}
32388x ins {#-Dfzya, adamya.#}

To add this without changing subsequent L-numbers is a problem at the moment. Do this when the meta-line form of AE has been accomplished.

This would be best done AFTER Sampada has finished the rest of
page-by-page corrections of AE, probably sometime toward the end of this year.

@gasyoun
Copy link
Member

gasyoun commented Aug 9, 2017

To add this without changing subsequent L-numbers is a problem at the moment

Let's change. We do not care yet for others as MW.

@funderburkjim
Copy link
Contributor Author

Question re kOwwinya

In AE, under headword 'pander' we find spelling kOwwinyaM:
image

This spelling is also shown in another edition of AE.

However, only kOwwanya is found in any dictionary (MD,MW,PW,AP).

Should the 'inyam' spelling be considered a print error in AE ?

@SergeA
Copy link

SergeA commented Oct 20, 2017

कुट्टनी=कुट्टिनी >> कौट्टन्यम्=कौट्टिन्यम्
The spelling कौट्टिन्यं is possible, so there is no problem with errors.
If we'll compare the dics we'll find that there is only one source reference to Rajatarangini given by PW, copied by MW and recopied by AP. So we have 2 possible forms with 1 source for the first and no sources for the other. But it does not mean the second is wrong.

@SergeA
Copy link

SergeA commented Oct 20, 2017

Concerning the previous question for supposed ज्रिम्

with sense 'to cool down', AE appears to have 'jrim'

another edition by the last link gives perfect reading विरम्

@SergeA
Copy link

SergeA commented Oct 20, 2017

question on rakzaRasAna

also by the link - रक्षणसाधनं

@funderburkjim
Copy link
Contributor Author

@SergeA Thanks for explanation and research on these three.

kOwwinyam

My understanding now is that kOwwinyam is derived from kuwwinI by some 'ya' taddhita suffix formation rule which introduces the vriddhi O of u; and kOwwanyam is similarly derived from kuwwanI.

jrim and rakzaRasAna

Have changed to viram and rakzaRasADanaM, per the newer print edition, classifying as print error in print edition used for Cologne digitization.

@gasyoun
Copy link
Member

gasyoun commented Oct 24, 2017

changed to viram and rakzaRasADanaM, per the newer print edition, classifying as print error in print edition used for Cologne digitization.

Long live the Jim.

@funderburkjim
Copy link
Contributor Author

DONE!

Sampada informed me today that all 501 pages of AE have now been examined and corrections provided.

All in all, there are about 7400 corrections as fruit of this year-long endeavor, approximately 15 corrections per page.

Let's all doff our hat to Sampada for her persistence in seeing this project through. The Cologne
digitization of Apte's English-Sanskrit Dictionary is now immensely better and more useful.

Thanks, Sampada!

@drdhaval2785
Copy link
Contributor

I extend my gratitude to Sampada.

@gasyoun
Copy link
Member

gasyoun commented Mar 13, 2018

7400 corrections - what patience did it take.
Nothing of this spirit is seen in Pune. It would
take 10 Sampadas to finish that dictionary, but
we have only 1, so I bow to her lotus feet.

@funderburkjim
Copy link
Contributor Author

invincible added

The correction to add 'invincible' as headword (see comment above) has now been made to the meta-line
version of ae. It is L=5754.1. Love those decimal L-numbers :)

@sanskritisampada
Copy link

sanskritisampada commented Mar 14, 2018 via email

@gasyoun
Copy link
Member

gasyoun commented Mar 15, 2018

@sanskritisampada do not try for this elephant to look like a puppy, it's huge! What's next?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants