Todo list as of December 2015 #181

drdhaval2785 · 2015-12-02T12:03:03Z

~~Extend the methods which we have used for cleanup of dictionaries to description also (See Extending faultfinder to sanskrit words in description #34, ) for methods.~~ DONE in Extend correction methods to description in dictionaries #309 09 Oct 2016
~~Abbreviation error corrections~~
Alternate readings should get headword status for all dictionaries (Only MW has it now). See Alternative readings should get headword status #35, Alternate headwords in PWG #133. https://github.com/sanskrit-lexicon/alternateheadwords is the dedicated repository to handle this problem.
~~hwnorm1 further development based on Different conventions of Sanskrit dictionaries #43 conventions. - Assigned to @drdhaval2785~~
~~Find and correct convention errors found out as a by product of point 4. - Assigned to @gasyoun~~
Prepare a javascript which would enable us to click on an L-id and we would have the standard format in clipboard. See point 2 in the link. - Assigned to @juhnowski
~~Design crowdsourcing platform for correction submission. - Assigned to @funderburkjim~~
Prepare a list of abbreviation / literary resources for all dictionaries. See Resource links #142 and Abbrv lists for all dictionaries #143. - Assigned to @gasyoun and @drdhaval2785
~~Prepare a wikisource-like platform for keeping track of correction history. - Assigned to @funderburkjim (EDIT - Shifted to csl-orig github repository for tracking history)~~
Get upasarga+dhatu words to headword status from PW, PWG or rather all dictionaries.
~~Prepare a mechanism by which webpage and PDFs can be accessed via L-number. - Assigned to @funderburkjim.~~ Not important, because L-numbers change substantially nowadays.
Analyse the suspect entries which end with abnormal endings. - Assigned to @gasyoun
~~Do some verb comparision 'research'. See Corrections to Wilson, MW re verbs #87. - Assigned to @drdhaval2785, @gasyoun~~
Do some research on 'b'/'v' confusion of dictionaries and find some conventions and convention errors. Assigned to @drdhaval2785
~~Pattern mismatch finding based on n-grams.~~ Listing methods to identify errors #46 (comment) refers to works 15 to 20.
Apply subanta and tiGanta generators to these methods - so that our tools are ready for application to description also. Use Dhaval's subanta and tiGanta tools. - Assigned to @drdhaval2785.
~~listing out impossible letter combinations by Sanskrit grammar rules.~~ - Assigned to @drdhaval2785. Listed all possible ngrams of sanhw2.txt. Whatever is not listed is impossible. 2-grams vs MW72, part 1 #241 (comment) status update.
Taking English-Sanskrit dictionaries as base and clustering the Sanskrit words having same meaning. The word which is not repeated across dictionaries is suspect. - Assigned to @drdhaval2785
~~Search for a list of feminine words ending in 'a'~~ - Assigned to @drdhaval2785
~~Listing out words which appear only in one dictionary after filtering out common differences like M, H at the end, corresponding nasal letters etc. - Assigned to @drdhaval2785~~
Analise accents (key2), batch comparison. There should be differences in PWG vs. Indian sources. See Todo list as of December 2015 #181 (comment) and Dhaval's accent tools. - Assigned to @drdhaval2785

gasyoun · 2015-12-02T19:51:49Z

7 and 9 sound equal to me.

gasyoun · 2015-12-02T20:49:19Z

21. Analise accents (key2), batch comparison. There should be differences in PWG vs. Indian sources. It was said in 1974 by Mayrhofer's pupil, but never approved.
@funderburkjim can we extract all key2 fields as we have done with key1? I want to see the differences not only in headwords, but correct or document deviations in accents as well. In most cases I guess there will be an issue of lost accents or deviations, that should be left as such.

funderburkjim · 2017-05-04T23:00:28Z

What I see as priorites - May 2017

[This is in response to @gasyoun request ]

I'm generally in foot-soldier mode: slogging through the details of implementing some improvement in a tiny corner of the Cologne sanskrit-lexicon project. Let me pretend for a moment that I'm a general sitting on a hillock overlooking the battlefield, like Kutusov in War and Peace,

My priorities at the moment are:

Finish AS to IAST for all dictionaries - simple to state, lots of work to accomplish
Backup (dev) server smoothly functioning with Dhaval; we've dipped our toes in this this week. This has long-term benefit of decreasing the dependence of the Sanskrit-Lexicon project on me and Cologne.
Infrastructure normalization. This is not a glamorous objective (road-building rarely is), but improving
roads and bridges makes everyone (potential contributor) more productive.
- The AS-IAST task is a part of this.
- As is the One DTD to rule them all ref:.
- Other aspects include:
  - simplifying the transition from xxx.txt to xxx.xml, by embedding some meta-data into xxx.txt. This has side benefit of stabilizing L-numbers.
  - Providing a programmatic base for the displays, so that all displays derive from the the same php class. This will permit simpler flow-through of improvements to all dictionaries. Currently, each dictionary is its own little kingdom (separate code base), so to implement a change to all dictionaries requires separate haggling.
Alternate headwords for various dictionaries
- The 'subheadwords' issue, although similar in some ways to the alternate headwords issue, is
  actually more complex because of the requirement to dive into parsing the entries, adding markup,
  not to mention the complexity of combining abbreviated affixes with parent headwords.
Corrections and data improvements as they arise always have high priority, e.g.
- AE with Sampada
- Greek
- Improvements relevant to stardict project Dhaval is working on
- corrections originating with users
- corrections arising in the course of implementing other tasks, such as AS to IAST.
Simple spelling UI ref:
UI for multiple dictionary displays, using hwnorm1

I would also like to finish the inflected form python rewrite that was begun last summer, but this always
seems to get pre-empted by some more pressing request.

I probably could go on and on if I thought a bit more about what I'd like to get done.

This is my actual current TODO List .

Now let me get down from that hillock before nose-bleed ensues :)

juhnowski · 2017-06-01T00:41:30Z

- https://github.com/juhnowski/sanskrit-correction-js/blob/master/WIL_Basic.html

gasyoun · 2017-06-01T06:26:11Z

https://github.com/juhnowski/sanskrit-correction-js/blob/master/WIL_Basic.html

@juhnowski wow!

Please upload on your github.io so it can be tested
Open a new issue at https://github.com/sanskrit-lexicon/Cologne/issues (Cologne - because it's web development related), because this is a meta issue, no real discussions occur here, thanks!

funderburkjim · 2017-06-01T20:11:09Z

WIL_basic.html link broken.

juhnowski · 2017-06-01T21:34:08Z

@funderburkjim pleas try https://juhnowski.github.io/ but I have not yet done saving to a file

gasyoun · 2017-07-28T07:13:35Z

So for example UI for multiple dictionary displays, using hwnorm1 is a subtask of Simple spelling UI ref. Yes, millions of ways to improve, but it's ready to be launched publicly. Corrections and data improvements are always there, it's where we started our sojourney. Infrastructure normalization is huge and indeed who would need new roads if the old trail is still there. Backup (dev) server - does Dhaval has access to all the backend scripts, all the dev scripts ever developed by Jim? The ones that we see on his own github page, for example. AS to IAST for all dictionaries similar to Corrections and data improvements is a background task and no need to speed it up, from my perspective. And we always have to keep in mind that there are high and low priority dictionaries. And the only thrilling tasks left is subheadwords sanskrit-lexicon/alternateheadwords#20 - and I would want to understand how my coders could help, because frankly - I do not know. Because you do have some code already related to it and I would love to see it first.

gasyoun · 2017-09-11T15:31:14Z

@funderburkjim let me introduce you @vschary, he wants to help and @Shalu411 said he is able to do so. Any ideas?

funderburkjim · 2017-09-13T22:49:42Z

Re @vschary wants to help.

I'm assuming that the interest is in the Sanskrit checking -- as opposed to programming.

One thing in the line of 'checking' relates to alternate headwords for vcp. We had a list of about 1000 cases where the accuracy of derivation of the alternate headwords have been auto-checked only. Probably most of these auto-generated alternates are correct, but it would be good to have a knowledgeable human examine each of them.

I am thinking specifically of the 'ok1' list mentioned here. Here is a link to the current form of that
ok1 list.
For instance the first case is

Case 0001: OK,OK : 1:aMsa(se)BAra:aMseBAra:aMsasera:169:170

The important parts are aMsa(se)BAra and aMseBAra. And the interpretation is that
'AMseBAra' is an alternate spelling of 'AMsaBAra'. The thing to check is whether this intepretation
is correct.

I could readily alter this to use Devanagari, IAST, or HK -- however @vschary prefers to read his Sanskrit.

I think this first pass could be done in a few hours, and would require nothing but the ok1 list; the idea would be to mark those that need further investigation. If there are any questionable ones, then he
could investigate those further using the UI that SergeA used recently.

If this sounds like an appropriate task, we can discuss it further.
If it doesn't sound appropriate, maybe @vschary can let us know what he might be interested in , and we'll work from that interest.

gasyoun · 2017-09-14T05:10:34Z

I'm assuming that the interest is in the Sanskrit checking -- as opposed to programming.

Exactly!

I could readily alter this to use Devanagari, IAST, or HK -- however @vschary prefers to read his Sanskrit.

Devanagari, he is from India. Everything other than SLP1 will do, but Devangari is best if you are from India.

drdhaval2785 · 2020-12-19T19:17:20Z

Status Update on 20 December 2020.

Out of Jim's wishlist at #181 (comment), all were completed except the following.

Alternate headwords for various dictionaries
    The 'subheadwords' issue, although similar in some ways to the alternate headwords issue, is
    actually more complex because of the requirement to dive into parsing the entries, adding markup,
    not to mention the complexity of combining abbreviated affixes with parent headwords.
 Greek

gasyoun · 2020-12-19T20:19:01Z

Greek

What about Greek?

gasyoun added the enhancement label Dec 2, 2015

gasyoun assigned gasyoun and funderburkjim and unassigned gasyoun Dec 2, 2015

drdhaval2785 mentioned this issue Dec 8, 2015

Convention 2.1 errors #167

Closed

drdhaval2785 mentioned this issue Oct 6, 2016

normalizing xml structure of various Cologne dictionaries - preliminary review sanskrit-lexicon/COLOGNE#87

Closed

gasyoun mentioned this issue May 4, 2017

AP alternate headword issues sanskrit-lexicon/alternateheadwords#20

Open

gasyoun mentioned this issue Jul 28, 2017

BUR meta-line conversion sanskrit-lexicon/COLOGNE#166

Closed

drdhaval2785 mentioned this issue Dec 20, 2020

todo list in 2021 (in descending order of importance) sanskrit-lexicon/COLOGNE#325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Todo list as of December 2015 #181

Todo list as of December 2015 #181

drdhaval2785 commented Dec 2, 2015 •

edited

Loading

gasyoun commented Dec 2, 2015

gasyoun commented Dec 2, 2015

funderburkjim commented May 4, 2017

juhnowski commented Jun 1, 2017

gasyoun commented Jun 1, 2017

funderburkjim commented Jun 1, 2017

juhnowski commented Jun 1, 2017

gasyoun commented Jul 28, 2017

gasyoun commented Sep 11, 2017

funderburkjim commented Sep 13, 2017

gasyoun commented Sep 14, 2017 •

edited

Loading

drdhaval2785 commented Dec 19, 2020

gasyoun commented Dec 19, 2020

Todo list as of December 2015 #181

Todo list as of December 2015 #181

Comments

drdhaval2785 commented Dec 2, 2015 • edited Loading

gasyoun commented Dec 2, 2015

gasyoun commented Dec 2, 2015

funderburkjim commented May 4, 2017

What I see as priorites - May 2017

juhnowski commented Jun 1, 2017

gasyoun commented Jun 1, 2017

funderburkjim commented Jun 1, 2017

juhnowski commented Jun 1, 2017

gasyoun commented Jul 28, 2017

gasyoun commented Sep 11, 2017

funderburkjim commented Sep 13, 2017

gasyoun commented Sep 14, 2017 • edited Loading

drdhaval2785 commented Dec 19, 2020

gasyoun commented Dec 19, 2020

drdhaval2785 commented Dec 2, 2015 •

edited

Loading

gasyoun commented Sep 14, 2017 •

edited

Loading