Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Todo list as of December 2015 #181

Open
drdhaval2785 opened this issue Dec 2, 2015 · 13 comments
Open

Todo list as of December 2015 #181

drdhaval2785 opened this issue Dec 2, 2015 · 13 comments
Assignees

Comments

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented Dec 2, 2015

  1. Extend the methods which we have used for cleanup of dictionaries to description also (See Extending faultfinder to sanskrit words in description #34, ) for methods. DONE in Extend correction methods to description in dictionaries #309 09 Oct 2016
  2. Abbreviation error corrections
  3. Alternate readings should get headword status for all dictionaries (Only MW has it now). See Alternative readings should get headword status #35, Alternate headwords in PWG #133. https://github.com/sanskrit-lexicon/alternateheadwords is the dedicated repository to handle this problem.
  4. hwnorm1 further development based on Different conventions of Sanskrit dictionaries #43 conventions. - Assigned to @drdhaval2785
  5. Find and correct convention errors found out as a by product of point 4. - Assigned to @gasyoun
  6. Prepare a javascript which would enable us to click on an L-id and we would have the standard format in clipboard. See point 2 in the link. - Assigned to @juhnowski
  7. Design crowdsourcing platform for correction submission. - Assigned to @funderburkjim
  8. Prepare a list of abbreviation / literary resources for all dictionaries. See Resource links #142 and Abbrv lists for all dictionaries #143. - Assigned to @gasyoun and @drdhaval2785
  9. Prepare a wikisource-like platform for keeping track of correction history. - Assigned to @funderburkjim (EDIT - Shifted to csl-orig github repository for tracking history)
  10. Get upasarga+dhatu words to headword status from PW, PWG or rather all dictionaries.
  11. Prepare a mechanism by which webpage and PDFs can be accessed via L-number. - Assigned to @funderburkjim. Not important, because L-numbers change substantially nowadays.
  12. Analyse the suspect entries which end with abnormal endings. - Assigned to @gasyoun
  13. Do some verb comparision 'research'. See Corrections to Wilson, MW re verbs #87. - Assigned to @drdhaval2785, @gasyoun
  14. Do some research on 'b'/'v' confusion of dictionaries and find some conventions and convention errors. Assigned to @drdhaval2785
  15. Pattern mismatch finding based on n-grams. Listing methods to identify errors #46 (comment) refers to works 15 to 20.
  16. Apply subanta and tiGanta generators to these methods - so that our tools are ready for application to description also. Use Dhaval's subanta and tiGanta tools. - Assigned to @drdhaval2785.
  17. listing out impossible letter combinations by Sanskrit grammar rules. - Assigned to @drdhaval2785. Listed all possible ngrams of sanhw2.txt. Whatever is not listed is impossible. 2-grams vs MW72, part 1 #241 (comment) status update.
  18. Taking English-Sanskrit dictionaries as base and clustering the Sanskrit words having same meaning. The word which is not repeated across dictionaries is suspect. - Assigned to @drdhaval2785
  19. Search for a list of feminine words ending in 'a' - Assigned to @drdhaval2785
  20. Listing out words which appear only in one dictionary after filtering out common differences like M, H at the end, corresponding nasal letters etc. - Assigned to @drdhaval2785
  21. Analise accents (key2), batch comparison. There should be differences in PWG vs. Indian sources. See Todo list as of December 2015 #181 (comment) and Dhaval's accent tools. - Assigned to @drdhaval2785
@gasyoun gasyoun assigned gasyoun and funderburkjim and unassigned gasyoun Dec 2, 2015
@gasyoun
Copy link
Member

gasyoun commented Dec 2, 2015

7 and 9 sound equal to me.

@gasyoun
Copy link
Member

gasyoun commented Dec 2, 2015

21. Analise accents (key2), batch comparison. There should be differences in PWG vs. Indian sources. It was said in 1974 by Mayrhofer's pupil, but never approved.
@funderburkjim can we extract all key2 fields as we have done with key1? I want to see the differences not only in headwords, but correct or document deviations in accents as well. In most cases I guess there will be an issue of lost accents or deviations, that should be left as such.

@funderburkjim
Copy link
Contributor

What I see as priorites - May 2017

[This is in response to @gasyoun request ]

I'm generally in foot-soldier mode: slogging through the details of implementing some improvement in a tiny corner of the Cologne sanskrit-lexicon project. Let me pretend for a moment that I'm a general sitting on a hillock overlooking the battlefield, like Kutusov in War and Peace,

My priorities at the moment are:

  • Finish AS to IAST for all dictionaries - simple to state, lots of work to accomplish
  • Backup (dev) server smoothly functioning with Dhaval; we've dipped our toes in this this week. This has long-term benefit of decreasing the dependence of the Sanskrit-Lexicon project on me and Cologne.
  • Infrastructure normalization. This is not a glamorous objective (road-building rarely is), but improving
    roads and bridges makes everyone (potential contributor) more productive.
    • The AS-IAST task is a part of this.
    • As is the One DTD to rule them all ref:.
    • Other aspects include:
      • simplifying the transition from xxx.txt to xxx.xml, by embedding some meta-data into xxx.txt. This has side benefit of stabilizing L-numbers.
      • Providing a programmatic base for the displays, so that all displays derive from the the same php class. This will permit simpler flow-through of improvements to all dictionaries. Currently, each dictionary is its own little kingdom (separate code base), so to implement a change to all dictionaries requires separate haggling.
  • Alternate headwords for various dictionaries
    • The 'subheadwords' issue, although similar in some ways to the alternate headwords issue, is
      actually more complex because of the requirement to dive into parsing the entries, adding markup,
      not to mention the complexity of combining abbreviated affixes with parent headwords.
  • Corrections and data improvements as they arise always have high priority, e.g.
    • AE with Sampada
    • Greek
    • Improvements relevant to stardict project Dhaval is working on
    • corrections originating with users
    • corrections arising in the course of implementing other tasks, such as AS to IAST.
  • Simple spelling UI ref:
  • UI for multiple dictionary displays, using hwnorm1

I would also like to finish the inflected form python rewrite that was begun last summer, but this always
seems to get pre-empted by some more pressing request.

I probably could go on and on if I thought a bit more about what I'd like to get done.

This is my actual current TODO List .

Now let me get down from that hillock before nose-bleed ensues :)

@juhnowski
Copy link

@gasyoun
Copy link
Member

gasyoun commented Jun 1, 2017

https://github.com/juhnowski/sanskrit-correction-js/blob/master/WIL_Basic.html

@juhnowski wow!

  1. Please upload on your github.io so it can be tested
  2. Open a new issue at https://github.com/sanskrit-lexicon/Cologne/issues (Cologne - because it's web development related), because this is a meta issue, no real discussions occur here, thanks!

@funderburkjim
Copy link
Contributor

WIL_basic.html link broken.

@juhnowski
Copy link

@funderburkjim pleas try https://juhnowski.github.io/ but I have not yet done saving to a file

@gasyoun
Copy link
Member

gasyoun commented Jul 28, 2017

So for example UI for multiple dictionary displays, using hwnorm1 is a subtask of Simple spelling UI ref. Yes, millions of ways to improve, but it's ready to be launched publicly. Corrections and data improvements are always there, it's where we started our sojourney. Infrastructure normalization is huge and indeed who would need new roads if the old trail is still there. Backup (dev) server - does Dhaval has access to all the backend scripts, all the dev scripts ever developed by Jim? The ones that we see on his own github page, for example. AS to IAST for all dictionaries similar to Corrections and data improvements is a background task and no need to speed it up, from my perspective. And we always have to keep in mind that there are high and low priority dictionaries. And the only thrilling tasks left is subheadwords sanskrit-lexicon/alternateheadwords#20 - and I would want to understand how my coders could help, because frankly - I do not know. Because you do have some code already related to it and I would love to see it first.

@gasyoun
Copy link
Member

gasyoun commented Sep 11, 2017

@funderburkjim let me introduce you @vschary, he wants to help and @Shalu411 said he is able to do so. Any ideas?

@funderburkjim
Copy link
Contributor

Re @vschary wants to help.

I'm assuming that the interest is in the Sanskrit checking -- as opposed to programming.

One thing in the line of 'checking' relates to alternate headwords for vcp. We had a list of about 1000 cases where the accuracy of derivation of the alternate headwords have been auto-checked only. Probably most of these auto-generated alternates are correct, but it would be good to have a knowledgeable human examine each of them.

I am thinking specifically of the 'ok1' list mentioned here. Here is a link to the current form of that
ok1 list.
For instance the first case is

Case 0001: OK,OK : 1:aMsa(se)BAra:aMseBAra:aMsasera:169:170

The important parts are aMsa(se)BAra and aMseBAra. And the interpretation is that
'AMseBAra' is an alternate spelling of 'AMsaBAra'. The thing to check is whether this intepretation
is correct.

I could readily alter this to use Devanagari, IAST, or HK -- however @vschary prefers to read his Sanskrit.

I think this first pass could be done in a few hours, and would require nothing but the ok1 list; the idea would be to mark those that need further investigation. If there are any questionable ones, then he
could investigate those further using the UI that SergeA used recently.

If this sounds like an appropriate task, we can discuss it further.
If it doesn't sound appropriate, maybe @vschary can let us know what he might be interested in , and we'll work from that interest.

@gasyoun
Copy link
Member

gasyoun commented Sep 14, 2017

I'm assuming that the interest is in the Sanskrit checking -- as opposed to programming.

Exactly!

I could readily alter this to use Devanagari, IAST, or HK -- however @vschary prefers to read his Sanskrit.

Devanagari, he is from India. Everything other than SLP1 will do, but Devangari is best if you are from India.

@drdhaval2785
Copy link
Contributor Author

Status Update on 20 December 2020.

Out of Jim's wishlist at #181 (comment), all were completed except the following.

Alternate headwords for various dictionaries
    The 'subheadwords' issue, although similar in some ways to the alternate headwords issue, is
    actually more complex because of the requirement to dive into parsing the entries, adding markup,
    not to mention the complexity of combining abbreviated affixes with parent headwords.
 Greek

@gasyoun
Copy link
Member

gasyoun commented Dec 19, 2020

Greek

What about Greek?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants