Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'keydoc' (dev) #1

Open
funderburkjim opened this issue Feb 7, 2020 · 7 comments
Open

'keydoc' (dev) #1

funderburkjim opened this issue Feb 7, 2020 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@funderburkjim
Copy link
Contributor

funderburkjim commented Feb 7, 2020

This repository is an offshoot of the hwnorm1 repository.

The idea is to define a dictionary document by a collection of intrinsic dictionary headwords,
then to allow access to such documents by the intrinsic headword spellings as well as alternate and normalized spellings. The term 'keydoc' (a document defined by headword keys) is one way to
refer to this notion; and it is currently represented by a database with the beautiful name
keydoc_glob1 (global keydoc database).

dalglob1 is a display that uses the new database. The database does not currently affect other displays.

There are two Youtube videos:

@gasyoun
Copy link
Member

gasyoun commented Feb 7, 2020

Jim, thanks for documenting in detail the issue with 443k spellings. GenerallyI do not understand where this UI will fit, as now we have so many different path to go. But the issues at the end of 2nd video, like pitA and pitf - have not we solved them already in the past in a different place?

nṛsiṃhaācārya does look in your video anti-sandhi.
(nṛsiṃhaācārya is an alternate of narasiṃha.) narasiṃha or nṛsiṃha ācārya

In 1st video you give alternate headword for MW based on that ACC gives narasiṃha and nṛsiṃha ācārya as synonyms.
In 2nd video MW gives guru and gurvi, but you do not use this connection for other dictionaries, or they just do not have a gurvi entry or subentry that can be used?

@funderburkjim
Copy link
Contributor Author

where this UI will fit

This UI is currently just for research purposes. The research questions:

  • What are the documents implied by the Sanskrit dictionaries? Currently, a document is specified by
    one headword per dictionary.
    The research proposes that a document for a dictionary should be defined by a collection of
    one or more headwords (e.g. guru and gurvI if both these are headwords of a given dictionary).
  • For a given document in a given dictionary, what search terms should lead to the document?
    This research focuses on the specific SLP1 spellings given in the dictionaries.

local document search terms

Assume that a document D in dictionary X is determined by headwords with spellings H1,H2,.. in X.
Then the local search terms L1,L2,... for D currently include:

  • The given spellings H1, H2, ...
  • Any alternate spellings (A1,A2,...) given in X_hwextra.txt for H1, H2, etc.
  • Any normalized spellings N1,N2,... for any of H1,H2,..., A1,A2,...

global document search terms

The global search terms G1,G2,... for a document D in dictionary X take into account other
dictionaries.

  • initialize G1,G2,... by the local search terms L1,L2,... for document D in dictionary X
  • For any other document D' in any dictionary X', let L1', L2',... be the local search terms for D' in X'.
    If any G-i is the same as L-j' , then add all the L1', L2', etc to the G1,G2,... list.

@funderburkjim
Copy link
Contributor Author

nṛsiṃhaācārya does look in your video anti-sandhi.

I agree. This looks like a bug in acc. In fact all the following instances look to be similar errors:

13 matches for "aa" in buffer: acc_hwextra.txt
     68:<L>1783.1<k1>kOSikAditya<k2>kOSikAditya<type>alt<LP>1783<k1P>AdityaAcArya
    104:<L>2568.1<k1>udayakaraAcArya<k2>udayakara AcArya<type>alt<LP>2568<k1P>udayana
    179:<L>4657.1<k1>kfzRamBawwa<k2>kfzRamBawwa,<type>alt<LP>4657<k1P>kfzRaBawwaArqe
    210:<L>5684.1<k1>gaReSvaraAcArya<k2>gaReSvara AcArya<type>alt<LP>5684<k1P>gaReSadEvajYa
    401:<L>11100.1<k1>nfsiMhaAcArya<k2>nfsiMha AcArya<type>alt<LP>11100<k1P>narasiMha
    496:<L>13957.1<k1>SuBaMkara<k2>SuBaMkara<type>alt<LP>13957<k1P>pragalBaAcArya
    778:<L>22017.1<k1>dIkzita<k2>dIkzita<type>alt<LP>22017<k1P>vAsudevaaDvarin
    814:<L>23353.1<k1>veNkawanATa<k2>veNkawanATa<type>alt<LP>23353<k1P>veNkawaAcArya
    815:<L>23359.1<k1>veNkaweSa<k2>veNkaweSa<type>alt<LP>23359<k1P>veNkawaAcArya
    903:<L>26044.1<k1>SrInivAsatIrTa<k2>SrInivAsatIrTa<type>alt<LP>26044<k1P>SrInivAsaAcArya
    951:<L>28306.1<k1>darSanAcArya<k2>darSanAcArya<type>alt<LP>28306<k1P>sudarSanaAcArya
    952:<L>28306.2<k1>darSanArya<k2>darSanArya<type>alt<LP>28306<k1P>sudarSanaAcArya
    956:<L>28551.1<k1>viSvarUpa<k2>viSvarUpa<type>alt<LP>28551<k1P>sureSvaraAcArya

I think all the 'aA' in 'k1' or 'k1P' should be changed to 'A'.

@drdhaval2785 agree?

@funderburkjim
Copy link
Contributor Author

local document extension

After the global document search term step mentioned above, there is one more step (keydoc2.txt) which revises the local document definitions.

An abstract statement of this process might be: For a given dictionary X, merge all documents which have a common search term.

The example of Burnouf with guru and gurvI might help.
Before the global search term step, the relevant items (in keydoc_norm.txt) for Burnouf shows
two documents:

  1. guru
  2. gurvI

These documents are, at this stage, unrelated.

After the global merge step, the relevant items (in keydoc_,merge.txt for burnouf) still shows two documents, but with additional search terms.

  1. guru gurvI,guruH
    • gurvI is a search term for guru in BUR because it is a search term for guru in MW
    • guruH is a search term for guru, because in SKD guru is a normalized spelling search term for guruH
  2. gurvI guru,gurvvI
    • guru is a search term for gurvI in BUR because it is a search term for gurvI in MW
    • gurvvI is a search term for gurvI in BUR because gurvI is a search term for gurvvI in SHS, SKD, VCP, WIL and YAT.

The last step merges these two documents, so now there is only 1 combined document in burnouf (keydoc2.txt):

  1. guru,gurvI guruH,gurvvI

The reason these are merged is because there are common spellings in the two merged documents:
In fact, in this case, 'guru' and 'gurvI' are both common search terms in the merged documents.

So that is how the new, two-headword, document occurs in Burnouf.

@gasyoun
Copy link
Member

gasyoun commented Feb 8, 2020

The last step merges these two documents, so now there is only 1 combined document in burnouf (keydoc2.txt):

Now let's think how it can and should live together with simple. And let's at least document what kind of relations are given in each dictionary between words. There are antonyms in GRA, for example and we have never even tried to markup them.
Or another approach. giri is based on guru, that is based on root gir as per Kossowich, but Wilson gives E. gṝ.

@drdhaval2785
Copy link

@drdhaval2785 agree?
I agree

@drdhaval2785 drdhaval2785 reopened this Dec 14, 2020
@drdhaval2785 drdhaval2785 added the documentation Improvements or additions to documentation label Dec 14, 2020
@gasyoun
Copy link
Member

gasyoun commented Dec 14, 2020

https://www.sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/sample/dalglob1.php is left. But there was a more modern version of it anyway, no, @funderburkjim ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants