-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple-search: hanumat #167
Comments
Technical reason why word frequency fails here:Actually, neither spelling (hanumat, hanUmat) appears in the word frequency file:
The word frequency file uses 'ant' instead of 'at' in the spelling: hanumant. |
Can't we have a variant with |
When checking what way should it be: tādṛ́śī or tādṛ́śā Entered 0 no results found Entered
what I wanted. No simple solution, @funderburkjim ? |
When searching for
|
One part of the simple search has to do with HK. The assumption is that the user might be using a
I wonder what would happen if the user's input string was lower-cased first. Does this sound worth a try. This would work for 'krisn' |
hanumat problem appears solvedSimple search with 'hanumat' yields: hanumat hanūmat, so the more common short-u word is now first, Simple search with 'hanumant' yields: hanumat hanūmat hanumanta |
No simple solution to tadrsiThe desired word is tadrSa, which can be found from 'tadrsa'. But how to generalize? If we allow a = i everywhere, will there be many false positives? |
@gasyoun Are there any other open questions on simple search besides KRISN and tadrsi? |
None that I'm aware of.
I guess it's not the only case when a user will try first letter capital or even all capital. Let's have a solution for that and HK is not an issue, still. I was surprised he could not find a thing. |
When searching in GRA |
Also, no results in MW. Clearly a bug, probably related to various representations of vocalic 'r' (SLP1 'f'). When we get the Cologne reorg done (as discussed today), I'll spend some time trying to make simple search more robust. This will probably be 3+ months from now. |
Great, half a year is not an issue. Simple search is what makes a difference, so we can wait for sure. |
In MW, rather strange 0 no results found: |
I was looking for P.S. If I would search for Cyrillic кришна and get krsna А а a a a |
|
Now if I search for |
Notice that 'nis' is in bold? That's the ui clue that it is what you typed. Not enough of a clue? |
Sometimes I notice too late. We invented it. And even I tend to forget.
Seems so. When I'm 100% sure that there will be the word I search for, I forget to check if a more frequent word is shown. Especially with shorter words. |
Thanks, so it is above all frequency, right? The 2nd one is still based on frequency, right? Another thought. If I search |
If I enter Another example is if I've heard a word and not quite sure how it should be written, like |
This question made me think of spelling checker (for Sanskrit). |
vyudpati has two problems:
|
If we simple search 'vyutpati', we get result 'vyutpatti' ! so the 'dp' -> 'tp' transformation would For instance 'pati' also solves as 'pati', 'patti' and |
Let's plan a call on that. @drdhaval2785 are you there?
So the spellchecker should kill anti-sandhi cases first.
Like try to double any consonant?
so anti-sandhi would do the job, got it. |
I am OK with call.
I think we have already discovered the way long ago.
|
And lost a few times as well. How about a call on 23rd of August? |
23rd of August tentatively ok with me. |
The 2grams and 3grams for headwords are in https://github.com/sanskrit-lexicon/csl-apidev/tree/master/simple-search/ngram1 . These are used in simple_search.php. |
2 years ago last update - we have not cleaned in the meantime any dirt, that could keep the list shorter? |
|
This capital-letter issue is only partially solved. I was going to work further on it this week, but |
There is one field where simple search has not yet been used for. As it is a separate library, it can be used for searching for Sanskrit words outside Cologne as well. One of the most notorious places for writing in 100 wrong ways each and every Sanskrit words is DLI, included DLI at archive.org. |
If I enter |
If I enter |
If I enter false form |
If I enter |
All good examples of current limitations. I still haven't had opportunity to work on simple search in nearly a month now. Too many strings pulling me in other directions. |
I know. If my voice has any value I would stop for a month or so with all the integrations |
I was tracking a word from a book |
The closeness of a writing of a word should be higher in the sorting algorythm than just pure frequency, @funderburkjim
|
@funderburkjim If I look for |
@funderburkjim plus to 6 others, these 4 remain there as well. As per UI I believe it's top-5. @drdhaval2785 agreee? |
@funderburkjim |
@funderburkjim |
@funderburkjim Cyrillic mode: If we type "surya" with Latin letters, the needed one is the 2-nd: |
This issue devoted to question raised regarding handling of 'hanumat' in simple search.
[reference] (#156 (comment)))
Agree that we should get hanumat as the best choice. We should also get this if user supplied 'hanuman'.
How to accomplish this is not known. Maybe the comments will come up with a solution.
The text was updated successfully, but these errors were encountered: