-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AE alternate headword patterns #19
Comments
Please explain what do you want to get? |
Excite -ment
Excitement
Excite -ed
Excited
etc.
|
Total three kinds of regexes in ae.xml gave me what I wanted
|
Also see one more peculiar tendency of the dictionary which we may have to parse.
Here |
@funderburkjim and @gasyoun |
Understood.
Agree, and again - add a tag, where it is stated what was it in the book. Or make it |
I opened
As I understand that the work done for MW and partly for PWG, has not been done and compounds not formed, right? Because in |
Marcis, I think you misposted AP issue in AE issue.
|
All equal, right @funderburkjim ? |
Good observation on AP, but agree that it needs reposting under an AP issue . The AP question is similar to AE question in one important aspect: solution requires more advanced parsing of the entry, and addition of tags. |
In working a lot with AE corrections with Sampada, I've also felt the need for 'improving' Apte's presentation. |
single quote patternAnother pattern is material in opening and closing single quotes, like
Within such a context, the X+period pattern usu. means some abbreviation. The first thing I would do is to look for errors in matching open-closed single quotes; these would need to be cleaned up before parsing this pattern. |
do we need pyparsing ?It might be that we need to learn how to use more sophisticated parsers to help us in such tasks. In particular, pyparsing. It seems likely, at least in case of AE, that each entry could be parsed into a meaningful data structure; but this can't be done just with regular expressions. pyparsing (and other similar parsers, such as 'ply') have a learning curve. |
A vote in favor of sub headwords in AE@Shalu411 recently submitted via the Correction Form the following:
Adding such 'embedded' headwords as 'distinct' to the searchable words for AE would solve this problem. This note just made to emphasize the interest and importance of doing such an enhancement. Incidentally, Sampada is now past the half-way point in the page-by-page corrections to AE (reference), with 3500+ corrections made thus far. |
The subheadword issue. The one I'm longing for the most + upasarga dhatu combinations extracted.
Means we can expect 7000 corrections. Sampada is a hero. |
Namaste |
Some verification, @funderburkjim ? |
@Shalu411 Hi! We'll have to think together how to organize things so you can help. This comment is to get us started. We have two kinds of extra headwords in AE:
The Alternate headwords are probably easiest to start with. Suggested first taskHere's a good starter task, I think. In a few cases, the alternate is spelled incompletely, Like 'Amid,-st'. In such cases, the task is to
After edit:
So the first task is just to do this for all the cases where it is relevant to do so. @Shalu411 OK? |
@funderburkjim I'm working as a translator. So if a word is mispelled (mis-not-glued well together) she adds : and the correct form after and that's all? Only English words, right? |
Yes -- Just need to have the correct spelling for alternate headwords. Just English words. |
Understood, @Shalu411 ? |
Namaste, Sorry for my silence for a while. Yes, I understood things clear now. Will do the needful.. and will be here for any clarifications. Yes, its simple and quicker one. Happy to be back, really. :) |
First issue- 037:BliTe, BliTesome,:5513,5517 Capital letters. And blite is a very Old English word :) Keep it? After Edit- 037:BliTe, BliTesome,:5513,5517:Blite, Blitesome |
Next issue- What is (1) here? |
I see that in this list, there are words that are not related to same head-word as well. Ignore, once I see that all spelling is right? |
Next doubt- |
Why in |
|
Next issue- |
One issue off-task--
Same issue with all Cologne pages in my browser. It is in tact with Chrome. :) |
So there can be numerous cases affected by this bug of mixing T/th? BTW why AE list interface does not allow to input English words and only Devanagari? |
Guess so.
A good point. None of the English dictionaries allow English words in list mode. |
|
@Shalu411 Here are my comments on your questions. If I overlooked any, please remind me. Review your solutions in light of these comments, and make any changes needed. BliTe make lower case 'T'It is blithe [p= 037] : Blithe, Blithesome,. (I see Serge noticed this). Probably the list you are working with was generated by my local version of AE, before Chirk , (1) What is 1?Although our digitization has the digit '1', I think it should be the letter 'l'
So in this case the alternate headword must be 'chirl' -- that's my guess. Askance, Askew, Aslantnot related to same head-word as well ? I think these are adjectives which are loosely related in meaning (check definitions to confirm). Thus, Discussive, DiscutientAgain, using Google for definitions, I see that these have similar (medical) meanings. Put Discutient as alternate. Nacre, NakerAgain, using Google, both words relate to some kind of drum. (nacre has other meanings also). So put Naker as alternate
So that explains how 'nacre' can also mean drum. 070:Concentric,-cal,:10149,10150:Concentric,ConcentricalDon't put comma at end. Spaces
Spaces between words in your after-edit form are optional. OK either way. Blank scan page 99 in FireFoxSince I show the page just fine using Mozilla browser, I suspect that it was some kind of temporary glitch in the internet transmission that caused you to have a blank page. Try it again. I suspect you will get this page now. 212:Hysterics, (Hysteria,)Leave the paren? Do similarly for Neither (nor). 219:Inamorata,-Inamorato,:29750,29751Leave the '-'? No, drop it. Similarly for the other two you mentioned. 340:Pers, ire,:45552,45559:PerspireThis is a print error. Please mark it with a '?' so I'll remember to give it different handling: Also, similarly mark any others that need special handling. 435:Somer-sault, -set,:57776,57777:Somer-sault, Somer-setLeave the hyphen or drop? 484:Vantage,-ground,:64418,64420:Vantage,Vantage-ground,Your solution looks good. |
@Shalu411 After you make any adjustments (such as per previous comment), you need to get the The best way might be to add your file as a second file in the Gist that you started from. To do this,
Give it a try! If you get stuck, we can go to some plan B. Then post a comment here so I'll know the corrections are ready for me. |
This is question regarding your 'subheadword' work on AE (e.g. aehw3.txt) . Is this ready for further work? I'm thinking that
What do you think? (also, I'm not sure whether aehw3.txt is the file to work further with). |
This has to do with Preferences. For English-Sanskrit dictionaries, set
|
Left it for quite some time. Will need time to figure out where I left, and what remains to be done. Will give green signal when I myself am confident. |
Started to use pyenchant for English dictionaries. |
Seems so
Agree
Too tough even for me
Where can Usha find the 1572 words in question, Dhaval? |
By experiment, there are (in aehw3.txt) 1572 instances of '@0'. From this we see unexpected phenomenon. Author introduces (under headword abound) the related Also the aehw3.txt does not mention
Maybe the fact that '-Abundant' starts with capital letter is significant. Maybe the rule for subwords is:
|
Probably indicates a need for change to UI for list displays for English-Sanskrit dictionaries. My current thinking in regard to the 'Preferences' used in list displays is that it should be completely In regard to English-Sanskrit dictionaries, the list Display preferences is currently really not applicable, Have added this concern to my TODO list. Perhaps it should also be mentioned in a separate 'Cologne' repository issue labeled enhancement. |
@Shalu411 nothing about it in the Preface?
Yes, yes, yes!
Jim would be the only one I can image who feels comfortable enough with SLP1.
Yap, next to unusable.
Hail to number 35 💯 |
As it turns out, this is a bit deeper than what I thought it would be. |
Thanks, Dhaval. |
@funderburkjim and @gasyoun Jim may like to generate the displays for @Shalu411. |
Jim, @Shalu411 is ready and waiting for your instructions. |
Want to get the first task results before beginning another task (see this comment above). |
@Shalu411 Have not yet received your final corrections on alternate headwords. ? |
Namaste |
Greetings-
|
#19 (comment) |
@Shalu411, |
Apte-AE-1.txt |
Ting tong.. Everything Ok? |
@funderburkjim seems to be busy, but guess it's fine. |
@Shalu411 Yes, I didn't read my email for a couple of days, busy with PW. In looking at your Apte-AE-1.txt file I notice a problem with this line: Did you forget to expand, the Among line like you did for the preceding line ? Would you correct the |
Key2 has two headwords e.g.
{@Afire, Aflame,@}
{@-ment@} etc English suffices.
These both need to be examined in detail. English suffix application must have been a trodden path. Need to do review of literature.
The text was updated successfully, but these errors were encountered: