Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIL separating genders #10

Open
drdhaval2785 opened this issue Dec 24, 2019 · 12 comments
Open

WIL separating genders #10

drdhaval2785 opened this issue Dec 24, 2019 · 12 comments
Labels
enhancement New feature or request

Comments

@drdhaval2785
Copy link
Contributor

drdhaval2785 commented Dec 24, 2019

Problem

As the masculine, feminine, neuter etc are not marked with a div marking, they are merged with previous line in the display. In the following entry see m. (-मः), f. (-मा), n. (-मं). They should be ideally on the next line, ideally with some kind of div marking.

As this is a major correction, noted here.

Sample

राम mfn. (-मः-मा-मं)
1 Black.
2 White.
3 Beautiful, pleasing. m. (-मः)
1 A name common to three incarnations of VIṢṆU, or PARAŚURĀMA, the son of the Muni JAMADAGNI, born at the commencement of the second or Tretā Yug, for the purpose of punishing the tyrannical kings of the Kṣatriya race; RĀMACANDRA, the son of DAŚARATHA, king of Oude, born at the close of the second age, to destroy the demons who infested the earth, and especially RĀVAṆA the Daitya sovereign of Ceylon; and BALARĀMA, the elder and half-brother of KṚṢṆA, the son of ROHIṆĪ, born at the end of the Dvāpara or third age.
2 A name of VARUṆA, regent of the waters.
3 A horse.
4 A sort of deer. f. (-मा)
1 A woman, a female, a pleasing or beautiful female.
2 Asafoetida.
3 A river. n. (-मं)
1 A potherb, (Chenopodium album.)
2 A sort of Costus, (C. speciosus.)
E. रम to sport, aff. घञ्।
@gasyoun
Copy link
Member

gasyoun commented Dec 24, 2019

m. (-मः), f. (-मा), n. (-मं).

You think end of one is beginning of another?

@funderburkjim
Copy link
Contributor

For reference, here are the pieces that would be involved in adding the markup suggested above.

  • wil.txt -- The part relative to rAma example shown below
  • wil.xml - Ditto
  • make_xml.py -- currently, this adds the 'div' markup that makes the enumerated items on separate lines
  • function sthndl_div, in webtc/basicdisplay.php This governs how markup of form <div n="X"> in
    xxx.xml is converted to html in the displays. The interpretation depends on the dictionary.
    Search for wil to see how div markup interpreted in Wilson dictionary.

@funderburkjim
Copy link
Contributor

rAma in wil.txt

<L>32263<pc>704<k1>rAma<k2>rAma
{#rAma#}¦ mfn. ({#-maH-mA-maM#})
.²1 Black.
.²2 White.
.²3 Beautiful, pleasing. m. ({#-maH#})
.²1 A name common to three incarnations of VIṢṆU, or PARAŚURĀMA, the son of
the {%Muni%} JAMADAGNI, born at the commencement of the second or {%Tretā%} 
{%Yug,%} for the purpose of punishing the tyrannical kings of the {%Kṣatriya%}
race; RĀMACANDRA, the son of DAŚARATHA, king of {%Oude,%} born at the close
of the second age, to destroy the demons who infested the earth, and especially
RĀVAṆA the {%Daitya%} sovereign of {%Ceylon;%} and BALARĀMA, the elder and
half-brother of KṚṢṆA, the son of ROHIṆĪ, born at the end of the
{%Dvāpara%} or third age.
.²2 A name of VARUṆA, regent of the waters.
.²3 A horse.
.²4 A sort of deer. f. ({#-mA#})
.²1 A woman, a female, a pleasing or beautiful female.
.²2 Asafoetida.
.²3 A river. n. ({#-maM#})
.²1 A potherb, (Chenopodium album.)
.²2 A sort of Costus, (C. speciosus.)
.E. {#rama#} to sport, aff. {#GaY.#}

Note that currently, there are no div in the digitization.

@funderburkjim
Copy link
Contributor

rAma in wil.xml

The record for rAma is one long single-line text string in wil.xml. For readability here, I've
manually inserted linebreaks.

<H1><h><key1>rAma</key1><key2>rAma</key2></h><body> 
<s>rAma</s>  mfn. (<s>-maH-mA-maM</s>) 
<div n="1">1 Black. </div>
<div n="1">2 White. </div>
<div n="1">3 Beautiful, pleasing. m. (<s>-maH</s>) </div>
<div n="1">1 A name common to three incarnations of VIṢṆU, or PARAŚURĀMA, the son of  the 
<i>Muni</i> JAMADAGNI, born at the commencement of the second or <i>Tretā</i>   <i>Yug,</i> for the purpose of punishing the tyrannical kings of the <i>Kṣatriya</i>  race; 
RĀMACANDRA, the son of DAŚARATHA, king of <i>Oude,</i> born at the close  of the second age, to destroy the demons who infested the earth, and especially  
RĀVAṆA the <i>Daitya</i> sovereign of <i>Ceylon;</i> and 
BALARĀMA, the elder and  half-brother of KṚṢṆA, the son of ROHIṆĪ, born at the end of the  
<i>Dvāpara</i> or third age. </div>
<div n="1">2 A name of VARUṆA, regent of the waters. </div>
<div n="1">3 A horse. </div>
<div n="1">4 A sort of deer. f. (<s>-mA</s>) </div>
<div n="1">1 A woman, a female, a pleasing or beautiful female. </div>
<div n="1">2 Asafoetida. </div>
<div n="1">3 A river. n. (<s>-maM</s>) </div>
<div n="1">1 A potherb, (Chenopodium album.) </div>
<div n="1">2 A sort of Costus, (C. speciosus.) </div>
<div n="E">E. <s>rama</s> to sport, aff. <s>GaY.</s>  </div>
</body><tail><L>32263</L><pc>704</pc></tail></H1>

@funderburkjim
Copy link
Contributor

make_xml.py

make_xml.py converts wil.txt to wil.xml.
Part of this conversion is specific to the way the wil.txt is coded and is in the dig_to_xml_specific function.

def dig_to_xml_specific(x):
 """ changes particular to wil digitization
     x is a line of the digitization
 """
 if x.startswith('<H>'):
  # Start of section beginning with a particular letter. Drop this line
  x = ''
 elif re.search(u'^[.]²[0-9]+',x):
  # a division coded by Thomas
  # drop the initial '.²'
  # and start <div n="1">
  x = '<div n="1">' + x[2:]
 elif re.search(r'^[.]E[.]',x):
  # an Etymology division 
  # drop the initial '.'
  # and start <div n="E">
  x = '<div n="E">' + x[1:]
 elif re.search(r'^[.]',x):
  # unknown division
  print "UNKNOWN DIVISION: ",x.encode('utf-8')
  x =  " " + x
 else:
  # assume a simple continuation line
  x = " " + x
 # In a currently small number of cases (as with root 'RI'), sub-meanings
 # are coded with superscript letters, as '^a'. We'll code these as
 # <div n="2">
 x = re.sub(r'[\^]','<div n="2">',x)
 return x

Note : The above only inserts the opening div tag , based on a regex. For example
.²3 in dig.txt become <div n="1">3.
This div markup is not an empty tag, so it requires a closing </div> tag. This closing tag is
inserted at a proper spot by the close_divs function of make_xml.py. This close_divs function
is fairly general, not specific to wil dictionary.

@funderburkjim
Copy link
Contributor

approaches to a solution.

A solution would require that

  • markup of another div type (say, <div n="3">, be inserted at the appropriate spots in wil.xml;
    also the closing div would need to be inserted at the appropriate spot.
  • basicadjust.php would need minor changes so the new and old divs be displays as desired.
    Also, a minor change to wil.dtd probably required so the new attribute value 3 is recognized
    as valid for wil.xml

Where to add markup?

The markup will be added by some program. The obvious choice of program to add markup, given the above description, would be in make_xml.py.
HOWEVER,
I think it actually would be better to add all the div markup to wil.txt.
Reason: Where to put the divs pertaining to gender will be tricky -- Thomas already had done
the numerical subdivision markup .²1 Black. Thus all make_xml had to do was convert this to
some xml form. Also, the fact that Thomas has already put all the text relating to a given
div on one line makes the div-closing problem easy.

But in case of gender divs, this is not so clear in the existing digitization. Perhaps most
cases can be handled by simple regex governed changes; but there will almost surely be
special cases that will need to be handled by 'manual corrections'.
Also, the div closing requirement will also probably need some manual corrections.

make changes in wil.txt

Thus, I would vote for doing a special update of wil.txt in order to implement the improvement
suggested above.
This special update might be viewed as a 4-step process:

  1. wil1.txt do the div markup consistent with the current version of make_xml.py
    • make modifications to make_xml.py so it can handle wil2.txt and generate a new wil.xml.
      At this point, the newly generated wil.xml should be exactly identical to the previous wil.xml.
  2. wil2.txt add as much as possible of the the gender-div markup to wil1.txt
    • frequently remake new version of wil.xml, based on wil2.txt, and be sure it is valid xml.
    • also make adjustments to basicdisplay.php to be sure the displays with new divs looks as desired.
    • Do all the changes to make_xml and basicdisplay outside the normal wilson update process.
  3. Handle special cases by some 'manualByLine' changes.
    • Again, test test test
  4. Put final result back into main update regiment:
  • revised version of wil.txt in csl-orig
  • revised version of make_xml.py in csl-pywork
  • revised version of basicDisplay.php in csl-websanlexicon.

@funderburkjim
Copy link
Contributor

What wil1.txt might look like

Version 1: just add the divs as in make_xml.py

   {#rAma#}¦ mfn. ({#-maH-mA-maM#})
  <div n="1">1 Black.</div>
  <div n="1">2 White.</div>
  <div n="1">3 Beautiful, pleasing. m. ({#-maH#})</div>
  <div n="1">1 A name common to three incarnations of VIṢṆU, or PARAŚURĀMA, the ... </div>
  etc.

Version 2 : also add the markup as in the 'dig_to_xml_general' part of
make_xml.py. If that were done, wil1.txt would look like:

    <s>rAma</s>¦ mfn. (<s>-maH-mA-maM</s>)
   <div n="1">1 Black.</div>
   <div n="1">2 White.</div>
   <div n="1">3 Beautiful, pleasing. m. (<s>-maH</s>)</div>
   <div n="1">1 A name common to three incarnations of VIṢṆU, or PARAŚURĀMA, the ... </div>
   etc.

@gasyoun
Copy link
Member

gasyoun commented Dec 26, 2019

Put final result back into main update regiment

A lot of things to do, oh boy.

@drdhaval2785
Copy link
Contributor Author

We are still missing.

m. (<s>-maH</s>) needs to be a separate div.

@funderburkjim
Copy link
Contributor

We are still missing ... div

My note above shows the approach I think is needed to do this. But, as Marcis noted, it's not
a simple task.
I'm not volunteering to do this, although I agree adding the gender markup would be an enhancement to Wilson dictionary, and indeed to many other dictionaries.

@funderburkjim funderburkjim added the enhancement New feature or request label Dec 28, 2019
@gasyoun
Copy link
Member

gasyoun commented Dec 30, 2019

I'm not volunteering to do this

Good to know that. Otherwise nothing would be left for the generations to come.

@Andhrabharati
Copy link

Problem

As the masculine, feminine, neuter etc are not marked with a div marking, they are merged with previous line in the display. In the following entry see m. (-मः), f. (-मा), n. (-मं). They should be ideally on the next line, ideally with some kind of div marking.

As this is a major correction, noted here.

We are still missing.

m. (<s>-maH</s>) needs to be a separate div.

We are still missing ... div

My note above shows the approach I think is needed to do this. But, as Marcis noted, it's not a simple task.

I'm not volunteering to do this

Good to know that. Otherwise nothing would be left for the generations to come.

It is a very simple task, and we had the WIL done in that way back from 2016, when we added the Skt. Dictionaries at andhrabharati.com

image

It did not take even hours, just a couple of minutes of work for us!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants