ruby.odd

<?xml version="1.0" encoding="utf-8"?>
<!-- $Id: ruby.odd 3426 2021-01-08 23:51:50Z syd $ -->
<?xml-model href="http://www.tei-c.org/Vault/P5/3.6.0/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/Vault/P5/3.6.0/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Ruby</title>
        <title type="sub">a TEI customization to demonstrate encoding
        of ruby annotations</title>
        <sponsor>East Asian / Japanese SIG</sponsor>
        <author>
          <persName>Kazuhiro Okada</persName>
          <persName>Satoru Nakamura</persName>
          <persName>Kiyonori Nagasaki</persName>
        </author>
        <respStmt>
          <resp>In cooperation with</resp>
          <persName>So Miyagawa</persName>
          <persName>Yifan Wang</persName>
          <persName>Nobutake Kamiya</persName>
          <persName>Naoki Kokaze</persName>
          <persName>Martin Holmes</persName>
          <persName>Syd Bauman</persName>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <publisher>TEI</publisher>
        <availability>
          <licence target="https://creativecommons.org/licenses/by-sa/4.0/">
            <choice>
              <abbr>CC BY-SA 4.0</abbr>
              <expan>Attribution-ShareAlike 4.0 International</expan>
            </choice>
          </licence>
          <ab>Intended for internal TEI use.</ab>
        </availability>
      </publicationStmt>
      <sourceDesc>
        <p>Born digital, based heavily on the ODD snippets in <ref
        target="https://docs.google.com/document/d/1rRgBRYpxUlsnMEHbOPW-92u_AIMNwq9axGtpI1UGgz8">Proposal
        to add <gi>ruby</gi>, <gi>rb</gi>, and <gi>rt</gi> to encode
        ruby glosses for Japanese texts ルビ符号化要素（<gi>ruby</gi>，
        <gi>rb</gi>，<gi>rt</gi>）の追加申請</ref> as of <date
        when="2020-10-25">25 Oct 20</date>.</p>
      </sourceDesc>
    </fileDesc>

    <revisionDesc>
      <change who="#SB" when="2021-01-15">
        <list>
          <item>Martin Holmes and I noticed that <att>target</att> was
          inadvertently a required attribute of <gi>rt</gi>;
          fixed.</item>
          <item>Martin believes (strongly) that a <gi>ruby</gi> should
          have only 1 child <gi>rb</gi>, and I do not disagree
          (although I also do not claim sufficient expertise to be
          sure). Since it is <emph>far</emph> easier to change a “1
          only” rule to “many allowed” than vice-versa, we are limiting
          the content model to <code> rb, rt+ </code>.</item>
        </list>
      </change>      
      <change who="#SB" when="2021-01-08">
        With Martin Holmes, added <att>target</att> to <gi>rt</gi> for
        pointing to a container element rather than empty booundaries
        (<gi>anchor</gi>), including constraint that <att>target</att>
        not be used with either <att>from</att> or <att>to</att>. Also
        changed the constraint specs to require <att>to</att> iff
        <att>from</att>. (We already had constraint requiring
        <att>from</att> iff <att>to</att>.)</change>
      <change who="#SB" when="2021-01-08">
        Convinced by Martin Holmes to permit <gi>ruby</gi> inside
        <gi>w</gi>, so changed the content model of <gi>w</gi>.
      </change>
      <change who="#SB" when="2020-10-25">
        Inspired by the presentation by the authors at today’s
        (virtual) TEI Council meeting, I took it upon myself to
        generate an actual ODD from the snippets in their proposal.
        Major changes (that I can remember):
        <list>
          <item>Made <gi>ruby</gi> a member of model.phrase (it makes
          no sense to declare membership in a macro)</item>
          <item>Changed namespace to something more official (y’all
          deserve it)</item>
          <item>Re-worded descriptions a bit</item>
          <item>Eliminated model.rubyPart, as it was unused</item>
          <item>Eliminated model.rubyLike, as it did not do much — this
          may be a controversial change</item>
          <item>Fixed constraint requiring <att>to</att> iff
          <att>from</att></item>
        </list>
        I did NOT change the names of the <att>to</att> and
        <att>from</att> attributes, which already have uses in TEI. I
        would prefer to see <att>kara</att> and either <att>ni</att>
        or <att>mukatte</att> or some such. I also did NOT change the
        content model of <gi>ruby</gi>, but I did add a comment
        detailing the model to be used if we wish to disallow <code>
        rb, rb+, rt, rt+ </code> (that is, if we do not want a
        <gi>ruby</gi> to contain more than one <gi>rb</gi> AND more
        than one <gi>rt</gi>).
      </change>
    </revisionDesc>
    </teiHeader>

    <text>
    <body>
      <!--<div type="section">
        <head>Summary from original proposal</head>
        <p xml:lang="en" corresp="#n" xml:id="p">The ruby gloss, or
        rubi, or furigana, is a device for giving guidance on
        phonation, as well as presenting another reading for the text,
        which sometimes can be an essential part of the text. The
        structure of ruby glosses has, until recently, been
        established between characters and glosses, rather than words
        and glosses, since over a millennium ago. Such a textual
        structure requires another layer of semantics besides the
        existing set of vocabulary, such as <gi>seg</gi>,
        <gi>note</gi> (or <gi>add</gi>) and <gi>span</gi>. In its
        centuries of history, ruby glosses can also be double-sided.
        Reflecting on the current encoding schema in HTML5, this
        proposal discusses encoding examples attested in real texts,
        taking Taiwanese Bopomofo into consideration.</p>
        <p xml:lang="ja" corresp="#p" xml:id="n">ルビ（振り仮名）は、
        発音の手がかりを与える手段である一方、あらたな読解をテクストに
        与えることも多く、むしろ、それが本体であることすらあった。ルビ
        の構造は、ながらく本行の文字と振られる文字の関係を本位とし、単
        語本位ではなかった。このテクストの構造は、既存の語彙、たとえば
        <gi>seg</gi>、<gi>note</gi>（あるいは<gi>add</gi>）と
        <gi>span</gi>の組み合わせのような意味論では捉えられず、あらた
        な把握を要する。また長い歴史の中では、ルビはテクストの両側に付
        されることもあった。既存のHTMLでの符号化スキーマを参考にしつつ、
        この提案では、実例をもとに符号化の例を検討し、台湾の注音符号に
        及ぶ。</p>
      </div>-->
      
      <div>
        <head>Rough draft of proposed Guidelines section</head>
        
        <div>
          <head>Ruby Annotations</head>
          
          <p>The word <mentioned>ruby</mentioned> (or <mentioned>rubi</mentioned>) refers
          to a particular method of glossing runs of text which is common in East Asian scripts.
          In horizontally-oriented text, ruby annotations typically appear above the text being 
          glossed, while in vertical runs of text they may appear to the left or right, or both, 
          also oriented vertically.  An English example of a ruby annotation might look like this:
            
          </p>
          <figure><graphic width="10rem" url="Images/rbTEI.png" mimeType="image/png"/></figure>
          
          <p>In Japanese, furigana (振り仮名) ruby annotations are often used to provide pronunciation 
            guidance for readers; characters from the largely phonetic hiragana or katakana syllabaries
            accompany Chinese characters, like this:
          </p>
          
          <figure><graphic width="22rem" url="Images/rbNhkEasy.png" mimeType="image/png"/>
            <head>The first line of a news story from NHK News Web Easy intended for Japanese
            learners, in which every Chinese character has a ruby gloss.</head>
          </figure>
          
          <p>Pinyin ruby annotations are also used in Chinese to provide pronunciation guidance,
            and Zhuyin (注音) phonetic symbols (commonly known as <mentioned>bopomofo</mentioned>) are 
            used in Taiwan for the same purpose.</p>
          
          <p>The TEI schema provides many different ways of encoding glosses and annotations, from the
          simple and flexible <gi>note</gi> element to a native implementation of the  Web Annotation
            Data Model (<ptr target="#SASOann"/>), and the Unicode standard also provides a simple mechanism 
            based on <ref target="https://www.unicode.org/versions/Unicode13.0.0/ch23.pdf">Annotation 
              Characters</ref> to express the relationship between components of the main text stream and 
            inline annotations. However, ruby is a particular, distinct, and widely-used 
            form of annotation that appears in script, print, calligraphy, and web pages, and the TEI therefore 
            provides specific  elements for it:
          
          <specList>
            <specDesc key="ruby"/>
            <specDesc key="rb"/>
            <specDesc key="rt"/>
          </specList>
            The <gi>rt</gi> element also has <att>place</att>:
            
            <specList>
              <specDesc key="att.placement" atts="place"/>
            </specList>
          </p>
          
          <p>In its simplest representation, a glossed form consists of an <gi>rb</gi> (ruby base) element containing the
          base form, an <gi>rt</gi> (ruby text) element containing the gloss, and a <gi>ruby</gi> element which
          wraps them together:
          
          <egXML xmlns="http://www.tei-c.org/ns/Examples">
            <ruby>
              <rb>大学</rb>
              <rt place="above">だいがく</rt>
            </ruby>
          </egXML>
            
            Here the word <mentioned>大学</mentioned> (<mentioned>daigaku</mentioned> = university) is provided with
            a phonation gloss in hiragana. In the example above, the full gloss is applied to the complete word, but it might also be broken down
            by character:
            
            <egXML xmlns="http://www.tei-c.org/ns/Examples">
              <ruby>
                <rb>大</rb>
                <rt place="above">だい</rt>
              </ruby>
                <ruby>
                  <rb>学</rb>
                  <rt place="above">がく</rt>
                </ruby>
            </egXML>
            
            Here is a similar example from Taiwan using bopomofo:
            
            <egXML xmlns="http://www.tei-c.org/ns/Examples">
              <ruby>
                <rb>瓶</rb>
                <rt place="right">ㄆㄧㄥˊ</rt>
              </ruby>
              <ruby>
                <rb>子</rb>
                <rt place="right">˙ㄗ</rt>
              </ruby>
            </egXML>
            (pinyin <mentioned>píngzi</mentioned> = bottle, taken from 
            <ref target="https://en.wikipedia.org/wiki/Bopomofo">Wikipedia</ref>.)
            
            Where <att>place</att> is not provided, the default assumption is that the 
            ruby gloss is <emph>above</emph> where the text is horizontal, and to the
            <emph>right</emph> of the text where it is vertical. [NOTE: if proposed layout/@rubyPlace
            is implemented, it should be explained here.]
            
          </p>
          
          <p>The same ruby base may be accompanied by more than one gloss. 
            Here, the Japanese word <mentioned>打球場</mentioned> (dakyūba, or <gloss>billiard hall</gloss>)
            is glossed with  two different pronunciations: <q>biriyādo</q> (its English equivalent)
            and <q>dakyū</q>, a phonation guide for the first two characters.
          
          <figure>
            <head><mentioned>Billiard hall</mentioned> with two ruby glosses. 
              <ref target="http://school.nijl.ac.jp/kindai/NIJL/NIJL-01116.html#37">国文学研究資料館所蔵：：英国／龍動新繁昌記</ref>.</head>
            <graphic url="Images/billiardhall.jpg" width="149px" height="276px"/>
          </figure>
            
            This example is intriguing in that the right-side ruby 
            glosses apply to the first and second characters respectively, but
            the left-side gloss applies to the whole word as a unit. We use this 
            instance to exemplify multiple approaches to encoding the same
          phenomena, which may be appropriate for different projects or 
          editorial preferences. First, using the same segmentation approach 
          as demonstrated for <mentioned>大学</mentioned> above, but 
            with nesting:
           
            <egXML xmlns="http://www.tei-c.org/ns/Examples">
              <p style="writing-mode: vertical-rl">
                [...]
                <ruby>
                  <rb>
                    <ruby>
                      <rb>打</rb>
                      <rt place="right">ダ</rt>
                    </ruby> 
                    <ruby>
                      <rb>球</rb>
                      <rt place="right">キウ</rt>
                    </ruby>
                    場
                  </rb>
                  <rt place="left">ビリヤード</rt>
                </ruby>
                [...]
              </p>
            </egXML>
            
            We could also use a standoff approach with anchors and 
            pointers:
            
            <egXML xmlns="http://www.tei-c.org/ns/Examples">
              <p style="writing-mode: vertical-rl"> 
                [...]
                <ruby>
                  <rb>
                    <anchor xml:id="da"/>打
                    <anchor xml:id="kyuu"/>球
                     <anchor xml:id="ba"/>場
                    <anchor  xml:id="owari"/>
                  </rb>
                  <rt place="left" from="#da" to="#owari">ビリヤード</rt>
                  <rt place="right" from="#da" to="#kyuu">ダ</rt>
                  <rt place="right" from="#kyuu" to="#ba">キウ</rt>
                </ruby> 
                [...]
              </p>
            </egXML>
            
            Alternatively, if the encoding itself already includes segmentation below
            the word level, we can use the existing elements instead of adding <gi>anchor</gi>s:
            
            <egXML xmlns="http://www.tei-c.org/ns/Examples">
              <p style="writing-mode: vertical-rl"> 
                [...]
                  <ruby>
                    <rb xml:id="dakyuuba"><c xml:id="c1">打</c><c xml:id="c2">球</c><c>場</c></rb>
                    <rt place="left" target="#dakyuuba">ビリヤード</rt>
                    <rt place="right" target="#c1">ダ</rt>
                    <rt place="right" target="#c2">キウ</rt>
                  </ruby> 
                
                [...]
              </p>
            </egXML>
            
          </p>
        
        <p>The current support for ruby is rudimentary, and in future releases of the
        Guidelines we expect to see more development of these features and 
        recommendations.</p>
          
        </div>
        
      </div>
      
    </body>
    <back>
      <div>
        <schemaSpec
          xmlns:sch="http://purl.oclc.org/dsdl/schematron" prefix="r_"
          xmlns:r="http://www.tei-c.org/ns/proposal/ruby" ident="ruby"
          source="tei:4.1.0" start="TEI div p ab" >
          <desc>From proposal, then tweaked</desc>
  
          <moduleRef key="tei"/>                              <!-- required -->
          <moduleRef key="core"/>                             <!-- required -->
          <moduleRef key="analysis"/>
          <moduleRef key="certainty"/>
          <moduleRef key="corpus"/>
          <moduleRef key="dictionaries"/>
          <moduleRef key="drama"/>
          <moduleRef key="figures"/>
          <moduleRef key="gaiji"/>
          <moduleRef key="header"/>                           <!-- required -->
          <moduleRef key="iso-fs"/>
          <moduleRef key="linking"/>
          <moduleRef key="msdescription"/>
          <moduleRef key="namesdates"/>
          <moduleRef key="nets"/>
          <moduleRef key="spoken"/>
          <moduleRef key="textcrit"/>
          <moduleRef key="textstructure"/>                    <!-- required -->
          <moduleRef key="transcr"/>
          <moduleRef key="verse"/>
          
          <elementSpec ident="ruby" ns="http://www.tei-c.org/ns/proposal/ruby" mode="add">
            <gloss versionDate="2021-01-30">ruby container</gloss>
            <gloss versionDate="2021-01-31" xml:lang="ja">ルビのためのコンテナ要素。</gloss>
            <desc versionDate="2020-02-28" xml:lang="en">contains a
            passage of base text along with its associated ruby gloss(es).</desc>
            <desc versionDate="2021-01-31" xml:lang="ja">ルビ及びその対象となるテキストを含む。</desc>
            <classes>
              <memberOf key="att.global"/>
              <memberOf key="att.typed"/>
              <memberOf key="model.phrase"/>
            </classes>
            <content>
              <elementRef key="rb" minOccurs="1" maxOccurs="1"/>
              <elementRef key="rt" minOccurs="1" maxOccurs="unbounded"/>
            </content>
            <constraintSpec scheme="schematron" ident="NShack">
              <desc>The TEI Stylesheets that generate schematron-in-relaxng do
              not (yet) generate namespaces; thus this hack.</desc>
              <constraint>
                <sch:ns prefix="r" uri="http://www.tei-c.org/ns/proposal/ruby"/>
              </constraint>
            </constraintSpec>
            <!-- Obviously need <exemplum>s here :-) -->
          </elementSpec>
  
          <elementSpec ident="rt" ns="http://www.tei-c.org/ns/proposal/ruby" mode="add">
            <gloss versionDate="2021-01-30">ruby text</gloss>
            <gloss versionDate="2021-01-31" xml:lang="ja">ルビのテキスト。</gloss>
            <desc versionDate="2020-02-28" xml:lang="en">contains a ruby
            text, an annotation closely associated with a passage of the
            main text.</desc>
            <desc versionDate="2021-01-31" xml:lang="ja">本文の一部と密接な関連を持つ注釈（主に読み方）としてのルビテキストを含む。</desc>
            <classes>
              <memberOf key="att.global"/>
              <memberOf key="att.typed"/>
              <memberOf key="att.placement"/>
              <memberOf key="att.transcriptional"/>
            </classes>
            <content>
              <alternate minOccurs="1" maxOccurs="unbounded">
                <classRef key="macro.phraseSeq"/>
                <classRef key="model.segLike"/>
              </alternate>
            </content>
          <attList>
            <attDef ident="target" usage="opt">
              <desc versionDate="2021-01-08" xml:lang="en">supplies a pointer to the
              base being glossed by this ruby text.</desc>
              <desc versionDate="2021-01-31" xml:lang="ja">ルビテキストの対象へのポインタを示す。</desc>
              <datatype><dataRef key="teidata.pointer"/></datatype>
              <constraintSpec scheme="schematron" ident="rt-target-not-span">
                <!-- Note: this constraint should not be necessary, as
                     the desired semantics should be something we
                     could describe in PureODD using attList/@org. But
                     I don’t think we can. —Syd -->
                <desc>Enforce that <emph>either</emph>
                <att>target</att> or both <att>from</att> and
                <att>to</att> (or none) are used, but not
                <att>target</att> in combination with either
                <att>from</att> or <att>to</att>.</desc>
                <constraint>
                  <sch:report test="../@from | ../@to">When target= is
                  present, neither from= nor to= should be.</sch:report>
                </constraint>
              </constraintSpec>
              <remarks>
                <p>Should point to a single <gi>rb</gi> or an element
                that is inside an <gi>rb</gi>. To refer to multiple
                elements or text nodes at once use <att>from</att> and
                <att>to</att>.</p>
              </remarks>
            </attDef>
            <attDef ident="from" usage="opt" mode="add">
              <desc>points to the starting point of the span of text
                being glossed by this ruby text.</desc>
              <desc versionDate="2021-01-31" xml:lang="ja">ルビテキストの対象範囲の始点を示す。</desc>
              <datatype>
                <dataRef key="teidata.pointer"/>
              </datatype>
              <constraintSpec scheme="schematron" ident="rt-from">
                  <!-- Note: this constraint should not be necessary, as
                       the desired semantics should be something we
                       could describe in PureODD using attList/@org. But
                       I don’t think we can. —Syd -->
                  <desc>Enforce the presence of <att>to</att> iff there
                  is a <att>from</att>.</desc>
                  <constraint>
                    <sch:assert test="../@to" >When from= is present, the to=
                    attribute of <sch:name/> is required.</sch:assert>
                  </constraint>
                </constraintSpec>
              </attDef>
              <attDef ident="to" usage="opt" mode="add">
                <desc>points to the ending point of the span of text
                being glossed.</desc>
                <desc versionDate="2021-01-31" xml:lang="ja">ルビテキストの対象範囲の終点を示す。</desc>
                <datatype>
                  <dataRef key="teidata.pointer"/>
                </datatype>
              <constraintSpec scheme="schematron" ident="rt-to">
                  <!-- Note: this constraint should not be necessary, as
                       the desired semantics should be something we
                       could describe in PureODD using attList/@org. But
                       I don’t think we can. —Syd -->
                  <desc>Enforce the presence of <att>from</att> iff there
                  is a <att>to</att>.</desc>
                  <constraint>
                    <sch:assert test="../@from" >When to= is present, the from=
                    attribute of <sch:name/> is required.</sch:assert>
                  </constraint>
                </constraintSpec>
              </attDef>
            </attList>
          </elementSpec>
  
          <elementSpec ident="rb" ns="http://www.tei-c.org/ns/proposal/ruby" mode="add">
            <gloss versionDate="2021-01-30">ruby base</gloss>
            <gloss versionDate="2021-01-31" xml:lang="ja">ルビの対象となるテキスト。</gloss>
            <desc versionDate="2020-02-28" xml:lang="en">contains the
            base text annotated by a ruby gloss.</desc>
            <desc versionDate="2021-01-31" xml:lang="ja">一つ以上のルビの対象となるテキストを含む。</desc>
            <classes>
              <memberOf key="att.global"/>
              <memberOf key="att.typed"/>
            </classes>
            <content>
              <alternate minOccurs="1" maxOccurs="unbounded">
                <classRef key="macro.phraseSeq"/>
                <classRef key="model.segLike"/>
              </alternate>
            </content>
        </elementSpec>
  
        <elementSpec ident="w" module="analysis" mode="change">
          <content>
            <alternate minOccurs="0" maxOccurs="unbounded">
              <textNode/>
              <classRef key="model.gLike"/>
              <elementRef key="seg"/>
              <elementRef key="w"/>
              <elementRef key="m"/>
              <elementRef key="c"/>
              <elementRef key="pc"/>
              <elementRef key="ruby"/>
              <classRef key="model.global"/>
              <classRef key="model.lPart"/>
              <classRef key="model.hiLike"/>
              <classRef key="model.pPart.edit"/>
            </alternate>
          </content>
        </elementSpec>

        </schemaSpec>
      </div>
    </back>
  </text>
</TEI>