-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathruby.odd
510 lines (470 loc) · 24.4 KB
/
ruby.odd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
<?xml version="1.0" encoding="utf-8"?>
<!-- $Id: ruby.odd 3426 2021-01-08 23:51:50Z syd $ -->
<?xml-model href="http://www.tei-c.org/Vault/P5/3.6.0/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/Vault/P5/3.6.0/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Ruby</title>
<title type="sub">a TEI customization to demonstrate encoding
of ruby annotations</title>
<sponsor>East Asian / Japanese SIG</sponsor>
<author>
<persName>Kazuhiro Okada</persName>
<persName>Satoru Nakamura</persName>
<persName>Kiyonori Nagasaki</persName>
</author>
<respStmt>
<resp>In cooperation with</resp>
<persName>So Miyagawa</persName>
<persName>Yifan Wang</persName>
<persName>Nobutake Kamiya</persName>
<persName>Naoki Kokaze</persName>
<persName>Martin Holmes</persName>
<persName>Syd Bauman</persName>
</respStmt>
</titleStmt>
<publicationStmt>
<publisher>TEI</publisher>
<availability>
<licence target="https://creativecommons.org/licenses/by-sa/4.0/">
<choice>
<abbr>CC BY-SA 4.0</abbr>
<expan>Attribution-ShareAlike 4.0 International</expan>
</choice>
</licence>
<ab>Intended for internal TEI use.</ab>
</availability>
</publicationStmt>
<sourceDesc>
<p>Born digital, based heavily on the ODD snippets in <ref
target="https://docs.google.com/document/d/1rRgBRYpxUlsnMEHbOPW-92u_AIMNwq9axGtpI1UGgz8">Proposal
to add <gi>ruby</gi>, <gi>rb</gi>, and <gi>rt</gi> to encode
ruby glosses for Japanese texts ルビ符号化要素(<gi>ruby</gi>,
<gi>rb</gi>,<gi>rt</gi>)の追加申請</ref> as of <date
when="2020-10-25">25 Oct 20</date>.</p>
</sourceDesc>
</fileDesc>
<revisionDesc>
<change who="#SB" when="2021-01-15">
<list>
<item>Martin Holmes and I noticed that <att>target</att> was
inadvertently a required attribute of <gi>rt</gi>;
fixed.</item>
<item>Martin believes (strongly) that a <gi>ruby</gi> should
have only 1 child <gi>rb</gi>, and I do not disagree
(although I also do not claim sufficient expertise to be
sure). Since it is <emph>far</emph> easier to change a “1
only” rule to “many allowed” than vice-versa, we are limiting
the content model to <code> rb, rt+ </code>.</item>
</list>
</change>
<change who="#SB" when="2021-01-08">
With Martin Holmes, added <att>target</att> to <gi>rt</gi> for
pointing to a container element rather than empty booundaries
(<gi>anchor</gi>), including constraint that <att>target</att>
not be used with either <att>from</att> or <att>to</att>. Also
changed the constraint specs to require <att>to</att> iff
<att>from</att>. (We already had constraint requiring
<att>from</att> iff <att>to</att>.)</change>
<change who="#SB" when="2021-01-08">
Convinced by Martin Holmes to permit <gi>ruby</gi> inside
<gi>w</gi>, so changed the content model of <gi>w</gi>.
</change>
<change who="#SB" when="2020-10-25">
Inspired by the presentation by the authors at today’s
(virtual) TEI Council meeting, I took it upon myself to
generate an actual ODD from the snippets in their proposal.
Major changes (that I can remember):
<list>
<item>Made <gi>ruby</gi> a member of model.phrase (it makes
no sense to declare membership in a macro)</item>
<item>Changed namespace to something more official (y’all
deserve it)</item>
<item>Re-worded descriptions a bit</item>
<item>Eliminated model.rubyPart, as it was unused</item>
<item>Eliminated model.rubyLike, as it did not do much — this
may be a controversial change</item>
<item>Fixed constraint requiring <att>to</att> iff
<att>from</att></item>
</list>
I did NOT change the names of the <att>to</att> and
<att>from</att> attributes, which already have uses in TEI. I
would prefer to see <att>kara</att> and either <att>ni</att>
or <att>mukatte</att> or some such. I also did NOT change the
content model of <gi>ruby</gi>, but I did add a comment
detailing the model to be used if we wish to disallow <code>
rb, rb+, rt, rt+ </code> (that is, if we do not want a
<gi>ruby</gi> to contain more than one <gi>rb</gi> AND more
than one <gi>rt</gi>).
</change>
</revisionDesc>
</teiHeader>
<text>
<body>
<!--<div type="section">
<head>Summary from original proposal</head>
<p xml:lang="en" corresp="#n" xml:id="p">The ruby gloss, or
rubi, or furigana, is a device for giving guidance on
phonation, as well as presenting another reading for the text,
which sometimes can be an essential part of the text. The
structure of ruby glosses has, until recently, been
established between characters and glosses, rather than words
and glosses, since over a millennium ago. Such a textual
structure requires another layer of semantics besides the
existing set of vocabulary, such as <gi>seg</gi>,
<gi>note</gi> (or <gi>add</gi>) and <gi>span</gi>. In its
centuries of history, ruby glosses can also be double-sided.
Reflecting on the current encoding schema in HTML5, this
proposal discusses encoding examples attested in real texts,
taking Taiwanese Bopomofo into consideration.</p>
<p xml:lang="ja" corresp="#p" xml:id="n">ルビ(振り仮名)は、
発音の手がかりを与える手段である一方、あらたな読解をテクストに
与えることも多く、むしろ、それが本体であることすらあった。ルビ
の構造は、ながらく本行の文字と振られる文字の関係を本位とし、単
語本位ではなかった。このテクストの構造は、既存の語彙、たとえば
<gi>seg</gi>、<gi>note</gi>(あるいは<gi>add</gi>)と
<gi>span</gi>の組み合わせのような意味論では捉えられず、あらた
な把握を要する。また長い歴史の中では、ルビはテクストの両側に付
されることもあった。既存のHTMLでの符号化スキーマを参考にしつつ、
この提案では、実例をもとに符号化の例を検討し、台湾の注音符号に
及ぶ。</p>
</div>-->
<div>
<head>Rough draft of proposed Guidelines section</head>
<div>
<head>Ruby Annotations</head>
<p>The word <mentioned>ruby</mentioned> (or <mentioned>rubi</mentioned>) refers
to a particular method of glossing runs of text which is common in East Asian scripts.
In horizontally-oriented text, ruby annotations typically appear above the text being
glossed, while in vertical runs of text they may appear to the left or right, or both,
also oriented vertically. An English example of a ruby annotation might look like this:
</p>
<figure><graphic width="10rem" url="Images/rbTEI.png" mimeType="image/png"/></figure>
<p>In Japanese, furigana (振り仮名) ruby annotations are often used to provide pronunciation
guidance for readers; characters from the largely phonetic hiragana or katakana syllabaries
accompany Chinese characters, like this:
</p>
<figure><graphic width="22rem" url="Images/rbNhkEasy.png" mimeType="image/png"/>
<head>The first line of a news story from NHK News Web Easy intended for Japanese
learners, in which every Chinese character has a ruby gloss.</head>
</figure>
<p>Pinyin ruby annotations are also used in Chinese to provide pronunciation guidance,
and Zhuyin (注音) phonetic symbols (commonly known as <mentioned>bopomofo</mentioned>) are
used in Taiwan for the same purpose.</p>
<p>The TEI schema provides many different ways of encoding glosses and annotations, from the
simple and flexible <gi>note</gi> element to a native implementation of the Web Annotation
Data Model (<ptr target="#SASOann"/>), and the Unicode standard also provides a simple mechanism
based on <ref target="https://www.unicode.org/versions/Unicode13.0.0/ch23.pdf">Annotation
Characters</ref> to express the relationship between components of the main text stream and
inline annotations. However, ruby is a particular, distinct, and widely-used
form of annotation that appears in script, print, calligraphy, and web pages, and the TEI therefore
provides specific elements for it:
<specList>
<specDesc key="ruby"/>
<specDesc key="rb"/>
<specDesc key="rt"/>
</specList>
The <gi>rt</gi> element also has <att>place</att>:
<specList>
<specDesc key="att.placement" atts="place"/>
</specList>
</p>
<p>In its simplest representation, a glossed form consists of an <gi>rb</gi> (ruby base) element containing the
base form, an <gi>rt</gi> (ruby text) element containing the gloss, and a <gi>ruby</gi> element which
wraps them together:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<ruby>
<rb>大学</rb>
<rt place="above">だいがく</rt>
</ruby>
</egXML>
Here the word <mentioned>大学</mentioned> (<mentioned>daigaku</mentioned> = university) is provided with
a phonation gloss in hiragana. In the example above, the full gloss is applied to the complete word, but it might also be broken down
by character:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<ruby>
<rb>大</rb>
<rt place="above">だい</rt>
</ruby>
<ruby>
<rb>学</rb>
<rt place="above">がく</rt>
</ruby>
</egXML>
Here is a similar example from Taiwan using bopomofo:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<ruby>
<rb>瓶</rb>
<rt place="right">ㄆㄧㄥˊ</rt>
</ruby>
<ruby>
<rb>子</rb>
<rt place="right">˙ㄗ</rt>
</ruby>
</egXML>
(pinyin <mentioned>píngzi</mentioned> = bottle, taken from
<ref target="https://en.wikipedia.org/wiki/Bopomofo">Wikipedia</ref>.)
Where <att>place</att> is not provided, the default assumption is that the
ruby gloss is <emph>above</emph> where the text is horizontal, and to the
<emph>right</emph> of the text where it is vertical. [NOTE: if proposed layout/@rubyPlace
is implemented, it should be explained here.]
</p>
<p>The same ruby base may be accompanied by more than one gloss.
Here, the Japanese word <mentioned>打球場</mentioned> (dakyūba, or <gloss>billiard hall</gloss>)
is glossed with two different pronunciations: <q>biriyādo</q> (its English equivalent)
and <q>dakyū</q>, a phonation guide for the first two characters.
<figure>
<head><mentioned>Billiard hall</mentioned> with two ruby glosses.
<ref target="http://school.nijl.ac.jp/kindai/NIJL/NIJL-01116.html#37">国文学研究資料館所蔵::英国/龍動新繁昌記</ref>.</head>
<graphic url="Images/billiardhall.jpg" width="149px" height="276px"/>
</figure>
This example is intriguing in that the right-side ruby
glosses apply to the first and second characters respectively, but
the left-side gloss applies to the whole word as a unit. We use this
instance to exemplify multiple approaches to encoding the same
phenomena, which may be appropriate for different projects or
editorial preferences. First, using the same segmentation approach
as demonstrated for <mentioned>大学</mentioned> above, but
with nesting:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<p style="writing-mode: vertical-rl">
[...]
<ruby>
<rb>
<ruby>
<rb>打</rb>
<rt place="right">ダ</rt>
</ruby>
<ruby>
<rb>球</rb>
<rt place="right">キウ</rt>
</ruby>
場
</rb>
<rt place="left">ビリヤード</rt>
</ruby>
[...]
</p>
</egXML>
We could also use a standoff approach with anchors and
pointers:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<p style="writing-mode: vertical-rl">
[...]
<ruby>
<rb>
<anchor xml:id="da"/>打
<anchor xml:id="kyuu"/>球
<anchor xml:id="ba"/>場
<anchor xml:id="owari"/>
</rb>
<rt place="left" from="#da" to="#owari">ビリヤード</rt>
<rt place="right" from="#da" to="#kyuu">ダ</rt>
<rt place="right" from="#kyuu" to="#ba">キウ</rt>
</ruby>
[...]
</p>
</egXML>
Alternatively, if the encoding itself already includes segmentation below
the word level, we can use the existing elements instead of adding <gi>anchor</gi>s:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<p style="writing-mode: vertical-rl">
[...]
<ruby>
<rb xml:id="dakyuuba"><c xml:id="c1">打</c><c xml:id="c2">球</c><c>場</c></rb>
<rt place="left" target="#dakyuuba">ビリヤード</rt>
<rt place="right" target="#c1">ダ</rt>
<rt place="right" target="#c2">キウ</rt>
</ruby>
[...]
</p>
</egXML>
</p>
<p>The current support for ruby is rudimentary, and in future releases of the
Guidelines we expect to see more development of these features and
recommendations.</p>
</div>
</div>
</body>
<back>
<div>
<schemaSpec
xmlns:sch="http://purl.oclc.org/dsdl/schematron" prefix="r_"
xmlns:r="http://www.tei-c.org/ns/proposal/ruby" ident="ruby"
source="tei:4.1.0" start="TEI div p ab" >
<desc>From proposal, then tweaked</desc>
<moduleRef key="tei"/> <!-- required -->
<moduleRef key="core"/> <!-- required -->
<moduleRef key="analysis"/>
<moduleRef key="certainty"/>
<moduleRef key="corpus"/>
<moduleRef key="dictionaries"/>
<moduleRef key="drama"/>
<moduleRef key="figures"/>
<moduleRef key="gaiji"/>
<moduleRef key="header"/> <!-- required -->
<moduleRef key="iso-fs"/>
<moduleRef key="linking"/>
<moduleRef key="msdescription"/>
<moduleRef key="namesdates"/>
<moduleRef key="nets"/>
<moduleRef key="spoken"/>
<moduleRef key="textcrit"/>
<moduleRef key="textstructure"/> <!-- required -->
<moduleRef key="transcr"/>
<moduleRef key="verse"/>
<elementSpec ident="ruby" ns="http://www.tei-c.org/ns/proposal/ruby" mode="add">
<gloss versionDate="2021-01-30">ruby container</gloss>
<gloss versionDate="2021-01-31" xml:lang="ja">ルビのためのコンテナ要素。</gloss>
<desc versionDate="2020-02-28" xml:lang="en">contains a
passage of base text along with its associated ruby gloss(es).</desc>
<desc versionDate="2021-01-31" xml:lang="ja">ルビ及びその対象となるテキストを含む。</desc>
<classes>
<memberOf key="att.global"/>
<memberOf key="att.typed"/>
<memberOf key="model.phrase"/>
</classes>
<content>
<elementRef key="rb" minOccurs="1" maxOccurs="1"/>
<elementRef key="rt" minOccurs="1" maxOccurs="unbounded"/>
</content>
<constraintSpec scheme="schematron" ident="NShack">
<desc>The TEI Stylesheets that generate schematron-in-relaxng do
not (yet) generate namespaces; thus this hack.</desc>
<constraint>
<sch:ns prefix="r" uri="http://www.tei-c.org/ns/proposal/ruby"/>
</constraint>
</constraintSpec>
<!-- Obviously need <exemplum>s here :-) -->
</elementSpec>
<elementSpec ident="rt" ns="http://www.tei-c.org/ns/proposal/ruby" mode="add">
<gloss versionDate="2021-01-30">ruby text</gloss>
<gloss versionDate="2021-01-31" xml:lang="ja">ルビのテキスト。</gloss>
<desc versionDate="2020-02-28" xml:lang="en">contains a ruby
text, an annotation closely associated with a passage of the
main text.</desc>
<desc versionDate="2021-01-31" xml:lang="ja">本文の一部と密接な関連を持つ注釈(主に読み方)としてのルビテキストを含む。</desc>
<classes>
<memberOf key="att.global"/>
<memberOf key="att.typed"/>
<memberOf key="att.placement"/>
<memberOf key="att.transcriptional"/>
</classes>
<content>
<alternate minOccurs="1" maxOccurs="unbounded">
<classRef key="macro.phraseSeq"/>
<classRef key="model.segLike"/>
</alternate>
</content>
<attList>
<attDef ident="target" usage="opt">
<desc versionDate="2021-01-08" xml:lang="en">supplies a pointer to the
base being glossed by this ruby text.</desc>
<desc versionDate="2021-01-31" xml:lang="ja">ルビテキストの対象へのポインタを示す。</desc>
<datatype><dataRef key="teidata.pointer"/></datatype>
<constraintSpec scheme="schematron" ident="rt-target-not-span">
<!-- Note: this constraint should not be necessary, as
the desired semantics should be something we
could describe in PureODD using attList/@org. But
I don’t think we can. —Syd -->
<desc>Enforce that <emph>either</emph>
<att>target</att> or both <att>from</att> and
<att>to</att> (or none) are used, but not
<att>target</att> in combination with either
<att>from</att> or <att>to</att>.</desc>
<constraint>
<sch:report test="../@from | ../@to">When target= is
present, neither from= nor to= should be.</sch:report>
</constraint>
</constraintSpec>
<remarks>
<p>Should point to a single <gi>rb</gi> or an element
that is inside an <gi>rb</gi>. To refer to multiple
elements or text nodes at once use <att>from</att> and
<att>to</att>.</p>
</remarks>
</attDef>
<attDef ident="from" usage="opt" mode="add">
<desc>points to the starting point of the span of text
being glossed by this ruby text.</desc>
<desc versionDate="2021-01-31" xml:lang="ja">ルビテキストの対象範囲の始点を示す。</desc>
<datatype>
<dataRef key="teidata.pointer"/>
</datatype>
<constraintSpec scheme="schematron" ident="rt-from">
<!-- Note: this constraint should not be necessary, as
the desired semantics should be something we
could describe in PureODD using attList/@org. But
I don’t think we can. —Syd -->
<desc>Enforce the presence of <att>to</att> iff there
is a <att>from</att>.</desc>
<constraint>
<sch:assert test="../@to" >When from= is present, the to=
attribute of <sch:name/> is required.</sch:assert>
</constraint>
</constraintSpec>
</attDef>
<attDef ident="to" usage="opt" mode="add">
<desc>points to the ending point of the span of text
being glossed.</desc>
<desc versionDate="2021-01-31" xml:lang="ja">ルビテキストの対象範囲の終点を示す。</desc>
<datatype>
<dataRef key="teidata.pointer"/>
</datatype>
<constraintSpec scheme="schematron" ident="rt-to">
<!-- Note: this constraint should not be necessary, as
the desired semantics should be something we
could describe in PureODD using attList/@org. But
I don’t think we can. —Syd -->
<desc>Enforce the presence of <att>from</att> iff there
is a <att>to</att>.</desc>
<constraint>
<sch:assert test="../@from" >When to= is present, the from=
attribute of <sch:name/> is required.</sch:assert>
</constraint>
</constraintSpec>
</attDef>
</attList>
</elementSpec>
<elementSpec ident="rb" ns="http://www.tei-c.org/ns/proposal/ruby" mode="add">
<gloss versionDate="2021-01-30">ruby base</gloss>
<gloss versionDate="2021-01-31" xml:lang="ja">ルビの対象となるテキスト。</gloss>
<desc versionDate="2020-02-28" xml:lang="en">contains the
base text annotated by a ruby gloss.</desc>
<desc versionDate="2021-01-31" xml:lang="ja">一つ以上のルビの対象となるテキストを含む。</desc>
<classes>
<memberOf key="att.global"/>
<memberOf key="att.typed"/>
</classes>
<content>
<alternate minOccurs="1" maxOccurs="unbounded">
<classRef key="macro.phraseSeq"/>
<classRef key="model.segLike"/>
</alternate>
</content>
</elementSpec>
<elementSpec ident="w" module="analysis" mode="change">
<content>
<alternate minOccurs="0" maxOccurs="unbounded">
<textNode/>
<classRef key="model.gLike"/>
<elementRef key="seg"/>
<elementRef key="w"/>
<elementRef key="m"/>
<elementRef key="c"/>
<elementRef key="pc"/>
<elementRef key="ruby"/>
<classRef key="model.global"/>
<classRef key="model.lPart"/>
<classRef key="model.hiLike"/>
<classRef key="model.pPart.edit"/>
</alternate>
</content>
</elementSpec>
</schemaSpec>
</div>
</back>
</text>
</TEI>