JLReq TF meeting notes 2024-01-09 #394

kidayasuo · 2024-01-12T01:02:12Z

kidayasuo
Jan 12, 2024
Collaborator

JLReq TF meeting notes 2024-01-09

(Japanese translation follows)

Attendee

atsushi, kida, kobatake, makoto, shinyu, suzuki, tajima, tatsuo, toshi, yamamoto, yamashige

Administrative matters

Resolved Discussions and reflecting the result to documents

We have GH issues where consensus has been reached and therefore further discussion isn't necessary but left open because they have not been reflected to the jlreq-d document. Is there a way to manage them better than leaving them open?

→ Agreed to separate issues for discussions and issues that are used to reflect it to a document. Closed jlreq/#389.

todo: kida to close following issues and instead create issues to reflect them in jlreq-d.

Come up with a definition of 全角 that are generic enough to cover non-square typefaces jlreq-d/#35
Warichu and string searching operations jlreq-d/#45
Ideal spacing before fullwidth opening punctuations at the beginning of sentences or lines jlreq-d/#37

JLReq-d document structure

Following the agreement at the last meeting kida is restructuring the document. The team reviewed the sub-structure of Chapter 2, “How to create Japanese digital text”.

The following is proposed sub-structure
2 How to create Japanese digital text
2.1 Characters used by Japanese digital text (previous 1.2)
2.2 Character set
2.3 How to use characters

Latin characters (proportional, fullwidth, spacing between Latin and Kanji)
Numbers
Punctuation
- Parentheses
- Dash
- Others
  2.4 Vertical writing
  2.5 Romanization and Japanese input

Discussions and consensus:

To clearly indicate that the jlreq-d document is about digital text and not about just general Japanese text such as hand-written or other non-digital form, use the term “Japanese digital text” rather than just “Japanese text”.
Mention confusable or easily misused characters in 2.3 as necessary. No independent section for this.
- Mention that < ＜ LESS-THAN SIGN or its fullwidth form should not be used as parentheses. It causes issues with line layout.
- c.f. Unicode’s confusable character data
- Do not try to come up with an exhaustive list. it is not the role of jlreq-d document.
Somewhere in the document mention that web forms should not force fullwidth or regular numbers. should allow flexibility.
Mention roles of the input or text system. For example it is not realistic to expect uses to pick right dash out of a large number of dashes in Unicode
縦組み instead of 縦書き (in English Vertical setting/composition instead of Vertical writing).
There are more constraints on vertical composition (e.g. arabic numbers and latin text are harder to read). One way of solving this is to create or modify text so that they are suitable for vertical composition. A different approach is to in a sense ignore these constraints and make whatever text vertical with layout rules, for example how you layout numbers in different width. The earlier is a typical approach but the latter is certainly possible. Mention this as an alternative approach.
2.5 Romanization is not necessary in jlreq-d.
We should cover issues with Japanese text entry. If there are enough contents make it a section. if not mention them in related sections.
- There are characters that can’t be or hard to enter with keyboards.
- What to do with voice input?
- It is not realistic to expect users to choose right punctuation.
Other topics that should be covered in chapter 2 or elsewhere.
- Using right character supports text search
- Issues with copy & paste
- Digital text are to be re-used
- PDF and images are not accessible.

General discussions

Kobayashi-san pointed out that the meaning of writing text is different between writing on paper and writing digital text. When you are writing on paper you are writing glyph shapes. The semantics of each character is something that is inferred by the reader using the context. For example whether a dash looking character is prolonged vowel mark or m dash is depends on interpretation rather than difference in the glyphs shape.

Writing digital text is (at least in theory) entering character codes following semantics of each character. You are not entering glyph shapes. There is a fundamental difference between the two. Kobayashi-san to write up something about it. kida to include this discussion in the 2nd chapter.

Bin-sensei followed up that in the traditional printing workflow, the layout professional would have picked characters (sementics) by interpreting the original text that is hand-written. Authors themselves now have this role in digital workflow. It is challenging however. We see misuses of characters and confusions.

Yamamoto-san: For Latin most if not all punctuation are covered in Unicode but it is not the case for Japanese. (kida: we did a research on this topic a few years ago. we should dig it up and come up with a proposal)

Tajima-san: re:vertical text: There are editors who prefer making single digit fullwidth. If the layout can achieve what they want to achieve, we can just use regular numbers.

Could not cover following agenda

Spacing between proportional and fullwidth letters
Reviewing open GitHub issues

管理的な議題

合意のあった議論と、その結果の文書への反映

合意が取れ、そのためにはさらなる議論は必要ありませんが、jlreq-d文書への反映がまだ行われていないためにオープンになっているGH issueがいくつかあります。これらをうまく管理する方法はあるか？

→ 議論用と文書への反映用の課題を分けることに同意。[jlreq/#389]はクローズしました。

todo: kidaは以下の課題をクローズし、代わりにjlreq-dに反映させる課題を作成します。

非正方形の書体に適用できる全角の定義を提案する [jlreq-d/#35]
Warichuおよび文字列検索操作 [jlreq-d/#45]
文や行の先頭の全角の開き括弧の前の理想的なスペーシング [jlreq-d/#37]

JLReq-d文書の構造

前回の会議での合意に従い、kidaさんが文書を再構築しています。チームは第2章「日本語デジタルテキストの作成方法」のサブ構造を検討しました。

以下は木田が提案したサブ構造です。
2 日本語デジタルテキストの作成方法
2.1 日本語デジタルテキストで使用される文字（以前の1.2）
2.2 文字セット
2.3 文字の使用方法

ラテン文字（比例、全角、ラテン文字と漢字の間のスペーシング）
数字
句読点
- 丸括弧
- ダッシュ
- その他
  2.4 縦書き
  2.5 ローマ字表記と日本語入力

議論と合意事項：

jlreq-d文書がデジタルテキストに関するものであり、手書きやその他の非デジタル形式の一般的な日本語テキストに関するものではないことを明示するために、「日本語デジタルテキスト」という用語を使用する。
2.3で混同されやすいまたは誤用される文字を必要に応じて示す。独立したセクションはなし。
- ＜ LESS-THAN SIGNまたはその全角形を丸括弧として使用しないようにメンション。これは行のレイアウトに問題を引き起こします。
- c.f. Unicodeの混同文字データ
- すべてを網羅しようとしないでください。jlreq-d文書の役割ではありません。
どこかでweb formsが全角の数字などを強制してはいけないことを述べる。柔軟性を許可するべきです。
入力またはテキストシステムの役割を示す。例えば、Unicodeの多くのダッシュから適切なダッシュを選ぶことをユーザーに期待するのは現実的ではない。
縦組み（英語でVertical setting/composition）の代わりに縦書き（Vertical writing）をメンション。
縦の組版には制約が多い（例：アラビア数字やラテン文字は読みにくい）が、これに対処する方法の一つは、テキストを縦組版に適したものに作成または変更することです。別のアプローチは、一種のレイアウトルールでテキストを縦にすることでこれらの制約を無視することです。前者は典型的なアプローチですが、後者も可能です。これを代替アプローチとして示してください。
2.5のローマ字表記はjlreq-dには不要です。
日本語テキストの入力に関連する問題をカバーすべきです。コンテンツが十分ある場合はセクションにしてください。そうでない場合は関連するセクションでメンションしてください。
- キーボードでは入力できないまたは難しい文字があります。
- 音声入力の取り扱いは？
- ユーザーに正しい句読点を選択することを期待するのは現実的ではありません。
第2章または他の場所でカバーすべき他のトピック。
- 正しい文字を使用してテキスト検索をサポートする
- コピー＆ペーストの問題
- デジタルテキストは再利用されるべきです
- PDFと画像はアクセス可能ではありません。

一般的な議論

小林さんは、紙に書き込む場合とデジタルテキストを書き込む場合では書くことの意味が異なることを指摘した。紙に書く場合は、形、グリフを直接書き込みます。各グリフの意味は読者が文脈を利用して推測するものです。たとえば、棒のように見える形が長音記号なのかMダッシュなのかは、コンテキストによった解釈に依存します。

デジタルテキストを書く場合は、入力されるものは文字コードであって、グリフの形状を入力しているわけではありません。それぞれの文字は（少なくとも理論的には）各々のセマンティクスを持っています。

この二つには根本的な違いがあります。小林さんはこの点について何か書くことを検討中で、kidaさんはこの議論を第2章の前書きに含める予定です。

敏先生は、従来の印刷ワークフローでは、レイアウトのプロフェッショナルが手書きの原文を解釈して文字（セマンティクス）を選択してたことを指摘しました。しかし、デジタルワークフローでは著者自身がこの役割を果たす必要があります。これは挑戦的な課題です。文字の誤用や混同が見られます。

山本さん：ラテン文字に関しては、ほぼすべての必要な句読点がUnicodeでカバーされていますが、日本語の場合はそうではありません。（kida：このトピックについては数年前に調査を行いました。それを掘り起こして提案を出すべきです）

田島さん：縦テキストに関して、一桁の全角を好む編集者多くもいます。レイアウトが望む結果を得ることができれば、通常の数字を使用できます。

以下の議題はカバーできませんでした

比例および全角文字間のスペーシング
オープンなGitHubの課題のレビュー

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JLReq TF meeting notes 2024-01-09 #394

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

JLReq TF meeting notes 2024-01-09 #394

kidayasuo Jan 12, 2024 Collaborator

JLReq TF meeting notes 2024-01-09

Attendee

Administrative matters

Resolved Discussions and reflecting the result to documents

JLReq-d document structure

General discussions

Could not cover following agenda

管理的な議題

合意のあった議論と、その結果の文書への反映

JLReq-d文書の構造

議論と合意事項：

一般的な議論

以下の議題はカバーできませんでした

Replies: 0 comments

kidayasuo
Jan 12, 2024
Collaborator