Skip to content

Commit

Permalink
Update docs for GBK and gb18030
Browse files Browse the repository at this point in the history
  • Loading branch information
hsivonen committed Oct 24, 2024
1 parent 7f62c7f commit e60a65a
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 14 deletions.
7 changes: 4 additions & 3 deletions doc/GBK.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
/// The decoder for this encoding is the same as the decoder for gb18030.
/// The encoder side of this encoding is GBK with Windows code page 936 euro
/// sign behavior. GBK extends GB2312-80 to cover the CJK Unified Ideographs
/// Unicode block as well as a handful of ideographs from the CJK Unified
/// Ideographs Extension A and CJK Compatibility Ideographs blocks.
/// sign behavior and with the changes to two-byte sequences made in GB18030-2022.
/// GBK extends GB2312-80 to cover the CJK Unified Ideographs Unicode block as
/// well as a handful of ideographs from the CJK Unified Ideographs Extension A
/// and CJK Compatibility Ideographs blocks.
///
/// Unlike e.g. in the case of ISO-8859-1 and windows-1252, GBK encoder wasn't
/// unified with the gb18030 encoder in the Encoding Standard out of concern
Expand Down
9 changes: 5 additions & 4 deletions doc/gb18030.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
/// This encoding matches GB18030-2005 except the two-byte sequence 0xA3 0xA0
/// maps to U+3000 for compatibility with existing Web content. As a result,
/// this encoding can represent all of Unicode except for the private-use
/// character U+E5E5.
/// This encoding matches GB18030-2022 except the two-byte sequence 0xA3 0xA0
/// maps to U+3000 for compatibility with existing Web content and the four-byte
/// sequences for the non-PUA characters that got two-byte sequences still decode
/// to the same non-PUA characters as in GB18030-2005. As a result, this encoding
/// can represent all of Unicode except for 19 private-use characters.
///
/// [Index visualization for the two-byte sequences](https://encoding.spec.whatwg.org/gb18030.html),
/// [Visualization of BMP coverage of the two-byte index](https://encoding.spec.whatwg.org/gb18030-bmp.html)
Expand Down
16 changes: 9 additions & 7 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -946,9 +946,10 @@ pub static GBK_INIT: Encoding = Encoding {
///
/// The decoder for this encoding is the same as the decoder for gb18030.
/// The encoder side of this encoding is GBK with Windows code page 936 euro
/// sign behavior. GBK extends GB2312-80 to cover the CJK Unified Ideographs
/// Unicode block as well as a handful of ideographs from the CJK Unified
/// Ideographs Extension A and CJK Compatibility Ideographs blocks.
/// sign behavior and with the changes to two-byte sequences made in GB18030-2022.
/// GBK extends GB2312-80 to cover the CJK Unified Ideographs Unicode block as
/// well as a handful of ideographs from the CJK Unified Ideographs Extension A
/// and CJK Compatibility Ideographs blocks.
///
/// Unlike e.g. in the case of ISO-8859-1 and windows-1252, GBK encoder wasn't
/// unified with the gb18030 encoder in the Encoding Standard out of concern
Expand Down Expand Up @@ -1690,10 +1691,11 @@ pub static GB18030_INIT: Encoding = Encoding {

/// The gb18030 encoding.
///
/// This encoding matches GB18030-2005 except the two-byte sequence 0xA3 0xA0
/// maps to U+3000 for compatibility with existing Web content. As a result,
/// this encoding can represent all of Unicode except for the private-use
/// character U+E5E5.
/// This encoding matches GB18030-2022 except the two-byte sequence 0xA3 0xA0
/// maps to U+3000 for compatibility with existing Web content and the four-byte
/// sequences for the non-PUA characters that got two-byte sequences still decode
/// to the same non-PUA characters as in GB18030-2005. As a result, this encoding
/// can represent all of Unicode except for 19 private-use characters.
///
/// [Index visualization for the two-byte sequences](https://encoding.spec.whatwg.org/gb18030.html),
/// [Visualization of BMP coverage of the two-byte index](https://encoding.spec.whatwg.org/gb18030-bmp.html)
Expand Down

0 comments on commit e60a65a

Please sign in to comment.