-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the first example (an octet 0x7F is percent encoded as "%23") correct? #545
Comments
I agree, 0x7F should not percent encode to %23. That looks like an error that should be fixed. |
Octet/Byte The URL Standard states in section 4.3 A byte is a sequence of eight bits... So it redefines the meaning of 'byte' to be an octet. Other standards are more aware of (and picky about) the difference. For example, the RFC list of the IETF avoids 'byte', and the MIME type application/octet-stream (RFC-1341) is deliberately not a byte-stream. My remark is not about confusion, but about consistent wording across standards. As a minimum concession one could rephrase the above definition to something like Following meanwhile widespread usage, a byte in this standard is always used as a synonym of an octet, a sequence of eight bits,... I am not going to pursue this issue beyond this post. I just noticed the slightly improper usage of 'byte' in this standard, because I am used to extremely precise wording in documents of this kind. Wolf Lammen |
Interesting, 0x23 was suggested at #503 (comment) so I suppose I did some copypasta. I can create a PR. It's not the URL Standard that defines byte, but the Infra Standard (in that section indeed), and all WHATWG standards (and quite a few W3C standards) are consistent about using that. We chose that because the difference doesn't come up in practice and it's the term everyone is already familiar with. I wouldn't say it's not precise though as it has a rather exact definition that is linked from where it's used. |
Thanks for reporting this Wolf! |
Hi,
This is my first post to this group, so I apologize for any violation of rules I might have committed.
I refer to the first example in the examples block at the end of section 1.3 Percent-encoded bytes in the URL standard, that reads
Percent-encode input | 0x7F | "%23"
I don't understand this example and think it is incorrect. If I follow the links and do what they suggest, I come up with "%7F" instead.
If, for some reason, this particular octet encoding receives special handling, I think this should be pointed out more clearly.
By the way, what you call a byte (in its original meaning it denotes the smallest addressable unit in memory, and that can consist of more or less than 8 bits), is in fact an octet. I concede this meaning has become out of use for decades now, and most people identify a byte with an octet, the correct name for an 8-bit unit. Maybe one should hint to this alternative somewhere?
Wolf Lammen
The text was updated successfully, but these errors were encountered: