-
Notifications
You must be signed in to change notification settings - Fork 15
A Note on Encodings
Getting encodings right is always a pain, so this document serves as an overview of the (potentially) different encodings involved when using the library. If you use UTF-8 everywhere on your page and your server, you may safely ignore this.
This distinction can lead to quite some confusion. URLs are usually URL encoded. In most cases, this means replacing a questionable character with it's code point in the form %xx
, such that a space (for instance) becomes %20
.
The question is, how are characters outside of the ASCII range encoded. The answer is that URL encoding doesn't care about that. URL encoding simply encodes bytes. How those bytes are interpreted after applying URL decoding is up to the server, and it could really be any encoding. Say we have a query parameter named foobär
(note the umlaut). How this parameter appears in the URL depends on the server's URL encoding:
- In ISO-8859-1,
ä
is encoded as0xE4
, so we'd expectfoob%E4r
. - In UTF-16,
ä
is encoded as0x00E4
, so we'd expectfoob%00%E4r
. - In UTF-8,
ä
is encoded as0xC3A4
, so we'd expectfoob%C3%A4r
.
So in the remainder of this article, when we refer URL encoding, we don't mean %xx
encoding, but the encoding by which the URL-decoded bytes are interpreted.
Since there are several different participants in the communication handled by the library (see Page vs. Client vs. Server), there also also (potentially) different encodings envolved - after all each participant could be configured differently. There are five primary places for strings in the communication, which could all be encoded differently:
- the rendered page
- URLs to the Client
- URLs to the Server
- raw data (JSON) returned by the Server
- strings stored in
\FACTFinder\Data
objects within the library
Luckily, the FACT-Finder server consistently uses UTF-8. We decided to keep all data within the library stored in UTF-8 as well (because UTF-8 everywhere!). However, we have no control over the encoding you render your shop pages in or the encoding you have configured for the URLs to your server. These can be configured in the library's XML configuration:
<encoding>
<pageContent>UTF-8</pageContent>
<clientUrl>ISO-8859-1</clientUrl>
</encoding>
What will the library do with these configuration values? All URLs intended for the Client (e.g. the URL stored in an Item
object) will be encoded with the correct encoding before applying (%xx
) URL encoding. Note that the fully URL encoded strings are then still in UTF-8 (which means, the resulting %
will be represented by the single byte 0x25
). The library itself never apply the pageContent
encoding. However, you can get your hands on an instance of an EncodingConverter
, which provides you with the method encodeContentForPage
. You can use this method to prepare all strings from the library when rendering your shop page.