-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal Error: invalid UTF-8 byte sequence found during decoding - on ü #1047
Comments
|
see stackoverflow |
Can someone give me an executive summary of this, so I don't have to do too much research? Is Google serving up invalid UTF-8 (hence Google's problem, IMO) or is it valid (and thus our problem)?
|
There is another SO page here: https://stackoverflow.com/questions/47108274/read-https-google-com-doesnt-work-anymore-in-red I have found both of them previously by a chance when looking for this error. Both claim it is a problem with Google's UTF-8 encoding. I don't know enough about UTF-8 to check myself. But if it would be a problem on Google's side why there are no complaints from people using python, etc. Seems only Rebol/R3-Renc/RED have this problem. But the fix works, so I didn't investigate further.. ¯_(ツ)_/¯ |
Btw I do get |
Ok, I have found the problem:
...snip...
There is this SO commentary regarding the
So Google is serving ISO-8859-1 even though the HTML says it is UTF-8.. |
Well, good to know. :-/ Thanks for digging into it. I've said that there needs to be a clear organization of the meaning of things like READ vs. LOAD, and how it all works. This is yet-another-piece-of-evidence that READ needs to stay in the world of bytes. LOAD then needs to be able to automatically sense content types and give you what you want, or give you an error if you do not have a codec for it. Going to have to put some thought into this; one piece of good news is that by being in the browser, we can experiment through the lens of something where all the network basics are taken care for us. Then that design could be reused on the desktop based on the information. |
As an aside @IngoHohmann - the nature of text and binary is now such that they can be aliased between each other with AS. This does not make a copy, while TO does. So above, you are copying a chunk out of a binary, then making another copy in order to do the TO. You could build a single disconnected copy from the binary with as text! copy/part x 1. After AS is used to alias a BINARY! as a TEXT!, however, that binary is constrained to where all modifications must keep it as valid UTF-8. In this case that's obviously not a problem for you, since you didn't store the copy anywhere else and hence can't access it as a binary (unless you alias it back). But clearly, aliasing it back will still have had it aliased as TEXT!, so that binary would also have the constraint. |
If opened in the Firefox view source window the text is: Glück
The text was updated successfully, but these errors were encountered: