-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vobject does not handle UTF-8 file with BOM #175
Comments
Ok, so we could just strip the BOM? |
I don't think using the BOM these days is appropriate. Could you share the name of the application that produced this, and did you also submit a bug report there? |
The UTF-8 specification RFC 3629 says
It is worth noting that it also says
While this would apply to vCards as they are required in RFC 6350 to always be UTF-8, there is no mention of the BOM to be allowed or forbidden. |
@evert But please read http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8 for yourself. I retraced my steps to find the guilty application, and as it turns out, it was not the android application which added the BOM. My Contacts Backup' by OBSS Mobile appears to produce a plain, regular UTF-8 file, with no BOM. But by editing the resulting file in LibreOffice (4.3.1.2) and saving it as plain text, it has gained a BOM. @staabm |
@evert: +1, BOM is a design error… |
@Hywan: You may be right. However, there is also: http://en.wikipedia.org/wiki/Robustness_principle |
Since the vCard standard does not explicitly forbid it, the BOM it should be ignored as per the UTF-8 RFC. |
I did a bit of searching, and this is the only 'official' word I saw around the BOM in vCard 4: http://www.ietf.org/mail-archive/web/ietf-types/current/msg00958.html
RFC 3629 also states:
To be fair though, while vCard 4 specifically mentions that only UTF-8 is acceptable, it does not explicitly state that the BOM is forbidden. I'm a little bit on the fence with this one. I don't think BOM's are widely supported as they were once intended to be, and I don't think it's a common expectation anymore to have to deal with it. Usually I just take the pragmatic route and simply do my best to support the formats that appear in the wild (within reason), but in this case we just have one person who manually opened a vcard in libreoffice, which is kind of the epitome of 'edge case'. So I'm leaning towards not providing support for this, for two reasons:
As an aside, I am not a proponent of the robustness principle. I think, from a server perspective it's good to be strict and throw hard failures when clients mess up. If we don't, a client developer may have the assumption that what they're doing is valid, and assume other servers will also consider it valid. I think being strict in what you accept ultimately creates a more interoperable world. |
I honestly did not think about the validate and store file case, only validate and "import records into database". In the validate and store file case, I think I must agree with not adhering to the robustness principle. I would still appreciate it if you would recognize the BOM and complain about that specificly, rather than just rejecting the file and leaving the end-user in the dark. |
Rethinking this. Again. But it is not for me to decide anyway. Good luck! |
Apparently, notepad.exe also adds a BOM when saving a utf-8 file. |
I believe @afflux' quote from RFC 3629 applies here: It is therefore RECOMMENDED to avoid stripping an initial As far as we know, no vcard-accepting clients require a BOM to be present, and we know that some clients cannot accept a vcard with an initial BOM. (https://code.google.com/p/android/issues/detail?id=10107) We also know that two widely used editors add a BOM when producing UTF-8 files. I'd say we are safely within the treshold for stripping the BOM, if vobject has the capability to edit files. I'll try really hard to leave this issue for now. :-) |
On the other hand, it's not a big deal to skip or drop the BOM :-/. |
I think I would like to close this issue for the moment. If this issue arises again because more people are running into this, I would certainly reconsider. In the meantime, you can indeed pretty simply skip the BOM. The vobject parsers accept streams for input, and you can just use fgets to skip the stream two bytes ahead. |
googling 'utf-8 bom vcard' or 'utf-8 bom vcf' certainly gives an impression that this is an issue. Worse is that Joe Sixpack most likely will have no clue whatsoever why it fails and leave no clues on the Internet that this was an issue. He'll either just give up or more likely fiddle with various tools until it 'magically' just works. But it is your call. I'll try to convince the owncloud guys to look for and strip the BOM. |
I could be persuaded otherwise if there was a fully functional and unit-tested patch in place =) |
@evert so basically, this issue could be reopened again? |
@evert Done ;-). |
I have a UTF-8 vcf-file which vobject appears unable to handle. vobject will flat out not recognize the file. The problem appears to be the Initial Byte Order Mark. http://en.wikipedia.org/wiki/Byte_order_mark.
Removing the BOM enables vobject to parse the file.
Seeing how I found an android utility which created this file, this is a real-world example.
I have not found anything expressively prohibiting the use of a BOM in rfc 6350, but I'll admit to neither being a skilled reader of RFCs, nor a programmer. In any case, it does not appear to be a very hard thing to fix?
The text was updated successfully, but these errors were encountered: