UTF-8 and new Documents via Client API #18

jackeywall · 2015-05-08T13:32:01Z

I'm using the Java client API 1.0 in my project to input/post documents to DocumentDb. Many of the DocumentDb JSON documents include text in languages with diacritic marks.

The POJO to JSON conversion is performed via FasterXML Jackson - code has been in production for some time (18 months) with JSON being written to Mongo, ElasticSearch, and Postgres (JSON and now JSONB). The UTF-8 encoding of the JSON to those data stores has not been an issue. However, the same JSON written to DocumentDB will lose its encoding; e.g.,

ou vítimas de maus tratos da região

becomes:

becomes: ou v�timas de maus tratos da regi�o

Inputting (updating) the document from the Azure Portal with the encoded text works, so it appears to be an issue with the client code. I'm checking the byte[] for UTF-8 encoding and no issues prior to inputting JSON to the client API for document creation - again the same JSON goes into the other data stores with no issues.

As seen here, dependencies are managed via Ivy - not relevant to the issue - just not keene on directly managing the Java DocumentDb source directly, much prefer just referencing the lib.

I welcome ideas and suggestions - thank you in advance ...

Jack

shipunyc · 2015-05-12T20:07:31Z

Fixed. I will push a new release today.

shipunyc closed this as completed May 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 and new Documents via Client API #18

UTF-8 and new Documents via Client API #18

jackeywall commented May 8, 2015

shipunyc commented May 12, 2015

UTF-8 and new Documents via Client API #18

UTF-8 and new Documents via Client API #18

Comments

jackeywall commented May 8, 2015

shipunyc commented May 12, 2015