Dump JSON containing multibyte characters #1999

elh-zbeloki · 2020-03-19T14:20:11Z

I want to be able to dump JSON containing multibyte characters.

My std::string contains 3 characters (or glyphs) in UTF-8: Año.

In UTF-8, 'ñ' is represented as two bytes: \xC3 \xB1.

So, my string contains 4 bytes: [\x41, \xC3, \xB1, \x6F]

The problem appears when I get the dump of the JSON containing the mentioned string. It seems that the library dumps each byte as a character in Latin1: "{"text": "AÃ±o"}"

I'm expecting the following string in the dump: "{"text": "A\u00f1o"}"

But I read in other issues of the library that it assumes all strings are UTF-8. Shouldn't this mean that it must be able to know that the "\xC3 \xB1" bytes represent a single character? Am I missing something here?

I always code in Linux + Emacs + GCC, but in this project I need to use Visual Studio 2017.

elh-zbeloki · 2020-03-19T15:23:02Z

I solved the problem.

I discovered the ensure_ascii option in dump(). If I call dump with ensure_ascii=true, I get the desired output: "{"text": "A\u00f1o"}"

Thank you.

elh-zbeloki added the kind: question label Mar 19, 2020

elh-zbeloki closed this as completed Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dump JSON containing multibyte characters #1999

Dump JSON containing multibyte characters #1999

elh-zbeloki commented Mar 19, 2020 •

edited

Loading

elh-zbeloki commented Mar 19, 2020

Dump JSON containing multibyte characters #1999

Dump JSON containing multibyte characters #1999

Comments

elh-zbeloki commented Mar 19, 2020 • edited Loading

elh-zbeloki commented Mar 19, 2020

elh-zbeloki commented Mar 19, 2020 •

edited

Loading