JSON: unsigned long are base64 encoded #72

ahamez · 2014-03-09T21:09:58Z

When serializing an unsigned long to JSON, cereal outputs some non-human readable data which seems to be in base64. Indeed, looking at the source code (https://github.com/USCiLab/cereal/blob/master/include/cereal/archives/json.hpp#L110), any type with a size >= sizeof(long long) is considered 'exotic' and thus base64-encoded.
Furthermore, the XML output is perfectly fine (by that, I mean readable).

I fail to see why longs are base64 encoded. Is this a desired thing? Also, it means that when compiling the code in 32bits or 64 bits, we have a different output. Finally, it makes unnecessarily difficult to read the JSON output in another language (Python in my case)

AzothAmmo · 2014-03-10T02:02:58Z

This isn't intended to catch anything that isn't as big as a long long (at least 64 bits) or larger that isn't caught by one of the other overloads. Are you saying that it is catching a long for you? What compiler/OS are you running if that is the case?

We can change this so that it serializes the same way as the XML archive, that is into a string based representation of the number instead of the base64 encoded version. rapidjson doesn't provide a native way to serialize numeric types larger than 64 bits, which is why we went with this originally. It might make more sense for us to switch since these are text based archives and supposed to be human readable. Also looking up the specification for long double I just realized that there's really no standard for it, so using a binary serialization would not be portable.

ahamez · 2014-03-10T05:33:03Z

Hello, (and sorry for the triple-post…)

The result is the same using clang (Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)) and gcc (gcc (MacPorts gcc48 4.8.2_0+universal) 4.8.2), on Mac OS X 10.9.2, using the following code.

#include <iostream>
#include <cereal/archives/json.hpp>

struct foo
{
  long x;

  template<typename Archive>
  void serialize(Archive& archive)
  {
    archive(x);
  }
};

int main()
{
  cereal::JSONOutputArchive archive(std::cout);
  archive(foo());
  return 0;
}

Result:

{
    "value0": {
        "value0": "AAAAAAAAAAA="
    }
}

Furthermore, when compiling with -m32, the compiler is unable to find the good overload, as it is ambiguous.

AzothAmmo · 2014-03-10T06:36:19Z

You are right about the overloads on 32 bit compilation. I haven't been able to reproduce the base64 encoding for a long on 64 bit Ubuntu using g++ 4.8, 4.7, or clang 3.3, but I'll keep looking into that.

long should definitely be being serialized as a base 10 number. What are your thoughts on the format for long long and long double? Do you have a preference over base 10 or base 64 for those? Also reading base64 in python should be quite easy (http://docs.python.org/2/library/base64.html).

DrAWolf · 2014-03-10T07:47:04Z

3 ways seem possible/sensible:

All numbers are always saved as base10
Set encoding per file
Set encoding per value
For me, 1. would be fine, if you have lots of time you could implement 2. or 3. in v2.0. The default always being 1.

ahamez · 2014-03-10T12:43:58Z

@AzothAmmo I made the test on a CentOS 6 and long types are serialized as base 10. I still have to find why the behavior is different on Mac OS X. Speaking of Python, I tried decoding base64 using the standard library, but I only obtain a string like b',\x00\x00\x00\x00\x00\x00\x00'. I still must find how to decode that.
EDIT: Here's how to decode in Python:

str = '/+MLVAIAAAA='
str2 = base64.b64decode(str);
i = int.from_bytes(str2, byteorder='little')

As you see, it's not very portable as we have to know the endianness.

@DrAWolf In the short term, I prefer the first solution, as it seems more natural when using the JSON file which is more or less 'human-readable'. But in a distant futur, it would be nice to be able to chose how to encode.

AzothAmmo · 2014-03-13T05:05:36Z

We have a commit (ba2ca7c) in that I think fixes the long being serialized incorrectly. If you can try out the develop branch, it should fix that portion of the issue. We haven't addressed the serializing long long or long double as something other than a base64 string yet. rapidjson's number parser would need to be updated to be able to handle these kinds of numbers if we want to directly write them as "numbers" in the JSON.

ahamez · 2014-03-13T06:25:12Z

It works for 32 bits, but not 64 bits. Comments at https://github.com/USCiLab/cereal/blob/develop/include/cereal/archives/json.hpp#L221 shows that it's indeed the case. Maybe there should be these overloads for 64 bits:

//! Saves a long on a 64 bit system
template <class T> inline
typename std::enable_if<std::is_same<T, long>::value && sizeof(T) == 8, void>::type
saveValue(T t){ saveValue( static_cast<std::int64_t>( t ) ); }
//! Saves an unsigned long on a 64 bit system
template <class T> inline
typename std::enable_if<std::is_same<T, unsigned long>::value && sizeof(T) == 8, void>::type
saveValue(T t){ saveValue( static_cast<std::uint64_t>( t ) ); }

The problem is that it causes an ambiguous call with the following overload ( https://github.com/USCiLab/cereal/blob/develop/include/cereal/archives/json.hpp#L231 ):

template<class T> inline
typename std::enable_if<std::is_arithmetic<T>::value &&
                          (sizeof(T) >= sizeof(long double) || sizeof(T) >= sizeof(long long)), void>::type
saveValue(T const & t)
{
    auto base64string = base64::encode( reinterpret_cast<const unsigned char *>( &t ), sizeof(T) );
    saveValue( base64string );
}

Modifying

(sizeof(T) >= sizeof(long double) || sizeof(T) >= sizeof(long long)), void>::type

to

(sizeof(T) > sizeof(long double) || sizeof(T) > sizeof(long long)), void>::type

removed the ambiguity, but I'm not sure if it's a good idea or not.

AzothAmmo · 2014-03-14T18:55:14Z

Can you post the output of this on your various OS/compiler mixes?

#include <iostream>
#include <type_traits>

int main()
{
  std::cout << std::boolalpha;

  std::cout << "int " << sizeof(int) << std::endl;
  std::cout << "uint " << sizeof(unsigned int) << std::endl;
  std::cout << "long " << sizeof(long) << std::endl;
  std::cout << "unsigned long " << sizeof(unsigned long) << std::endl;
  std::cout << "long long " << sizeof(long long) << std::endl;
  std::cout << "unsigned long long " << sizeof(unsigned long long) << std::endl;

  std::cout << "long int32 " << std::is_same<long, int32_t>::value << std::endl;
  std::cout << "long int64 " << std::is_same<long, int64_t>::value << std::endl;

  std::cout << "ulong uint32 " << std::is_same<unsigned long, uint32_t>::value << std::endl;
  std::cout << "ulong uint64 " << std::is_same<unsigned long, uint64_t>::value << std::endl;

  std::cout << "long long int32 " << std::is_same<long long, int32_t>::value << std::endl;
  std::cout << "long long int64 " << std::is_same<long long, int64_t>::value << std::endl;

  std::cout << "ulong long uint32 " << std::is_same<unsigned long long, uint32_t>::value << std::endl;
  std::cout << "ulong long uint64 " << std::is_same<unsigned long long, uint64_t>::value << std::endl;
}

ahamez · 2014-03-14T19:21:31Z

I used clang on OS X (results are the same with gcc) and gcc on CentOS. I only post results for 64 bits, as there are no differences between Linux and OS X in 32 bits.
Clang version on OS X:

$ clang --version
Apple LLVM version 5.1 (clang-503.0.38) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.1.0
Thread model: posix

GCC version on CentOS:

$ ~/usr/gcc48/bin/g++ --version
g++ (GCC) 4.8.1

clang/OS X:

int 4
uint 4
long 8
unsigned long 8
long long 8
unsigned long long 8
long int32 false
long int64 false
ulong uint32 false
ulong uint64 false
long long int32 false
long long int64 true
ulong long uint32 false
ulong long uint64 true

gcc/CentOS:

int 4
uint 4
long 8
unsigned long 8
long long 8
unsigned long long 8
long int32 false
long int64 true
ulong uint32 false
ulong uint64 true
long long int32 false
long long int64 false
ulong long uint32 false
ulong long uint64 false

longs should now properly serialize under 32 or 64 bit machines. long long, unsigned long long, and long double now serialize as base10 strings instead of base64. see issue #72

AzothAmmo · 2014-03-15T06:49:44Z

Ready to close this if the issue is gone for you. I think what I've committed should cover all the cases, but I don't have a Mac to test on. Now using base10 instead of base64.

ahamez · 2014-03-18T07:08:31Z

Sorry for the delay. Alas, there's still an ambiguous call (the error is similar with GCC 4.8):

$ clang++ -m64 -std=c++11 -I /Users/hal/Desktop/cereal-develop/include foo.cc 
In file included from foo.cc:2:
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:807:8: error: call to member function
      'saveValue' is ambiguous
    ar.saveValue( t );
    ~~~^~~~~~~~~
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:397:9: note: in instantiation of function template
      specialization 'cereal::save<long>' requested here
        save(*self, t);
        ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:322:15: note: in instantiation of function template
      specialization 'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::processImpl<long>' requested here
        self->processImpl( head );
              ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:238:15: note: in instantiation of function template
      specialization 'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::process<long &>' requested here
        self->process( std::forward<Types>( args )... );
              ^
foo.cc:11:5: note: in instantiation of function template specialization
      'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::operator()<long &>' requested here
    archive(x);
    ^
/Users/hal/Desktop/cereal-develop/include/cereal/access.hpp:228:11: note: in instantiation of function template
      specialization 'foo::serialize<cereal::JSONOutputArchive>' requested here
      { t.serialize(ar); }
          ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:364:17: note: in instantiation of function template
      specialization 'cereal::access::member_serialize<cereal::JSONOutputArchive, foo>' requested here
        access::member_serialize(*self, const_cast<T &>(t));
                ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:322:15: note: in instantiation of function template
      specialization 'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::processImpl<foo>' requested here
        self->processImpl( head );
              ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:238:15: note: in instantiation of function template
      specialization 'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::process<foo>' requested here
        self->process( std::forward<Types>( args )... );
              ^
foo.cc:18:10: note: in instantiation of function template specialization
      'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::operator()<foo>' requested here
  archive(foo());
         ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:272:7: note: candidate function
      [with T = long]
      saveValue( T t ){ saveLong( t ); }
      ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:287:7: note: candidate function
      [with T = long]
      saveValue(T const & t)
      ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:222:12: note: candidate function
      void saveValue(bool b)                { itsWriter.Bool_(b)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:224:12: note: candidate function
      void saveValue(int i)                 { itsWriter.Int(i)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:226:12: note: candidate function
      void saveValue(unsigned u)            { itsWriter.Uint(u)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:228:12: note: candidate function
      void saveValue(int64_t i64)           { itsWriter.Int64(i64)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:230:12: note: candidate function
      void saveValue(uint64_t u64)          { itsWriter.Uint64(u64)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:232:12: note: candidate function
      void saveValue(double d)              { itsWriter.Double(d)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:234:12: note: candidate function not viable:
      no known conversion from 'const long' to 'const std::string' (aka 'const basic_string<char,
      char_traits<char>, allocator<char> >') for 1st argument
      void saveValue(std::string const & s) { itsWriter.String(s.c_str(), static_cast<rapidjson::SizeType...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:236:12: note: candidate function not viable:
      no known conversion from 'const long' to 'const char *' for 1st argument
      void saveValue(char const * s)        { itsWriter.String(s)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:276:31: note: candidate template ignored:
      disabled by 'enable_if' [with T = long]
      typename std::enable_if<std::is_same<T, unsigned long>::value &&
                              ^
1 error generated.

AzothAmmo · 2014-03-18T19:38:10Z

Let me know how this one does.

ahamez · 2014-03-19T20:57:54Z

Everything works fine. Thank you!

This was referenced Mar 9, 2014

JSON and base64 encoded unsigned long #70

Closed

JSON: unsigned long are base64 encode #71

Closed

AzothAmmo added the discussion label Mar 10, 2014

AzothAmmo added the bug label Mar 10, 2014

AzothAmmo added this to the v1.0.0 milestone Mar 10, 2014

AzothAmmo added a commit that referenced this issue Mar 15, 2014

note on number format relates #72

d06542a

AzothAmmo added a commit that referenced this issue Mar 18, 2014

Another attempt at #72

b915326

AzothAmmo closed this as completed Mar 19, 2014

brdbrd mentioned this issue Apr 14, 2014

Problems with Json archive, clang/OSX and "long" datatype #94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON: unsigned long are base64 encoded #72

JSON: unsigned long are base64 encoded #72

ahamez commented Mar 9, 2014

AzothAmmo commented Mar 10, 2014

ahamez commented Mar 10, 2014

AzothAmmo commented Mar 10, 2014

DrAWolf commented Mar 10, 2014

ahamez commented Mar 10, 2014

AzothAmmo commented Mar 13, 2014

ahamez commented Mar 13, 2014

AzothAmmo commented Mar 14, 2014

ahamez commented Mar 14, 2014

AzothAmmo commented Mar 15, 2014

ahamez commented Mar 18, 2014

AzothAmmo commented Mar 18, 2014

ahamez commented Mar 19, 2014