Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON: unsigned long are base64 encoded #72

Closed
ahamez opened this issue Mar 9, 2014 · 13 comments
Closed

JSON: unsigned long are base64 encoded #72

ahamez opened this issue Mar 9, 2014 · 13 comments

Comments

@ahamez
Copy link

ahamez commented Mar 9, 2014

When serializing an unsigned long to JSON, cereal outputs some non-human readable data which seems to be in base64. Indeed, looking at the source code (https://github.com/USCiLab/cereal/blob/master/include/cereal/archives/json.hpp#L110), any type with a size >= sizeof(long long) is considered 'exotic' and thus base64-encoded.
Furthermore, the XML output is perfectly fine (by that, I mean readable).

I fail to see why longs are base64 encoded. Is this a desired thing? Also, it means that when compiling the code in 32bits or 64 bits, we have a different output. Finally, it makes unnecessarily difficult to read the JSON output in another language (Python in my case)

@AzothAmmo
Copy link
Contributor

This isn't intended to catch anything that isn't as big as a long long (at least 64 bits) or larger that isn't caught by one of the other overloads. Are you saying that it is catching a long for you? What compiler/OS are you running if that is the case?

We can change this so that it serializes the same way as the XML archive, that is into a string based representation of the number instead of the base64 encoded version. rapidjson doesn't provide a native way to serialize numeric types larger than 64 bits, which is why we went with this originally. It might make more sense for us to switch since these are text based archives and supposed to be human readable. Also looking up the specification for long double I just realized that there's really no standard for it, so using a binary serialization would not be portable.

@ahamez
Copy link
Author

ahamez commented Mar 10, 2014

Hello, (and sorry for the triple-post…)

The result is the same using clang (Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)) and gcc (gcc (MacPorts gcc48 4.8.2_0+universal) 4.8.2), on Mac OS X 10.9.2, using the following code.

#include <iostream>
#include <cereal/archives/json.hpp>

struct foo
{
  long x;

  template<typename Archive>
  void serialize(Archive& archive)
  {
    archive(x);
  }
};

int main()
{
  cereal::JSONOutputArchive archive(std::cout);
  archive(foo());
  return 0;
}

Result:

{
    "value0": {
        "value0": "AAAAAAAAAAA="
    }
}

Furthermore, when compiling with -m32, the compiler is unable to find the good overload, as it is ambiguous.

@AzothAmmo
Copy link
Contributor

You are right about the overloads on 32 bit compilation. I haven't been able to reproduce the base64 encoding for a long on 64 bit Ubuntu using g++ 4.8, 4.7, or clang 3.3, but I'll keep looking into that.

long should definitely be being serialized as a base 10 number. What are your thoughts on the format for long long and long double? Do you have a preference over base 10 or base 64 for those? Also reading base64 in python should be quite easy (http://docs.python.org/2/library/base64.html).

@AzothAmmo AzothAmmo added the bug label Mar 10, 2014
@AzothAmmo AzothAmmo added this to the v1.0.0 milestone Mar 10, 2014
@DrAWolf
Copy link

DrAWolf commented Mar 10, 2014

3 ways seem possible/sensible:

  1. All numbers are always saved as base10
  2. Set encoding per file
  3. Set encoding per value
    For me, 1. would be fine, if you have lots of time you could implement 2. or 3. in v2.0. The default always being 1.

@ahamez
Copy link
Author

ahamez commented Mar 10, 2014

@AzothAmmo I made the test on a CentOS 6 and long types are serialized as base 10. I still have to find why the behavior is different on Mac OS X. Speaking of Python, I tried decoding base64 using the standard library, but I only obtain a string like b',\x00\x00\x00\x00\x00\x00\x00'. I still must find how to decode that.
EDIT: Here's how to decode in Python:

str = '/+MLVAIAAAA='
str2 = base64.b64decode(str);
i = int.from_bytes(str2, byteorder='little')

As you see, it's not very portable as we have to know the endianness.

@DrAWolf In the short term, I prefer the first solution, as it seems more natural when using the JSON file which is more or less 'human-readable'. But in a distant futur, it would be nice to be able to chose how to encode.

@AzothAmmo
Copy link
Contributor

We have a commit (ba2ca7c) in that I think fixes the long being serialized incorrectly. If you can try out the develop branch, it should fix that portion of the issue. We haven't addressed the serializing long long or long double as something other than a base64 string yet. rapidjson's number parser would need to be updated to be able to handle these kinds of numbers if we want to directly write them as "numbers" in the JSON.

@ahamez
Copy link
Author

ahamez commented Mar 13, 2014

It works for 32 bits, but not 64 bits. Comments at https://github.com/USCiLab/cereal/blob/develop/include/cereal/archives/json.hpp#L221 shows that it's indeed the case. Maybe there should be these overloads for 64 bits:

//! Saves a long on a 64 bit system
template <class T> inline
typename std::enable_if<std::is_same<T, long>::value && sizeof(T) == 8, void>::type
saveValue(T t){ saveValue( static_cast<std::int64_t>( t ) ); }
//! Saves an unsigned long on a 64 bit system
template <class T> inline
typename std::enable_if<std::is_same<T, unsigned long>::value && sizeof(T) == 8, void>::type
saveValue(T t){ saveValue( static_cast<std::uint64_t>( t ) ); }

The problem is that it causes an ambiguous call with the following overload ( https://github.com/USCiLab/cereal/blob/develop/include/cereal/archives/json.hpp#L231 ):

template<class T> inline
typename std::enable_if<std::is_arithmetic<T>::value &&
                          (sizeof(T) >= sizeof(long double) || sizeof(T) >= sizeof(long long)), void>::type
saveValue(T const & t)
{
    auto base64string = base64::encode( reinterpret_cast<const unsigned char *>( &t ), sizeof(T) );
    saveValue( base64string );
}

Modifying

(sizeof(T) >= sizeof(long double) || sizeof(T) >= sizeof(long long)), void>::type

to

(sizeof(T) > sizeof(long double) || sizeof(T) > sizeof(long long)), void>::type

removed the ambiguity, but I'm not sure if it's a good idea or not.

@AzothAmmo
Copy link
Contributor

Can you post the output of this on your various OS/compiler mixes?

#include <iostream>
#include <type_traits>

int main()
{
  std::cout << std::boolalpha;

  std::cout << "int " << sizeof(int) << std::endl;
  std::cout << "uint " << sizeof(unsigned int) << std::endl;
  std::cout << "long " << sizeof(long) << std::endl;
  std::cout << "unsigned long " << sizeof(unsigned long) << std::endl;
  std::cout << "long long " << sizeof(long long) << std::endl;
  std::cout << "unsigned long long " << sizeof(unsigned long long) << std::endl;

  std::cout << "long int32 " << std::is_same<long, int32_t>::value << std::endl;
  std::cout << "long int64 " << std::is_same<long, int64_t>::value << std::endl;

  std::cout << "ulong uint32 " << std::is_same<unsigned long, uint32_t>::value << std::endl;
  std::cout << "ulong uint64 " << std::is_same<unsigned long, uint64_t>::value << std::endl;

  std::cout << "long long int32 " << std::is_same<long long, int32_t>::value << std::endl;
  std::cout << "long long int64 " << std::is_same<long long, int64_t>::value << std::endl;

  std::cout << "ulong long uint32 " << std::is_same<unsigned long long, uint32_t>::value << std::endl;
  std::cout << "ulong long uint64 " << std::is_same<unsigned long long, uint64_t>::value << std::endl;
}

@ahamez
Copy link
Author

ahamez commented Mar 14, 2014

I used clang on OS X (results are the same with gcc) and gcc on CentOS. I only post results for 64 bits, as there are no differences between Linux and OS X in 32 bits.
Clang version on OS X:

$ clang --version
Apple LLVM version 5.1 (clang-503.0.38) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.1.0
Thread model: posix

GCC version on CentOS:

$ ~/usr/gcc48/bin/g++ --version
g++ (GCC) 4.8.1

clang/OS X:

int 4
uint 4
long 8
unsigned long 8
long long 8
unsigned long long 8
long int32 false
long int64 false
ulong uint32 false
ulong uint64 false
long long int32 false
long long int64 true
ulong long uint32 false
ulong long uint64 true

gcc/CentOS:

int 4
uint 4
long 8
unsigned long 8
long long 8
unsigned long long 8
long int32 false
long int64 true
ulong uint32 false
ulong uint64 true
long long int32 false
long long int64 false
ulong long uint32 false
ulong long uint64 false

AzothAmmo added a commit that referenced this issue Mar 15, 2014
longs should now properly serialize under 32 or 64 bit machines.

long long, unsigned long long, and long double now serialize as base10
strings instead of base64.

see issue #72
AzothAmmo added a commit that referenced this issue Mar 15, 2014
@AzothAmmo
Copy link
Contributor

Ready to close this if the issue is gone for you. I think what I've committed should cover all the cases, but I don't have a Mac to test on. Now using base10 instead of base64.

@ahamez
Copy link
Author

ahamez commented Mar 18, 2014

Sorry for the delay. Alas, there's still an ambiguous call (the error is similar with GCC 4.8):

$ clang++ -m64 -std=c++11 -I /Users/hal/Desktop/cereal-develop/include foo.cc 
In file included from foo.cc:2:
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:807:8: error: call to member function
      'saveValue' is ambiguous
    ar.saveValue( t );
    ~~~^~~~~~~~~
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:397:9: note: in instantiation of function template
      specialization 'cereal::save<long>' requested here
        save(*self, t);
        ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:322:15: note: in instantiation of function template
      specialization 'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::processImpl<long>' requested here
        self->processImpl( head );
              ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:238:15: note: in instantiation of function template
      specialization 'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::process<long &>' requested here
        self->process( std::forward<Types>( args )... );
              ^
foo.cc:11:5: note: in instantiation of function template specialization
      'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::operator()<long &>' requested here
    archive(x);
    ^
/Users/hal/Desktop/cereal-develop/include/cereal/access.hpp:228:11: note: in instantiation of function template
      specialization 'foo::serialize<cereal::JSONOutputArchive>' requested here
      { t.serialize(ar); }
          ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:364:17: note: in instantiation of function template
      specialization 'cereal::access::member_serialize<cereal::JSONOutputArchive, foo>' requested here
        access::member_serialize(*self, const_cast<T &>(t));
                ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:322:15: note: in instantiation of function template
      specialization 'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::processImpl<foo>' requested here
        self->processImpl( head );
              ^
/Users/hal/Desktop/cereal-develop/include/cereal/cereal.hpp:238:15: note: in instantiation of function template
      specialization 'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::process<foo>' requested here
        self->process( std::forward<Types>( args )... );
              ^
foo.cc:18:10: note: in instantiation of function template specialization
      'cereal::OutputArchive<cereal::JSONOutputArchive, 0>::operator()<foo>' requested here
  archive(foo());
         ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:272:7: note: candidate function
      [with T = long]
      saveValue( T t ){ saveLong( t ); }
      ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:287:7: note: candidate function
      [with T = long]
      saveValue(T const & t)
      ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:222:12: note: candidate function
      void saveValue(bool b)                { itsWriter.Bool_(b)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:224:12: note: candidate function
      void saveValue(int i)                 { itsWriter.Int(i)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:226:12: note: candidate function
      void saveValue(unsigned u)            { itsWriter.Uint(u)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:228:12: note: candidate function
      void saveValue(int64_t i64)           { itsWriter.Int64(i64)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:230:12: note: candidate function
      void saveValue(uint64_t u64)          { itsWriter.Uint64(u64)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:232:12: note: candidate function
      void saveValue(double d)              { itsWriter.Double(d)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:234:12: note: candidate function not viable:
      no known conversion from 'const long' to 'const std::string' (aka 'const basic_string<char,
      char_traits<char>, allocator<char> >') for 1st argument
      void saveValue(std::string const & s) { itsWriter.String(s.c_str(), static_cast<rapidjson::SizeType...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:236:12: note: candidate function not viable:
      no known conversion from 'const long' to 'const char *' for 1st argument
      void saveValue(char const * s)        { itsWriter.String(s)...
           ^
/Users/hal/Desktop/cereal-develop/include/cereal/archives/json.hpp:276:31: note: candidate template ignored:
      disabled by 'enable_if' [with T = long]
      typename std::enable_if<std::is_same<T, unsigned long>::value &&
                              ^
1 error generated.

AzothAmmo added a commit that referenced this issue Mar 18, 2014
@AzothAmmo
Copy link
Contributor

Let me know how this one does.

@ahamez
Copy link
Author

ahamez commented Mar 19, 2014

Everything works fine. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants