Skip to content

The definition of SciSerialize, a serialization format based on JSON and MessagePack.

Notifications You must be signed in to change notification settings

SciSerialize/Definition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Definition of SciSerialize

SciSerialize is a serialization format focusing on scientific data. It is based on the JSON format and adds some kind of a type-coding-system. All supported datatypes are converted to valid JSON types and can be saved to JSON and MesssagePack data-formats. The goal of this serialization format is to be cross-platform and cross-language. This definition will show the output representations of all required data types to be supported by a serializer implemented in any language. To every implementation it is free to decide how the representation inside the specific language looks like, but it is recommendet to keep it simple and similar to the reference implementation. There are no restrictions to extend the serializer with language specific types and representations, however they should be switched off by default so there will be no conflicts between cross language serializers by using defaults. A reference implementation of SciSerialize is given in python.

##Formats

All data types supported by ScySerialize will be reduced to the data types supported by JSON:

Type Example Komment
Number 123.987 double-precision floating-point
String "Hello" dubblequoted UTF-8, escaped with backslashes
Boolean true/false
Array [1.0, 2.0, "a", "b"] like lists in python or cell-arrays in matlab
Object {"keyString": "value", "Object containing Numbers array": [1.1, 1.2, 1.3]} java-script objects like key-value stores
Null null empty values

Binary data will be handled differently for JSON and MessagePack. In JSON, binary data will be stored as base64 strings and in MessagePack the raw binary data will be packed.

##Data Types

###Date Time Iso String Coder Serialized Representation:

{
    "__type__": "datetime",
    "isostr": "2015-02-18T21:36:32.528617",
}

Example from reference implementation:

In [1]: import sciserialize
In [2]: import datetime
In [3]: s = sciserialize.dumps(datetime.datetime.now())
In [4]: s
Out[4]: '{"isostr": "2015-02-18T21:40:23.511717", "__type__": "datetime"}'
In [5]: t = sciserialize.loads(s)
In [6]: t
Out[6]: datetime.datetime(2015, 2, 18, 21, 40, 23, 511717)

###Time Delta Coder Serialized Representation:

JSON:
{
    "__type__": "timedelta",
    "days": 0,
    "seconds": 11,
    "microsec": 626512,
}

MSGPACK
b'\x84\xa8microsec\xce\x00\t\x8fP\xa7seconds\x0b\xa8__type__\xa9timedelta\xa4days\x00'

Example from reference implementation:

In [11]: t1 = datetime.datetime.now()
In [12]: t2 = datetime.datetime.now()
In [13]: td = t2 - t1
In [14]: td
Out[14]: datetime.timedelta(0, 11, 626512)
In [15]: s = sciserialize.dumps(td)
In [16]: s
Out[16]: '{"microsec": 626512, "seconds": 11, "__type__": "timedelta", "days": 0}'
In [17]: decoded = sciserialize.loads(s)
In [18]: decoded
Out[18]: datetime.timedelta(0, 11, 626512)
In [19]: b = sciserialize.packb(td)
In [20]: b
Out[20]: b'\x84\xa8microsec\xce\x00\t\x8fP\xa7seconds\x0b\xa8__type__\xa9timedelta\xa4days\x00'
In [22]: sys.getsizeof(b)
Out[22]: 82
In [23]: sys.getsizeof(s)
Out[23]: 120

###N dimensional Array Coder

In [25]: x = randn(3, 4, 5)
In [26]: x
Out[26]:
array([[[ 1.05344858,  0.0291041 ,  0.38289843, -0.98107994,  0.3139919 ],
        [ 0.73767174,  2.12341246, -2.04463471,  0.32919843,  0.67947691],
        [-1.96649368, -1.81407016, -0.95784464, -1.68542805,  0.87043148],
        [ 0.71586721, -1.56975362, -1.74745641,  1.05711026,  1.14616359]],

       [[-1.30953889, -1.42069983,  0.24024504,  0.67984107, -0.11183973],
        [-0.83789866, -0.04427722,  0.18577593, -0.12175868,  0.34695681],
        [ 0.93977527, -1.07114551,  0.06305212, -0.82623412, -2.11478308],
        [-0.08274785,  0.18192578, -0.14476797, -1.24072587,  1.03656297]],

       [[ 1.03339121,  0.14559811, -0.69360654,  1.17136044,  1.24462986],
        [ 0.82171074,  1.1502607 , -0.69005395, -0.81639491,  0.82231422],
        [-1.53028073,  1.04233296, -0.62064313,  2.30226706,  1.0182505 ],
        [-0.04815908,  0.55817617,  0.07667377, -0.82511246,  0.44404845]]])
In [27]: x.shape
Out[27]: (3, 4, 5)
In [28]: s = sciserialize.dumps(x)
In [29]: s
Out[29]: '{"shape": [3, 4, 5], "dtype": "float64", "bytes": {"__base64__": "K4Ik5eza8D93oqobd82dP4eaOm9ogdg/HsiHwAFl778bk4x2cRjUP80XhMIBm+c/hzvqq7/8AECOCQBxaVsAwBeBeE2WEdU/IdaoXUa+5T+DjjkUwnb/v8iSkm9uBv2/lKZJzqmm7r/bimhng/f6vw0bRx+T2us/jQ1cWGLo5j/yvhL4tR35vxgOZ9mU9fu/sC35c+zp8D8boDejr1byPw7c5Azf8/S/6Lk/vi+79r/X9Sd+WcDOP0W7jA1CweU/ApkfUoehvL8Gfj/cENDqv+TZvOWAq6a/sgWPbIHHxz8k7B6rkyu/vxDCJVGKNNY/Slfhl6MS7j+Kh3t5aSPxv5bfTwYvJLA/Zs6hiYJw6r8erJlkE+sAwJCckI/2LrW/geMkCFhJxz8Qp5m5wYfCvwn5pF4D2vO/A1dRDMOV8D8R84I5xYjwP8SViXH1osI/LkXgVgYy5r9E1wNy5L3yP5husAAB6vM/H7iWUnRL6j+36B/Ed2fyP7c1PAfsFOa/TiUpNugf6r8btp/rZVDqP/SpZqUHfPi/YzIcUmWt8D9bQeP9Ttzjvxnopv0KawJAZ6ZECMFK8D95WPSlTqiov5B2NUSU3OE/wTz9X+Sgsz/aEcI9Umfqv3UCWDRKa9w/"}, "__type__": "ndarray"}'
In [30]: xdec = sciserialize.loads(s)
In [31]: x == xdec
Out[31]:
array([[[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]]], dtype=bool)

And with msgpack:

In [32]: b = sciserialize.packb(x)
In [33]: b
Out[33]: b"\x84\xa5shape\x93\x03\x04\x05\xa5dtype\xa7float64\xa5bytes\xc5\x01\xe0+\x82$\xe5\xec\xda\xf0?w\xa2\xaa\x1bw\xcd\x9d?\x87\x9a:oh\x81\xd8?\x1e\xc8\x87\xc0\x01e\xef\xbf\x1b\x93\x8cvq\x18\xd4?\xcd\x17\x84\xc2\x01\x9b\xe7?\x87;\xea\xab\xbf\xfc\x00@\x8e\t\x00qi[\x00\xc0\x17\x81xM\x96\x11\xd5?!\xd6\xa8]F\xbe\xe5?\x83\x8e9\x14\xc2v\xff\xbf\xc8\x92\x92on\x06\xfd\xbf\x94\xa6I\xce\xa9\xa6\xee\xbf\xdb\x8ahg\x83\xf7\xfa\xbf\r\x1bG\x1f\x93\xda\xeb?\x8d\r\\Xb\xe8\xe6?\xf2\xbe\x12\xf8\xb5\x1d\xf9\xbf\x18\x0eg\xd9\x94\xf5\xfb\xbf\xb0-\xf9s\xec\xe9\xf0?\x1b\xa07\xa3\xafV\xf2?\x0e\xdc\xe4\x0c\xdf\xf3\xf4\xbf\xe8\xb9?\xbe/\xbb\xf6\xbf\xd7\xf5'~Y\xc0\xce?E\xbb\x8c\rB\xc1\xe5?\x02\x99\x1fR\x87\xa1\xbc\xbf\x06~?\xdc\x10\xd0\xea\xbf\xe4\xd9\xbc\xe5\x80\xab\xa6\xbf\xb2\x05\x8fl\x81\xc7\xc7?$\xec\x1e\xab\x93+\xbf\xbf\x10\xc2%Q\x8a4\xd6?JW\xe1\x97\xa3\x12\xee?\x8a\x87{yi#\xf1\xbf\x96\xdfO\x06/$\xb0?f\xce\xa1\x89\x82p\xea\xbf\x1e\xac\x99d\x13\xeb\x00\xc0\x90\x9c\x90\x8f\xf6.\xb5\xbf\x81\xe3$\x08XI\xc7?\x10\xa7\x99\xb9\xc1\x87\xc2\xbf\t\xf9\xa4^\x03\xda\xf3\xbf\x03WQ\x0c\xc3\x95\xf0?\x11\xf3\x829\xc5\x88\xf0?\xc4\x95\x89q\xf5\xa2\xc2?.E\xe0V\x062\xe6\xbfD\xd7\x03r\xe4\xbd\xf2?\x98n\xb0\x00\x01\xea\xf3?\x1f\xb8\x96RtK\xea?\xb7\xe8\x1f\xc4wg\xf2?\xb75<\x07\xec\x14\xe6\xbfN%)6\xe8\x1f\xea\xbf\x1b\xb6\x9f\xebeP\xea?\xf4\xa9f\xa5\x07|\xf8\xbfc2\x1cRe\xad\xf0?[A\xe3\xfdN\xdc\xe3\xbf\x19\xe8\xa6\xfd\nk\x02@g\xa6D\x08\xc1J\xf0?yX\xf4\xa5N\xa8\xa8\xbf\x90v5D\x94\xdc\xe1?\xc1<\xfd_\xe4\xa0\xb3?\xda\x11\xc2=Rg\xea\xbfu\x02X4Jk\xdc?\xa8__type__\xa7ndarray"
In [34]: xunp = sciserialize.unpackb(b)
In [35]: x==xunp
Out[35]:
array([[[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]],

       [[ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]]], dtype=bool)

In [37]: sys.getsizeof(b)
Out[37]: 564
In [38]: sys.getsizeof(s)
Out[38]: 781
  • Data Frame (containing columns and rows array and Multidimensional Array) Ndarray must be 2 dimensional.

  • Boolean Array (Bytes)

##Functional Requirements

  • Encoder for each data type
  • Decoder for each data type
  • JSON serializer functions:
    • dumps()
    • loads()
  • MessagePack serializer functions:
    • packb()
    • unpackb()

Architecture

  • sciserialize
    • coders

      • Date Time
      • Time Delta
      • Multidimensional Array (Bytes)
      • Boolean Array (Bytes)
      • ... and already supported JSON types
      • encode_types() ... recursive encoder function
      • decode_types() ... recursive decoder function
    • serializers

      • dumps()
      • loads()
      • packb()
      • unpackb()

About

The definition of SciSerialize, a serialization format based on JSON and MessagePack.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published