Skip to content

wikimedia/restbase-mod-table-spec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

restbase-mod-table-spec

RESTBase is a caching / storing API proxy.

This module contains the shared table storage specification, and provides functional tests against this spec. Those tests are executed against the Cassandra and SQLite backends.

JSON table schema example

Example:

{
    table: 'example',
    // Attributes are typed key-value pairs
    attributes: {
        name: 'string',
        property: 'string',
        tid: 'timeuuid',
        length: 'int',
        value: 'string'
    },
    // Primary index structure: The order of index components matters. Simple
    // single-level range queries are supported below the hash key level.
    index: [
        { type: 'hash', attribute: 'name' },
        { type: 'range', order: 'asc', attribute: 'property' },
        { type: 'range', order: 'desc', attribute: 'tid' }
    },
    // Optional secondary indexes on the attributes
    secondaryIndexes: {
        by_tid: {
            { type: 'hash', attribute: 'tid' },
            // Primary key attributes are included implicitly
            // Project some additional attributes into the secondary index
            { type: 'proj', attribute: 'length' }
        }
    }
}

Supported types

  • blob: arbitrary-sized blob; in practice, should be single-digit MB at most (at least for Cassandra backend)
  • set<T>: A set of type T.
  • int: A 32-bit signed integer.
  • varint: A variable-length (arbitrary range) integer. Backends support at least a 64 bit signed integer. Note that there might be further limitations in client platforms; for example, Javascript can only represent 52bits at full integer precision in its Number type. Since our server-side implementation decodes JSON to doubles, this is also the maximum range the we currently support in practice. We might add support for an alternative JSON string representation of larger integers in the future.
  • long: A 64-bit signed long. Javascript only represents 52 bits in its Number type, so longs should be represented as strings in clients.
  • decimal: Decimal number.
  • float: Single-precision (32-bit) floating point number.
  • double: Double-precision (64-bit) floating point number.
  • boolean: A boolean.
  • string: An UTF8 string.
  • timeuuid: A version 1 UUID as a string. Sorted by timestamp.
  • uuid: A version 4 UUID as a string.
  • timestamp: ISO 8601 timestamp as a string.
  • json: A JSON sub-object (as an embedded object, not a string), which is transparently parsed back to JSON.

Secondary index consistency

Queries on secondary indexes are eventually consistent by default. While new entries are inserted along with the data, it is possible that false positives are returned for a short time after the primary request was acknowledged. We will also support optional strongly consistent secondary index requests at the cost of cross-checking the index match with the actual data, at least on some backends.

Custom TTL

A custom TTL can be set for individual objects on PUT requests by providing a special _ttl integer attribute. Its value indicates the amount of time (in seconds) after which the record will be removed from storage.

To select a TTL of a row, provide withTTL: true key in the query.

Queries

Select the first 50 entries:

{
    table: 'example',
    limit: 50
}

Limit the query to 'Tom':

{
    table: 'example',
    attributes: {
        name: 'Tom'
    },
    limit: 50
}

Limit the query to 'Tom', and select properties that are greater than 'a', and smaller or equal to 'c'. Also, only select the 'value' column:

{
    table: 'example',
    attributes: {
        name: 'Tom',
        property: {
            gt: 'a',
            le: 'c'
        }
    },
    // Only select the 'value' column
    proj: ['value']
    limit: 50
}

Now, descend down the primary index tree one level further & perform a range query on the tid key:

{
    table: 'example',
    attributes: {
        name: 'Tom',
        property: 'foo', // Note: needs to be fixed
        tid: {
            le: '30b68d20-6ba1-11e4-b3d9-550dc866dac4'
        }
    },
    limit: 50
}

Finally, perform an index on the by_tid secondary index:

{
    table: 'example',
    index: 'by_tid',
    attributes: {
        tid: '30b68d20-6ba1-11e4-b3d9-550dc866dac4'
    },
    limit: 50
}

As you can see, these queries always select a contiguous slice of indexed data, which is fairly efficient. The downside is that you can only query what you indexed for.

API alternative to consider: REST URLs for GET queries

Due to the tree structure of primary & secondary indexes, simple prefix equality or range queries pretty naturally map to URLs like /example/Tom/foo, or /example//by_id/30b68d20-6ba1-11e4-b3d9-550dc866dac4 for a secondary index query (note the // separator). More complex queries could be supported with query string syntax like /example/Tom/foo/?le=30b68d20-6ba1-11e4-b3d9-550dc866dac4&limit=50.

The current implementation uses the JSON syntax described above exclusively (as GET or POST requests with a body), but for external APIs the URL-based API looks very promising. This is not yet implemented, and needs more thinking though of all the details before we expose a path-based API externally.

About

Shared spec and tests for RESTBase table storage

Resources

Code of conduct

Stars

Watchers

Forks

Packages

No packages published