Xacto is a tool for manipulating data with Javascript which is heavily inspired by the amazing Q/Kdb and Mathematica. It includes a general utility API and an in-memory database implementation.
Work at a high level, using a uniform set of functions that behave the same way against different types of in-memory values, files, tables, remote resources (not yet implemented), etc.
Pretty new. Some glaring omissions. Don't trust with your important data just yet. See "Bugs" below.
-
JavaScript really sucks but we're stuck with it. I want to write JS that's as concise and meaningful as Q by Kx.
-
One size does not fit all. As my needs become more complex, I want to tune how I store and scan my data. I want to be able to spawn mini-databases and hybrids.
-
Every external dependency is a risk factor. Remove as many as possible. I want to work without MySQL or Redis or 500mb of npms or anything but Node itself.
- Convenience functions that abstract away JavaScript's frustratingly patchy standard library and perform in a uniform way for most data types.
- Column-oriented in-memory tables
- Updates logged to disk and replayed at startup
- TypedArray vector columns for integers (byte, short, int)
- Regular Javascript value columns (can contain any type, including other tables)
- Create your own table types
- Create your own column types
- Fast-ish, or at least written with performance in mind
- Regular Javascript-style functions instead of SQL or homegrown query languages.
- No magic or Javascript puffery; simple, concise code, written in a crude but familiar style, with minimal state.
- Zero dependencies (at least for now)
- Not too "objecty" (prototypal inheritance is leading cause of teen suicide)
- Code fairly dense, easy to scan, designed for trendy wide screens
For my own purposes I need this to be pretty fast so that was a primary concern when designing the system.
- many functions use plain ole for(;;;) - still faster than all those lovely callbacks, but makes the code less terse and flexible than I'd like
- uses typed arrays to store integer values (and hopefully floats and other types soon; see below)
- uses
Map
instead of objects for table column handling - this should allow for tables with almost any number of columns which creates interesting opportunities - uses
Set
for internal row lists in critical sections
Xacto presents a number of handy functions for working with Javascript objects and pure values.
Goals:
- Use as few global "verbs" as possible - have one understood meaning of each
- Make each verb behave logically for each type of data/collection (find the path of least surprise)
- Add the first level of required sugar to make it edible by humans
Generally, X's verbs take the "data" or "from" thing as the first argument with the operation or value as the second.
Dies if !cond showing text
Returns a random item from value
.
Value must be a string or array at this time.
Deep recursion into collection
. Applies func
to every "leaf" value.
Optionally, supply {type: "string"}
in opts
to select what kind types
of nodes should have func
invoked on them.
The collection is returned with the results of func
inserted in the
place of previous values when it was dispatched.
func
is called as f(value, path, opts, collection)
. The path value can be used to
figure out where you are in collection
. It is an array of indices.
> let X=require('../xacto')();
> let z=['tom',23,'bob',function(){return 999}];
> let myfun=function(s){return s.toUpperCase()};
> let z1=X.deep(z,myfun,{type:'string'});
> z1
[ 'TOM', 23, 'BOB', [Function] ]
This is useful for recursing deep into objects to find or manipulate specific values.
Creates a keyed object (dictionary) from a list of keys and a list of values.
> X.dict(['name','age'],['tom',38])
{name:'tom',age:38}
Prints text and exits with error code 1.
Return value
without the first n
items.
Negative n
will remove items from the end of value
.
> const X=require('xacto')();
> const r=X.range(0,10)
> X.drop(r,7)
[7,8,9]
> X.drop(r,-7)
[0,1,2]
Prints value and returns it; use in the middle of expressions to debug values.
> let z=emit(get_thing(),'thing result')*4+emit(other_func(),'other')
thing result 6
other 12
36
For arrays: returns an array of f(x[i],i,opts)
for each item in x.
> X.each(X.range(1, 10+1), function(x){return x*3})
[ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30 ]
For objects and Maps, each
preserves keys. It returns {k:f(x[k],k,opts), j:f(x[j],j,opts), ...}
:
> let rec={name:'Arca',species:'super cute pomeranian'};
> X.each(rec, function(x){return x.toUpperCase()})
{ name: 'ARCA', species: 'SUPER CUTE POMERANIAN' }
each
also works for tables. Starting from a CSV, as a string, just apply some columns to the output:
> let cols={name:'string',age:'int',species:'string'};
> let tblConf={tableCols:cols};
> let tbl=X.imp("tom,38,human\narca,4,dog\ntyler,4,human","csv",false,tblConf);
> tbl.each(function(row){return row.age*2})
[ 76, 8, 8 ]
In the case of arrays of objects or tables, each
allows you to specify the
column name instead of a function to extract all values of that column:
> X.each(tbl, 'age')
[38, 4, 4]
Performs a deep equality test between x and y.
Synonym for head
Transform dictionaries with arrays of values (like {a:[1,2,3],b:[4,5,6]}
) into arrays of flattened
dictionaries (like [{a:1,b:4},{a:2,b:5},{a:3,b:6}]
).
> const X=require('./xacto')()
> const z={'name':['tom','arca','tyler'],age:[38,4*7,4]}
> z
{ name: [ 'tom', 'arca', 'tyler' ], age: [ 38, 28, 4 ] }
> X.flip(z)
[ { name: 'tom', age: 38 },
{ name: 'arca', age: 28 },
{ name: 'tyler', age: 4 } ]
For dictionaries (objects), returns the keys.
For lists, returns an array of its indices.
Return the index
th item in collection
. index
can also be an array. Works for all types.
Returns the Xacto handler for a given filename's extension. Mostly used
by load()
and save()
.
If collection
is an object, returns whether or not value
is one of its properties.
If collection
is an array or other container, returns whether or not value
is one of
its members.
Error otherwise.
Returns the first item in value
Appends value
to collection
. This works for tables, arrays, etc.
If collection
is an object, and it has a member named ins
that is a
function, this will return collection.ins(x)
.
If collection
is an object and value
is an object each of the values in
value
will be set in collection
, overriding previous values with the same
keys.
Intersection. Returns the common values in x
and y
.
Membership test. Returns true if value
is in collection
.
Synonym for tail
Return the length of value
. Works for most types, including tables.
If value is an object with a 'len' member, returns value.len()
.
If value is an object with a 'length' member or a string, returns value.length
If value is a dictionary, return the number of keys
Interpret resource
and retrieve it, calling callback(err,data)
when done.
resource
is generally a filename. You can define your own handlers to, say,
automatically decode .json
files when loaded. See the Resources section
below for more.
This callback style (error as first arg, result as second) is meant to emulate
the Node.js built in API. The built-in filesystem extension handlers allow you
to supply null as callback
and invoke their synchronous APIs. This is handy
during server startup and to avoid callback hell when you can spare the
performance.
The meaning of options
is specific to the resource handler.
See also the converse of this function: save(resource,data,callback?)
.
JSON decode
JSON encode
Returns x with y appended.
This might go away in favor of ins
.
Return the higher of m and n
Return the lower of m and n
Project arguments x
, y
, and/or z
onto function func
. Returns a new function.
Similar to currying. Returns a version of func
with arguments already
applied. Use undefined to indicate an empty value that must be applied when
calling the resulting function.
> const X=require('./xacto')();
> const pointlessfunc=function(a, b, c){ return 'Hello '+a+', '+b+', '+c },
> const f=X.proj(pointlessfunc,'Tom', undefined, 'Tyler')
> f
[Function: bound ]
> f('Arca')
'Hello Tom, Arca, Tyler'
Currently only allows work with functions with three arguments or less.
Returns a random integer from (0..n]
Returns an array of integers from min to max-1.
Optionally calls func(i)
for each integer. You can use this to apply a range
of numbers to a function, generate test data, etc.
Select the items in collection
matching predicate
. Works for most types.
See "Select" below.
Attempt to stringify x
. Simple values like numbers become strings. Objects
with a toString
method, such as a Buffer
, have it its results returned.
Container types are returned as JSON.
Sum array. Numbers only for now. Only arrays for now.
If nullvalue
supplied, string conversion will be attempted.
Return the first n
items in value
.
Negative n
will return items from the end of value
.
// Xacto currently pollutes globals, so you don't have to use X. in front of verb names
> const r=range(0,10)
> take(r,3)
[0,1,2]
> take(r,-3)
[7,8,9]
Return last item in value
Returns the type of value
, with some additions over standard typeof
:
- Undefined values return
undef
- Objects that are arrays return
array
(saves trip throughArray.isArray
) - Numbers that have no fractional part return
int
- All other numbers return
float
- Functions return
func
- Otherwise,
typeof(value)
is returned.
Shortcut for Array.isArray
Returns true if value
is a collection type (object or array)
Return true if value
is an object, but not an array. Eventually this should
also try to ensure this is a "flat" object with no functions as members, etc.
Returns true if value
is a function
Updates key
in collection
with value. Works with tables, arrays, and objects.
For objects, key should be an array of strings.
key
can be an array of indices. value
should be an array of the same size.
Returns keys of collection
that match predicate
.
If collection
is something like an array and predicate
is a function, where
returns the indices where the function returns
true:
> X.where([1, 2, 3, 4, 5],function(x){return x%2==0})
[1,3]
If predicate
is omitted, an array of all of the elements indices is returned.
For usage with tables, see "Where" below.
Shortcut for undefined. I hate typing.
Xacto' file handling features come in the form of two functions: load
and save
.
> X.save("./test.json",myData)
> myData2=X.load("./test.json")
> X.assert(X.equal(myData,myData2),"ugh")
See lib/filehandlers.js
for a sense of how these are constructed while
these negligent docs remain unfinished.
Xacto databases live in their own folder which is specified when the Xacto instance is created.
> var X=require('exacto');
> // open database folder. existing database and logs will be automatically loaded.
> X=X('./testdb/')
The first time you reference a table, you have to define its schema. You can also give it a name.
> students=X.table('students',{name:'string',age:'int',species:'string'})
If you don't give the table a name, you won't be able to refer to it by its string name elsewhere in your application. Using a string to refer to a table is useful because you don't have to pass it around to all of your code that may need to do data manipulation.
You can reference the table by a string of its name using X.ins
(surprisingly
handy in some situations) or via a table reference.
> X.ins('students', {name:'Tom',age:38,species:"Programmer"})
> // alternative forms:
> X.tbl.students.ins({name,'Arca',age:4,species:"Elegant Pomeranian"})
> students.ins({name:'Tyler',age:4,species:"Lil Bebe"})
See also the full explanation of ins()
above.
Search for values matching predicate
or find rows where predicate(row)
returns true.
If you omit the predicate, will return all values.
Always returns an array of records. The array is empty if no match is found.
> // generate 1000 numbers from 0..100 and find those that are 42
> X.sel([X.randN(100, 1000)], 42)
> X.sel('students', {name:'Tom'})
[{name:"Tom",age:38,species:"Programmer"}]
> students.sel({age:function(a){return a < 10;})
[{name:"Arca",age:4,species:"Elegant Pomeranian"},
{name:"Tyler",age:4,species:"Lil Bebe"}]
TODO query capabilities in detail
where
is used to search for values much like sel
. where
returns the indices that match the predicate
instead of the rows or matching values themselves. In other words, where
returns an array of integers, but
sel
returns an array of records/objects.
> X.where('students', {name:'Tom'})
[0]
> students.where({age:function(a){return a < 10;})
[1,2]
where
always returns an array. It will be empty if no matches are found. You can use X.len()
to check any
type of value's length.
Internally, sel
often uses where
to perform its searches.
Update items in collection
matching predicate
.
predicate
may be a function or in the case of a table a record/object.
> X.upd('students', {name:'Tom'}, {age:0}); // to be young again
upd()
can also be used for non-table types. See the upd
section above for more.
When you create a table, you can supply a list of "backends" that are attached to it. These are like plugins or storage engines.
One of them is the logger. This will record all ins
and upd
operations performed against the table since the time it was created.
If you don't want to maintain an update log, you can save your table whenever you want with table.save('whatever.json')
.
The logger has a variety of options. To start, an example, with all options specified:
> let logopts={
replay:true,
flush:{
time:60 * 1000,
rows:100
},
rotate:1,
interval:2 * 1000,
unlink:false,
verbose:true
};
> X.table('recipes',{id:'int',title:'string',ingredients:'any'},[X.mem, X.logger(logopts)])
Use verbose:true
to see debugging information about the logger's behavior.
This is recommended when in development. You don't want to have any blank areas
in your understanding of your database's on-disk state.
When you first initialize the table and its associated logger, replay:true
will
request it replays existing logs. If you'd like to do this on your own, you can use
X.logger.replay()
.
To replay it will scan XHOME/*.log.json
for log files. If it finds one, it
will apply its contents to the table. These are done as synchronous operations
and may slow the start of your app if the logs are numerous. You can set
unlink:true
to remove each log file as its consumed, but you'll need to
save/reload your initial table state some other way if you want to persist data
across many executions of your program.
Information about what logs were loaded with replay
can be accessed via the
array X.logger.logStats
.
After starting, the logger runs every interval
seconds (2 seconds by
default). If you set interval
to 0, it won't run, but you can run it manually
with X.logger.check()
.
Each time it runs, it examines the amount of items in its update log, and when
it last saved its state to disk. If it's more than flush.rows
OR if it's been
longer than flush.time
since the log was written to disk, it will save the
log as XHOME/$TIME.log.json
.
The time values used here (including in the log file name) have a millisecond resolution as per JavaScript conventions.
If you want to log everything and never risk losing an update, set flush.rows
to 1.
Please note that once the logger is operational your script will have pending timeouts and thus will not exit after finishing execution.
If you don't want the logger to run on its own, you can set interval
to 0,
and then use X.logger.flush()
to save state on your own schedule. Then your
script will exit on its own correctly too.
Major bugs:
- currently pollutes globals. trying to find a better structure
- enumerations (columns grouped by unique values) do not currently work.
- logger needs a way to remove logs and take snapshots or some combination thereof. logger should be some kind of quasi-global behavior, rather than table specific.
- there's something odd about converting some TypedArrays to buffers for loading/storing. In particular, floats seem to be saved as ints. I'm still looking into this.
- deep() bombs on some trees (nested arrays). Fix imminent.
- impending inevitable showdown with Promises. Thinking caps required.
See also the TODO list on top of lib/xacto.js
Dumb for loop speed: https://jsperf.com/for-vs-foreach/37
fileHandlers={'.json':{load(f):{..},save(f,x):{..},import,export()},'.txt':{..},'.csv':{..}