Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing raw JS objects and querying according to pointer equality === #248

Open
CMCDragonkai opened this issue Jan 24, 2018 · 20 comments
Open

Comments

@CMCDragonkai
Copy link

I want to use datascript to store "pointers" to raw JS objects, and be able to query for them.

So I did a small test to see if it could work.

First with a class which works:

const d = require('datascript');
class DummyObj {}
const obj1 = new DummyObj;
const obj2 = new DummyObj;
const db = d.empty_db();
const db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [':db/add', 2, 'obj', obj2]]);

d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj" ?obj]]', db1, obj1); // only entity 1
d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj" ?obj]]', db1, obj2); // only entity 2

Second with just a normal object:

const d = require('datascript');
const obj1 = { x: 1 };
const obj2 = { x: 1};
const db = d.empty_db();
const db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [':db/add', 2, 'obj', obj2]]);

d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj" ?obj]]', db1, obj1); // []
d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj" ?obj]]', db1, obj2); // []

But the second one doesn't, it just returns nothing.

Also are certain objects ever directly serialised? What determines whether to compare objects by their serialised form vs by their object pointer?

@tonsky
Copy link
Owner

tonsky commented Jan 24, 2018

Objects are not serialized, but they are compared using CLJS compare

(compare o1 o2)
. That means if you ever look up by value, you can only look up by primitives: strings, numbers, cljs keywords. If it works for some objects it’s most probably just by accident

@CMCDragonkai
Copy link
Author

CMCDragonkai commented Jan 24, 2018

What does CLJS compare do with objects that are instantiated from classes? The above code shows that it works for new DummyObj.

This would be a very useful function for me, because I need a table that I can query that stores pointers. And some rows may store the same pointer. And then I would update all rows with the same pointer to point to something new. It would support an immutable keyless B+tree.

@tonsky
Copy link
Owner

tonsky commented Jan 24, 2018

Here’s the code:

https://github.com/clojure/clojurescript/blob/9ddd356d344aa1ebf9bd9443dd36a1911c92d32f/src/main/cljs/cljs/core.cljs#L2345-L2369

I guess maybe first case falls under (identical? (type x) (type y))? Not sure

@rauhs
Copy link
Contributor

rauhs commented Jan 24, 2018

Extending the type to be IComparable is probably tough if you use a mangled JS build. Though it'd be easy if you built datascript yourself (you can see shadow-cljs to get you a webpack compatible build of datascript). Right now the simplest (hacky) workaround would be to always add an array and give the first element in the array an "ID" that's unique and primitive (string,number,bool) and store the actual payload (your object) as the second value of the array:

[12 {x: 12, other: "bar"}]
[15 {x: 15, other: "foo"}]

Those value will be comparable to CLJS and will also properly be sorted (important for initializing the DB which uses arr.sort()).

FWIW, I think CLJS could be more lax about this and allow all objects to cljs.core/compare as long as both have a valueOf function, which is required to return a primitive value by the JS standard. Though I'm not sure such a change would be accepted (feel free to open a ticket about it).

@CMCDragonkai
Copy link
Author

CMCDragonkai commented Jan 24, 2018

I really need it to compare based on pointer equality of the object itself. Are you saying to tag each object created with a special unique id before inserting into datascript?

Also I'm sure there are certain value types that cannot be ordered, I wouldn't think of pointers to objects as being ordered. So I'm not sure what kind of benefit sorting is for this situation.

@CMCDragonkai
Copy link
Author

I found that identical? in CLJS maps directly to ===: https://stackoverflow.com/a/13005218/582917

@rauhs
Copy link
Contributor

rauhs commented Jan 24, 2018

If you store values in Datascript and query by them (as above) you absolutely need to make your values comparable. If you really just care for identity and don't have a natural ordering for your values then I'd do the following:

  1. Generate a unique ID for each object, (1, 2, 3....), add this ID to your JS object which also has pointers.
  2. Attach this unique ID to an indexed datascript attribute. Something like object/pointers-id
  3. Attach the actual payload (your JS object with pointers) to some other datascript attribute. Something like object/pointers.

Then only ever query by object/pointers-id and get the value from the entity on the pull. That'd be less hacky and scale well.

@CMCDragonkai
Copy link
Author

Thanks for the advice, however I'm not familiar with clojure. What would those 3 steps look like in JS?

@CMCDragonkai
Copy link
Author

Still I'm confused why would there be a different behaviour from using just {} vs new DummyObj. The code samples pointed out by @tonsky doesn't appear to deal with the difference. In JS, both are typeof Object, and both are instanceof Object. The only difference is that obj1.constructor === DummyObj and ({}).constructor === Object.

@CMCDragonkai
Copy link
Author

CMCDragonkai commented Jan 24, 2018

Another test:

const d = require('datascript');
const obj1 = { x: 1 };
const obj2 = { x: 1};
const db = d.empty_db();
const db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [':db/add', 2, 'obj', obj2]]);

d.pull(db1, '[*]', 1).obj === obj1; // false (there was a parentheses typo here)

It shows that these are no longer the same object. That must mean datascript must be doing a shallow or deep copy of the normal object that is being inserted. (Later I found out that it was in fact a deep copy.)

I think the docs should make clear that when inserting JS objects, if they are literal objects, they get copied, while if they are class instantiated objects, they are inserted by reference. This occurs even when the class instantiated objects are deeply nested.

@tonsky
Copy link
Owner

tonsky commented Jan 24, 2018

That must mean datascript must be doing a shallow or deep copy of the normal object that is being inserted.

DataScript certainly does not do that. Check your tests

@CMCDragonkai
Copy link
Author

CMCDragonkai commented Jan 24, 2018

@tonsky Have you tried running this?

const d = require('datascript');
const obj1 = { x: 1 };
const obj2 = { x: 1};
const db = d.empty_db();
const db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [':db/add', 2, 'obj', obj2]]);

d.pull(db1, '[*]', 1).obj === obj1; // false

It shows that with obj1 which is just plainly {x: 1}, which is added into the DB. Then when I pull it out, I compare it with obj1 using ===. It returns false. I'm running on Node v8.7.0. I copy it verbatim and run it. That's what happens. If it's not copying it, then what is d.pull(db1, '[*]', 1).obj?

I've tested again with new Object({a:1}), it is the same result as a normal literal object. But as soon as it is a class instantiation, then it does return true when doing ===. It even happens for deep objects.

@rauhs
Copy link
Contributor

rauhs commented Jan 24, 2018

I don't know the JS side API of datascript so I can't help you there. Forget about the difference between Obj vs Class instance. Both won't work, you just can't query by values which are not comparable. Try implementing my idea above. Pseudo code:

db = d.empty_db({"obj-id", {":db/index" true}});
db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [":db/add", 1 "obj-id" obj1.id]
                     [':db/add', 2, 'obj', obj2], [":db/add", 2, "obj-id", obj2.id]

;; Now query by obj id:
d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj-id" ?obj]]', db1, obj1.id);

Use a factory method to get you a new object with a newly generated id.

@tonsky
Copy link
Owner

tonsky commented Jan 24, 2018

I’m sorry, you’re right. DS does tries to convert entities to CLJS values and back

(->> (js->clj entities)

@CMCDragonkai
Copy link
Author

CMCDragonkai commented Jan 25, 2018

@rauhs Just a clarification, does this mean datascript cannot index things that are not ordered (like using hash indexing)? I just tried it:

Error: Cannot compare [object Object] to [object Object]

@tonsky
Copy link
Owner

tonsky commented Jan 25, 2018

usually it can store incomparable values. You can’t store them cardinality-many attributes, you can’t make them indexed or unique. Otherwise it should be fine.

@CMCDragonkai
Copy link
Author

CMCDragonkai commented Feb 1, 2018

I'm making an adapter to make sure all my object keys are given unique numbers so they can be indexed by datascript.

But I had a thought experiment as to whether datascript in the future could index JS objects. Well I found that other than ES6 Map and WeakMap, there's no other easy way to index object keys in JS. But I looked at Facebook's immutable.js codebase, and here's their implementation for "hashing" JS objects that can be used as keys in their Immutable Map and Ordered Map. https://github.com/facebook/immutable-js/blob/7f4e61601d92fc874c99ccf7734d6f33239cec8c/src/Hash.js#L85-L153

Maybe a feature request for the future?

There's also a discussion about this feature: immutable-js/immutable-js#84 Previously immutable.js also couldn't store objects as keys, but after that commit, objects could be stored as keys for immutable sets, maps and orderedmap.

@tonsky
Copy link
Owner

tonsky commented Feb 1, 2018

cool, thanks

@CMCDragonkai
Copy link
Author

@CMCDragonkai
Copy link
Author

CMCDragonkai commented Feb 9, 2018

BTW @rauhs even if I use object tagging to allow object keys to be indexed by proxy of the numeric tag. I still need to make sure my objects are class instantiated (not new Object() as it doesn't work), because as demonstrated before, datascript copies literal objects on insertion. I just tried with the pull API, and it did this again. However the entity API is strange as instead of giving back my object, it gives back some different kind of object (seems like another entity itself).

I think the docs should make clear that when inserting JS objects, if they are literal objects, they get copied, while if they are class instantiated objects, they are inserted by reference. This occurs even when the class instantiated objects are deeply nested.
#248 (comment)

I hope one day this feature will be made explicit, the ability to make sure even literal objects are stored by reference and not copied.


Found another hack to get referenced objects: Object.create(null) creates an object with undefined constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants