Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model update doing two reads? #104

Open
jacobg opened this issue May 27, 2015 · 5 comments
Open

model update doing two reads? #104

jacobg opened this issue May 27, 2015 · 5 comments

Comments

@jacobg
Copy link
Contributor

jacobg commented May 27, 2015

From looking at djangoappengine.db.compiler.SQLUpdateCompiler, it seems that an update reads a model entity from datastore a second time in order to save. That is, if I have a Django code like this:

my_entity = MyModel.objects.get(id=id) # reads from datastore
my_entity.my_field = 'something'
my_entity.save() # reads a second time, then writes to datastore

The first line of code obviously reads the model from the datastore. But based on the SQLUpdateCompiler.update_entity implementation, it looks like the model is read a second time, then updated with the changed values, and saved.

Am I understanding that correctly?

@aburgel
Copy link
Member

aburgel commented May 27, 2015

SQLUpdateCompiler is not used when saving. Its used when calling update, which in SQL-land just does an update in place. That's not possible on appengine, everything has to be gets and puts. So SQLUpdateCompiler will do a get, then update the object, and then a put to save it.

@jacobg
Copy link
Contributor Author

jacobg commented May 27, 2015

Thanks Alex. Ok so I stepped through the debugger and can see that SQLInsertCompiler gets used for saves.

My next step (unrelated to the discussion so far) is I'd like in mapreduce to use the MutationPool to do batch updates and deletes. My initial thinking of the approach was to convert the Django model object to a GAE entity object, and then pass it to the pool. But now it seems perhaps unadvisable to mix the API's like that. So I want to consider deriving a GAEMutationPool, modifying SqlInsertCompiler and SqlDeleteCompiler to somehow figure out that it should add the entity to the mutation pool instead of calling Put/Delete directly. But it looks like that involves some nasty overrides to mapreduce, as it doesn't looks like it was designed to customize. Any thoughts?

@jacobg
Copy link
Contributor Author

jacobg commented May 27, 2015

Ok I have a solution: https://gist.github.com/jacobg/6a78a8f90e44c3a8993c

It's path of least resistance, and is based on a "batch_op_class" hint that gets passed to db compiler. A similar solution could be made for deletes. What do you think?

@aburgel
Copy link
Member

aburgel commented May 28, 2015

Looks like a reasonable solution, but I'm not sure I'd want that merged in. I can't really tell if the appengine mapreduce thing is still supported. They seem to be building new data pipeline tools in Java, and I imagine they'd want to follow the same pattern with python.

@jacobg
Copy link
Contributor Author

jacobg commented May 28, 2015

Thanks for taking a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants