-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make pagination simpler? #640
Comments
Definitely not a crazy person. I would like this to be simpler as well.
var query = dataset.createQuery('Lion').limit(10);
dataset.runQuery(query, function(err, results, nextQuery) {}); To get all of the results, dataset.runQuery(query).on('data', function(entity) {
// entity.key = ...
// entity.data = ...
}); I don't like returning a function in the callback. The modular approach we have now is imo the best way to go. We return a prepared query that the user sends back to However, I think something like |
I didn't notice the Either way, it doesn't fix the issue of "I know this is going to be a boatload of results. I want to page through them. Please don't make me sad while doing it."
That's an overall limit, not a per-page limit. There's no way (AFAIK) to say "I want up to 500 results, in chunks of 10 at a time please". We just get chunks of N back (I don't know how big N is...)
That makes me sad. Let's look at this again, assuming I can have unlimited results coming back, I want to page through them all, but I might want to stop once I've found the one I want: (Disclaimer -- I have no idea if this even works -- but the fact that I'm having a bit of trouble getting this to do what I want is... telling. Either I'm not so hot in Javascript, or the pattern is a bit tricky.) var query = dataset.createQuery('Person');
var callback = function(err, entities, endCursor, apiResponse) {
for (var i in entities) {
var entity = entities[i];
if (entity['name'] == 'Stephen') {
console.log('Found him!');
return;
}
}
// Whoops -- didn't find him yet.. let's go again.
var nextQuery = query.start(endCursor);
dataset.runQuery(nextQuery, callback);
}
dataset.runQuery(query, callback); I can't define the callback in a simple way in-line, can I ? I'd totally much rather var query = dataset.createQuery('Person');
dataset.runQuery(query, function(err, entities, nextPage) {
for (var i in entities) {
var entity = entities[i];
if (entity['name'] == 'Stephen') {
console.log('Found him!');
return; // I'm done -- no more need to page through these results.
}
}
nextPage();
}); Where Couldn't we also make nextPage have a property called |
I can't address all of the issues at the moment, but just wanted to show some code. How it works currently: var query = dataset.createQuery('Person');
var callback = function(err, entities, nextQuery) {
for (var i in entities) {
var entity = entities[i];
if (entity['name'] == 'Stephen') {
console.log('Found him!');
return;
}
}
// Whoops -- didn't find him yet.. let's go again.
dataset.runQuery(nextQuery, callback);
}
dataset.runQuery(query, callback); I don't mind either way. At one point, I suggested the P.S. good luck finding me! |
Yea that seems super repetitive and makes me sad. I think we should reopen the discussion of |
So rather than: var query = dataset.createQuery('Person');
var callback = function(err, entities, nextQuery) {
// check if found, if so then return
// ...
// Else whoops -- didn't find him yet.. let's go again.
dataset.runQuery(nextQuery, callback);
}
dataset.runQuery(query, callback); We have: var query = dataset.createQuery('Person');
var callback = function(err, entities, nextPage) {
// check if found, if so then return
// ...
// Else whoops -- didn't find him yet.. let's go again.
nextPage();
}
dataset.runQuery(query, callback); That doesn't seem to make things much better, and now I'm unsure what happens once nextPage() returns because with no callback it looks like a synchronous call... I like the explicit passing of callback because it feels less magical, and as you can see in the first example above, it's really not all that more complicated. |
Wait, I think you're missing something. You can leave the second option's function anonymous and in-line it without repeating yourself...: var query = dataset.createQuery('Person');
dataset.runQuery(query, function(err, entities, nextPage) {
// check if found, if so then return
// ...
// Else whoops -- didn't find him yet.. let's go again.
nextPage();
}); Which is much nicer than the previous IMO.
It feels awfully repetitive to me, and requires me to change my pattern if I want to deal with more than one page. That is: if you want a single page of results (which we don't define how big a "page" is), in-line the callback function. If you want more than that... well, you have to pull your callback function out, name it, and then make sure you call the next query at the end of it, with itself as the callback. Note that our documentation only shows inlined callbacks -- aka, all of our documentation shows people how to deal with either 1 or 2 pages only -- never "page until you're done". If we were to in-line this to "page until you're done", taking the example from the var query = dataset.createQuery('Lion');
// Retrieve 5 companies.
transaction.runQuery(query, function(err, entities, endCursor, apiResponse) {
console.log("First page");
var callback = function(err, entities, endCursor, apiResponse) {
console.log("Next n pages...");
var nextQuery = query.start(endCursor);
transaction.runQuery(nextQuery, callback);
};
// Use `endCursor` as the starting cursor for your next query.
var nextQuery = query.start(endCursor);
transaction.runQuery(nextQuery, callback);
}); Which looks stupidly repetitive just to page through the results... It feels to me like we can do much better while still providing the level of control we want. |
It's not: var query = dataset.createQuery('Lion');
// Retrieve 5 companies.
transaction.runQuery(query, function(err, entities, endCursor, apiResponse) {
console.log("First page");
var callback = function(err, entities, endCursor, apiResponse) {
console.log("Next n pages...");
var nextQuery = query.start(endCursor);
transaction.runQuery(nextQuery, callback);
};
// Use `endCursor` as the starting cursor for your next query.
var nextQuery = query.start(endCursor);
transaction.runQuery(nextQuery, callback);
}); It's: var query = dataset.createQuery('Lion');
// Retrieve 5 companies.
var callback = function(err, entities, nextQuery, apiResponse) {
if (nextQuery) {
transaction.runQuery(nextQuery, callback);
}
};
transaction.runQuery(query, callback); We return an entire I'm with Ryan that a function call that doesn't take a callback is sync. That's just how every function call in JavaScript is, so it's in a JS programmer's memory. I think we could say "Well, not for us!" and do it our own way, but I don't know if that's even better in this case. Inline function vs named function isn't that much of a simplification win in my opinion. I think we can find a happy middle point by allowing an option on the query to get all of the results before executing the callback. For the user who doesn't care, and doesn't want to deal with streams, that seems perfect. Back to pagination, I was once told Datastore basically decided on the fly how many results it could return based on available resources at the time of request. So, we couldn't chunk results in a defined manner. |
If that's true, then our docs are broken :( See http://googlecloudplatform.github.io/gcloud-node/#/docs/master/datastore/dataset?method=runQuery
For me it's huge -- it means rearranging code, coming up with a name for the callback, and changing the flow of things. Instead of "ok runQuery -> here's the callback" it's exactly backwards "Here's a callback, now runQuery with it". One thing that might be useful would be to provide some more context, so I don't have to pre-define and name my callback: var query = dataset.createQuery('Lion');
transaction.runQuery(query, function(err, entities, nextQuery, apiResponse) {
if (nextQuery) {
transaction.runQuery(nextQuery, this.callback); // or just `this` ?
}
}); |
Our library shouldn't try to simplify the language, that's when things are magical and confusing. Naming a function that's going to be re-used makes sense. If they want it to run once, inline and set "runAllNextQueriesForMe: true". But no, our docs aren't broken. Just my head. I thought for sure we returned |
My argument is that it's not being reused in the way that I would name When we're not allowing anyone to authoritatively get all the results and call my anonymous function for those results and stop running when I want to stop (
Cool - no problem. @idosela just suggested something interesting. What if we provided a So the code might look like: var query = dataset.createQuery('Person');
transaction.runQueryUntil(query, function(err, entities, apiResponse) {
if (someMatchOn(entities)) {
console.log('Found him!');
return false;
}
}); or var query = dataset.createQuery('Person');
transaction.runQueryUntil(query, function(err, entities, done, apiResponse) {
if (someMatchOn(entities)) {
console.log('Found him!');
done();
}
}); Maybe we could introduce this into the var query = dataset.createQuery('Person');
transaction.runQuery(query).on('data', function(entity) {
if (entity.name == 'Stephen') {
return false;
}
}); If we introduce that -- I'd recommend that the |
Re the |
When running in stream mode, you can end it with var queryStream = dataset.runQuery(query);
// ... later
queryStream.end(); That's all we do when the results are fully retrieved: https://github.com/GoogleCloudPlatform/gcloud-node/blob/d2883f6b64c9d9d372e6d4da683654664aa39a08/lib/datastore/request.js#L559
I'm on board. RE: dataset.runQuery(query).on('data', function (entity) {
if (entity.something) {
// got it!
this.end();
}
}); We can definitely update our docs to show this type of flow.
Even if they could choose the page size, they would still have to control if and when to get the next page's results, right? And I don't think we agree on the anonymous usage here. It's named so that they can call it again, if they choose. Here's what we're looking at: // 1.
var onResultSet = function(err, results, nextQuery) {
if (results[0].name === 'JJ') {
console.log('Found JJ!');
return;
}
dataset.runQuery(nextQuery, onResultSet);
};
var query = dataset.createQuery('Person');
dataset.runQuery(query, onResultSet);
// 2.
var query = dataset.createQuery('Person');
dataset.runQuery(query, function(err, results, continueRunningQuery) {
if (results[0].name === 'JJ') {
console.log('Found JJ!');
return;
}
continueRunningQuery();
}); But as I said before, I'm fine with either. As a user, I would prefer to write in the first style. The second one is truly more confusing for me. I wouldn't expect a function to re-run when I execute some other variable. That feels like we're trying to out-smart the language, and sure, we can try, but for a developer who writes in that language, they get confused by this. They know how to name functions and call them again. If we go with the second option, a good variable name in place of |
I am 100% on board for the following: dataset.runQuery(query).on('data', function (entity) {
if (entity.something) {
// got it!
this.end();
}
}); It is clear, concise, and makes pagination not really an issue.
The point I was making was that option 2 above means I can define the logic in-line, no need to choose a name. If I was to use something in multiple places (as in I want to use it in multiple places), then I should be required to name it. I don't want to use it multiple times here -- I'm forced to because of an implementation detail in our API. That IMO means I shouldn't have to name it... |
Worth noting, we use the nextQuery pattern quite a bit in the library, e.g. The most sane and modular solution to this problem is the one where we hand back a prepared query. The user can run that query and provide an entirely different callback if they wanted / queue it to run later / whatever. But with both of our preferred solutions to this problem, the user can do the same exact things. Who can help make the final call? |
Are you referring to the call between sending back |
Oh sorry, I meant as far as providing a function to get the next results vs a Query object with the cursor set for the user to continue paging. Everything in your last comment I'm +1 on. |
I think if we provide (and recommend) that the way you page through results is with |
🤖 I have created a release \*beep\* \*boop\* --- ## [6.2.0](https://www.github.com/googleapis/nodejs-translate/compare/v6.1.0...v6.2.0) (2021-04-07) ### Features * added v3beta1 proto for online and batch document translation ([#639](https://www.github.com/googleapis/nodejs-translate/issues/639)) ([513c21a](https://www.github.com/googleapis/nodejs-translate/commit/513c21aabf2ece278708bca56a94bde33e80f072)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
(Going to use Datastore as the example)
In the docs from
Datastore.runQuery
Seems like in datastore there's no way to set the page size, however even so, it seems that paging is super manual here, which bums me out.
What if we could make it so that...
Or even more explicitly, let the user tell you "just get them all please":
?
Let me know if I'm a crazy person here... @ryanseys @stephenplusplus We're having a similar discussion about pagination being explicit versus implicit on googleapis/google-cloud-python#895
The text was updated successfully, but these errors were encountered: