Paginated queries are too slow #110

ricardobeat · 2013-06-03T08:43:56Z

The can fetch a paginated set, with various options test is timing out on mocha (see #108). Takes 3-4 seconds on my machine.

{"page":2,"limit":15,"where":{"status":"published"},"status":"published"}
query: 120ms
count: 93ms
1: 217ms
{"page":5,"limit":15,"where":{"status":"published"},"status":"published"}
query: 513ms
count: 594ms
2: 1111ms
{"page":1,"limit":30,"where":{"status":"published"},"status":"published"}
query: 277ms
count: 198ms
{"page":2,"limit":10,"where":{"language":"fr","status":"published"},"status":"published"}
query: 212ms
count: 280ms
{"page":2,"limit":10,"where":{},"status":"all"}
query: 52ms
count: 7ms

The reason is the way OFFSET/LIMIT works in sqlite, loading everything into memory. The manual warns Do not try to implement a scrolling window using LIMIT and OFFSET and recommends using a comparison instead: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor

The text was updated successfully, but these errors were encountered:

ricardobeat · 2013-06-03T08:44:24Z

[6/3/13 5:15:28 AM] ricardosub: apparently SQLite loads all selected fields into memory when using an offset
[6/3/13 5:15:47 AM] ricardosub: so the pagination queries should probably omit the content fields
[6/3/13 5:16:19 AM] Gabor Javorszky: imagine TNW being loaded into memory
[6/3/13 5:16:20 AM] Gabor Javorszky: :D
[6/3/13 5:17:13 AM] Hannah Wolfe: Interesting, we would need the content fields, so would end up doing n=limit extra queries right?
[6/3/13 5:17:43 AM] ricardosub: in the post listings you can lazy-load content
[6/3/13 5:18:05 AM] ricardosub: in the admin at least
[6/3/13 5:18:21 AM] ricardosub: but if the information is correct, doing the n extra queries would be faster than loading everything within the offset query
[6/3/13 5:18:48 AM] Hannah Wolfe: Admin performance is not as important as frontend performance - and the themes would expect to get content_html at least
[6/3/13 5:19:38 AM] Hannah Wolfe: although don't get me wrong, a well-performing admin is certainly prefered over one that runs badly
[6/3/13 5:20:56 AM] ricardosub: another options is WHERE date>:lastdate instead of offsets
[6/3/13 5:21:03 AM] ricardosub: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor - What Not To Do section
[6/3/13 5:21:22 AM] ricardosub: Do not try to implement a scrolling window using LIMIT and OFFSET
[6/3/13 5:22:12 AM] ricardosub: that sounds like the best thing to do
[6/3/13 5:22:56 AM] Hannah Wolfe: very interesting
[6/3/13 5:24:14 AM] ricardosub: but you have to keep state in the app... so if you jump directly to a page it needs to first do an OFFSET query selecting only the dates, and then proceed to fetch the posts
[6/3/13 5:36:57 AM] Hannah Wolfe: Get max(date) from blah limit blah should be a quick query I hope

ErisDS · 2013-06-03T09:09:29Z

I have added extra timeout to the test which should be removed once this work is completed.

JohnONolan · 2013-06-03T09:11:48Z

Can we standardise code based "todo's" ? Want it to be easy to bulk search a codebase and find all of them, so they're being tracked.

My preference is:
// TODO: [comment]

tgriesser · 2013-06-03T15:41:46Z

Would a solution here be to do a (pseudocode):

SELECT "id" from "tableName" WHERE -conditionals- ORDER BY -column-

and then implement pagination based on the ids:

SELECT * FROM "tableName" WHERE -conditionals- AND "id" > x ORDER BY -column- LIMIT N

If so, we could then probably cache the result of the "id" select statement somewhere and then invalidate when modifications are made to that table.

re: @JohnONolan - that looks good for todo's syntax.

ErisDS · 2013-06-03T15:57:46Z

We can't rely on id, but we can rely on publish date - which is the order posts should pretty much always be in.

tgriesser · 2013-06-03T16:02:46Z

I think we should be able to rely on id here, the publish_date would just be the value for the ORDER BY clause.

ricardobeat · 2013-06-03T22:23:43Z

@tgriesser hannah has a point, since we are considering using UUIDs they wouldn't be sortable, unless we use a non-standard generator for that purpose: http://50.112.33.48:3000/posts/7AQtgRtqJPCeq42b6.

There are also many situations where ids will be out-of-order: programmed publish dates, imported posts, updated posts.

tgriesser · 2013-06-03T22:39:43Z

Oh right, wasn't really thinking when I wrote that...

tgriesser · 2013-06-03T22:42:19Z

Or maybe what I was really thinking of here is that you could get/cache the ID's as the result of a particular query and then do a WHERE "id" IN ('ids') and specify the subset of IDs you're looking to retrieve results for.

ErisDS · 2013-06-04T07:16:36Z

So we can either use published date (which doesn't exist yet btw), or we can use some sort of query cache to remember ids for queries.

What are the pros and cons of each? To me, published date seems like a relatively light-weight solution - but very specific, meanwhile caching seems like a more generic solution that we might be able to apply to all queries to keep things running fast.

I am wondering if we could / should do both. We are likely to want to do sqlite specific query optimisations, so implementing these so that they are only for sqlite in future when we have options is another problem to consider, but not one that needs solving now.

I think at this stage we should build out more of the core functionality, getting started on more complex query building, and then when we have a clearer idea of what this looks like and which bits are slow we could add a cache layer/and or optimisations.

tgriesser · 2013-06-04T12:36:15Z

The problem with just using published date, is that there's no context of a "page" of posts. For example there's no way to immediately browse to page "4" of your posts, because without knowing the total number of posts (filtered / ordered by conditions) and their dates there's no way to tell what date you would use to define the beginning of page "4".

Like your suggestion I'd say let's build out more core functionality, but let's also not try to use the published date as a the pagination, as that's probably an overly simple solution for what we need. We're also not likely to run into this as an issue in the very immediate future (other than the tests), until we start to have a lot of posts.

I've been planning on building some pagination support into a widget for Bookshelf anyway, so maybe I could account for the sqlite case in that and that we could sort of drop that in place when it's done.

ErisDS · 2013-06-04T13:23:28Z

I understand that replacing offset with date requires state, or an initial query to figure out which date to start at for a particular page. This sort of specialisation / optimisation is troublesome, but I imagine that we will want to do quite a lot of database specific query munging, especially if we want to support different databases.

Therefore, having one widget/helper to do pagination isn't really solving the underlying issue of needing to support advanced, specialised behaviour. At the same time, it is an enormous problem and certainly one to keep rattling around at the back of our brains whilst developing features.

One short term micro-optimisation, would be to only fetch content or content_html depending on whether we were in the admin or frontend. Not sure how that might work though.

ErisDS · 2013-07-09T09:34:27Z

Not sure if this has happened for anyone else, but for me the pagination tests are now pretty much always taking 4-5 seconds. Granted that test requires several pagination queries to run, but the fact that it is slowing down over time is quite worrying.

I'm assuming changes to SQLite are at fault, and not changes in our codebase... but it would be good to get some sort of profiling going so we can be sure we're not making things worse.

It would be good to start looking into how we could address this issue, either through bookshelf or changes to Ghost.

ErisDS · 2013-09-01T23:49:36Z

@tgriesser any further thoughts on building a pagination module into bookshelf / knex?

Otherwise, @ricardobeat is this something you might like to take a stab at?

ErisDS · 2013-09-08T14:51:56Z

Punted to 0.4

…is done

* master: Adding proper copyright info for Ghost Foundation Amending pagination test to have a longer timeout until TryGhost#110 is done server half of TryGhost#83, posts are draft by default, browse shows published by default Adding proper copyright info for Ghost Foundation

ErisDS · 2014-11-03T16:13:40Z

I think this is going to be solved by upgrading to knex 0.7 although there are some considerations

jaswilli · 2014-12-28T18:55:45Z

I'm going to close this as we're now using knex 0.7 and at this point I'm not sure what specifically this issue is tracking.

There is a bug in these version where Portal does not load correctly after a successful Stripe Checkout. We are reverting until we can determine the cause of the bug and fix it * Revert "v0.11.1" This reverts commit 828c4d5. * Revert "Fixed incorrect link/path handling" This reverts commit 9d853be. * Revert "v0.11.0" This reverts commit 16c2224. * Revert "Updated portal direct checkout links to use path" This reverts commit d26fad1. * Revert "Added direct links for monthly/yearly checkout" This reverts commit bbea4f7. * Revert "Added user select style to plan container" This reverts commit a665ca5. * Revert "Refined copy" This reverts commit 32d4949. * Revert "Account home page refinements" This reverts commit 6587eca. * Revert "Notification refinements" This reverts commit 23c75e3. * Revert "Added name to welcome notification" This reverts commit 7220049. * Revert "Refined notification" This reverts commit d1c0915.

This reverts commit 741bfb6. The revert was done to resolve an urgent bug fix which was breaking Portal script in case of any notification

ErisDS added a commit that referenced this issue Jun 3, 2013

Amending pagination test to have a longer timeout until #110 is done

278e0cf

ErisDS mentioned this issue Jul 14, 2013

[EPIC] Advanced post features #271

Closed

14 tasks

ErisDS mentioned this issue Sep 4, 2013

Content management screen loading takes too long #395

Closed

ErisDS mentioned this issue Mar 18, 2014

Posts API Response #2347

Closed

ErisDS added the help wanted label Jul 16, 2014

ErisDS mentioned this issue Sep 2, 2014

Changelog for 0.7 knex/knex#460

Closed

morficus pushed a commit to morficus/Ghost that referenced this issue Sep 4, 2014

Amending pagination test to have a longer timeout until TryGhost#110 …

f3bbeeb

…is done

ErisDS modified the milestones: Next Backlog, Future Backlog Nov 3, 2014

jaswilli closed this as completed Dec 28, 2014

snyk-bot mentioned this issue Apr 7, 2022

[Snyk] Fix for 1 vulnerabilities amgir/Ghost#56

Open

snyk-bot mentioned this issue Jun 19, 2022

[Snyk] Fix for 1 vulnerabilities amgir/Ghost#62

Open

daniellockyer pushed a commit that referenced this issue Oct 5, 2022

Revert "Revert to 0.10.17 (#110)"

28ed0fb

This reverts commit 741bfb6. The revert was done to resolve an urgent bug fix which was breaking Portal script in case of any notification

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paginated queries are too slow #110

Paginated queries are too slow #110

ricardobeat commented Jun 3, 2013

ricardobeat commented Jun 3, 2013

ErisDS commented Jun 3, 2013

JohnONolan commented Jun 3, 2013

tgriesser commented Jun 3, 2013

ErisDS commented Jun 3, 2013

tgriesser commented Jun 3, 2013

ricardobeat commented Jun 3, 2013

tgriesser commented Jun 3, 2013

tgriesser commented Jun 3, 2013

ErisDS commented Jun 4, 2013

tgriesser commented Jun 4, 2013

ErisDS commented Jun 4, 2013

ErisDS commented Jul 9, 2013

ErisDS commented Sep 1, 2013

ErisDS commented Sep 8, 2013

ErisDS commented Nov 3, 2014

jaswilli commented Dec 28, 2014

Paginated queries are too slow #110

Paginated queries are too slow #110

Comments

ricardobeat commented Jun 3, 2013

ricardobeat commented Jun 3, 2013

ErisDS commented Jun 3, 2013

JohnONolan commented Jun 3, 2013

tgriesser commented Jun 3, 2013

ErisDS commented Jun 3, 2013

tgriesser commented Jun 3, 2013

ricardobeat commented Jun 3, 2013

tgriesser commented Jun 3, 2013

tgriesser commented Jun 3, 2013

ErisDS commented Jun 4, 2013

tgriesser commented Jun 4, 2013

ErisDS commented Jun 4, 2013

ErisDS commented Jul 9, 2013

ErisDS commented Sep 1, 2013

ErisDS commented Sep 8, 2013

ErisDS commented Nov 3, 2014

jaswilli commented Dec 28, 2014