Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginated queries are too slow #110

Closed
ricardobeat opened this issue Jun 3, 2013 · 17 comments
Closed

Paginated queries are too slow #110

ricardobeat opened this issue Jun 3, 2013 · 17 comments
Labels
bug [triage] something behaving unexpectedly help wanted [triage] Ideal issues for contributors to help with
Milestone

Comments

@ricardobeat
Copy link
Contributor

The can fetch a paginated set, with various options test is timing out on mocha (see #108). Takes 3-4 seconds on my machine.

{"page":2,"limit":15,"where":{"status":"published"},"status":"published"}
query: 120ms
count: 93ms
1: 217ms
{"page":5,"limit":15,"where":{"status":"published"},"status":"published"}
query: 513ms
count: 594ms
2: 1111ms
{"page":1,"limit":30,"where":{"status":"published"},"status":"published"}
query: 277ms
count: 198ms
{"page":2,"limit":10,"where":{"language":"fr","status":"published"},"status":"published"}
query: 212ms
count: 280ms
{"page":2,"limit":10,"where":{},"status":"all"}
query: 52ms
count: 7ms

The reason is the way OFFSET/LIMIT works in sqlite, loading everything into memory. The manual warns Do not try to implement a scrolling window using LIMIT and OFFSET and recommends using a comparison instead: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor

@ricardobeat
Copy link
Contributor Author

[6/3/13 5:15:28 AM] ricardosub: apparently SQLite loads all selected fields into memory when using an offset
[6/3/13 5:15:47 AM] ricardosub: so the pagination queries should probably omit the content fields
[6/3/13 5:16:19 AM] Gabor Javorszky: imagine TNW being loaded into memory
[6/3/13 5:16:20 AM] Gabor Javorszky: :D
[6/3/13 5:17:13 AM] Hannah Wolfe: Interesting, we would need the content fields, so would end up doing n=limit extra queries right?
[6/3/13 5:17:43 AM] ricardosub: in the post listings you can lazy-load content
[6/3/13 5:18:05 AM] ricardosub: in the admin at least
[6/3/13 5:18:21 AM] ricardosub: but if the information is correct, doing the n extra queries would be faster than loading everything within the offset query
[6/3/13 5:18:48 AM] Hannah Wolfe: Admin performance is not as important as frontend performance - and the themes would expect to get content_html at least
[6/3/13 5:19:38 AM] Hannah Wolfe: although don't get me wrong, a well-performing admin is certainly prefered over one that runs badly
[6/3/13 5:20:56 AM] ricardosub: another options is WHERE date>:lastdate instead of offsets
[6/3/13 5:21:03 AM] ricardosub: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor - What Not To Do section
[6/3/13 5:21:22 AM] ricardosub: Do not try to implement a scrolling window using LIMIT and OFFSET
[6/3/13 5:22:12 AM] ricardosub: that sounds like the best thing to do
[6/3/13 5:22:56 AM] Hannah Wolfe: very interesting
[6/3/13 5:24:14 AM] ricardosub: but you have to keep state in the app... so if you jump directly to a page it needs to first do an OFFSET query selecting only the dates, and then proceed to fetch the posts
[6/3/13 5:36:57 AM] Hannah Wolfe: Get max(date) from blah limit blah should be a quick query I hope

@ErisDS
Copy link
Member

ErisDS commented Jun 3, 2013

I have added extra timeout to the test which should be removed once this work is completed.

@JohnONolan
Copy link
Member

Can we standardise code based "todo's" ? Want it to be easy to bulk search a codebase and find all of them, so they're being tracked.

My preference is:
// TODO: [comment]

@tgriesser
Copy link
Contributor

Would a solution here be to do a (pseudocode):

SELECT "id" from "tableName" WHERE -conditionals- ORDER BY -column- 

and then implement pagination based on the ids:

SELECT * FROM "tableName" WHERE -conditionals- AND "id" > x ORDER BY -column- LIMIT N

If so, we could then probably cache the result of the "id" select statement somewhere and then invalidate when modifications are made to that table.

re: @JohnONolan - that looks good for todo's syntax.

@ErisDS
Copy link
Member

ErisDS commented Jun 3, 2013

We can't rely on id, but we can rely on publish date - which is the order posts should pretty much always be in.

@tgriesser
Copy link
Contributor

I think we should be able to rely on id here, the publish_date would just be the value for the ORDER BY clause.

@ricardobeat
Copy link
Contributor Author

@tgriesser hannah has a point, since we are considering using UUIDs they wouldn't be sortable, unless we use a non-standard generator for that purpose: http://50.112.33.48:3000/posts/7AQtgRtqJPCeq42b6.

There are also many situations where ids will be out-of-order: programmed publish dates, imported posts, updated posts.

@tgriesser
Copy link
Contributor

Oh right, wasn't really thinking when I wrote that...

@tgriesser
Copy link
Contributor

Or maybe what I was really thinking of here is that you could get/cache the ID's as the result of a particular query and then do a WHERE "id" IN ('ids') and specify the subset of IDs you're looking to retrieve results for.

@ErisDS
Copy link
Member

ErisDS commented Jun 4, 2013

So we can either use published date (which doesn't exist yet btw), or we can use some sort of query cache to remember ids for queries.

What are the pros and cons of each? To me, published date seems like a relatively light-weight solution - but very specific, meanwhile caching seems like a more generic solution that we might be able to apply to all queries to keep things running fast.

I am wondering if we could / should do both. We are likely to want to do sqlite specific query optimisations, so implementing these so that they are only for sqlite in future when we have options is another problem to consider, but not one that needs solving now.

I think at this stage we should build out more of the core functionality, getting started on more complex query building, and then when we have a clearer idea of what this looks like and which bits are slow we could add a cache layer/and or optimisations.

@tgriesser
Copy link
Contributor

The problem with just using published date, is that there's no context of a "page" of posts. For example there's no way to immediately browse to page "4" of your posts, because without knowing the total number of posts (filtered / ordered by conditions) and their dates there's no way to tell what date you would use to define the beginning of page "4".

Like your suggestion I'd say let's build out more core functionality, but let's also not try to use the published date as a the pagination, as that's probably an overly simple solution for what we need. We're also not likely to run into this as an issue in the very immediate future (other than the tests), until we start to have a lot of posts.

I've been planning on building some pagination support into a widget for Bookshelf anyway, so maybe I could account for the sqlite case in that and that we could sort of drop that in place when it's done.

@ErisDS
Copy link
Member

ErisDS commented Jun 4, 2013

I understand that replacing offset with date requires state, or an initial query to figure out which date to start at for a particular page. This sort of specialisation / optimisation is troublesome, but I imagine that we will want to do quite a lot of database specific query munging, especially if we want to support different databases.

Therefore, having one widget/helper to do pagination isn't really solving the underlying issue of needing to support advanced, specialised behaviour. At the same time, it is an enormous problem and certainly one to keep rattling around at the back of our brains whilst developing features.

One short term micro-optimisation, would be to only fetch content or content_html depending on whether we were in the admin or frontend. Not sure how that might work though.

@ErisDS
Copy link
Member

ErisDS commented Jul 9, 2013

Not sure if this has happened for anyone else, but for me the pagination tests are now pretty much always taking 4-5 seconds. Granted that test requires several pagination queries to run, but the fact that it is slowing down over time is quite worrying.

I'm assuming changes to SQLite are at fault, and not changes in our codebase... but it would be good to get some sort of profiling going so we can be sure we're not making things worse.

It would be good to start looking into how we could address this issue, either through bookshelf or changes to Ghost.

@ErisDS
Copy link
Member

ErisDS commented Sep 1, 2013

@tgriesser any further thoughts on building a pagination module into bookshelf / knex?

Otherwise, @ricardobeat is this something you might like to take a stab at?

@ErisDS
Copy link
Member

ErisDS commented Sep 8, 2013

Punted to 0.4

morficus pushed a commit to morficus/Ghost that referenced this issue Sep 4, 2014
morficus pushed a commit to morficus/Ghost that referenced this issue Sep 4, 2014
* master:
  Adding proper copyright info for Ghost Foundation
  Amending pagination test to have a longer timeout until TryGhost#110 is done
  server half of TryGhost#83, posts are draft by default, browse shows published by default
  Adding proper copyright info for Ghost Foundation
@ErisDS
Copy link
Member

ErisDS commented Nov 3, 2014

I think this is going to be solved by upgrading to knex 0.7 although there are some considerations

@ErisDS ErisDS modified the milestones: Next Backlog, Future Backlog Nov 3, 2014
@jaswilli
Copy link
Contributor

I'm going to close this as we're now using knex 0.7 and at this point I'm not sure what specifically this issue is tracking.

daniellockyer pushed a commit that referenced this issue Oct 5, 2022
There is a bug in these version where Portal does not load correctly after a successful
Stripe Checkout. We are reverting until we can determine the cause of the bug and fix it

* Revert "v0.11.1"

This reverts commit 828c4d5.

* Revert "Fixed incorrect link/path handling"

This reverts commit 9d853be.

* Revert "v0.11.0"

This reverts commit 16c2224.

* Revert "Updated portal direct checkout links to use path"

This reverts commit d26fad1.

* Revert "Added direct links for monthly/yearly checkout"

This reverts commit bbea4f7.

* Revert "Added user select style to plan container"

This reverts commit a665ca5.

* Revert "Refined copy"

This reverts commit 32d4949.

* Revert "Account home page refinements"

This reverts commit 6587eca.

* Revert "Notification refinements"

This reverts commit 23c75e3.

* Revert "Added name to welcome notification"

This reverts commit 7220049.

* Revert "Refined notification"

This reverts commit d1c0915.
daniellockyer pushed a commit that referenced this issue Oct 5, 2022
This reverts commit 741bfb6.

The revert was done to resolve an urgent bug fix which was breaking Portal script in case of any notification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug [triage] something behaving unexpectedly help wanted [triage] Ideal issues for contributors to help with
Projects
None yet
Development

No branches or pull requests

5 participants