Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database Indexes to speed up the requests #510

Closed
1 task done
enaut opened this issue May 30, 2019 · 41 comments · Fixed by #637
Closed
1 task done

Database Indexes to speed up the requests #510

enaut opened this issue May 30, 2019 · 41 comments · Fixed by #637

Comments

@enaut
Copy link

enaut commented May 30, 2019

Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)

Is your feature request related to a problem? Please describe.
The Problem is: When loading the main Nextcloud news page it takes about 15+ sekonds to be displayed. Enabling long query log on the server I found that each pageload triggers several 5+ queries on the database. By adding some indexes I was able to cut those in halve or on some even cut them from 5sec to 0,002sec.

It might be slow for me because I like to archive and have 88720 news items stored.

Describe the solution you'd like
It would help a lot if the indexes would be added when installing/updating the app.

I'm not good with php and neither mysql which is why I let the possible indexes generate by some webpage.

I found that:

ALTER TABLE `oc_news_items` ADD INDEX `oc_news_items_idx_starred_id` (`starred`,`feed_id`);

makes the query for counting the starred items a 0,002sec instead of 5sec one.

All the indexes I added are:

ALTER TABLE `oc_news_feeds` ADD INDEX `oc_news_feeds_idx_at_id` (`deleted_at`,`user_id`);
ALTER TABLE `oc_news_items` ADD INDEX `oc_news_items_idx_unread_id_id` (`unread`,`feed_id`,`id`);
ALTER TABLE `oc_news_feeds` ADD INDEX `oc_news_feeds_idx_id_at_id` (`folder_id`,`deleted_at`,`user_id`);
ALTER TABLE `oc_news_folders` ADD INDEX `oc_news_folders_idx_at` (`deleted_at`);
ALTER TABLE `oc_news_items` ADD INDEX `oc_news_items_idx_starred_id` (`starred`,`feed_id`);
ALTER TABLE `oc_news_items` ADD INDEX `oc_news_items_idx_unread_id` (`unread`,`feed_id`);
ALTER TABLE `oc_news_feeds` ADD INDEX `oc_news_feeds_idx_user_id_at_id` (`user_id`,`deleted_at`,`id`);

As said I'm not good with any of it which is why it is probably good if someone else actually does the job of optimizing but I was able to cut the page load speeds to half in web nextcloud news. And even (a lot) less in the nextcloud news app (before 30 sec, now 1sec).

@enaut
Copy link
Author

enaut commented Jun 3, 2019

Mhm I don't get why but the queries were getting slower and slower with the indexes. So maybe those are not so good - I don't know the first results were really good though. Do I have to update the indexes "manually"?

@marco44
Copy link
Contributor

marco44 commented Feb 20, 2020

Hi,

Is this still an issue ? As a full-time DBA, I could give a hand.

@Grotax
Copy link
Member

Grotax commented Feb 20, 2020

Yea nothing has changed regarding this.
Would be cool if you could help.

@enaut
Copy link
Author

enaut commented Feb 20, 2020

@marco44 yes the issue is still there clicking on the news app takes about 20 seconds to load (no super computer ;) but I think this could be way faster.

@marco44
Copy link
Contributor

marco44 commented Feb 21, 2020

Ok
Thing is I don't have any performance problem here, so we'll need to find out what goes wrong with your database. We need to find a way to identify what queries take time in your environment. There's much better tooling with PostgreSQL for this, but we'll do with what we have… so you need to activate slow queries logging on your system, see https://blog.toadworld.com/2017/08/09/logging-and-analyzing-slow-queries-in-mysql for instance.

Best thing may be to put long_query_time to 0, do a run, then turn this off (you don't want gigabytes of log), then find out what took time in the log. Then we'll know what we need to optimize (mysqldumpslow may help here).

@marco44
Copy link
Contributor

marco44 commented Feb 21, 2020

By the way, first query on my site is this:

nextcloud=# explain analyze SELECT MAX("items"."id") AS "max_id" FROM "oc_news_items" "items" JOIN "oc_news_feeds" "feeds" ON "feeds"."id" = "items"."feed_id" AND "feeds"."user_id" = 'marc';
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4177.16..4177.17 rows=1 width=8) (actual time=28.121..28.121 rows=1 loops=1)
   ->  Hash Join  (cost=9.17..4109.79 rows=26948 width=8) (actual time=0.133..25.846 rows=26383 loops=1)
         Hash Cond: (items.feed_id = feeds.id)
         ->  Seq Scan on oc_news_items items  (cost=0.00..4024.75 rows=27575 width=16) (actual time=0.010..5.035 rows=27575 loops=1)
         ->  Hash  (cost=8.10..8.10 rows=86 width=8) (actual time=0.096..0.096 rows=86 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 12kB
               ->  Seq Scan on oc_news_feeds feeds  (cost=0.00..8.10 rows=86 width=8) (actual time=0.031..0.073 rows=86 loops=1)
                     Filter: ((user_id)::text = 'marc'::text)
                     Rows Removed by Filter: 2
 Planning Time: 0.337 ms
 Execution Time: 28.214 ms
(11 rows)

That's a query that many SQL engines won't be able to rewrite smartly. This way of writing it is much more efficient:

nextcloud=# explain analyze select  items.id  FROM "oc_news_items" "items" JOIN "oc_news_feeds" "feeds" ON "feeds"."id" = "items"."feed_id" AND "feeds"."user_id" = 'marc' order by id desc limit 1;
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.92 rows=1 width=8) (actual time=0.061..0.061 rows=1 loops=1)
   ->  Nested Loop  (cost=0.43..13193.78 rows=26948 width=8) (actual time=0.058..0.058 rows=1 loops=1)
         ->  Index Scan Backward using oc_news_items_pkey on oc_news_items items  (cost=0.29..8683.28 rows=27575 width=16) (actual time=0.034..0.034 rows=1 loops=1)
         ->  Index Scan using oc_news_feeds_pkey on oc_news_feeds feeds  (cost=0.14..0.16 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=1)
               Index Cond: (id = items.feed_id)
               Filter: ((user_id)::text = 'marc'::text)
 Planning Time: 0.980 ms
 Execution Time: 0.141 ms
(8 rows)

This comes from the fact that "max" is an aggregate function like any other (count, avg, etc…), but there are some special cases in SQL optimizers for trivial cases (where there is no join for instance). With the order by + limit, we are explicit about what we want. I don't know if MySQL can benefit from it too, but in my case, it's a very big gain.

@marco44
Copy link
Contributor

marco44 commented Feb 21, 2020

Other query that could be optimized (the 2nd in my environment):

SELECT "items".* FROM "oc_news_items" "items" JOIN "oc_news_feeds" "feeds" ON "feeds"."id" = "items"."feed_id" AND "feeds"."deleted_at" = 0 AND "feeds"."user_id" = 'marc' AND "items"."unread" = '1' LEFT OUTER JOIN "oc_news_folders" "folders" ON "folders"."id" = "feeds"."folder_id" WHERE "feeds"."folder_id" = 0 OR "folders"."deleted_at" = 0 ORDER BY "items"."id" DESC LIMIT 40;

Two things can improve it without too much change:

nextcloud=# explain analyze SELECT "items".* FROM "oc_news_items" "items" JOIN "oc_news_feeds" "feeds" ON "feeds"."id" = "items"."feed_id" AND "feeds"."deleted_at" = 0 AND "feeds"."user_id" = 'marc' AND "items"."unread" = '1' LEFT OUTER JOIN "oc_news_folders" "folders" ON "folders"."id" = "feeds"."folder_id" WHERE "feeds"."folder_id" = 0 OR "folders"."deleted_at" = 0 ORDER BY "items"."id" DESC LIMIT 40;
                                                                                  QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..381.57 rows=40 width=1188) (actual time=0.047..32.401 rows=2 loops=1)
   ->  Nested Loop Left Join  (cost=0.43..9004.89 rows=945 width=1188) (actual time=0.046..32.399 rows=2 loops=1)
         Join Filter: (folders.id = feeds.folder_id)
         Rows Removed by Join Filter: 7
         Filter: ((feeds.folder_id = 0) OR (folders.deleted_at = 0))
         ->  Nested Loop  (cost=0.43..8885.68 rows=945 width=1196) (actual time=0.030..32.379 rows=2 loops=1)
               ->  Index Scan Backward using oc_news_items_pkey on oc_news_items items  (cost=0.29..8690.76 rows=967 width=1188) (actual time=0.021..29.950 rows=967 loops=1)
                     Filter: unread
                     Rows Removed by Filter: 26616
               ->  Index Scan using oc_news_feeds_pkey on oc_news_feeds feeds  (cost=0.14..0.20 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=967)
                     Index Cond: (id = items.feed_id)
                     Filter: ((deleted_at = 0) AND ((user_id)::text = 'marc'::text))
                     Rows Removed by Filter: 1
         ->  Materialize  (cost=0.00..1.10 rows=7 width=16) (actual time=0.006..0.008 rows=4 loops=2)
               ->  Seq Scan on oc_news_folders folders  (cost=0.00..1.07 rows=7 width=16) (actual time=0.009..0.011 rows=6 loops=1)
 Planning Time: 0.612 ms
 Execution Time: 32.466 ms
(17 rows)
create index test_marc on oc_news_items (unread,id);
CREATE INDEX
# explain analyze SELECT "items".* FROM "oc_news_items" "items" JOIN "oc_news_feeds" "feeds" ON "feeds"."id" = "items"."feed_id" AND "feeds"."deleted_at" = 0 AND "feeds"."user_id" = 'marc' AND "items"."unread" = '1' LEFT OUTER JOIN "oc_news_folders" "folders" ON "folders"."id" = "feeds"."folder_id" WHERE "feeds"."folder_id" = 0 OR "folders"."deleted_at" = 0 ORDER BY "items"."id" DESC LIMIT 40;
                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..100.07 rows=40 width=1188) (actual time=0.154..9.642 rows=2 loops=1)
   ->  Nested Loop Left Join  (cost=0.43..2354.40 rows=945 width=1188) (actual time=0.151..9.638 rows=2 loops=1)
         Join Filter: (folders.id = feeds.folder_id)
         Rows Removed by Join Filter: 7
         Filter: ((feeds.folder_id = 0) OR (folders.deleted_at = 0))
         ->  Nested Loop  (cost=0.43..2235.19 rows=945 width=1196) (actual time=0.112..9.588 rows=2 loops=1)
               ->  Index Scan Backward using test_marc on oc_news_items items  (cost=0.29..2040.28 rows=967 width=1188) (actual time=0.087..3.220 rows=967 loops=1)
                     Index Cond: (unread = true)
                     Filter: unread
               ->  Index Scan using oc_news_feeds_pkey on oc_news_feeds feeds  (cost=0.14..0.20 rows=1 width=16) (actual time=0.005..0.005 rows=0 loops=967)
                     Index Cond: (id = items.feed_id)
                     Filter: ((deleted_at = 0) AND ((user_id)::text = 'marc'::text))
                     Rows Removed by Filter: 1
         ->  Materialize  (cost=0.00..1.10 rows=7 width=16) (actual time=0.013..0.018 rows=4 loops=2)
               ->  Seq Scan on oc_news_folders folders  (cost=0.00..1.07 rows=7 width=16) (actual time=0.019..0.024 rows=6 loops=1)
 Planning Time: 2.200 ms
 Execution Time: 9.782 ms
(17 rows)

Then the query can be a bit more optimized: If you look at the plan, you see that "Filter: ((feeds.folder_id = 0) OR (folders.deleted_at = 0))" is done as a last step
That's because the query optimizer didn't convert "feeds"."folder_id" = 0 OR "folders"."deleted_at" = 0 To "folders"."id" = 0 OR "folders"."deleted_at" = 0, which is logically equivalent.

explain analyze SELECT "items".* FROM "oc_news_items" "items" JOIN "oc_news_feeds" "feeds" ON "feeds"."id" = "items"."feed_id" AND "feeds"."deleted_at" = 0 AND "feeds"."user_id" = 'marc' AND "items"."unread" = '1' LEFT OUTER JOIN "oc_news_folders" "folders" ON "folders"."id" = "feeds"."folder_id" WHERE "folders"."id" = 0 OR "folders"."deleted_at" = 0 ORDER BY "items"."id" DESC LIMIT 40;
                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..98.82 rows=40 width=1188) (actual time=0.086..6.726 rows=2 loops=1)
   ->  Nested Loop  (cost=0.43..2324.91 rows=945 width=1188) (actual time=0.084..6.722 rows=2 loops=1)
         Join Filter: (feeds.folder_id = folders.id)
         Rows Removed by Join Filter: 7
         ->  Nested Loop  (cost=0.43..2235.19 rows=945 width=1196) (actual time=0.053..6.684 rows=2 loops=1)
               ->  Index Scan Backward using test_marc on oc_news_items items  (cost=0.29..2040.28 rows=967 width=1188) (actual time=0.036..2.319 rows=967 loops=1)
                     Index Cond: (unread = true)
                     Filter: unread
               ->  Index Scan using oc_news_feeds_pkey on oc_news_feeds feeds  (cost=0.14..0.20 rows=1 width=16) (actual time=0.004..0.004 rows=0 loops=967)
                     Index Cond: (id = items.feed_id)
                     Filter: ((deleted_at = 0) AND ((user_id)::text = 'marc'::text))
                     Rows Removed by Filter: 1
         ->  Materialize  (cost=0.00..1.14 rows=7 width=8) (actual time=0.012..0.015 rows=4 loops=2)
               ->  Seq Scan on oc_news_folders folders  (cost=0.00..1.10 rows=7 width=8) (actual time=0.017..0.022 rows=6 loops=1)
                     Filter: ((id = 0) OR (deleted_at = 0))
 Planning Time: 1.255 ms
 Execution Time: 6.827 ms
(17 rows)

The filter has moved in the scan of the folders table. Which means checking on 6 records instead of 1000.

Going faster with this would need to not use "technical" keys in tables anymore, but natural keys… (having the user in all 3 tables, etc…). That's a problem of normalizing the database, and would require much more work (removing the id columns and using natural keys). Or we could duplicate this data in all tables, which would make maintaining integrity harder. But having a way to tell which records belong to a user in oc_news_items would make finding them much faster (like 1 or 2 orders of magnitude).

BTW is giving you query optimizations like this helping ?

@SMillerDev
Copy link
Contributor

I'd much rather have the optimizations of pull requests. I doubt putting them in the issue like this is going to create any meaningful change.

@marco44
Copy link
Contributor

marco44 commented Feb 21, 2020

I presume you mean the optimizations IN pull requests ?

@SMillerDev
Copy link
Contributor

Yeah, suggest the changes as a pull request instead of/in addition to the comments here. Otherwise we'd need to wait for someone else to copy these over to a PR and implement them.

@marco44
Copy link
Contributor

marco44 commented Feb 21, 2020

Ok. We'll see about implementation when we're sure the rewrites and indexes solve anything (I'm much more proficient with PostgreSQL, so some rewrites may not be efficient with MySQL). I'll need @enaut 's feedback (I don't have a nexcloud mysql database with lots of data in in to test). Or do we have test dumps with lots of data in them somewhere ?

@enaut
Copy link
Author

enaut commented Feb 21, 2020

Hey thanks for helping! There is the slowquery (with time 0):
https://gist.github.com/enaut/dbd59c050710d81da45bf30dffacbda9

@marco44
Copy link
Contributor

marco44 commented Feb 21, 2020

Ok, so your queries are more or less all of the same family (all are similar to the second I posted). Can you post a plan of (what this query returns):

EXPLAIN ANALYZE SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' AND `items`.`unread` = '1' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;

And how many records are there in oc_news_items ? (select count(*) from oc_news_items)

@enaut
Copy link
Author

enaut commented Feb 21, 2020

I had to omit the analyze for this to work...

MariaDB [clouddb]> select count(*) from oc_news_items
    -> ;
+----------+
| count(*) |
+----------+
|   125610 |
+----------+
1 row in set (0.062 sec)

MariaDB [clouddb]> EXPLAIN ANALYZE SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' AND `items`.`unread` = '1' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'ANALYZE SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feed' at line 1
MariaDB [clouddb]> EXPLAIN SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' AND `items`.`unread` = '1' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
| id   | select_type | table   | type   | possible_keys                                               | key                      | key_len | ref                     | rows | Extra                                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
|    1 | SIMPLE      | feeds   | ALL    | PRIMARY,news_feeds_user_id_index,news_feeds_folder_id_index | NULL                     | NULL    | NULL                    |   36 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | folders | eq_ref | PRIMARY                                                     | PRIMARY                  | 8       | clouddb.feeds.folder_id |    1 | Using where                                  |
|    1 | SIMPLE      | items   | ref    | news_items_feed_id_index                                    | news_items_feed_id_index | 8       | clouddb.feeds.id        | 1329 | Using index condition; Using where           |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
3 rows in set (0.057 sec)

@marco44
Copy link
Contributor

marco44 commented Feb 21, 2020

Yeah, analyze is with mysql. mariadb doesn't even have this.
Ok, so it seems it's about the same problem as with PostgreSQL, but much worse. And your database is not much bigger than mine, so the runtime is just crazy. Can you try the two things I wrote about before ?

First run the query as is (without explain) It should take 7s.

Then replace WHERE feeds.folder_id = 0 with WHERE folder.id = 0, and re-run the query. I hope it may be faster, but as I told before, I'm much better with PostgreSQL (and it's optimizer is also much better).

And then create the index (create index test_marc on oc_news_items (unread,id) or something like that), and try once more. Post the results. And if it's still slow, post an explain of the last try.

@enaut
Copy link
Author

enaut commented Feb 21, 2020

There might be a difference in data since my news entries are all from one user?

Ok now so that I do not mix things up...
Changing feeds.folder_id = 0 did not help times where still around 8 sec.
After creating an index: create index test_marc on oc_news_items (unread,id);

  • SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' AND `items`.`unread` = '1' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
    time: 0.02s
  • SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' AND `items`.`unread` = '1' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `folders`.`id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
    time: 7.8s (still bad)

So the original version is way faster with the index.

Oh and the news app load is not really faster (even with index).

@marco44
Copy link
Contributor

marco44 commented Feb 22, 2020

I've built a dummy database of about the same size as yours, and I'm a bit lost, I don't really have an explanation for what you see. The rewrite doesn't do any difference here with mysql (but it does with postgreql as it can take advantage of it), but the index helps a lot for this query with and without the rewrite. Without the index, I'm at 0.5s (but on a very fast machine). What kind of machine are we dealing with so I have a rough idea of what is the normal ratio between your tests and mine ?

Anyway if the index has sped up this query, great, let's move on to the next, please re-do a slow-query (erase the previous file, so we start from scratch)

@marco44
Copy link
Contributor

marco44 commented Feb 22, 2020

Looking at the queries, the starred items count will be slow too.
Digging a bit more into this made me realize that maybe my first index wasn't what would be best for mariadb. So can you do this ?

drop index test_marc on oc_news_items;
create index test_marc on oc_news_items (unread,feed_id);
create index test_marc2 on oc_news_items (starred,feed_id);

This should make finding unread and starred items much faster. If it's still not good enough, please do the slow-query routine again.

@enaut
Copy link
Author

enaut commented Feb 22, 2020

This makes the overview on the left load as fast as expected.
However the messages themselves are still "slow":

# mysqldumpslow 

Reading mysql slow query log from /var/log/mariadb/slow-queries.log
Count: 2  Time=3.68s (7s)  Lock=0.00s (0s)  Rows_sent=0.0 (0), Rows_examined=142010.0 (284020), Rows_affected=0.0 (0), clouddbuser[clouddbuser]@localhost
  SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = N AND `feeds`.`user_id` = 'S' AND `items`.`unread` = 'S' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = N OR `folders`.`deleted_at` = N ORDER BY `items`.`id` DESC LIMIT N

@marco44
Copy link
Contributor

marco44 commented Feb 22, 2020

Ok, I think that for this one, the first index I proposed is better suited. I tried to get rid of an index, but we may need it still. Please add:

create index test_marc3 on oc_news_items (unread,id);

and tell me...

@enaut
Copy link
Author

enaut commented Feb 22, 2020

There is still one taking it's time...

Reading mysql slow query log from /var/log/mariadb/slow-queries.log
Count: 1  Time=14.16s (14s)  Lock=0.00s (0s)  Rows_sent=40.0 (40), Rows_examined=218003.0 (218003), Rows_affected=0.0 (0), clouddbuser[clouddbuser]@localhost
  SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = N AND `feeds`.`user_id` = 'S' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = N OR `folders`.`deleted_at` = N ORDER BY `items`.`id` DESC LIMIT N

more details:

SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;

[...]
40 rows in set (16.310 sec)
MariaDB [clouddb]> explain SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
| id   | select_type | table   | type   | possible_keys                                               | key                      | key_len | ref                     | rows | Extra                                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
|    1 | SIMPLE      | feeds   | ALL    | PRIMARY,news_feeds_user_id_index,news_feeds_folder_id_index | NULL                     | NULL    | NULL                    |   36 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | folders | eq_ref | PRIMARY                                                     | PRIMARY                  | 8       | clouddb.feeds.folder_id |    1 | Using where                                  |
|    1 | SIMPLE      | items   | ref    | news_items_feed_id_index                                    | news_items_feed_id_index | 8       | clouddb.feeds.id        | 1334 | Using index condition                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
3 rows in set (0.001 sec)
MariaDB [clouddb]> analyze SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+---------+----------+------------+----------------------------------------------+
| id   | select_type | table   | type   | possible_keys                                               | key                      | key_len | ref                     | rows | r_rows  | filtered | r_filtered | Extra                                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+---------+----------+------------+----------------------------------------------+
|    1 | SIMPLE      | feeds   | ALL    | PRIMARY,news_feeds_user_id_index,news_feeds_folder_id_index | NULL                     | NULL    | NULL                    |   36 |   36.00 |    94.44 |      94.44 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | folders | eq_ref | PRIMARY                                                     | PRIMARY                  | 8       | clouddb.feeds.folder_id |    1 |    0.97 |   100.00 |     100.00 | Using where                                  |
|    1 | SIMPLE      | items   | ref    | news_items_feed_id_index                                    | news_items_feed_id_index | 8       | clouddb.feeds.id        | 1334 | 3204.32 |   100.00 |     100.00 | Using index condition                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+---------+----------+------------+----------------------------------------------+
3 rows in set (12.787 sec)

The machine this runs on is a VirtualServer with 4 cores and 14 GB of Ram. Due to the virtualization some things not in cache take a while... but the database is usually fast enough.

@enaut
Copy link
Author

enaut commented Feb 22, 2020

Wait... it is fast now... but when checking the "show read items" button it is slow again.

@marco44
Copy link
Contributor

marco44 commented Feb 22, 2020

I'm starting to wonder if you wouldn't have caching effects. Maybe you should reserve some memory for the DB cache, instead of relying on the operating system... I think you should raise your innodb_buffer_pool_size (but I have no real experience on this with MariaDB), to make sure your block cache don't get pushed out by IO on the machine. Maybe also double check you're not overcommitting memory, databases really hate when they think data is in memory, but it's been swapped by the hypervisor.

@enaut
Copy link
Author

enaut commented Feb 22, 2020

I incresed the innodb_buffer_pool_size from 128MB to 1GB. It did not change much 1GB is the recommended size for my setup according to: https://www.taste-of-it.de/mysql-mariadb-feintuning-von-innodb_buffer_pool_size/
If I understand it correctly it is using 600MB.

The following query still takes 7.4 seconds:

SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;

8GB of RAM are free - so plenty...
I have no access to the hypervisor so I cannot say if it is swapped out. But theoretically it is labled guaranteed RAM so I think it shouldn't be swapped. Also the second and third time I issue the query it still takes that amount of time so I think it should be no swapping issue.

@enaut
Copy link
Author

enaut commented Feb 22, 2020

But all in all with those indexes it is already much faster! Loading the news panel now only takes one second where it was ~20sec before. Only if activating the "show read articles" it takes a little longer.

But I think the above query can become a subsecond one too!

@enaut
Copy link
Author

enaut commented Feb 22, 2020

The upgrade of the innodb_buffer_pool_size did boost performance in general - everywhere a little so thanks for the hint.

@marco44
Copy link
Contributor

marco44 commented Feb 23, 2020

I don't get why this last query takes so long. It's the most trivial of them all. There's no filtering at all, it's just "give me the 40 items with the largest id". The primary key from items should be enough (and it is on my mariadb test db). What's the query plan ?

@enaut
Copy link
Author

enaut commented Feb 23, 2020

I don't know what a query plan is.
one of these? #510 (comment)

when removing the ORDER BY `items`.`id` DESC the query is instantanious. Does that help?

@marco44
Copy link
Contributor

marco44 commented Feb 23, 2020

Oh. Yes, query plans are what explain display, sorry :)

If you remove the order by, the query only has to take whatever records it wants, so yes, it would be faster. But That's not what we want :)

So, yes, please give me the plan

@enaut
Copy link
Author

enaut commented Feb 24, 2020

I think its the "same ones" as above:

MariaDB [clouddb]> explain SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
| id   | select_type | table   | type   | possible_keys                                               | key                      | key_len | ref                     | rows | Extra                                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
|    1 | SIMPLE      | feeds   | ALL    | PRIMARY,news_feeds_user_id_index,news_feeds_folder_id_index | NULL                     | NULL    | NULL                    |   36 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | folders | eq_ref | PRIMARY                                                     | PRIMARY                  | 8       | clouddb.feeds.folder_id |    1 | Using where                                  |
|    1 | SIMPLE      | items   | ref    | news_items_feed_id_index                                    | news_items_feed_id_index | 8       | clouddb.feeds.id        | 1335 | Using index condition                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+----------------------------------------------+
3 rows in set (0.003 sec)

MariaDB [clouddb]> analyze SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `feeds`.`folder_id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+---------+----------+------------+----------------------------------------------+
| id   | select_type | table   | type   | possible_keys                                               | key                      | key_len | ref                     | rows | r_rows  | filtered | r_filtered | Extra                                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+---------+----------+------------+----------------------------------------------+
|    1 | SIMPLE      | feeds   | ALL    | PRIMARY,news_feeds_user_id_index,news_feeds_folder_id_index | NULL                     | NULL    | NULL                    |   36 |   36.00 |    94.44 |      94.44 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | folders | eq_ref | PRIMARY                                                     | PRIMARY                  | 8       | clouddb.feeds.folder_id |    1 |    0.97 |   100.00 |     100.00 | Using where                                  |
|    1 | SIMPLE      | items   | ref    | news_items_feed_id_index                                    | news_items_feed_id_index | 8       | clouddb.feeds.id        | 1335 | 3206.09 |   100.00 |     100.00 | Using index condition                        |
+------+-------------+---------+--------+-------------------------------------------------------------+--------------------------+---------+-------------------------+------+---------+----------+------------+----------------------------------------------+
3 rows in set (6.748 sec)

@marco44
Copy link
Contributor

marco44 commented Feb 24, 2020

Ok, so that's absurd… The correct way of doing this is to use the items primary key backwards, get records and see if they match in feeds and folders (which most will). Here's the plan on my environment with something similar…

MariaDB [marc]> analyze SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'marc' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `folders`.`id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
+------+-------------+---------+--------+-------------------------------------------------------------+---------+---------+----------------------+--------+--------+----------+------------+-------------+
| id   | select_type | table   | type   | possible_keys                                               | key     | key_len | ref                  | rows   | r_rows | filtered | r_filtered | Extra       |
+------+-------------+---------+--------+-------------------------------------------------------------+---------+---------+----------------------+--------+--------+----------+------------+-------------+
|    1 | SIMPLE      | items   | index  | NULL                                                        | PRIMARY | 8       | NULL                 | 206491 | 40.00  |   100.00 |     100.00 |             |
|    1 | SIMPLE      | feeds   | eq_ref | PRIMARY,news_feeds_folder_id_index,news_feeds_user_id_index | PRIMARY | 8       | marc.items.feed_id   | 1      | 1.00   |    97.73 |     100.00 | Using where |
|    1 | SIMPLE      | folders | eq_ref | PRIMARY                                                     | PRIMARY | 8       | marc.feeds.folder_id | 1      | 1.00   |   100.00 |     100.00 | Using where |
+------+-------------+---------+--------+-------------------------------------------------------------+---------+---------+----------------------+--------+--------+----------+------------+-------------+
3 rows in set (0.001 sec)

It answers in like 5ms.

In your case, it does the whole join of all records in the 3 tables, then sorts the result by descending id and gives the 40 top. Which seems plain wrong.

So maybe the problem is that it thinks the table is too fragmented or it has bad statistics…
Please try these steps (and redo the analyze inbetween to see what solves the issue if something solves it):
OPTIMIZE oc_news_items; # this will defragment the table
ANALYZE oc_news_items; # Collect some stats

@enaut
Copy link
Author

enaut commented Feb 24, 2020

So I did the two commands (not sure about the second one because I expected something more verbose):

MariaDB [clouddb]> OPTIMIZE Table oc_news_items;
+-----------------------+----------+----------+-------------------------------------------------------------------+
| Table                 | Op       | Msg_type | Msg_text                                                          |
+-----------------------+----------+----------+-------------------------------------------------------------------+
| clouddb.oc_news_items | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| clouddb.oc_news_items | optimize | status   | OK                                                                |
+-----------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (2 min 14.812 sec)

MariaDB [clouddb]> ANALYZE table oc_news_items;     
+-----------------------+---------+----------+----------+
| Table                 | Op      | Msg_type | Msg_text |
+-----------------------+---------+----------+----------+
| clouddb.oc_news_items | analyze | status   | OK       |
+-----------------------+---------+----------+----------+
1 row in set (0.031 sec)

Afterwards the Query is still at 6 sec:

MariaDB [clouddb]> analyze SELECT `items`.* FROM `oc_news_items` `items` JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'enaut' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `folders`.`id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
+------+-------------+---------+------+-------------------------------------------------------------+----------------------------+---------+--------------------+------+---------+----------+------------+----------------------------------------------+
| id   | select_type | table   | type | possible_keys                                               | key                        | key_len | ref                | rows | r_rows  | filtered | r_filtered | Extra                                        |
+------+-------------+---------+------+-------------------------------------------------------------+----------------------------+---------+--------------------+------+---------+----------+------------+----------------------------------------------+
|    1 | SIMPLE      | folders | ALL  | PRIMARY                                                     | NULL                       | NULL    | NULL               |    7 |    7.00 |   100.00 |     100.00 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | feeds   | ref  | PRIMARY,news_feeds_user_id_index,news_feeds_folder_id_index | news_feeds_folder_id_index | 8       | clouddb.folders.id |    2 |    4.71 |   100.00 |     100.00 | Using index condition; Using where           |
|    1 | SIMPLE      | items   | ref  | news_items_feed_id_index                                    | news_items_feed_id_index   | 8       | clouddb.feeds.id   | 1186 | 3272.18 |   100.00 |     100.00 | Using index condition                        |
+------+-------------+---------+------+-------------------------------------------------------------+----------------------------+---------+--------------------+------+---------+----------+------------+----------------------------------------------+
3 rows in set (6.510 sec)

@marco44
Copy link
Contributor

marco44 commented Feb 24, 2020

Well then I don't know what to say. MariaDB's optimizer is dumb in your case, indexes are correct. And I'm not good enough with it to know what to do if it takes the wrong index.

But the cause here is the news_items_feed_id_index index. I didn't have it, though it's in the configuration file (maybe I did a mistake when copying from my postgresql database, as it's present there). Without the index, the query is fast, with it I get your plan...

Maybe try to drop the index to see how it goes...
drop index news_items_feed_id_index on oc_news_items;
to create it back:
create index news_items_feed_id_index on oc_news_items(feed_id);

@enaut
Copy link
Author

enaut commented Feb 24, 2020

So, now if I change that query:

MariaDB [clouddb]> SELECT `items`.* FROM `oc_news_items` `items` IGNORE INDEX (news_items_feed_id_index) JOIN `oc_news_feeds` `feeds` ON `feeds`.`id` = `items`.`feed_id` AND `feeds`.`deleted_at` = 0 AND `feeds`.`user_id` = 'friedi' LEFT OUTER JOIN `oc_news_folders` `folders` ON `folders`.`id` = `feeds`.`folder_id` WHERE `folders`.`id` = 0 OR `folders`.`deleted_at` = 0 ORDER BY `items`.`id` DESC LIMIT 40;
Empty set (0.001 sec)

Everything is good! - could you do a pullrequest for this?

@marco44
Copy link
Contributor

marco44 commented Feb 24, 2020

Can you try to use the app in detail to be sure that the suppression of this index has no bad consequence ?

@enaut
Copy link
Author

enaut commented Feb 24, 2020

should I delete the index? Because else I would have to change php wouldn't I - I'm not really into php...

@marco44
Copy link
Contributor

marco44 commented Feb 24, 2020

What do you mean ? You can keep the index as deleted, it shouldn't be a problem (except for performance, which is what you should test)

@marco44
Copy link
Contributor

marco44 commented Feb 24, 2020

@Grotax : should I just edit the database.xml ? There are 3 indexes to add and 1 to remove. Will this be sufficient ? Is there a way I can test what a migration this would produce ?

@SMillerDev
Copy link
Contributor

Just edit the database.xml. I don't think there's a way to check the migration though.

@enaut
Copy link
Author

enaut commented Feb 24, 2020

I did drop the index now. Together with the new indexes the news app is now a lot faster. Now it has the speed I'd expect! Also the android app benefits from the speedup when synchronizing with the server.
Thank you @marco44 !

marco44 pushed a commit to marco44/news that referenced this issue Feb 24, 2020
Close nextcloud#510

Hopefully this will solve performance with MySQL/MariaDB.
There is no way to prove this though. See nextcloud#510
marco44 pushed a commit to marco44/news that referenced this issue Feb 24, 2020
Close nextcloud#510

Hopefully this will solve performance with MySQL/MariaDB.
There is no way to prove this though. See nextcloud#510

Signed-off-by: Marc Cousin <marc.cousin@people-doc.com>
@marco44
Copy link
Contributor

marco44 commented Feb 24, 2020

Ok, made the PR. When you get the app uprgrade, @enaut , don't forget to drop the indexes. Or rename them to the ones in the PR
news_items_unread_feed_id , news_items_starred_feed_id and news_items_unread_id

Grotax pushed a commit to Talon24/news that referenced this issue Mar 1, 2020
Close nextcloud#510

Hopefully this will solve performance with MySQL/MariaDB.
There is no way to prove this though. See nextcloud#510

Signed-off-by: Marc Cousin <marc.cousin@people-doc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants