Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query on oc_filecache uses wrong index - Cron job runs very long #24401

Open
JohnWalkerx opened this issue Nov 26, 2020 · 89 comments
Open

Query on oc_filecache uses wrong index - Cron job runs very long #24401

JohnWalkerx opened this issue Nov 26, 2020 · 89 comments
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap 25-feedback bug feature: database Database related DB feature: filesystem performance 🚀

Comments

@JohnWalkerx
Copy link

JohnWalkerx commented Nov 26, 2020

➡️ 🚧 Possible workaround (works for me):

  1. Kill all running cron.php jobs. Or just restart your machine.
  2. Disable your cron job during rescan (just to be sure that it doesn't scan too).
  3. Run occ files:scan-app-data on your instance to rescan the appdata folder.

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Steps to reproduce

(0. It began after I run the Preview Generator over 1553047 files (not only pictures). So a lot of previews were generated.)

  1. Run cron.php every 5 or 15 minutes.

Expected behaviour

Each cron job is done before the next one is started. CPU is not very high.

Actual behaviour

After some time there are running multiple cron jobs. CPU is at 100% because of mysqld.
This is because there is a very slow query:

SELECT `path` FROM nextcloud.oc_filecache WHERE `storage` = 1 AND `size` < 0 ORDER BY `fileid` DESC

If I enable to log slow querys I get following output:

# User@Host: ncuser[ncuser] @ localhost []  Id: 21805
# Query_time: 253.154251  Lock_time: 0.000205 Rows_sent: 191412  Rows_examined: 1454739
SET timestamp=1603198875;
SELECT `path` FROM `oc_filecache` WHERE (`storage` = 1) AND (`size` < 0) ORDER BY `fileid` DESC;

If I let explain me this query I get following:

# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'SIMPLE', 'oc_filecache', NULL, 'ref', 'fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size', 'fs_storage_path_hash', '8', 'const', '788173', '33.33', 'Using where; Using filesort'

So it uses the wrong index fs_storage_path_hash.
I would expect that it uses the fs_storage_size index.

If I execute the query and force the index it looks nice:

SELECT `path` FROM nextcloud.oc_filecache FORCE INDEX (fs_storage_size) WHERE `storage` = 1 AND `size` < 0 ORDER BY `fileid` DESC
# 190921 row(s) returned	3.654 sec / 0.335 sec

Output of EXPLAIN:

# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'SIMPLE', 'oc_filecache', NULL, 'range', 'fs_storage_size', 'fs_storage_size', '16', NULL, '358248', '100.00', 'Using index condition; Using filesort'

Why it doesn't use the fs_storage_size index?
On a fresh NC20 test instance it uses this index.

Server configuration detail

Operating system: Linux 5.4.0-54-generic #60-Ubuntu SMP Fri Nov 6 10:37:59 UTC 2020 x86_64 / Ubuntu 20.04.29 LTS

Webserver: Apache (apache2handler)

Database: mysql 8.0.22

PHP version:

7.3.24-3+ubuntu20.04.1+deb.sury.org+1
Modules loaded: Core, date, libxml, openssl, pcre, zlib, filter, hash, Reflection, SPL, session, sodium, standard, apache2handler, mysqlnd, PDO, xml, apcu, bcmath, calendar, ctype, curl, dom, mbstring, fileinfo, ftp, gd, gettext, gmp, iconv, igbinary, imagick, intl, json, exif, mysqli, pdo_mysql, apc, posix, readline, redis, shmop, SimpleXML, sockets, sysvmsg, sysvsem, sysvshm, tokenizer, wddx, xmlreader, xmlwriter, xsl, zip, Phar, Zend OPcache

Nextcloud version: 19.0.5 - 19.0.5.2

Updated from an older Nextcloud/ownCloud or fresh install: Updated from 19.0.4 but problem exists there also.

Where did you install Nextcloud from: unknown

Signing status

Array
(
)

List of activated apps
Enabled:
 - accessibility: 1.5.0
 - activity: 2.12.1
 - announcementcenter: 3.8.1
 - apporder: 0.11.0
 - audioplayer: 2.13.1
 - bbb: 1.1.2
 - bookmarks: 3.4.9
 - bruteforcesettings: 2.0.1
 - calendar: 2.0.4
 - cloud_federation_api: 1.2.0
 - comments: 1.9.0
 - contacts: 3.4.2
 - contactsinteraction: 1.0.0
 - cospend: 1.0.5
 - dav: 1.15.0
 - deck: 1.2.2
 - federatedfilesharing: 1.9.0
 - federation: 1.9.0
 - files: 1.14.0
 - files_markdown: 2.3.1
 - files_pdfviewer: 1.8.0
 - files_rightclick: 0.16.0
 - files_sharing: 1.11.0
 - files_trashbin: 1.9.0
 - files_versions: 1.12.0
 - files_videoplayer: 1.8.0
 - firstrunwizard: 2.8.0
 - forms: 2.0.4
 - gpxpod: 4.2.2
 - groupfolders: 7.1.1
 - impersonate: 1.6.1
 - issuetemplate: 0.7.0
 - logreader: 2.4.0
 - lookup_server_connector: 1.7.0
 - mail: 1.4.1
 - maps: 0.1.6
 - nextcloud_announcements: 1.8.0
 - notes: 3.6.4
 - notifications: 2.7.0
 - oauth2: 1.7.0
 - password_policy: 1.9.1
 - photos: 1.1.0
 - polls: 1.6.2
 - previewgenerator: 2.3.0
 - privacy: 1.3.0
 - provisioning_api: 1.9.0
 - ransomware_protection: 1.7.0
 - richdocuments: 3.7.10
 - serverinfo: 1.9.0
 - settings: 1.1.0
 - sharebymail: 1.9.0
 - spreed: 9.0.6
 - support: 1.2.1
 - survey_client: 1.7.0
 - systemtags: 1.9.0
 - tasks: 0.13.6
 - text: 3.0.1
 - theming: 1.10.0
 - twofactor_backupcodes: 1.8.0
 - twofactor_totp: 5.0.0
 - twofactor_u2f: 6.0.0
 - updatenotification: 1.9.0
 - viewer: 1.3.0
 - workflowengine: 2.1.0
Disabled:
 - admin_audit
 - encryption
 - files_external
 - onlyoffice
 - phonetrack
 - recommendations
 - sharerenamer
 - user_ldap

Configuration (config/config.php)
{
    "instanceid": "***REMOVED SENSITIVE VALUE***",
    "passwordsalt": "***REMOVED SENSITIVE VALUE***",
    "secret": "***REMOVED SENSITIVE VALUE***",
    "trusted_domains": [
        "cloud.example.de"
    ],
    "datadirectory": "***REMOVED SENSITIVE VALUE***",
    "overwrite.cli.url": "https:\/\/cloud.example.de\/",
    "htaccess.RewriteBase": "\/",
    "dbtype": "mysql",
    "version": "19.0.5.2",
    "dbname": "***REMOVED SENSITIVE VALUE***",
    "dbhost": "***REMOVED SENSITIVE VALUE***",
    "dbport": "",
    "dbtableprefix": "oc_",
    "mysql.utf8mb4": true,
    "dbuser": "***REMOVED SENSITIVE VALUE***",
    "dbpassword": "***REMOVED SENSITIVE VALUE***",
    "installed": true,
    "memcache.local": "\\OC\\Memcache\\APCu",
    "memcache.locking": "\\OC\\Memcache\\Redis",
    "filelocking.enabled": "true",
    "redis": {
        "host": "***REMOVED SENSITIVE VALUE***",
        "port": 0,
        "dbindex": 0,
        "timeout": 1.5
    },
    "logfile": "\/var\/log\/nextcloud\/nextcloud.log",
    "logtimezone": "Europe\/Berlin",
    "loglevel": 2,
    "log_rotate_size": "10485760",
    "default_language": "de",
    "default_locale": "de_DE",
    "maintenance": false,
    "mail_smtpmode": "smtp",
    "mail_smtpauthtype": "LOGIN",
    "mail_from_address": "***REMOVED SENSITIVE VALUE***",
    "mail_domain": "***REMOVED SENSITIVE VALUE***",
    "mail_smtpauth": 1,
    "mail_smtpsecure": "ssl",
    "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
    "mail_smtpport": "465",
    "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
    "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
    "trashbin_retention_obligation": "30, 90",
    "updater.release.channel": "stable",
    "app_install_overwrite": [
        "calendar"
    ],
    "preview_max_x": "2048",
    "preview_max_y": "2048",
    "jpeg_quality": "60"
}

Are you using external storage, if yes which one: No

Are you using encryption: No

Are you using an external user-backend, if yes which one: No

Client configuration

Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:82.0) Gecko/20100101 Firefox/82.0

Operating system: -

Logs

There are no relevant logs

@JohnWalkerx JohnWalkerx added 0. Needs triage Pending check for reproducibility or if it fits our roadmap bug labels Nov 26, 2020
@JohnWalkerx JohnWalkerx changed the title Query on oc_filecache uses wrong index - Cron job runs very long - Query on oc_filecache uses wrong index - Cron job runs very long Nov 26, 2020
@kesselb
Copy link
Contributor

kesselb commented Nov 27, 2020

@nickvergessen
Copy link
Member

Maybe the index is missing on your instance?

can you run:

sudo -u www-data php occ db:add-missing-indices

@JohnWalkerx
Copy link
Author

JohnWalkerx commented Nov 27, 2020

I would say the index fs_storage_size is already there. See the possible_keys of the first "explain output" of the query.

grafik
grafik

Output of db:add-missing-indices:

Check indices of the share table.
Check indices of the filecache table.
Check indices of the twofactor_providers table.
Check indices of the login_flow_v2 table.
Check indices of the whats_new table.
Check indices of the cards table.
Check indices of the cards_properties table.
Check indices of the calendarobjects_props table.
Check indices of the schedulingobjects table.
Check indices of the oc_properties table.
Done.

I will observe if anything has changed.

@kesselb
Copy link
Contributor

kesselb commented Nov 27, 2020

image

image

On 19.0.4 (updated from 16.0.0).

@JohnWalkerx
Copy link
Author

JohnWalkerx commented Nov 28, 2020

Problem still exists.
grafik
After the query (e.g. process ID 21787) is done, query time ("Zeit") starts again. It looks like that the query is executed over and over again.

Actual two cron jobs are running. Intervall of calling is 15 minutes.
grafik
In the past after some time there are 3 cron jobs which are executing this query.

CPU graph over the last 24 hours:
grafik
grafik

@nickvergessen
Copy link
Member

On a fresh NC20 test instance it uses this index.

Well this means our database structure and code are correct. Maybe you can run some repair/optimize commands on your database?

@JohnWalkerx
Copy link
Author

JohnWalkerx commented Nov 29, 2020

I tried following today:

  • ANALYZE TABLE nextcloud.oc_filecache
  • OPTIMIZE TABLE nextcloud.oc_filecache. But for engine InnoDB this only works very limited.
  • Recreated the fs_storage_size index.
  • Recreated table by dump and replace it.

Nothing helps. The query optimizer still wants to use the fs_storage_path_hash index.

I'm not the only one who have this issue. There are others on the forum.

Maybe it depends on the size of the table. The table of my clean NC20 instance is very small.

Here the info of my indexes:
grafik

I also find a site in the MySQL Docs: Optimizer-Related Issues

Is it possible that we force the index in the query from nextcloud?

@nickvergessen
Copy link
Member

Is it possible that we force the index in the query from nextcloud?

Not really as this is not supported by all databases Nextcloud supports.

Also just for the record I get different results depending on the storage. E.g. for us storage with ID 1 does not exist. Result:

MariaDB [oc]> EXPLAIN SELECT `path` FROM oc_filecache WHERE `storage` = 1 AND `size` < 0 ORDER BY `fileid` DESC;
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+------+-----------------------------+
| id   | select_type | table        | type | possible_keys                                                                | key                  | key_len | ref   | rows | Extra                       |
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+------+-----------------------------+
|    1 | SIMPLE      | oc_filecache | ref  | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size | fs_storage_path_hash | 8       | const |    1 | Using where; Using filesort |
+------+-------------+--------------+------+------------------------------------------------------------------------------+----------------------+---------+-------+------+-----------------------------+
1 row in set (0.001 sec)

If I change the ID to 3 it uses the correct index:

MariaDB [oc]> EXPLAIN SELECT `path` FROM oc_filecache WHERE `storage` = 3 AND `size` < 0 ORDER BY `fileid` DESC;
+------+-------------+--------------+-------+------------------------------------------------------------------------------+-----------------+---------+------+------+---------------------------------------+
| id   | select_type | table        | type  | possible_keys                                                                | key             | key_len | ref  | rows | Extra                                 |
+------+-------------+--------------+-------+------------------------------------------------------------------------------+-----------------+---------+------+------+---------------------------------------+
|    1 | SIMPLE      | oc_filecache | range | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size | fs_storage_size | 16      | NULL |   52 | Using index condition; Using filesort |
+------+-------------+--------------+-------+------------------------------------------------------------------------------+-----------------+---------+------+------+---------------------------------------+
1 row in set (0.001 sec)

@JohnWalkerx
Copy link
Author

JohnWalkerx commented Dec 9, 2020

On my instance storage with ID 1 is the normal data directory.

I looked a little bit at the processes/queries which mysqld is running.
And I'm wondering that the same process is running the same query again and again.
Why does Nextcloud do this query multiple times?

While I'm typing this comment, two cron jobs are running and each job queried it at least 3 times.
EDIT: After 10 hours still the same two cron jobs processes. A third cron job also joined with the same query.

@feutl
Copy link

feutl commented Dec 10, 2020

I want to jump onto this bandwagon ;) Even though I am not sure if it is the same problem, but at least for people searching, they should get linked through the different articles and sites. Therefor linking at least the Nextcloud Forum post I started and used to link some additional threads and sites as reference
https://help.nextcloud.com/t/background-job-scanfiles-triggered-via-cron-php-breaks-my-system/95971

I fixed my problem (for now) by cleaning up my photo previews on disk and in the database.

Besides this, why "scan files" is done twice a day is still unclear, I thought this would not be necessary if everything is pushed via the standard API / Interfaces.

@JohnWalkerx
Copy link
Author

Does anyone know what is done with this query?

I searched a little bit deeper and had a look at the output of this query.
And I get 179465 rows with this kind of data (sorted by column path):

'appdata_myinstanceid'
'appdata_myinstanceid/preview'
'appdata_myinstanceid/preview/0'
'appdata_myinstanceid/preview/0/0'
'appdata_myinstanceid/preview/0/0/0/0/a'
'appdata_myinstanceid/preview/0/0/0/0/a/0'
'appdata_myinstanceid/preview/0/0/0/0/a/0/f'
'appdata_myinstanceid/preview/0/0/0/2'
'appdata_myinstanceid/preview/0/0/0/2/7'
'appdata_myinstanceid/preview/0/0/0/2/7/4'
'appdata_myinstanceid/preview/0/0/0/2/a'
'appdata_myinstanceid/preview/0/0/0/2/a/c'
'appdata_myinstanceid/preview/0/0/0/2/a/c/0'
'appdata_myinstanceid/preview/0/0/0/3/8'
'appdata_myinstanceid/preview/0/0/0/3/8/f'
'appdata_myinstanceid/preview/0/0/0/3/8/f/e'
'appdata_myinstanceid/preview/0/0/0/3/f'
'appdata_myinstanceid/preview/0/0/0/3/f/a'
'appdata_myinstanceid/preview/0/0/0/3/f/a/c'
...

And the output are just directories in the preview folder (excepted the first one).
For example in the end of the "folder path"appdata_myinstanceid/preview/0/0/0/2/a/c/0there is the directory 36492 containing some previews.
I checked some directories and in most of them it looks like this.

There are also a lot of directories in the preview directory with a number containing some previews.

Maybe this information can point someone in the right direction.

@robertugo2
Copy link

I am running mariadb database and I am affected by the same issue. I used preview generator as well. I am using docker instance for nextcloud version 20.0.3.

@nickvergessen
Copy link
Member

appdata should not really be scanned by this. So I guess this is where the problem comes from.

@feutl
Copy link

feutl commented Dec 15, 2020

@JohnWalkerx @robertugo2
This is what I was using too, the preview generator, but after deleting all the preview folders / files from the file system and the database and getting rid of the preview-generator app, the folders appear again. The are needed for the previews generated while browsing pictures with nextcloud.

So the preview folders / files itself are fine, but somehow the sheer amount of the folder or something else breaks the system.

@nickvergessen
This is a question I try to get answered for month now, no one in the forum had time to give me an answer why an OC ScanFiles is even needed in the first place ?!

@kesselb
Copy link
Contributor

kesselb commented Dec 15, 2020

'appdata_myinstanceid'
'appdata_myinstanceid/preview'
'appdata_myinstanceid/preview/0'
'appdata_myinstanceid/preview/0/0'
'appdata_myinstanceid/preview/0/0/0/0/a'
'appdata_myinstanceid/preview/0/0/0/0/a/0'
'appdata_myinstanceid/preview/0/0/0/0/a/0/f'
'appdata_myinstanceid/preview/0/0/0/2'
'appdata_myinstanceid/preview/0/0/0/2/7'
'appdata_myinstanceid/preview/0/0/0/2/7/4'
'appdata_myinstanceid/preview/0/0/0/2/a'
'appdata_myinstanceid/preview/0/0/0/2/a/c'
'appdata_myinstanceid/preview/0/0/0/2/a/c/0'
'appdata_myinstanceid/preview/0/0/0/3/8'
'appdata_myinstanceid/preview/0/0/0/3/8/f'
'appdata_myinstanceid/preview/0/0/0/3/8/f/e'
'appdata_myinstanceid/preview/0/0/0/3/f'
'appdata_myinstanceid/preview/0/0/0/3/f/a'
'appdata_myinstanceid/preview/0/0/0/3/f/a/c'

They have the same storage id as your files? On my dev instance the previews have a different storage id then my files.

@feutl
Copy link

feutl commented Dec 15, 2020

@kesselb
What do you mean with storage id ?
My previews are stored in the same "appdata" folder than everything else like appstore and dashboard and so on

@kesselb
Copy link
Contributor

kesselb commented Dec 15, 2020

@feutl not a question for you. If you are interested read the issue from the beginning. Then you may know why I was asking another person for the storage id. Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.

@nickvergessen
Copy link
Member

This is a question I try to get answered for month now, no one in the forum had time to give me an answer why an OC ScanFiles is even needed in the first place ?!

Yes it's needed, but it shouldn't run through appdata_*
The thing is, it seems that:

$query->select($query->func()->max('user_id'))
->from('filecache', 'f')
->innerJoin('f', 'mounts', 'm', $query->expr()->eq('storage_id', 'storage'))
->where($query->expr()->lt('size', $query->createNamedParameter(0, IQueryBuilder::PARAM_INT)))
->groupBy('storage_id')
->setMaxResults(self::USERS_PER_SESSION);

Returns the appdata_* thing for you too, instead of only real users.

Can you after:

$users = $this->getUsersToScan();

add:

\OC::$server->getLogger()->error(json_encode($users), ['app' => 'server-24401']);

and then check your log file for server-24401 regularly and report what it logged?

@feutl
Copy link

feutl commented Dec 15, 2020

@kesselb
sorry, but if you do not refer to a post or a person, it is really hard to know who this belongs too. And just to notice, I read the whole thread, still did not know how your comment, after my comment, is referenced or who it is referenced to - except the timing which made me thinking... anyhow, lets try to keep the attitude up and stay polite.

@nickvergessen
What if I tell you I do miss the function private function getUsersToScan(): array at all in my ScanFiles.php.
I am running NC 20.0.3, upgrade from I think NC13 or NC14 over the time they got released.

@kesselb
Copy link
Contributor

kesselb commented Dec 15, 2020

sorry, but if you do not refer to a post or a person, it is really hard to know who this belongs too.

Screenshot_2020-12-15 Query on oc_filecache uses wrong index - Cron job runs very long · Issue #24401 · nextcloud server

@feutl of course my comment was totally out of nowhere unrelated to anything that has been written before.

What if I tell you I do miss the function private function getUsersToScan(): array

$users = $this->userManager->search('', self::USERS_PER_SESSION, $offset);
if (!count($users)) {
// No users found, reset offset and retry
$offset = 0;
$users = $this->userManager->search('', self::USERS_PER_SESSION);
}

@feutl add the code to line 106.

@JohnWalkerx
Copy link
Author

They have the same storage id as your files?

@kesselb
Storage ID 1 is local::/path/to/data/directory/
Storage ID 2 and above are home::Username1 or share::foohex
So the filecache of storage ID 1 contains the appdata, updater and usernames directories.
For the other storage IDs there are also entries in the filecache.

@JohnWalkerx JohnWalkerx reopened this Dec 15, 2020
@kesselb
Copy link
Contributor

kesselb commented Dec 15, 2020

@JohnWalkerx mind to check the oc_mounts table if there is a user with storage_id = 1? Nevermind ;)

public function getIncomplete() {
$query = $this->getQueryBuilder();
$query->select('path')
->from('filecache')
->whereStorageId()
->andWhere($query->expr()->lt('size', $query->createNamedParameter(0, IQueryBuilder::PARAM_INT)))
->orderBy('fileid', 'DESC');
$result = $query->execute();
$path = $result->fetchColumn();
$result->closeCursor();
return $path;
}
is the code responsible for the query.

occ files:scan --all --unscanned will trigger a background scan (a very similar code path to the automated background scan)

@JohnWalkerx
Copy link
Author

add:

\OC::$server->getLogger()->error(json_encode($users), ['app' => 'server-24401']);

I added this line. I also don't have the function getUsersToScan(). So I added it like @kesselb descibed.
Lets see what will happen.

occ files:scan --all --unscanned will trigger a background scan (a very similar code path to the automated background scan)

@kesselb
I run this command to simulate it. But it doesn't trigger the long running query.
This is the output I get:

sudo -u www-data php ./occ files:scan --all --unscanned
Starting scan for user 1 out of 13 (User1)
Starting scan for user 2 out of 13 (User2)
Starting scan for user 3 out of 13 (User3)
Starting scan for user 4 out of 13 (User4)
Starting scan for user 5 out of 13 (User5)
Starting scan for user 6 out of 13 (User6)
Starting scan for user 7 out of 13 (User7)
Starting scan for user 8 out of 13 (User8)
Starting scan for user 9 out of 13 (User9)
Starting scan for user 10 out of 13 (User10)
Starting scan for user 11 out of 13 (User11)
Starting scan for user 12 out of 13 (User12)
Starting scan for user 13 out of 13 (User13)
+---------+-------+--------------+
| Folders | Files | Elapsed time |
+---------+-------+--------------+
| 0       | 0     | 00:00:01     |
+---------+-------+--------------+

So we'll have to wait until the background jobs runs the scan.

@JohnWalkerx
Copy link
Author

I got following debug line output:

{
    "reqId": "ks9Kp2ovRkkr6dfJ8G2W",
    "level": 3,
    "time": "2020-12-16T01:15:02+01:00",
    "remoteAddr": "",
    "user": "--",
    "app": "server-24401",
    "method": "",
    "url": "--",
    "message": {
        "User1": {},
        "User2": {},
        "User3": {},
        "User4": {},
        "User5": {},
        "User6": {},
        "User7": {},
        "User8": {},
        "User9": {},
        "User10": {},
        "User11": {},
        "User12": {},
        "User13": {}
    },
    "userAgent": "--",
    "version": "19.0.6.2"
}

Currently one cron job is running and is executing the query of getIncomplete() again and again.

Just for the record: In the oc_mounts the storage ID 1 is used by two users for some sepecial folders.

@OswaldKolle1977
Copy link

@Cebrain I solved it yesterday!
For me, it was the web-proxy-server. I replaced the "manual" apache2 proxy withe the nginx docker version - now all the cron-jobs finish in less than a minute!
(and additionally nextcloud office is working)
Hope that helps someone?!

@Cebrain
Copy link

Cebrain commented Nov 13, 2022

@Cebrain I solved it yesterday! For me, it was the web-proxy-server. I replaced the "manual" apache2 proxy withe the nginx docker version - now all the cron-jobs finish in less than a minute! (and additionally nextcloud office is working) Hope that helps someone?!

Great to hear!
Thanks for this information, at the moment i use the official docker, will check how much work it would be to go with nginx

Best workaround for me at the moment ist to limit the memory: /usr/bin/php8.1 -d memory_limit=512M -f /var/www/nextcloud_hgv/cron.php

With this, the jobs still run long (15 minutes) but arent killing my server

@OswaldKolle1977
Copy link

OswaldKolle1977 commented Nov 13, 2022

@Cebrain I solved it yesterday! For me, it was the web-proxy-server. I replaced the "manual" apache2 proxy withe the nginx docker version - now all the cron-jobs finish in less than a minute! (and additionally nextcloud office is working) Hope that helps someone?!

Great to hear! Thanks for this information, at the moment i use the official docker, will check how much work it would be to go with nginx

This is super easy - I took this video https://youtu.be/JgrPcQIQuF8

@szaimen
Copy link
Contributor

szaimen commented Jan 23, 2023

Hi, please update to 24.0.9 or better 25.0.3 and report back if it fixes the issue. Thank you!

My goal is to add a label like e.g. 25-feedback to this ticket of an up-to-date major Nextcloud version where the bug could be reproduced. However this is not going to work without your help. So thanks for all your effort!

If you don't manage to reproduce the issue in time and the issue gets closed but you can reproduce the issue afterwards, feel free to create a new bug report with up-to-date information by following this link: https://github.com/nextcloud/server/issues/new?assignees=&labels=bug%2C0.+Needs+triage&template=BUG_REPORT.yml&title=%5BBug%5D%3A+

@szaimen szaimen added needs info 0. Needs triage Pending check for reproducibility or if it fits our roadmap and removed 1. to develop Accepted and waiting to be taken care of labels Jan 23, 2023
@tacruc
Copy link
Contributor

tacruc commented Feb 26, 2023

@szaimen I think I can reproduce this on NC 25.0.4
I have this Mysql process running for at least 10 min multiple times:

SELECT `user_id` FROM `oc_filecache` `f` INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1

My digging points me at \OCA\Files\BackgroundJob\ScanFiles::getUserToScan

Explain Output:

id select_type table type possible_keys key key_len ref rows Extra  
1 SIMPLE m index mounts_storage_index,mount_user_storage mount_user_storage 266 NULL 468 Using index
1 SIMPLE f ref fs_storage_path_hash,fs_parent_name_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,search_photos_test,fs_size,fs_storage_path_prefix,memories_parent_mimetype search_photos_test 8 owncloud.m.storage_id 1038 Using where

@tacruc
Copy link
Contributor

tacruc commented Feb 27, 2023

I just profiled a similar command don't know if it is helpfull:
SELECT DISTINCT * FROM oc_filecache AS f INNER JOIN oc_mounts AS m ON m.storage_id = f.storage WHERE size < 0 AND parent > 1;

Order State Time
1 Starting 13 µs
2 Waiting For Query Cache Lock 3 µs
3 Starting 5 µs
4 Checking Query Cache For Query 37 µs
5 Checking Permissions 4 µs
6 Opening Tables 19 µs
7 After Opening Tables 5 µs
8 System Lock 5 µs
9 Table Lock 4 µs
10 Waiting For Query Cache Lock 17 µs
11 Init 814 µs
12 Optimizing 25 µs
13 Statistics 5 ms
14 Preparing 24 µs
15 Creating Tmp Table 38 µs
16 Sorting Result 9 µs
17 Executing 3 µs
18 Sending Data 1000 s
19 Creating Sort Index 32 µs
20 Removing Tmp Table 6 µs
21 Creating Sort Index 8 µs
22 End Of Update Loop 7 µs
23 Query End 2 µs
24 Commit 4 µs
25 Closing Tables 3 µs
26 Unlocking Tables 2 µs
27 Closing Tables 12 µs
28 Starting Cleanup 2 µs
29 Freeing Items 8 µs
30 Updating Status 15 µs
31 Logging Slow Query 76 µs
32 Reset For Next Command 18 µs

@tdalbo92
Copy link

tdalbo92 commented Mar 12, 2023

Since upgrading to 25 I've hit this too, but not just with CRON - all file uploads choke and die on this query. Here's the explain on one:

EXPLAIN SELECT `filecache`.`fileid`, `storage`, `path`, `path_hash`, `filecache`.`parent`, `filecache`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size`, `metadata_etag`, `creation_time`, `upload_time` FROM `oc_filecache` `filecache` LEFT JOIN `oc_filecache_extended` `fe` ON `filecache`.`fileid` = `fe`.`fileid` WHERE (`name` IN ('.noimage', '.nomedia')) AND (`storage` = 2):

*** row 1 ***
          table:  filecache
           type:  ref
  possible_keys:  fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix
            key:  fs_storage_path_hash
        key_len:  8
            ref:  const
           rows:  311619
          Extra:  Using where
*** row 2 ***
          table:  fe
           type:  eq_ref
  possible_keys:  PRIMARY
            key:  PRIMARY
        key_len:  8
            ref:  nextcloud.filecache.fileid
           rows:  1
          Extra:  Using where

This issue is currently show-stopper for me. I've tried disabling redis, migrating out of the snap installation, migrating to postgres (and then back again to mysql) with no luck.

@tdalbo92
Copy link

Update: Ran an EXPLAIN ANALYZE on this query. The fs_storage_path_hash index seems... suboptimal?

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop left join  (cost=37248.21 rows=62324) (actual time=34365.876..34365.876 rows=0 loops=1)
    -> Filter: (filecache.`name` in ('.noimage','.nomedia'))  (cost=15434.88 rows=62324) (actual time=34365.875..34365.875 rows=0 loops=1)
        -> Index lookup on filecache using fs_storage_path_hash (storage=2)  (cost=15434.88 rows=311619) (actual time=0.528..34327.399 rows=635029 loops=1)
    -> Filter: (filecache.fileid = fe.fileid)  (cost=0.25 rows=1) (never executed)
        -> Single-row index lookup on fe using PRIMARY (fileid=filecache.fileid)  (cost=0.25 rows=1) (never executed)
 |```

@tacruc
Copy link
Contributor

tacruc commented Mar 17, 2023

arrow_right construction Possible workaround (works for me):

1. Kill all running cron.php jobs. Or just restart your machine.

2. Disable your cron job during rescan (just to be sure that it doesn't scan too).

3. Run `occ files:scan-app-data` on your instance to rescan the appdata folder.

This Workaround didn't work for me.

@tdalbo92
Copy link

tdalbo92 commented Apr 9, 2023

To update my comment here - adding an index on name fixed all my table lock issues.

@xxmathias
Copy link

arrow_right construction Possible workaround (works for me):

1. Kill all running cron.php jobs. Or just restart your machine.

2. Disable your cron job during rescan (just to be sure that it doesn't scan too).

3. Run `occ files:scan-app-data` on your instance to rescan the appdata folder.

This Workaround didn't work for me.

not working for me neither. i still have this problem.

@Forza-tng
Copy link

Forza-tng commented Sep 25, 2023

I had the same issue with slow query logged and mariadb process constantly using high cpu.

I'm on Nextcloud 27.1.1 with mariadb 10.6.14.

The slow query log and also show full processlist; had the following (the path_hash was different on all queries)

SELECT 
  `filecache`.`fileid`, 
  `storage`, 
  `path`, 
  `path_hash`, 
  `filecache`.`parent`, 
  `filecache`.`name`, 
  `mimetype`, 
  `mimepart`, 
  `size`, 
  `mtime`, 
  `storage_mtime`, 
  `encrypted`, 
  `etag`, 
  `permissions`, 
  `checksum`, 
  `unencrypted_size`, 
  `metadata_etag`, 
  `creation_time`, 
  `upload_time` 
FROM 
  `oc_filecache` `filecache` 
  LEFT JOIN `oc_filecache_extended` `fe` ON `filecache`.`fileid` = `fe`.`fileid` 
WHERE 
  (`storage` = 27) 
  AND (
    `path_hash` = '3d3867fe61097ed2ceed36fd182d435b'
  )

My solution was to create an additional index that occ db:add-missing-indices did not create.

MariaDB [nextcloud]> CREATE INDEX idx_storage_path_hash ON oc_filecache (storage, path_hash);
MariaDB [nextcloud]> show index from oc_filecache;
+--------------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table        | Non_unique | Key_name               | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored |
+--------------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| oc_filecache |          0 | PRIMARY                |            1 | fileid      | A         |     1149608 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | fs_mtime               |            1 | mtime       | A         |     1149608 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | fs_size                |            1 | size        | A         |      574804 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | fs_id_storage_size     |            1 | fileid      | A         |     1149608 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | fs_id_storage_size     |            2 | storage     | A         |     1149608 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | fs_id_storage_size     |            3 | size        | A         |     1149608 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | fs_storage_path_prefix |            1 | storage     | A         |        2600 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | fs_storage_path_prefix |            2 | path        | A         |     1149608 |     64   | NULL   | YES  | BTREE      |         |               | NO      |
| oc_filecache |          1 | fs_parent              |            1 | parent      | A         |     1149608 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | idx_storage_path_hash  |            1 | storage     | A         |        2223 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| oc_filecache |          1 | idx_storage_path_hash  |            2 | path_hash   | A         |     1149608 |     NULL | NULL   |      | BTREE      |         |               | NO      |
+--------------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
11 rows in set (0,003 sec)

EXPLAIN now shows index is being used for this query:

+------+-------------+-----------+--------+----------------------------------------------+-----------------------+---------+----------------------------+------+-----------------------+
| id   | select_type | table     | type   | possible_keys                                | key                   | key_len | ref                        | rows | Extra                 |
+------+-------------+-----------+--------+----------------------------------------------+-----------------------+---------+----------------------------+------+-----------------------+
|    1 | SIMPLE      | filecache | ref    | fs_storage_path_prefix,idx_storage_path_hash | idx_storage_path_hash | 138     | const,const                | 1    | Using index condition |
|    1 | SIMPLE      | fe        | eq_ref | PRIMARY                                      | PRIMARY               | 8       | nextcloud.filecache.fileid | 1    | Using where           |
+------+-------------+-----------+--------+----------------------------------------------+-----------------------+---------+----------------------------+------+-----------------------+
2 rows in set (0,022 sec)

@bugsyb
Copy link

bugsyb commented Oct 27, 2023

My PostgreSQL shows high CPU too and slow queries even with index already in place as per below (didn't have to create it recently - and seems like was created automatically):

nextcloud=# SELECT schemaname, tablename, indexname FROM pg_indexes WHERE tablename = 'oc_filecache' ; 
 schemaname |  tablename   |        indexname         
------------+--------------+--------------------------
 public     | oc_filecache | oc_filecache_pkey
 public     | oc_filecache | fs_id_storage_size
 public     | oc_filecache | fs_mtime
 public     | oc_filecache | fs_parent_name_hash
 public     | oc_filecache | fs_size
 public     | oc_filecache | fs_storage_mimepart
 public     | oc_filecache | fs_storage_mimetype
 public     | oc_filecache | fs_storage_path_hash
 public     | oc_filecache | fs_storage_size
 public     | oc_filecache | memories_parent_mimetype
(10 rows)

@DjSlash
Copy link

DjSlash commented May 16, 2024

I found this bug after seeing this behavior on my instance. After some research I realized that I didn't tune up the mariadb server after moving to a new machine. I followed the recommendations, and now the issue has resolved. So it might be worth to check out your sql server. :)

@bugsyb
Copy link

bugsyb commented May 21, 2024

Would you mind to mention which performance tuning you've in mind? Looking at suggested settings for Mysql and mapping these to Postgres it would just mean decreasing memory available for queries. Happy to hear if there is anything specific.
Observations shows that Postgres, often with default settings is used by enterprise grade solutions (historically it was primary choice and since then Mysql got much better performance, though Postgres is still being used at a lot of place).

Unsure how further/which area should be tuned though :(

@muchachagrande
Copy link

My instance fallen into this problem 5 days ago after updating from 28.0.6 to 29.0.1.
The first thing was that NC did a full scan creating previews of everything. I have a NAS attached with many, many files, so it took around two days and the previews folder occupies many gigs now.
After that full scan, NC started this never ending query on the cron jobs so Postgres is always working and eating the CPU.
My workaround was the referenced [occ files:scan-app-data] but I previously deleted the previews folder. My usage doesn't worth having pregenerated previews of all my files.

@DjSlash
Copy link

DjSlash commented Jun 4, 2024

@bugsyb apologies to answering so late. In my situation I had mariadb running with the default (Debian) settings. Following the documentation was enough to counter the behavior described in this issue. Regarding psql, I wouldn't know, sorry.

(On the machine where I had this Nextcloud instance before, I had put effort in mariadb ages ago, but I forgot about that. Now I just don't have time to dive into it further.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. Needs triage Pending check for reproducibility or if it fits our roadmap 25-feedback bug feature: database Database related DB feature: filesystem performance 🚀
Projects
None yet
Development

No branches or pull requests