-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compactor: File name is too long error #844
Comments
The newest versions have a newer |
Hi We had a similar issue reported for 0.6.0 - #1491 looking for more data from the author on this, especially what object storage is used. It looks super suspicious and actually might mean our object storage client we used (e.g |
We use S3 object storage (sorry, I thought I'd put that in). Looking at the generated hostname, it seems as if the pathname contains the same timestamp repeated over and over again. I am in principle happy to bump a subset of compactors to v0.7.0 and see if it goes away. With the last error message, the constructed pathname is 4106 characters long (if we account for the ":" at the end), thus exceeding FILENAME_MAX by 10 bytes:
I'm happy to start digging through the code and see where that is being constructed. |
In terms of configuration:
I see that we should probably configure our repository to start mirroring thanosio/thanos in addition to / instead of improbable/thanos. |
If you want, I can try to generate a PR with some extra debugging of the directory name in various stages through the code path? |
I've now managed to catch this happening with some extra debugging enabled. The interesting debug logs is from two places in objstore.DownloadDir, one is right after the entry to the function (denoted
It looks as if (somehow) under (some) conditions, the recursive call to bkt.Iter basically ends up fecthing the same thing over and over. Having looked at the S3 storage, I can see that (in s3browser), there's no infinite nesting. |
Looking even deeper, it looks as if minio, in some (all?) circumstances returns the "current directory", and a small patch that explictly logs this edge case and skips to the next item in the list does trigger the logging, and avoids the infinite recursion. A cleaner PR to follow, once we have had it running for a while. |
Raised PR for fix: #1544 |
#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> * Add full-stop Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
#1544 merged 🎉 so the next release of Thanos should have this fixed! |
🎉 closing this as fixed. Thanks for your contribution, @jimbobby5 ! |
… directory (thanos-io#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> * Add full-stop Signed-off-by: Jamie Poole <jimbobby5@yahoo.com>
… directory (thanos-io#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> * Add full-stop Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
* Some updates to compact docs Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * some formatting Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Update docs/components/compact.md accept PR suggestions Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add metalmatze to list of maintainers (#1547) Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * resolve comments Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * resolve last comment Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * receive: Add liveness and readiness probe (#1537) * Add prober to receive Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entries Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update README Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * downsample: Add liveness and readiness probe (#1540) * Add readiness and liveness probes for downsampler Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entry Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Set ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Document the dnssrvnoa option (#1551) Signed-off-by: Antonio Santos <antonio@santosvelasco.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * feat store: added readiness and livenes prober (#1460) Signed-off-by: Martin Chodur <m.chodur@seznam.cz> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add Hotstar to adopters. (#1553) It's the largest streaming service in India that does cricket and GoT for India. They have insane scale and are using Thanos to scale their Prometheus. Spoke to them offline about adding the logo and will get a signoff here too. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Fix hotstar logo in the adoptor's list (#1558) Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552) Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Compactor: Fix for #844 - Ignore object if it is the current directory (#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> * Add full-stop Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Adding doc explaining the importance of groups for compactor (#1555) Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add blank line for list (#1566) The format of these files is wrong in the web. Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Refactor compactor constants, fix bucket column (#1561) * compact: unify different time constants Use downsample.* constants where possible. Move the downsampling time ranges into constants and use them as well. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * bucket: refactor column calculation into compact Fix the column's name and name it UNTIL-DOWN because that is what it actually shows - time until the next downsampling. Move out the calculation into a separate function into the compact package. Ideally we could use the retention policies in this calculation as well but the `bucket` subcommand knows nothing about them :-( Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * compact: fix issues with naming Reorder the constants and fix mistakes. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * remove duplicate Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me>
#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> * Add full-stop Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
* Some updates to compact docs Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * some formatting Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Update docs/components/compact.md accept PR suggestions Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add metalmatze to list of maintainers (#1547) Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * resolve comments Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * resolve last comment Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * receive: Add liveness and readiness probe (#1537) * Add prober to receive Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entries Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update README Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * downsample: Add liveness and readiness probe (#1540) * Add readiness and liveness probes for downsampler Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add changelog entry Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove default Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Set ready Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Clean CHANGELOG Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Document the dnssrvnoa option (#1551) Signed-off-by: Antonio Santos <antonio@santosvelasco.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * feat store: added readiness and livenes prober (#1460) Signed-off-by: Martin Chodur <m.chodur@seznam.cz> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add Hotstar to adopters. (#1553) It's the largest streaming service in India that does cricket and GoT for India. They have insane scale and are using Thanos to scale their Prometheus. Spoke to them offline about adding the logo and will get a signoff here too. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Fix hotstar logo in the adoptor's list (#1558) Signed-off-by: Karthik Vijayaraju <karthik@hotstar.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552) Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Compactor: Fix for #844 - Ignore object if it is the current directory (#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> * Add full-stop Signed-off-by: Jamie Poole <jimbobby5@yahoo.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Adding doc explaining the importance of groups for compactor (#1555) Signed-off-by: Leo Meira Vital <leo.vital@nubank.com.br> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Add blank line for list (#1566) The format of these files is wrong in the web. Signed-off-by: dongwenjuan <dong.wenjuan@zte.com.cn> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * Refactor compactor constants, fix bucket column (#1561) * compact: unify different time constants Use downsample.* constants where possible. Move the downsampling time ranges into constants and use them as well. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * bucket: refactor column calculation into compact Fix the column's name and name it UNTIL-DOWN because that is what it actually shows - time until the next downsampling. Move out the calculation into a separate function into the compact package. Ideally we could use the retention policies in this calculation as well but the `bucket` subcommand knows nothing about them :-( Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * compact: fix issues with naming Reorder the constants and fix mistakes. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> * remove duplicate Signed-off-by: Ivan Kiselev <kiselev_ivan@pm.me> Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Thanos, Prometheus and Golang version used
-> prometheus instances run in kubernetes with thanos 0.3 side car and image tags used 2.7.1.
-> % thanos --version
type: S3
thanos, version 0.3.0 (branch: HEAD, revision: 837e967)
build user: root@986454de7a63
build date: 20190208-15:24:36
go version: go1.11.5
What happened
Running compact command fails against a known bucket containing our metrics data. The data itself is currently not huge, may be around 3GB so far. Thanos sidecar keep pushing the data in to our local s3 compatible storage and we were hoping to keep this data at lesser granularity for several years. Here is the command I used:
What you expected to happen
I was hoping the compact command will process the data :)
How to reproduce it (as minimally and precisely as possible):
Our data is collected by prometheus from a bunch of netdata agents but with limited metrics we have identified. This gets scraped every 15s at the moment and all targets are CentOS 6/7. I feel like that may not be as relevant, as this looks like something else while creating a directory locally.
Full logs to relevant components
level=info ts=2019-02-14T19:39:03.453025Z caller=factory.go:39 msg="loading bucket configuration"
level=info ts=2019-02-14T19:39:03.453635Z caller=compact.go:196 msg="retention policy of 5 min aggregated samples is enabled" duration=4320h0m0s
level=info ts=2019-02-14T19:39:03.453659Z caller=compact.go:199 msg="retention policy of 1 hour aggregated samples is enabled" duration=24000h0m0s
level=info ts=2019-02-14T19:39:03.453759Z caller=compact.go:281 msg="starting compact node"
level=info ts=2019-02-14T19:39:03.453862Z caller=compact.go:821 msg="start sync of metas"
level=info ts=2019-02-14T19:39:03.453839Z caller=main.go:308 msg="Listening for metrics" address=0.0.0.0:10902
level=info ts=2019-02-14T19:39:03.658623Z caller=compact.go:827 msg="start of GC"
level=error ts=2019-02-14T19:39:03.808235Z caller=compact.go:265 msg="retriable error" err="compaction failed: compaction: download block 01D3MZ4NWJ8Q5YGSCB5F3SCDFT: create dir: mkdir /tmp/compact/0@{prometheus="kube-monitoring/operator-prometheus-lsf",prometheus_replica="prometheus-operator-prometheus-lsf-0",region="sc",role="lsf-metrics"}/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT/01D3MZ4NWJ8Q5YGSCB5F3SCDFT: file name too long"
^Clevel=info ts=2019-02-14T19:39:08.503792Z caller=main.go:192 msg="caught signal. Exiting." signal=interrupt
level=info ts=2019-02-14T19:39:08.503963Z caller=main.go:184 msg=exiting
Anything else we need to know
Nope and thanks for looking in to this, I hope I didn't miss anything else to help you understand this issue.
The text was updated successfully, but these errors were encountered: