-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dont forward samples beyond a certain age #3049
Conversation
86d86b2
to
8be7dec
Compare
Would it make this to be a forwarding rule property, rather than a global setting? |
I'm not sure how this would make sense. I think the main reason why we want this is to prevent that a failed forwarding destination can completely block the write path, we prevent this by stopping to try to send samples to the forwarding destination once they reach a certain age, but as long as there are any samples which we do attempt to forward the write path would remain blocked. So I can't see a use case for the ability to have different age cutoffs on a per forwarding rule basis, or do you see one? |
We're adding the ability to drop them because we know that upstream would drop them anyway, and I thought that would be defined per-metric, but apparently it's not-so-configurable as I thought, so maybe it doesn't make sense. Sorry for the noise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I have some questions and suggestions, please take a look.
} | ||
|
||
// filterSamplesBefore filters a given slice of samples to only contain samples that have timestamps newer or equal to | ||
// the given timestamp. It relies on the samples being sorted by timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It relies on the samples being sorted by timestamp.
Why can we rely on that? Push requests may not have samples sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we added OOO support the ingester required the samples within one series to be sorted as well afaik, otherwise we would have dropped some of them, this follows the same requirement.
However, you're right that now it is possible that we'd forward samples which are older than the cutoff at which we're supposed to start dropping them if they aren't sorted. I'd prefer to not have to sort the samples, so how about we reverse this logic and scan the slice of samples from the beginning to the end and as soon as we see one which is too old we drop the rest? That way if a slice of samples isn't sorted then we'd drop too much as opposed to too little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this flips the logic as suggested and also adds a unit test for it: 7e9915c2b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow the logic for reversed iteration, but let's keep it as-is. No need to complicate this code more for would have been rejected as out-of-order samples before.
|
||
samplesFiltered := filterSamplesBefore(samplesUnfiltered, dontForwardBeforeTs) | ||
if len(samplesFiltered) < len(samplesUnfiltered) { | ||
err = errSamplesTooOld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to returning error, we should also include such samples in cortex_discarded_samples_total
metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think updating cortex_discarded_samples_total
would make sense if we'd not ingest them when ingest
is true
, but we do. Maybe there should be a new metric to count the samples that were not forwarded due to the age?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add new reason for discarded samples that were not ingested. Let's do that in next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2448,6 +2448,10 @@ The `limits` block configures default and per-tenant limits imposed by component | |||
# rules. | |||
[forwarding_endpoint: <string> | default = ""] | |||
|
|||
# If set, forwarding drops samples that are older than this duration. If unset | |||
# or 0, no samples get dropped. | |||
[forwarding_drop_before: <int> | default = ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it display <duration>
? I tried it with this patch, and it worked, although I'm not sure which of the cases really helped:
diff --git a/tools/doc-generator/parse/parser.go b/tools/doc-generator/parse/parser.go
index 293de6790..bff59d0a0 100644
--- a/tools/doc-generator/parse/parser.go
+++ b/tools/doc-generator/parse/parser.go
@@ -342,6 +342,8 @@ func getFieldCustomType(t reflect.Type) (string, bool) {
return "url", true
case reflect.TypeOf(time.Duration(0)).String():
return "duration", true
+ case reflect.TypeOf(model.Duration(0)).String():
+ return "duration", true
case reflect.TypeOf(flagext.StringSliceCSV{}).String():
return "string", true
case reflect.TypeOf(flagext.CIDRSliceCSV{}).String():
@@ -422,6 +424,8 @@ func getCustomFieldType(t reflect.Type) (string, bool) {
return "url", true
case reflect.TypeOf(time.Duration(0)).String():
return "duration", true
+ case reflect.TypeOf(model.Duration(0)).String():
+ return "duration", true
case reflect.TypeOf(flagext.StringSliceCSV{}).String():
return "string", true
case reflect.TypeOf(flagext.CIDRSliceCSV{}).String():
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks, will apply that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It leads to panic when generating config descriptor json file:
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x101150a40]
goroutine 1 [running]:
github.com/grafana/mimir/pkg/mimirtool/config.(*InspectedEntry).decodeValue(0x140010fb600, {0x101870760, 0x140010fcb40})
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:284 +0x120
github.com/grafana/mimir/pkg/mimirtool/config.parseDefaultValue(0x140010fb600, {0x0, 0x0})
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:539 +0x108
github.com/grafana/mimir/pkg/mimirtool/config.convertEntryToEntry(0x14000ee36b0)
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:525 +0x1f8
github.com/grafana/mimir/pkg/mimirtool/config.convertEntriesToEntries(...)
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:497
github.com/grafana/mimir/pkg/mimirtool/config.convertEntryToEntry(0x14000eba9a0)
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:517 +0x50c
github.com/grafana/mimir/pkg/mimirtool/config.convertEntriesToEntries(...)
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:497
github.com/grafana/mimir/pkg/mimirtool/config.convertBlockToEntry(...)
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:547
github.com/grafana/mimir/pkg/mimirtool/config.InspectConfigWithFlags({0x101792aa0?, 0x14000d20000?}, 0x1018713e0?)
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:491 +0xe0
github.com/grafana/mimir/pkg/mimirtool/config.InspectConfig({0x10186d280?, 0x14000d20000})
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:478 +0x5c
github.com/grafana/mimir/pkg/mimirtool/config.Describe({0x10186d280?, 0x14000d20000?})
/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:468 +0x24
main.main()
/Users/peter/Grafana/mimir/tools/config-inspector/main.go:14 +0x34
exit status 2
Let's tackle that separately.
force-pushing to rebase |
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
c1c9af4
to
6b4a0be
Compare
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, thank you!
What this PR does
It adds age filtering to the Distributor's forwarding functionality. The age filter can be configured so that samples which are older than a defined duration (for example
10min
) do not get forwarded.