Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dont forward samples beyond a certain age #3049

Merged
merged 23 commits into from
Sep 30, 2022
Merged

Conversation

replay
Copy link
Contributor

@replay replay commented Sep 26, 2022

What this PR does

It adds age filtering to the Distributor's forwarding functionality. The age filter can be configured so that samples which are older than a defined duration (for example 10min) do not get forwarded.

@replay replay marked this pull request as ready for review September 26, 2022 21:37
@replay replay requested a review from a team as a code owner September 26, 2022 21:37
@colega
Copy link
Contributor

colega commented Sep 27, 2022

Would it make this to be a forwarding rule property, rather than a global setting?

@replay
Copy link
Contributor Author

replay commented Sep 27, 2022

Would it make this to be a forwarding rule property, rather than a global setting?

I'm not sure how this would make sense. I think the main reason why we want this is to prevent that a failed forwarding destination can completely block the write path, we prevent this by stopping to try to send samples to the forwarding destination once they reach a certain age, but as long as there are any samples which we do attempt to forward the write path would remain blocked. So I can't see a use case for the ability to have different age cutoffs on a per forwarding rule basis, or do you see one?

@colega
Copy link
Contributor

colega commented Sep 28, 2022

We're adding the ability to drop them because we know that upstream would drop them anyway, and I thought that would be defined per-metric, but apparently it's not-so-configurable as I thought, so maybe it doesn't make sense. Sorry for the noise.

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have some questions and suggestions, please take a look.

pkg/distributor/distributor.go Show resolved Hide resolved
pkg/distributor/forwarding/forwarding.go Outdated Show resolved Hide resolved
pkg/distributor/forwarding/forwarding.go Outdated Show resolved Hide resolved
pkg/distributor/forwarding/forwarding.go Outdated Show resolved Hide resolved
}

// filterSamplesBefore filters a given slice of samples to only contain samples that have timestamps newer or equal to
// the given timestamp. It relies on the samples being sorted by timestamp.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It relies on the samples being sorted by timestamp.

Why can we rely on that? Push requests may not have samples sorted.

Copy link
Contributor Author

@replay replay Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we added OOO support the ingester required the samples within one series to be sorted as well afaik, otherwise we would have dropped some of them, this follows the same requirement.
However, you're right that now it is possible that we'd forward samples which are older than the cutoff at which we're supposed to start dropping them if they aren't sorted. I'd prefer to not have to sort the samples, so how about we reverse this logic and scan the slice of samples from the beginning to the end and as soon as we see one which is too old we drop the rest? That way if a slice of samples isn't sorted then we'd drop too much as opposed to too little.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this flips the logic as suggested and also adds a unit test for it: 7e9915c2b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the logic for reversed iteration, but let's keep it as-is. No need to complicate this code more for would have been rejected as out-of-order samples before.

pkg/distributor/forwarding/forwarding.go Show resolved Hide resolved

samplesFiltered := filterSamplesBefore(samplesUnfiltered, dontForwardBeforeTs)
if len(samplesFiltered) < len(samplesUnfiltered) {
err = errSamplesTooOld
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to returning error, we should also include such samples in cortex_discarded_samples_total metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think updating cortex_discarded_samples_total would make sense if we'd not ingest them when ingest is true, but we do. Maybe there should be a new metric to count the samples that were not forwarded due to the age?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add new reason for discarded samples that were not ingested. Let's do that in next PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2448,6 +2448,10 @@ The `limits` block configures default and per-tenant limits imposed by component
# rules.
[forwarding_endpoint: <string> | default = ""]

# If set, forwarding drops samples that are older than this duration. If unset
# or 0, no samples get dropped.
[forwarding_drop_before: <int> | default = ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it display <duration>? I tried it with this patch, and it worked, although I'm not sure which of the cases really helped:

diff --git a/tools/doc-generator/parse/parser.go b/tools/doc-generator/parse/parser.go
index 293de6790..bff59d0a0 100644
--- a/tools/doc-generator/parse/parser.go
+++ b/tools/doc-generator/parse/parser.go
@@ -342,6 +342,8 @@ func getFieldCustomType(t reflect.Type) (string, bool) {
                return "url", true
        case reflect.TypeOf(time.Duration(0)).String():
                return "duration", true
+       case reflect.TypeOf(model.Duration(0)).String():
+               return "duration", true
        case reflect.TypeOf(flagext.StringSliceCSV{}).String():
                return "string", true
        case reflect.TypeOf(flagext.CIDRSliceCSV{}).String():
@@ -422,6 +424,8 @@ func getCustomFieldType(t reflect.Type) (string, bool) {
                return "url", true
        case reflect.TypeOf(time.Duration(0)).String():
                return "duration", true
+       case reflect.TypeOf(model.Duration(0)).String():
+               return "duration", true
        case reflect.TypeOf(flagext.StringSliceCSV{}).String():
                return "string", true
        case reflect.TypeOf(flagext.CIDRSliceCSV{}).String():

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks, will apply that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It leads to panic when generating config descriptor json file:

[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x101150a40]

goroutine 1 [running]:
github.com/grafana/mimir/pkg/mimirtool/config.(*InspectedEntry).decodeValue(0x140010fb600, {0x101870760, 0x140010fcb40})
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:284 +0x120
github.com/grafana/mimir/pkg/mimirtool/config.parseDefaultValue(0x140010fb600, {0x0, 0x0})
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:539 +0x108
github.com/grafana/mimir/pkg/mimirtool/config.convertEntryToEntry(0x14000ee36b0)
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:525 +0x1f8
github.com/grafana/mimir/pkg/mimirtool/config.convertEntriesToEntries(...)
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:497
github.com/grafana/mimir/pkg/mimirtool/config.convertEntryToEntry(0x14000eba9a0)
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:517 +0x50c
github.com/grafana/mimir/pkg/mimirtool/config.convertEntriesToEntries(...)
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:497
github.com/grafana/mimir/pkg/mimirtool/config.convertBlockToEntry(...)
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:547
github.com/grafana/mimir/pkg/mimirtool/config.InspectConfigWithFlags({0x101792aa0?, 0x14000d20000?}, 0x1018713e0?)
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:491 +0xe0
github.com/grafana/mimir/pkg/mimirtool/config.InspectConfig({0x10186d280?, 0x14000d20000})
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:478 +0x5c
github.com/grafana/mimir/pkg/mimirtool/config.Describe({0x10186d280?, 0x14000d20000?})
	/Users/peter/Grafana/mimir/pkg/mimirtool/config/inspect.go:468 +0x24
main.main()
	/Users/peter/Grafana/mimir/tools/config-inspector/main.go:14 +0x34
exit status 2

Let's tackle that separately.

@replay
Copy link
Contributor Author

replay commented Sep 29, 2022

force-pushing to rebase

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thank you!

@pstibrany pstibrany enabled auto-merge (squash) September 30, 2022 12:50
@pstibrany pstibrany merged commit e19b31a into main Sep 30, 2022
@pstibrany pstibrany deleted the dont_forward_old_samples branch September 30, 2022 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants