-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query-scheduler.libsonnet
#295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having to include the source file to enable it, can we just have a cortex_query_scheduler_enabled
in the config? I think it would be more practical to enable/disable.
You're right. Please take a look again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on it! LGTM (modulo a nit) 🙏
* Add query-scheduler.libsonnet. * CHANGELOG.md * Use flag to enable query-scheduler. * Fix image.
* Added mega_user class Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fine-tune blocks storage config Signed-off-by: Marco Pracucci <marco@pracucci.com> * Disable tests by default to fix README instructions Ref grafana/cortex-jsonnet#95 * Run store-gateway without CPU limits Signed-off-by: Marco Pracucci <marco@pracucci.com> * Use v1 API for Deployment and StatefulSet resources * Version bump to v1.1.0 * Actually include the ruler * Update config option name * Added ruler_enabled and alertmanager_enabled flags. (grafana/cortex-jsonnet#116) * Added publish not ready addresses Signed-off-by: Joe Elliott <number101010@gmail.com> * Removed -experimental.tsdb.store-gateway-enabled flag Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added a discovery svc and pointed the querier service at itself Signed-off-by: Joe Elliott <number101010@gmail.com> * lint Signed-off-by: Joe Elliott <number101010@gmail.com> * Added PodDisruptionBudget for store-gateway Signed-off-by: Marco Pracucci <marco@pracucci.com> * Allow to configure the blocks replication factor Signed-off-by: Marco Pracucci <marco@pracucci.com> * Switch store-gateway StatefulSets to Parallel Pod Management Signed-off-by: Marco Pracucci <marco@pracucci.com> * Ruler should use metadata cache as well, if configured. (grafana/cortex-jsonnet#128) Ruler instantiates querier internally, so it can use metadata cache. * Allow to customize ingester disk size and class Signed-off-by: Marco Pracucci <marco@pracucci.com> * Version bump to 1.2.0 * refactor: use jaeger-agent-mixin lib got moved: grafana/jsonnet-libshttps://github.com/grafana/cortex-jsonnet/pull/291 used jb-0.4.0 which updates the jsonnetfile.json format * Switch blocks storage ingesters to Parallel pod management policy and 4d retention Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Chunks blocks migration (grafana/cortex-jsonnet#148) * Allow configuring querier with second store engine. * Introduced newIngesterStatefulSet and newIngesterPdb functions. * Rename parameters to be more clear. * refactor(cortex): use first class citizens for: * requiredDuringSchedulingIgnoredDuringExecutionType * portsType These are available from: https://github.com/jsonnet-libs/k8s-alpha * Update blocks storage CLI flags Signed-off-by: Marco Pracucci <marco@pracucci.com> * Do not apply blocks storage config to query-frontend, table-manager and purger Signed-off-by: Marco Pracucci <marco@pracucci.com> * Cleaned up blocks storage config Signed-off-by: Marco Pracucci <marco@pracucci.com> * Apply chunks-store config if primary or secondary store use chunks. (grafana/cortex-jsonnet#160) * Enable table manager when using chunks storage as secondary storage engine for querier. (grafana/cortex-jsonnet#161) * fix(ksonnet): backwards compatibility with ksonnet * add overrides config to tsdb store-gateway * Add jsonnet for ingester StatefulSet with WAL (grafana/cortex-jsonnet#72) * Add jsonnet for ingester StatefulSet with WAL Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Add CHANGELOG entry Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix lint Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Fix review comments Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in> * Change max query length to 32 days To allow for comparision over months of 31d Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Fix ruler S3 config option (grafana/cortex-jsonnet#174) * Removed -experimental.tsdb.store-gateway-enabled flag Signed-off-by: Marco Pracucci <marco@pracucci.com> * Use correct config variable for s3 ruler config * restore dropped line Co-authored-by: Marco Pracucci <marco@pracucci.com> * Add support for local ruler_client_type (grafana/cortex-jsonnet#175) * Support Alertmanager HA With this, we can now support increasing the number of replicas for a Cortex AM thus enabling HA. Please note that Alerts themselves are not gossiped between Alertmanagers. Each Ruler needs to send the alert to every Alertmanager available thus the reason why a headless service gets created when the number of replicas is more than 1. * Setup the gossip port * s/isGossiping/isHa * Bump to 3 replicas by default * Bump the cortex image, the latest stable is 1.3 * Fix typo in Alertmanager configuration * Alertmanager configuration tweaks - Introduces the `fallback_config` option to allow an Alertmanager to have a fallback config. - Given the headless service a different name to allow seamless switching between 1 or multiple replicas. The cluster field in the service metadata is immutable which made it impossible to create the new service unless you delete the previous one. * Remove different name for a headless service Sadly, we can't have a different name for the headless service as the statefulset is configured to match its name. * Fix ruler s3 storage configuration * Block storage support for s3 * Added Azure support to blocks storage Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed linter Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed the experimental prefix from blocks storage CLI flags Signed-off-by: Marco Pracucci <marco@pracucci.com> * Lower default ingestion limits and create a new overrides user * Address review feedback * Bump default series limit by 50% * Add flusher job for blocks. * Fixed Azure account name/key config Signed-off-by: Marco Pracucci <marco@pracucci.com> * Rename changed flags for 1.4 release. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Make sure only a single ruler rolls out at a time Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Cut 1.4.0 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add overrides exporter Overrides exporter part of grafana/cortex-tools and exposes runtime overrides and related presets of Cortex as metrics. Signed-off-by: Christian Simon <simon@swine.de> * Refactor limits and overrides Ensure we expose 'extra_small_user' and reference it setting the "default" values. This will raise the limits of the 'small_user' preset to the defaults for `ingester.max-samples-per-query` and `ingester.max-series-per-query`. Signed-off-by: Christian Simon <simon@swine.de> * Removed support for ingester.statefulset_replicas Signed-off-by: Marco Pracucci <marco@pracucci.com> * Switch compactor statefulset to Parallel pod management policy Signed-off-by: Marco Pracucci <marco@pracucci.com> * Cut 1.5.0 release Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add ruler limits Sets default presets for for all the 'users' when it comes to ruler limits. * Add for the last user * Enabled compactor sharding Signed-off-by: Marco Pracucci <marco@pracucci.com> * Rollback PR 213 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Re-introduce ruler limits Signed-off-by: Marco Pracucci <marco@pracucci.com> * [fixup] ruler limits config key name Ruler limits have a prefix of `ruler_` on the config key name. This makes the key match and then uses them as the value for the flags. * Removed postings-compression-enabled Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fine-tuned gRPC keepalive pings settings Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed gRPC settings Signed-off-by: Marco Pracucci <marco@pracucci.com> * Release 1.6.0 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add option to configure unregister ingesters on shutdown Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed config Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed ifs Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed syntax error Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove misleading comment (grafana/cortex-jsonnet#243) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add option to customise the configmap name Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Fix for real Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added bucket index flag, and enable bucket index by default. (grafana/cortex-jsonnet#254) * Cleanup blocks storage config Signed-off-by: Marco Pracucci <marco@pracucci.com> * feat: allow for Alertmanager to configure multiple storage backends Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * Update cortex/config.libsonnet Co-authored-by: gotjosh <josue@grafana.com> * Update cortex/alertmanager.libsonnet Co-authored-by: gotjosh <josue@grafana.com> * Release 1.7.0. (grafana/cortex-jsonnet#260) * Release 1.7.0. * cortex: config: Fix error message for alertmanager_client_type. * cortex: alertmanager: Remove space in dot notation. * Up metadata connection limits * Add flag to enable streaming of chunks. (grafana/cortex-jsonnet#276) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Add recording rules to calculate Cortex scaling - Update dashboard so it only shows under provisioned services and why - Add sizing rules based on limits. - Add some docs to the dashboard. Signed-off-by: Tom Wilkie <tom@grafana.com> * chore: update lib to use new API paths Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com> * Create 1.8.0 release. (grafana/cortex-jsonnet#282) * Create 1.8.0 release. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Update image tags. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Do not use deprecated Alertmanager cluster flags Signed-off-by: Marco Pracucci <marco@pracucci.com> * fix: Update ksonnet-util vendor lock The previous version `c19a92e586a6752f11745b47f309b13f02ef7147` is incompatible with the library in its current form. For example in `tsdb.libsonnet` L81, we use `pvc.new('ingester-pvc')` but at the locked version, in `ksonnet-util/kausal.libsonnet` the `pvc.new` function takes no arguments. Signed-off-by: Jack Baldry <jack.baldry@grafana.com> * Add function to customize compactor statefulset Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add querier_service_ignored_labels (grafana/cortex-jsonnet#291) Co-authored-by: Victor Tsang Hi <victor.tsang.hi@sap.com> * Introduce ingester instance limits to configuration, and add alerts. (grafana/cortex-jsonnet#296) * Introduce ingester instance limits to configuration, and add alerts. * CHANGELOG.md * Address (internal) review feedback. * Add `query-scheduler.libsonnet` (grafana/cortex-jsonnet#295) * Add query-scheduler.libsonnet. * CHANGELOG.md * Use flag to enable query-scheduler. * Fix image. * Replace use of querier.compress-http-responses removed in Cortex 1.9 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Enable index-header lazy loading in store-gateway Signed-off-by: Marco Pracucci <marco@pracucci.com> * Do not use deprecated/removed flag -limits.per-user-override-config Signed-off-by: Marco Pracucci <marco@pracucci.com> * Use new ruler storage config and enable API compression Signed-off-by: Marco Pracucci <marco@pracucci.com> * Changed alertmanager config to use the new storage config Signed-off-by: Marco Pracucci <marco@pracucci.com> * Cut release 1.9.0 Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Mount overrides configmap to alertmanager too Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upgrade memcached Signed-off-by: Marco Pracucci <marco@pracucci.com> * Increase default store-gateway memory request and limit Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix Signed-off-by: Marco Pracucci <marco@pracucci.com> * Set -server.grpc-max-*-msg-size-bytes for ruler and ingester. (grafana/cortex-jsonnet#326) * Fixed --alertmanager.cluster.peers Signed-off-by: Marco Pracucci <marco@pracucci.com> * Set empty alertmanager listen address with 1 replica Alertmanager tries to start clustering unless the flag is explicitly set as an empty string https://github.com/prometheus/alertmanager#turn-off-high-availability * Add option to disable anti-affinity in newIngesterStatefulSet() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix alertmanager config change introduced in grafana/cortex-jsonnet#344 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Create another tier with 300K active series The other tiers have a 3x jump except when we go from 100K to 1Mil. I think we should have a 3x jump for the first tier too. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Improve config settings based on recent learnings Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added functions to create query-frontend and querier deployments Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added function to create query-scheduler deployment Signed-off-by: Marco Pracucci <marco@pracucci.com> * chore: upgrade to latest etcd-operator Brings: grafana/jsonnet-libs#480 * Alertmanager: Allow storage configuration to support Azure The alertmanager configuration did not have support for Azure. Let's add it. * remove new line * Fix comment on medium_small_user config It says it should be 100k + 50%, but that's what extra_small_user is. Here we have 300k, which is 200k + 50%. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Remove wrong comment Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Add overrides to compactor Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Split limits config into a variable we can reuse Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Review feedback Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Fix missing ruler limits Damn, missed this in grafana/cortex-jsonnet#391 Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Alertmanager: Add sharding configuration. * Fix `compactor_blocks_retention_period` type in `extra_small_user` (grafana/cortex-jsonnet#395) * Fix `compactor_blocks_retention_period` type in `extra_small_user` The actual type of `compactor_blocks_retention_period` is `model.Duration`. Which comes from prometheus `common` package. The problem is that `model.Duration` have custom JSON unmarshal which treat the incoming value as string. https://github.com/prometheus/common/blob/main/model/time.go#L276 So setting it as integer, won't work when unmarshalling with JSON. NOTE: This won't be an issue for YamlUnmarshal, as it always treating it as string (even though you put it as integer) https://github.com/prometheus/common/blob/main/model/time.go#L307 * update CHANGELOG * Update rule limits to be inline with customer expectations We built the initial rules on guesswork and now we're updating them based on what the customers are asking for. Further, the ruler can be horizontally scaled and we're happy letting our users have more rules! Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> * Remove max_samples_per_query limit. (grafana/cortex-jsonnet#397) * Remove max_samples_per_query limit. * Fixed CHANGELOG.md * Removed chunks storage query sharding config support Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add queryEngineConfig Signed-off-by: Marco Pracucci <marco@pracucci.com> * tsdb: Add multi concurrency and max idle connections store gateway params Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * Update cortex/tsdb.libsonnet Co-authored-by: Marco Pracucci <marco@pracucci.com> * Fix formatting Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * tsdb: Use literal numbers instead of variables Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * cortex: Make ruler object storage support generic Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * Remove ruler-storage.gcs.bucket-name for Azure Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * cortex: Define Azure ruler args Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * Parameterize Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * Further document ingester_stream_chunks_when_using_blocks parameter Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * Add options to disable anti-affinity Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upstream some config improvements Signed-off-by: Marco Pracucci <marco@pracucci.com> * Increased max connections for memcached chunks and index-queries too Signed-off-by: Marco Pracucci <marco@pracucci.com> * Ruler: Pass `-ruler-storage.s3.endpoint` to ruler when using S3. This argument is is required, without it, the following error appears: ``` no s3 endpoint in config file ``` * Allow to create custom store-gateway StatefulSets via newStoreGatewayStatefulSet() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix newStoreGatewayStatefulSet() to use input container Signed-off-by: Marco Pracucci <marco@pracucci.com> * Add CI check for jsonnet manifests * Remove additional git diff in check-mixin * Imported cortex-jsonnet CHANGELOG entries from 1.9.0 Signed-off-by: Marco Pracucci <marco@pracucci.com> * Improved CHANGELOG header Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Austin McKinley <54160+amckinley@users.noreply.github.com> Co-authored-by: Tom Wilkie <tomwilkie@users.noreply.github.com> Co-authored-by: Jacob Lisi <jacob.t.lisi@gmail.com> Co-authored-by: Austin McKinley <austin.mckinley@robinhood.com> Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com> Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com> Co-authored-by: Joe Elliott <number101010@gmail.com> Co-authored-by: Joe Elliott <joe.elliott@grafana.com> Co-authored-by: Duologic <jeroen@simplistic.be> Co-authored-by: Jeroen Op 't Eynde <jeroen@grafana.com> Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com> Co-authored-by: Stan Kwong <jpdstan@gmail.com> Co-authored-by: gotjosh <josue@grafana.com> Co-authored-by: forestsword <colsen@adobe.com> Co-authored-by: Jacob Lisi <jlisi@grafana.com> Co-authored-by: Alex Martin <alex@suitupalex.com> Co-authored-by: Tom Wilkie <tom@grafana.com> Co-authored-by: Jack Baldry <jack.baldry@grafana.com> Co-authored-by: Victor Tsang Hi <victor.tsanghi@gmail.com> Co-authored-by: Victor Tsang Hi <victor.tsang.hi@sap.com> Co-authored-by: Nick Pillitteri <nick.pillitteri@grafana.com> Co-authored-by: Steve Simpson <steve.simpson@grafana.com> Co-authored-by: Hamish <hamish.forbes@gmail.com> Co-authored-by: Javier Palomo <javier.palomo@grafana.com> Co-authored-by: gotjosh <josue.abreu@gmail.com> Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> Co-authored-by: Kaviraj <kavirajkanagaraj@gmail.com> Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
What this PR does: This PR adds
query-scheduler.libsonnet
for configuring query-scheduler.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]