Skip to content

Commit

Permalink
Import cortex-jsonnet into mimir repo (#506)
Browse files Browse the repository at this point in the history
* Added mega_user class

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fine-tune blocks storage config

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Disable tests by default to fix README instructions

Ref grafana/cortex-jsonnet#95

* Run store-gateway without CPU limits

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Use v1 API for Deployment and StatefulSet resources

* Version bump to v1.1.0

* Actually include the ruler

* Update config option name

* Added ruler_enabled and alertmanager_enabled flags. (grafana/cortex-jsonnet#116)

* Added publish not ready addresses

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Removed -experimental.tsdb.store-gateway-enabled flag

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added a discovery svc and pointed the querier service at itself

Signed-off-by: Joe Elliott <number101010@gmail.com>

* lint

Signed-off-by: Joe Elliott <number101010@gmail.com>

* Added PodDisruptionBudget for store-gateway

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Allow to configure the blocks replication factor

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Switch store-gateway StatefulSets to Parallel Pod Management

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Ruler should use metadata cache as well, if configured. (grafana/cortex-jsonnet#128)

Ruler instantiates querier internally, so it can use metadata cache.

* Allow to customize ingester disk size and class

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Version bump to 1.2.0

* refactor: use jaeger-agent-mixin

lib got moved: grafana/jsonnet-libshttps://github.com/grafana/cortex-jsonnet/pull/291

used jb-0.4.0 which updates the jsonnetfile.json format

* Switch blocks storage ingesters to Parallel pod management policy and 4d retention

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed comment

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Chunks blocks migration (grafana/cortex-jsonnet#148)

* Allow configuring querier with second store engine.

* Introduced newIngesterStatefulSet and newIngesterPdb functions.

* Rename parameters to be more clear.

* refactor(cortex): use first class citizens

for:
* requiredDuringSchedulingIgnoredDuringExecutionType
* portsType

These are available from: https://github.com/jsonnet-libs/k8s-alpha

* Update blocks storage CLI flags

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Do not apply blocks storage config to query-frontend, table-manager and purger

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Cleaned up blocks storage config

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Apply chunks-store config if primary or secondary store use chunks. (grafana/cortex-jsonnet#160)

* Enable table manager when using chunks storage as secondary storage engine for querier. (grafana/cortex-jsonnet#161)

* fix(ksonnet): backwards compatibility with ksonnet

* add overrides config to tsdb store-gateway

* Add jsonnet for ingester StatefulSet with WAL (grafana/cortex-jsonnet#72)

* Add jsonnet for ingester StatefulSet with WAL

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add CHANGELOG entry

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix lint

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Change max query length to 32 days

To allow for comparision over months of 31d

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Fix ruler S3 config option (grafana/cortex-jsonnet#174)

* Removed -experimental.tsdb.store-gateway-enabled flag

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Use correct config variable for s3 ruler config

* restore dropped line

Co-authored-by: Marco Pracucci <marco@pracucci.com>

* Add support for local ruler_client_type (grafana/cortex-jsonnet#175)

* Support Alertmanager HA

With this, we can now support increasing the number of replicas for a
Cortex AM thus enabling HA.

 Please note that Alerts themselves are not gossiped between
Alertmanagers. Each Ruler needs to send the alert to every Alertmanager
available thus the reason why a headless service gets created when the
number of replicas is more than 1.

* Setup the gossip port

* s/isGossiping/isHa

* Bump to 3 replicas by default

* Bump the cortex image, the latest stable is 1.3

* Fix typo in Alertmanager configuration

* Alertmanager configuration tweaks

- Introduces the `fallback_config` option to allow an Alertmanager to
  have a fallback config.
- Given the headless service a different name to allow seamless
  switching between 1 or multiple replicas. The cluster field in the
service metadata is immutable which made it impossible to create the new
service unless you delete the previous one.

* Remove different name for a headless service

Sadly, we can't have a different name for the headless service as the
statefulset is configured to match its name.

* Fix ruler s3 storage configuration

* Block storage support for s3

* Added Azure support to blocks storage

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed linter

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Removed the experimental prefix from blocks storage CLI flags

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Lower default ingestion limits and create a new overrides user

* Address review feedback

* Bump default series limit by 50%

* Add flusher job for blocks.

* Fixed Azure account name/key config

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Rename changed flags for 1.4 release.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Make sure only a single ruler rolls out at a time

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Cut 1.4.0

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add overrides exporter

Overrides exporter part of grafana/cortex-tools and exposes runtime
overrides and related presets of Cortex as metrics.

Signed-off-by: Christian Simon <simon@swine.de>

* Refactor limits and overrides

Ensure we expose 'extra_small_user' and reference it setting the
"default" values.

This will raise the limits of the 'small_user' preset to the defaults
for `ingester.max-samples-per-query` and
`ingester.max-series-per-query`.

Signed-off-by: Christian Simon <simon@swine.de>

* Removed support for ingester.statefulset_replicas

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Switch compactor statefulset to Parallel pod management policy

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Cut 1.5.0 release

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add ruler limits

Sets default presets for for all the 'users' when it comes to ruler
limits.

* Add for the last user

* Enabled compactor sharding

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Rollback PR 213

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Re-introduce ruler limits

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* [fixup] ruler limits config key name

Ruler limits have a prefix of `ruler_` on the config key name. This
makes the key match and then uses them as the value for the flags.

* Removed postings-compression-enabled

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fine-tuned gRPC keepalive pings settings

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed gRPC settings

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Release 1.6.0

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add option to configure unregister ingesters on shutdown

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed config

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Improved comment

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Updated doc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Removed ifs

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Updated comment

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed syntax error

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Remove misleading comment (grafana/cortex-jsonnet#243)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add option to customise the configmap name

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Fix for real

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added bucket index flag, and enable bucket index by default. (grafana/cortex-jsonnet#254)

* Cleanup blocks storage config

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* feat: allow for Alertmanager to configure multiple storage backends

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* Update cortex/config.libsonnet

Co-authored-by: gotjosh <josue@grafana.com>

* Update cortex/alertmanager.libsonnet

Co-authored-by: gotjosh <josue@grafana.com>

* Release 1.7.0. (grafana/cortex-jsonnet#260)

* Release 1.7.0.

* cortex: config: Fix error message for alertmanager_client_type.

* cortex: alertmanager: Remove space in dot notation.

* Up metadata connection limits

* Add flag to enable streaming of chunks. (grafana/cortex-jsonnet#276)

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Add recording rules to calculate Cortex scaling

- Update dashboard so it only shows under provisioned services and why
- Add sizing rules based on limits.
- Add some docs to the dashboard.

Signed-off-by: Tom Wilkie <tom@grafana.com>

* chore: update lib to use new API paths

Signed-off-by: Jacob Lisi <jacob.t.lisi@gmail.com>

* Create 1.8.0 release. (grafana/cortex-jsonnet#282)

* Create 1.8.0 release.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Update image tags.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Do not use deprecated Alertmanager cluster flags

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* fix: Update ksonnet-util vendor lock

The previous version `c19a92e586a6752f11745b47f309b13f02ef7147` is
incompatible with the library in its current form. For example in
`tsdb.libsonnet` L81, we use `pvc.new('ingester-pvc')` but at the
locked version, in `ksonnet-util/kausal.libsonnet` the `pvc.new`
function takes no arguments.

Signed-off-by: Jack Baldry <jack.baldry@grafana.com>

* Add function to customize compactor statefulset

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add querier_service_ignored_labels (grafana/cortex-jsonnet#291)

Co-authored-by: Victor Tsang Hi <victor.tsang.hi@sap.com>

* Introduce ingester instance limits to configuration, and add alerts. (grafana/cortex-jsonnet#296)

* Introduce ingester instance limits to configuration, and add alerts.

* CHANGELOG.md

* Address (internal) review feedback.

* Add `query-scheduler.libsonnet` (grafana/cortex-jsonnet#295)

* Add query-scheduler.libsonnet.

* CHANGELOG.md

* Use flag to enable query-scheduler.

* Fix image.

* Replace use of querier.compress-http-responses removed in Cortex 1.9

Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>

* Enable index-header lazy loading in store-gateway

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Do not use deprecated/removed flag -limits.per-user-override-config

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Use new ruler storage config and enable API compression

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Changed alertmanager config to use the new storage config

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Cut release 1.9.0

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Mount overrides configmap to alertmanager too

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Upgrade memcached

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Increase default store-gateway memory request and limit

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Set -server.grpc-max-*-msg-size-bytes for ruler and ingester. (grafana/cortex-jsonnet#326)

* Fixed --alertmanager.cluster.peers

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Set empty alertmanager listen address with 1 replica

Alertmanager tries to start clustering unless the flag is explicitly set as an empty string
https://github.com/prometheus/alertmanager#turn-off-high-availability

* Add option to disable anti-affinity in newIngesterStatefulSet()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix alertmanager config change introduced in grafana/cortex-jsonnet#344

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Create another tier with 300K active series

The other tiers have a 3x jump except when we go from 100K to 1Mil. I
think we should have a 3x jump for the first tier too.

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Improve config settings based on recent learnings

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added functions to create query-frontend and querier deployments

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added function to create query-scheduler deployment

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* chore: upgrade to latest etcd-operator

Brings: grafana/jsonnet-libs#480

* Alertmanager: Allow storage configuration to support Azure

The alertmanager configuration did not have support for Azure. Let's add it.

* remove new line

* Fix comment on medium_small_user config

It says it should be 100k + 50%, but that's what extra_small_user is.
Here we have 300k, which is 200k + 50%.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Remove wrong comment

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Add overrides to compactor

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Split limits config into a variable we can reuse

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Review feedback

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Fix missing ruler limits

Damn, missed this in grafana/cortex-jsonnet#391

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Alertmanager: Add sharding configuration.

* Fix `compactor_blocks_retention_period` type in `extra_small_user` (grafana/cortex-jsonnet#395)

* Fix `compactor_blocks_retention_period` type in `extra_small_user`

The actual type of `compactor_blocks_retention_period` is `model.Duration`. Which comes
from prometheus `common` package.

The problem is that `model.Duration` have custom JSON unmarshal which treat the incoming
value as string.
https://github.com/prometheus/common/blob/main/model/time.go#L276

So setting it as integer, won't work when unmarshalling with JSON.

NOTE: This won't be an issue for YamlUnmarshal, as it always treating it as string (even
though you put it as integer)
https://github.com/prometheus/common/blob/main/model/time.go#L307

* update CHANGELOG

* Update rule limits to be inline with customer expectations

We built the initial rules on guesswork and now we're updating them
based on what the customers are asking for.

Further, the ruler can be horizontally scaled and we're happy letting
our users have more rules!

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

* Remove max_samples_per_query limit. (grafana/cortex-jsonnet#397)

* Remove max_samples_per_query limit.

* Fixed CHANGELOG.md

* Removed chunks storage query sharding config support

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add queryEngineConfig

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* tsdb: Add multi concurrency and max idle connections store gateway params

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* Update cortex/tsdb.libsonnet

Co-authored-by: Marco Pracucci <marco@pracucci.com>

* Fix formatting

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* tsdb: Use literal numbers instead of variables

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* cortex: Make ruler object storage support generic

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* Remove ruler-storage.gcs.bucket-name for Azure

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* cortex: Define Azure ruler args

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* Parameterize

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* Further document ingester_stream_chunks_when_using_blocks parameter

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>

* Add options to disable anti-affinity

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Upstream some config improvements

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Increased max connections for memcached chunks and index-queries too

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Ruler: Pass `-ruler-storage.s3.endpoint` to ruler when using S3.

This argument is is required, without it, the following error appears:

```
no s3 endpoint in config file
```

* Allow to create custom store-gateway StatefulSets via newStoreGatewayStatefulSet()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix newStoreGatewayStatefulSet() to use input container

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add CI check for jsonnet manifests

* Remove additional git diff in check-mixin

* Imported cortex-jsonnet CHANGELOG entries from 1.9.0

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Improved CHANGELOG header

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Co-authored-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Austin McKinley <54160+amckinley@users.noreply.github.com>
Co-authored-by: Tom Wilkie <tomwilkie@users.noreply.github.com>
Co-authored-by: Jacob Lisi <jacob.t.lisi@gmail.com>
Co-authored-by: Austin McKinley <austin.mckinley@robinhood.com>
Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com>
Co-authored-by: Joe Elliott <number101010@gmail.com>
Co-authored-by: Joe Elliott <joe.elliott@grafana.com>
Co-authored-by: Duologic <jeroen@simplistic.be>
Co-authored-by: Jeroen Op 't Eynde <jeroen@grafana.com>
Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Stan Kwong <jpdstan@gmail.com>
Co-authored-by: gotjosh <josue@grafana.com>
Co-authored-by: forestsword <colsen@adobe.com>
Co-authored-by: Jacob Lisi <jlisi@grafana.com>
Co-authored-by: Alex Martin <alex@suitupalex.com>
Co-authored-by: Tom Wilkie <tom@grafana.com>
Co-authored-by: Jack Baldry <jack.baldry@grafana.com>
Co-authored-by: Victor Tsang Hi <victor.tsanghi@gmail.com>
Co-authored-by: Victor Tsang Hi <victor.tsang.hi@sap.com>
Co-authored-by: Nick Pillitteri <nick.pillitteri@grafana.com>
Co-authored-by: Steve Simpson <steve.simpson@grafana.com>
Co-authored-by: Hamish <hamish.forbes@gmail.com>
Co-authored-by: Javier Palomo <javier.palomo@grafana.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Kaviraj <kavirajkanagaraj@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
  • Loading branch information
1 parent e1431c5 commit 529646f
Show file tree
Hide file tree
Showing 28 changed files with 2,291 additions and 5 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/test-build-deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ jobs:
run: make BUILD_IN_CONTAINER=false check-doc
- name: Check Mixin
run: make BUILD_IN_CONTAINER=false check-mixin
- name: Check Jsonnet Manifests
run: make BUILD_IN_CONTAINER=false check-jsonnet-manifests
- name: Check White Noise.
run: make BUILD_IN_CONTAINER=false check-white-noise
- name: Check License Header
Expand Down
105 changes: 103 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,19 +180,120 @@
* [BUGFIX] Distributor: fix bug in query-exemplar where some results would get dropped. #583
* [BUGFIX] Azure storage: only create HTTP client once, to reduce memory utilization. #605

Mixin:

### Mixin (changes since `grafana/cortex-jsonnet` `1.9.0`)

* [CHANGE] Update grafana-builder dependency: use $__rate_interval in qpsPanel and latencyPanel. [#372](https://github.com/grafana/cortex-jsonnet/pull/372)
* [CHANGE] `namespace` template variable in dashboards now only selects namespaces for selected clusters. [#311](https://github.com/grafana/cortex-jsonnet/pull/311)
* [CHANGE] `CortexIngesterRestarts` alert severity changed from `critical` to `warning`. [#321](https://github.com/grafana/cortex-jsonnet/pull/321)
* [CHANGE] Dashboards: added overridable `job_labels` and `cluster_labels` to the configuration object as label lists to uniquely identify jobs and clusters in the metric names and group-by lists in dashboards. [#319](https://github.com/grafana/cortex-jsonnet/pull/319)
* [CHANGE] Dashboards: `alert_aggregation_labels` has been removed from the configuration and overriding this value has been deprecated. Instead the labels are now defined by the `cluster_labels` list, and should be overridden accordingly through that list. [#319](https://github.com/grafana/cortex-jsonnet/pull/319)
* [CHANGE] Renamed `CortexCompactorHasNotUploadedBlocksSinceStart` to `CortexCompactorHasNotUploadedBlocks`. [#334](https://github.com/grafana/cortex-jsonnet/pull/334)
* [CHANGE] Renamed `CortexCompactorRunFailed` to `CortexCompactorHasNotSuccessfullyRunCompaction`. [#334](https://github.com/grafana/cortex-jsonnet/pull/334)
* [CHANGE] Renamed `CortexInconsistentConfig` alert to `CortexInconsistentRuntimeConfig` and increased severity to `critical`. [#335](https://github.com/grafana/cortex-jsonnet/pull/335)
* [CHANGE] Increased `CortexBadRuntimeConfig` alert severity to `critical` and removed support for `cortex_overrides_last_reload_successful` metric (was removed in Cortex 1.3.0). [#335](https://github.com/grafana/cortex-jsonnet/pull/335)
* [CHANGE] Grafana 'min step' changed to 15s so dashboard show better detail. [#340](https://github.com/grafana/cortex-jsonnet/pull/340)
* [CHANGE] Replace `CortexRulerFailedEvaluations` with two new alerts: `CortexRulerTooManyFailedPushes` and `CortexRulerTooManyFailedQueries`. [#347](https://github.com/grafana/cortex-jsonnet/pull/347)
* [CHANGE] Removed `CortexCacheRequestErrors` alert. This alert was not working because the legacy Cortex cache client instrumentation doesn't track errors. [#346](https://github.com/grafana/cortex-jsonnet/pull/346)
* [CHANGE] Removed `CortexQuerierCapacityFull` alert. [#342](https://github.com/grafana/cortex-jsonnet/pull/342)
* [CHANGE] Changes blocks storage alerts to group metrics by the configured `cluster_labels` (supporting the deprecated `alert_aggregation_labels`). [#351](https://github.com/grafana/cortex-jsonnet/pull/351)
* [CHANGE] Increased `CortexIngesterReachingSeriesLimit` critical alert threshold from 80% to 85%. [#363](https://github.com/grafana/cortex-jsonnet/pull/363)
* [CHANGE] Changed default `job_names` for query-frontend, query-scheduler and querier to match custom deployments too. [#376](https://github.com/grafana/cortex-jsonnet/pull/376)
* [CHANGE] Split `cortex_api` recording rule group into three groups. This is a workaround for large clusters where this group can become slow to evaluate. [#401](https://github.com/grafana/cortex-jsonnet/pull/401)
* [CHANGE] Increased `CortexIngesterReachingSeriesLimit` warning threshold from 70% to 80% and critical threshold from 85% to 90%. [#404](https://github.com/grafana/cortex-jsonnet/pull/404)
* [CHANGE] Raised `CortexKVStoreFailure` alert severity from warning to critical. #493
* [CHANGE] Increase `CortexRolloutStuck` alert "for" duration from 15m to 30m. #493 #573
* [ENHANCEMENT] cortex-mixin: Make `cluster_namespace_deployment:kube_pod_container_resource_requests_{cpu_cores,memory_bytes}:sum` backwards compatible with `kube-state-metrics` v2.0.0. [#317](https://github.com/grafana/cortex-jsonnet/pull/317)
* [ENHANCEMENT] Cortex-mixin: Include `cortex-gw-internal` naming variation in default `gateway` job names. [#328](https://github.com/grafana/cortex-jsonnet/pull/328)
* [ENHANCEMENT] Ruler dashboard: added object storage metrics. [#354](https://github.com/grafana/cortex-jsonnet/pull/354)
* [ENHANCEMENT] Alertmanager dashboard: added object storage metrics. [#354](https://github.com/grafana/cortex-jsonnet/pull/354)
* [ENHANCEMENT] Added documentation text panels and descriptions to reads and writes dashboards. [#324](https://github.com/grafana/cortex-jsonnet/pull/324)
* [ENHANCEMENT] Dashboards: defined container functions for common resources panels: containerDiskWritesPanel, containerDiskReadsPanel, containerDiskSpaceUtilization. [#331](https://github.com/grafana/cortex-jsonnet/pull/331)
* [ENHANCEMENT] cortex-mixin: Added `alert_excluded_routes` config to exclude specific routes from alerts. [#338](https://github.com/grafana/cortex-jsonnet/pull/338)
* [ENHANCEMENT] Added `CortexMemcachedRequestErrors` alert. [#346](https://github.com/grafana/cortex-jsonnet/pull/346)
* [ENHANCEMENT] Ruler dashboard: added "Per route p99 latency" panel in the "Configuration API" row. [#353](https://github.com/grafana/cortex-jsonnet/pull/353)
* [ENHANCEMENT] Increased the `for` duration of the `CortexIngesterReachingSeriesLimit` warning alert to 3h. [#362](https://github.com/grafana/cortex-jsonnet/pull/362)
* [ENHANCEMENT] Added a new tier (`medium_small_user`) so we have another tier between 100K and 1Mil active series. [#364](https://github.com/grafana/cortex-jsonnet/pull/364)
* [ENHANCEMENT] Extend Alertmanager dashboard: [#313](https://github.com/grafana/cortex-jsonnet/pull/313)
* "Tenants" stat panel - shows number of discovered tenant configurations.
* "Replication" row - information about the replication of tenants/alerts/silences over instances.
* "Tenant Configuration Sync" row - information about the configuration sync procedure.
* "Sharding Initial State Sync" row - information about the initial state sync procedure when sharding is enabled.
* "Sharding Runtime State Sync" row - information about various state operations which occur when sharding is enabled (replication, fetch, marge, persist).
* [ENHANCEMENT] Update gsutil command for `not healthy index found` playbook [#370](https://github.com/grafana/cortex-jsonnet/pull/370)
* [ENHANCEMENT] Added Alertmanager alerts and playbooks covering configuration syncs and sharding operation: [#377 [#378](https://github.com/grafana/cortex-jsonnet/pull/378)
* `CortexAlertmanagerSyncConfigsFailing`
* `CortexAlertmanagerRingCheckFailing`
* `CortexAlertmanagerPartialStateMergeFailing`
* `CortexAlertmanagerReplicationFailing`
* `CortexAlertmanagerPersistStateFailing`
* `CortexAlertmanagerInitialSyncFailed`
* [ENHANCEMENT] Add recording rules to improve responsiveness of Alertmanager dashboard. [#387](https://github.com/grafana/cortex-jsonnet/pull/387)
* [ENHANCEMENT] Add `CortexRolloutStuck` alert. [#405](https://github.com/grafana/cortex-jsonnet/pull/405)
* [ENHANCEMENT] Added `CortexKVStoreFailure` alert. [#406](https://github.com/grafana/cortex-jsonnet/pull/406)
* [ENHANCEMENT] Use configured `ruler` jobname for ruler dashboard panels. [#409](https://github.com/grafana/cortex-jsonnet/pull/409)
* [ENHANCEMENT] Add ability to override `datasource` for generated dashboards. [#407](https://github.com/grafana/cortex-jsonnet/pull/407)
* [ENHANCEMENT] Use alertmanager jobname for alertmanager dashboard panels [#411](https://github.com/grafana/cortex-jsonnet/pull/411)
* [ENHANCEMENT] Added `CortexDistributorReachingInflightPushRequestLimit` alert. [#408](https://github.com/grafana/cortex-jsonnet/pull/408)
* [ENHANCEMENT] Added `CortexReachingTCPConnectionsLimit` alert. #403
* [ENHANCEMENT] Added "Cortex / Writes Networking" and "Cortex / Reads Networking" dashboards. #405
* [ENHANCEMENT] Improved "Queue length" panel in "Cortex / Queries" dashboard. #408
* [ENHANCEMENT] Add `CortexDistributorReachingInflightPushRequestLimit` alert and playbook. #401
* [ENHANCEMENT] Added "Recover accidentally deleted blocks (Google Cloud specific)" playbook. #475
* [ENHANCEMENT] Added support to multi-zone store-gateway deployments. #608 #615
* [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. [#308](https://github.com/grafana/cortex-jsonnet/pull/308)
* [BUGFIX] Fixed `CortexInconsistentRuntimeConfig` metric. [#335](https://github.com/grafana/cortex-jsonnet/pull/335)
* [BUGFIX] Fixed scaling dashboard to correctly work when a Cortex service deployment spans across multiple zones (a zone is expected to have the `zone-[a-z]` suffix). [#365](https://github.com/grafana/cortex-jsonnet/pull/365)
* [BUGFIX] Fixed rollout progress dashboard to correctly work when a Cortex service deployment spans across multiple zones (a zone is expected to have the `zone-[a-z]` suffix). [#366](https://github.com/grafana/cortex-jsonnet/pull/366)
* [BUGFIX] Fixed rollout progress dashboard to include query-scheduler too. [#376](https://github.com/grafana/cortex-jsonnet/pull/376)
* [BUGFIX] Upstream recording rule `node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate` renamed. [#379](https://github.com/grafana/cortex-jsonnet/pull/379)
* [BUGFIX] Fixed writes/reads/alertmanager resources dashboards to use `$._config.job_names.gateway`. [#403](https://github.com/grafana/cortex-jsonnet/pull/403)
* [BUGFIX] Span the annotation.message in alerts as YAML multiline strings. [#412](https://github.com/grafana/cortex-jsonnet/pull/412)
* [BUGFIX] Fixed "Instant queries / sec" in "Cortex / Reads" dashboard. #445
* [BUGFIX] Fixed and added missing KV store panels in Writes, Reads, Ruler and Compactor dashboards. #448

### Jsonnet (changes since `grafana/cortex-jsonnet` `1.9.0`)

* [CHANGE] Store gateway: set `-blocks-storage.bucket-store.index-cache.memcached.max-get-multi-concurrency`,
`-blocks-storage.bucket-store.chunks-cache.memcached.max-get-multi-concurrency`,
`-blocks-storage.bucket-store.metadata-cache.memcached.max-get-multi-concurrency`,
`-blocks-storage.bucket-store.index-cache.memcached.max-idle-connections`,
`-blocks-storage.bucket-store.chunks-cache.memcached.max-idle-connections`,
`-blocks-storage.bucket-store.metadata-cache.memcached.max-idle-connections` to 100 [#414](https://github.com/grafana/cortex-jsonnet/pull/414)
* [CHANGE] Alertmanager: mounted overrides configmap to alertmanager too. [#315](https://github.com/grafana/cortex-jsonnet/pull/315)
* [CHANGE] Memcached: upgraded memcached from `1.5.17` to `1.6.9`. [#316](https://github.com/grafana/cortex-jsonnet/pull/316)
* [CHANGE] Store-gateway: increased memory request and limit respectively from 6GB / 6GB to 12GB / 18GB. [#322](https://github.com/grafana/cortex-jsonnet/pull/322)
* [CHANGE] Store-gateway: increased `-blocks-storage.bucket-store.max-chunk-pool-bytes` from 2GB (default) to 12GB. [#322](https://github.com/grafana/cortex-jsonnet/pull/322)
* [CHANGE] Ingester/Ruler: set `-server.grpc-max-send-msg-size-bytes` and `-server.grpc-max-send-msg-size-bytes` to sensible default values (10MB). [#326](https://github.com/grafana/cortex-jsonnet/pull/326)
* [CHANGE] Decreased `-server.grpc-max-concurrent-streams` from 100k to 10k. [#369](https://github.com/grafana/cortex-jsonnet/pull/369)
* [CHANGE] Decreased blocks storage ingesters graceful termination period from 80m to 20m. [#369](https://github.com/grafana/cortex-jsonnet/pull/369)
* [CHANGE] Increase the rules per group and rule groups limits on different tiers. [#396](https://github.com/grafana/cortex-jsonnet/pull/396)
* [CHANGE] Removed `max_samples_per_query` limit, since it only works with chunks and only when using `-distributor.shard-by-all-labels=false`. [#397](https://github.com/grafana/cortex-jsonnet/pull/397)
* [CHANGE] Removed chunks storage query sharding config support. The following config options have been removed: [#398](https://github.com/grafana/cortex-jsonnet/pull/398)
* `_config` > `queryFrontend` > `shard_factor`
* `_config` > `queryFrontend` > `sharded_queries_enabled`
* `_config` > `queryFrontend` > `query_split_factor`
* [CHANGE] Rename ruler_s3_bucket_name and ruler_gcs_bucket_name to ruler_storage_bucket_name: [#415](https://github.com/grafana/cortex-jsonnet/pull/415)
* [CHANGE] Fine-tuned rolling update policy for distributor, querier, query-frontend, query-scheduler. [#420](https://github.com/grafana/cortex-jsonnet/pull/420)
* [CHANGE] Increased memcached metadata/chunks/index-queries max connections from 4k to 16k. [#420](https://github.com/grafana/cortex-jsonnet/pull/420)
* [CHANGE] Disabled step alignment in query-frontend to be compliant with PromQL. [#420](https://github.com/grafana/cortex-jsonnet/pull/420)
* [CHANGE] Do not limit compactor CPU and request a number of cores equal to the configured concurrency. [#420](https://github.com/grafana/cortex-jsonnet/pull/420)
* [ENHANCEMENT] Add overrides config to compactor. This allows setting retention configs per user. [#386](https://github.com/grafana/cortex-jsonnet/pull/386)
* [ENHANCEMENT] Added 256MB memory ballast to querier. [#369](https://github.com/grafana/cortex-jsonnet/pull/369)
* [ENHANCEMENT] Update `etcd-operator` to latest version (see https://github.com/grafana/jsonnet-libs/pull/480). [#263](https://github.com/grafana/cortex-jsonnet/pull/263)
* [ENHANCEMENT] Add support for Azure storage in Alertmanager configuration. [#381](https://github.com/grafana/cortex-jsonnet/pull/381)
* [ENHANCEMENT] Add support for running Alertmanager in sharding mode. [#394](https://github.com/grafana/cortex-jsonnet/pull/394)
* [ENHANCEMENT] Allow to customize PromQL engine settings via `queryEngineConfig`. [#399](https://github.com/grafana/cortex-jsonnet/pull/399)
* [ENHANCEMENT] Define Azure object storage ruler args. [#416](https://github.com/grafana/cortex-jsonnet/pull/416)
* [ENHANCEMENT] Added the following config options to allow to schedule multiple replicas of the same service on the same node: [#418](https://github.com/grafana/cortex-jsonnet/pull/418)
* `cortex_distributor_allow_multiple_replicas_on_same_node`
* `cortex_ruler_allow_multiple_replicas_on_same_node`
* `cortex_querier_allow_multiple_replicas_on_same_node`
* `cortex_query_frontend_allow_multiple_replicas_on_same_node`
* [BUGFIX] Alertmanager: fixed `--alertmanager.cluster.peers` CLI flag passed to alertmanager when HA is enabled. [#329](https://github.com/grafana/cortex-jsonnet/pull/329)
* [BUGFIX] Fixed `-distributor.extend-writes` setting on ruler when `unregister_ingesters_on_shutdown` is disabled. [#369](https://github.com/grafana/cortex-jsonnet/pull/369)
* [BUGFIX] Treat `compactor_blocks_retention_period` type as string rather than int.[#395](https://github.com/grafana/cortex-jsonnet/pull/395)
* [BUGFIX] Pass `-ruler-storage.s3.endpoint` to ruler when using S3. [#421](https://github.com/grafana/cortex-jsonnet/pull/421)

### Query-tee

* [ENHANCEMENT] Added `/api/v1/query_exemplars` API endpoint support (no results comparison). #168
Expand Down
15 changes: 12 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# WARNING: do not commit to a repository!
-include Makefile.local

.PHONY: all test test-with-race integration-tests cover clean images protos exes dist doc clean-doc check-doc push-multiarch-build-image license check-license format check-mixin check-mixin-jb check-mixin-mixtool checkin-mixin-playbook build-mixin format-mixin push-multiarch-mimir list-image-targets
.PHONY: all test test-with-race integration-tests cover clean images protos exes dist doc clean-doc check-doc push-multiarch-build-image license check-license format check-mixin check-mixin-jb check-mixin-mixtool checkin-mixin-playbook build-mixin format-mixin check-jsonnet-manifests format-jsonnet-manifests push-multiarch-mimir list-image-targets
.DEFAULT_GOAL := all

# Version number
Expand Down Expand Up @@ -39,9 +39,12 @@ UPTODATE := .uptodate
# path to jsonnetfmt
JSONNET_FMT := jsonnetfmt

# path to the mimir/mixin
# path to the mimir-mixin
MIXIN_PATH := operations/mimir-mixin

# path to the mimir jsonnet manifests
JSONNET_MANIFESTS_PATH := operations/mimir

.PHONY: image-tag
image-tag:
@echo $(IMAGE_TAG)
Expand Down Expand Up @@ -369,7 +372,6 @@ check-white-noise: clean-white-noise

check-mixin: format-mixin check-mixin-jb check-mixin-mixtool check-mixin-playbook
@echo "Checking diff:"
git diff
@git diff --exit-code -- $(MIXIN_PATH) || (echo "Please format mixin by running 'make format-mixin'" && false)

@cd $(MIXIN_PATH) && \
Expand All @@ -396,6 +398,13 @@ build-mixin: check-mixin-jb
format-mixin:
@find $(MIXIN_PATH) -type f -name '*.libsonnet' -print -o -name '*.jsonnet' -print | xargs jsonnetfmt -i

check-jsonnet-manifests: format-jsonnet-manifests
@echo "Checking diff:"
@git diff --exit-code -- $(JSONNET_MANIFESTS_PATH) || (echo "Please format jsonnet manifests by running 'make format-jsonnet-manifests'" && false)

format-jsonnet-manifests:
@find $(JSONNET_MANIFESTS_PATH) -type f -name '*.libsonnet' -print -o -name '*.jsonnet' -print | xargs jsonnetfmt -i

check-tsdb-blocks-storage-s3-docker-compose-yaml:
cd development/tsdb-blocks-storage-s3 && make check

Expand Down
Loading

0 comments on commit 529646f

Please sign in to comment.