Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA is not scaling beyond one replica using azure storage queue #6183

Closed
led94 opened this issue Sep 24, 2024 Discussed in #6167 · 9 comments
Closed

KEDA is not scaling beyond one replica using azure storage queue #6183

led94 opened this issue Sep 24, 2024 Discussed in #6167 · 9 comments

Comments

@led94
Copy link

led94 commented Sep 24, 2024

Discussed in #6167

Hi there!

I am having an issue when trying to scale from azure storage queue, seems like it is not properly calculating the messages on the queue and it never goes beyond 1 replica.

Here is my scaled object

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: queue-so
  namespace: digitalafe-24246-dev
spec:
  scaleTargetRef:
    name: pet-predictor
  pollingInterval: 30  # Optional. Default: 30 seconds
  cooldownPeriod:  30 # Optional. Default: 300 seconds
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
  - type: azure-queue
    metadata:
      queueName: catfish22
      # Does not seen to work: Target value for queue length passed to the scaler.
      # so below, if there are 9 messages, 3 pods should be created
      queueLength: '3'
      activationQueueLength: '0'
      connectionFromEnv: CONNECTION_STRING
      accountName: gjmfunky2
      cloud: AzurePublicCloud

The deployment is a custom image which takes a message from the queue, deletes it and then has a cooldown before processing the next one.

If there are no messages on the queue it properly scales to 0, but once I add messages it goes to one and never scales beyond it, I have tested with 20 messages but it just stays on 1.

The keda version I have is 2.13.0, I am working on AKS, and my kubernetes version is 1.30.3. On the logs I do not see any errors, just this when I started the test

2024-09-24T15:11:40Z	INFO	scaleexecutor	Successfully set ScaleTarget replicas count to ScaledObject minReplicaCount	{"scaledobject.Name": "queue-so", "scaledObject.Namespace": "digitalafe-24246-dev", "scaleTarget.Name": "pet-predictor", "Original Replicas Count": 1, "New Replicas Count": 0}
2024-09-24T15:10:10Z	INFO	scaleexecutor	Successfully updated ScaleTarget	{"scaledobject.Name": "queue-so", "scaledObject.Namespace": "digitalafe-24246-dev", "scaleTarget.Name": "pet-predictor", "Original Replicas Count": 0, "New Replicas Count": 1}
2024-09-24T15:09:40Z	INFO	Initializing Scaling logic according to ScaledObject Specification	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"queue-so","namespace":"digitalafe-24246-dev"}, "namespace": "digitalafe-24246-dev", "name": "queue-so", "reconcileID": "d68b90a6-f7c8-4361-b98f-47af1182fda6"}

There are no entry logs on the metrics api server, not at least for the metric from the scaled object

The hpa always shows 0/3 (avg) on target

$ kubectl get hpa -n digitalafe-24246-dev 
NAME                         REFERENCE                           TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-queue-so   Deployment/pet-predictor   0/3 (avg)     1                 10                 1                24h

Let me know if I should share anything else.

@led94 led94 closed this as completed Sep 24, 2024
@led94 led94 reopened this Sep 24, 2024
@JorTurFer
Copy link
Member

Hello
Do you see any error in the HPA? If you are collecting KEDA metrics (prometheus or opentelemetry), there is a metric that you can use to check the raw value returned by the scaler

@led94
Copy link
Author

led94 commented Sep 24, 2024

Hello Do you see any error in the HPA? If you are collecting KEDA metrics (prometheus or opentelemetry), there is a metric that you can use to check the raw value returned by the scaler

Describing the HPA without messages on the queue I get the following message

Name:                                                 keda-hpa-queue-so
Namespace:                                            digitalafe-24246-dev
Labels:                                               app.kubernetes.io/managed-by=Helm
                                                      app.kubernetes.io/name=keda-hpa-queue-so
                                                      app.kubernetes.io/part-of=queue-so
                                                      app.kubernetes.io/version=2.13.0
                                                      scaledobject.keda.sh/name=queue-so
Annotations:                                          meta.helm.sh/release-name: pet-predictor
                                                      meta.helm.sh/release-namespace: digitalafe-24246-dev
CreationTimestamp:                                    Mon, 23 Sep 2024 09:16:00 -0500
Reference:                                            Deployment/pet-predictor
Metrics:                                              ( current / target )
  "s0-azure-queue-catfish22" (target average value):  <unknown> / 3
Min replicas:                                         1
Max replicas:                                         10
Deployment pods:                                      0 current / 0 desired
Conditions:
  Type            Status  Reason             Message
  ----            ------  ------             -------
  AbleToScale     True    SucceededGetScale  the HPA controller was able to get the target's current scale
  ScalingActive   False   ScalingDisabled    scaling is disabled since the replica count of the target is zero
  ScalingLimited  True    TooFewReplicas     the desired replica count is less than the minimum replica count

With 15 messages I get

Name:                                                 keda-hpa-queue-so
Namespace:                                            digitalafe-24246-dev
Labels:                                               app.kubernetes.io/managed-by=Helm
                                                      app.kubernetes.io/name=keda-hpa-queue-so
                                                      app.kubernetes.io/part-of=queue-so
                                                      app.kubernetes.io/version=2.13.0
                                                      scaledobject.keda.sh/name=queue-so
Annotations:                                          meta.helm.sh/release-name: pet-predictor
                                                      meta.helm.sh/release-namespace: digitalafe-24246-dev
CreationTimestamp:                                    Mon, 23 Sep 2024 09:16:00 -0500
Reference:                                            Deployment/pet-predictor
Metrics:                                              ( current / target )
  "s0-azure-queue-catfish22" (target average value):  0 / 3
Min replicas:                                         1
Max replicas:                                         10
Deployment pods:                                      1 current / 1 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from external metric s0-azure-queue-catfish22(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: queue-so,},MatchExpressions:[]LabelSelectorRequirement{},})
  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count
Events:           <none>

Those messages are not really telling me much if I am honest with you.

I am not using Prometheus nor opentelemetry, I will take a look.

So far what's interesting is that the metrics pass from unknown to 0.

@JorTurFer
Copy link
Member

So far what's interesting is that the metrics pass from unknown to 0.

It's because when the replicas are 0, the HPA controller can't calculate the average (X/0=E), that's why you see unknown when there are 0 replicas

@led94
Copy link
Author

led94 commented Sep 24, 2024

So far what's interesting is that the metrics pass from unknown to 0.

It's because when the replicas are 0, the HPA controller can't calculate the average (X/0=E), that's why you see unknown when there are 0 replicas

Ok, understood. But regarding the main issue, the metrics never go over 0, and the deployment is not scaling beyond 1 pod. I do not see any errors on the logs, the accesses are working as Keda at least detects whenever there are messages but the scaling is not working.

Other scalers are working, so I am not sure if I am doing anything wrong here...

@JorTurFer
Copy link
Member

That's why I asked about KEDA metrics, you can check the Prometheus metric keda_scaler_metrics_value or the opentelemetry metric keda.scaler.metrics.value.

Then you can see the raw metic recovered by the scaler to verify what's KEDA seeing during upstream calls

@led94
Copy link
Author

led94 commented Sep 26, 2024

Sorry about the delay, I was able to solve the issue upgrading KEDA, but not exactly having it on latest version. I tried the upgrade as someone else tried on a new cluster and it worked, I used helm with

$ helm upgrade keda kedacore/keda --version 2.15.1

But once it finished I checked the version on each deployment, I noticed that the image for the keda metrics api server was ghcr.io/kedacore/keda-metrics-apiserver:2.10.0. That might have been like that for some time now, as we were using 2.13 for the rest. I manually edited the deployment and it started working.

@JorTurFer, do you have any idea what could that have happened? Thanks for your time btw

@JorTurFer
Copy link
Member

JorTurFer commented Sep 27, 2024

Do you mean that just using

helm upgrade keda kedacore/keda --version 2.15.1

the installation is using 2.10.0? I think that's only possible if you provide a pinned image through values file, but your command isn't passing any value file :/

@led94
Copy link
Author

led94 commented Sep 27, 2024

Not that, rather that it does not update the version that is deployed for the metrics api server, that is the initial version I had on my cluster

@JorTurFer
Copy link
Member

so, can I close the issue?

@led94 led94 closed this as completed Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready To Ship
Development

No branches or pull requests

2 participants