Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Flex Consumption Function App Service Bus Trigger #10706

Open
dsdilpreet opened this issue Dec 19, 2024 · 8 comments
Open

Question about Flex Consumption Function App Service Bus Trigger #10706

dsdilpreet opened this issue Dec 19, 2024 · 8 comments
Labels
area: flex-consumption Items related to Flex Consumption support Needs: Triage (Functions)

Comments

@dsdilpreet
Copy link

Is your question related to a specific version? If so, please specify:

What language does your question apply to? (e.g. C#, JavaScript, Java, All)

C#

Question

Hi there, I asked this question over here, but this might be more appropriate place to ask, if not my apologies. The question is about service bus messages stay in queue for longer on flex consumption compared to consumption. The explanation is:

I have multiple functions which subscribe to service bus topics using the ServiceBusTrigger. All function apps are running .NET 8 isolated model. The service bus is using standard tier. Each function app is being pinged from application insights every 10 minutes.

I have recently migrated from windows consumption plan to flex consumption plan.

On the consumption plan, when the app scaled down to 0, service bus requests would also drop whereas on flex consumption, it doesn't drop when function apps scale down to 0. It only drops when I turn them off.

I understand that service bus functions are now scaled independently of other trigger types on flex consumption plan. What I am noticing now on flex is that there is some delay even before a message is picked up from the subscription by the trigger in the function, I have seen almost up to 2 minutes. I have never observed delays of this kind on consumption plan.

Image

Is this expected? Is there any configuration or setting I can change in the function app, so it checks for messages more frequently? Even when I have always on instance enabled, the delay still seems to be there although it does seem a bit reduced but then again I haven't done too much with this.

Appreciate any insight into how this works internally.

Thank you!

My host.json file

{
    "version": "2.0",
    "logging": {
        "applicationInsights": {
            "samplingSettings": {
                "isEnabled": true,
                "excludedTypes": "Request"
            },
            "enableLiveMetricsFilters": true
        }
    }
}

and appsettings.json

{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Function.GetHealthFunction": "Error",
      "Azure.Messaging.ServiceBus": "Warning",
      "Azure.Core": "Warning"
    }
  }
}

These files have not been changed when migrating to flex consumption.

For context, I have randomly picked a event from app insights before the changeover to flex.

Image

@satvu satvu added the area: flex-consumption Items related to Flex Consumption support label Dec 20, 2024
@dsdilpreet
Copy link
Author

I have created a little sample to help reproduce this issue.
https://github.com/dsdilpreet/flex-consumption-service-bus-sample

The sample contains a bicep script to deploy relevant infra and two function apps, one running consumption and the other running flex consumption. As you can see from the results below flex consumption is significantly slower. It took 8 and 11 seconds for flex and just milliseconds for consumption to trigger after message was enqueued to the same topic. I made sure in both cases, exactly one instance was running for each function app, so there was no cold start involved.

Test 1
Consumption
Image
Flex Consumption
Image

Test 2
Consumption
Image
Flex Consumption
Image

@dsdilpreet
Copy link
Author

Hi @nzthiago! Is this your area of expertise? Would really appreciate your input, this is holding up our flex consumption deployment, unfortunately.

@nzthiago
Copy link
Member

Hi @dsdilpreet - thank you for pinging me, and for sharing a repro. I also added a Linux Consumption app to the test, and for the initial message I can see the same results as you (with Linux Consumption being similar to Flex Consumption). There is likely a "scale from zero" optimization that was done to Windows Consumption that we need to bring to Flex Consumption here.

Can you share what you experience with subsequent messages? I.e., if you wait, say, 30 minutes, and send another message, and then a few more quickly, does the behavior and latency difference change for you? I believe Flex Consumption should be faster for those.

@dsdilpreet
Copy link
Author

Hi @nzthiago - thanks for getting back.

I don't think its just the cold start at play here. I have done the test you said again on my end (same setup as repo still). I sent the first message (should be cold start because I haven't sent anything to the topic for days) and then I sent 3 more messages within seconds of sending the first one.

Message Windows Consumption Flex Consumption Notes
1 (cold start) ~12s ~12s roughly the same
2 (warm) pretty much instant ~9s
3 (warm) pretty much instant ~10s
4 (warm) pretty much instant ~10s

It seems like flex consumption doesn't poll the service bus as frequently compared to windows consumption. Do you have any insights about this?

Thanks for your help so far.

@nzthiago
Copy link
Member

nzthiago commented Jan 14, 2025

@dsdilpreet thank you for the extra tests, appreciate it. We now understand why you have these results. It is both related to how fast Flex Consumption scales in and how quickly it checks for new messages in the queue. Anything beyond 30 seconds between tests could have the Flex Consumption app scaled back to zero, which was the case for your tests. Once that queue or topic gets busy then Flex Consumption will scale and perform faster than Consumption.

We will discuss internally how to improve this, either with faster checks for changes or take longer to scale in, or both. In the meantime, if you need very fast response for that very initial message, this can be mitigated by enabling one Always Ready instance for that function.

Image

You would be able to see if the instance gets reused or if it's a new instance by looking at the cloud_RoleInstance field in an App Insights query against the traces table. Here's a sample query:

traces
| parse-where message with "Trigger Details: MessageId: " MessageId ", SequenceNumber: " SequenceNumber ", DeliveryCount: " DeliveryCount ", EnqueuedTimeUtc: " EnqueuedTimeUtc ", " *
| extend LatencyToTriggerMs = datetime_diff("millisecond", timestamp, todatetime(EnqueuedTimeUtc)) 
| project timestamp, EnqueuedTimeUtc, LatencyToTriggerMs, cloud_RoleName, cloud_RoleInstance, MessageId, SequenceNumber, DeliveryCount
| order by EnqueuedTimeUtc asc

With that one Always Ready instance the Flex Consumption app triggers in milliseconds the first and subsequent times:

Image

@dsdilpreet
Copy link
Author

@nzthiago, you are right. When I send a message very quickly after an instance has started, the flex consumption plan does process it pretty much instantly. But the function seems to scale in irrespective of traffic after about 30 seconds i.e. even if I keep sending messages it will still scale in and an odd message will start a new instance.

We also tried an always on instance and it seems to remedy the problem but we do have a lot of subscriptions, so having an always on instance will have a significant cost implication for our solution.

It will be great if you could configure the polling / scaling as you said in your previous comment. Is there anyway we can track the progress of this as we would like to use flex consumption going forward when this is fixed?

Thank you for replicating this on your end and all your help so far!

@nzthiago
Copy link
Member

@dsdilpreet - we now have in our backlog to introduce a "last instance per function group / individual function remains for 10 minutes" feature, to mitigate the behavior you identified of the app scaling in too fast. This will likely take a few months to be implemented and roll out, so the workaround shared above is recommended for now, even though it might not be the best for your implementation. I will update our documentation once it does roll out, thank you for highlighting this! @pragnagopa @alrod FYI.

@dsdilpreet
Copy link
Author

Thank you! @nzthiago

Rough timeline helps as well.

Would this also address the service bus polling frequency issue, you know how sometimes a message can stay in the bus for a while before an instance begins to init?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: flex-consumption Items related to Flex Consumption support Needs: Triage (Functions)
Projects
None yet
Development

No branches or pull requests

3 participants