Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure websubhub] Analyze performace test results for content-publishing #4508

Closed
ayeshLK opened this issue May 31, 2023 · 3 comments
Closed
Assignees
Labels
Area/AzureWebsubHub Tasks related to Azure websubhub deployment Team/PCM Protocol connector packages related issues Type/Task

Comments

@ayeshLK
Copy link
Member

ayeshLK commented May 31, 2023

Description:

$subject

Asgardeo perf test results can be hound here [1]

[1] - https://docs.google.com/spreadsheets/d/1DV2-bXqFff-UPR76sS3pLpEeohRVtAdPlmpwpEJ1EbA/edit?usp=sharing

@ayeshLK ayeshLK added Type/Task Team/PCM Protocol connector packages related issues Area/AzureWebsubHub Tasks related to Azure websubhub deployment labels May 31, 2023
@ayeshLK ayeshLK self-assigned this May 31, 2023
@ayeshLK ayeshLK moved this from On Hold to In Progress in Ballerina Team Main Board Jun 6, 2023
@ayeshLK
Copy link
Member Author

ayeshLK commented Jun 7, 2023

Analyzed the performance test results for the given parameters mentioned in the "Round 2 - After correlation log fix" sheet. The settings used were as follows:

  • Number of tenants: 100
  • Number of SPs per tenant: 10
  • Number of users per tenant: 10

For the concurrency configurations of 1, 20, and 50, there were no instances of HTTP 502 or HTTP 503 responses. However, a few (less than 10) request timeouts were observed. These timeouts could be attributed to the low timeout value (300ms) set in the Asgardeo publisher client. To resolve these timeouts, it is recommended to update the timeout configuration in the Asgardeo publisher client to the previously agreed value of 5 seconds.

On the other hand, when using the concurrency configurations of 100 and 200, a significant number of HTTP 502 and HTTP 503 responses were observed, along with a quite a few request timeouts. The occurrence of these HTTP 502 and HTTP 503 responses can be attributed to the resource limitations in the hub pods. To improve the performance of the hub and enable support for higher concurrency settings, it is necessary to update the current resource limitations applied to the hub pods.

@ayeshLK ayeshLK closed this as completed Jun 7, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Ballerina Team Main Board Jun 7, 2023
@ayeshLK ayeshLK reopened this Jun 20, 2023
@ayeshLK ayeshLK moved this from Done to In Progress in Ballerina Team Main Board Jun 20, 2023
@ayeshLK
Copy link
Member Author

ayeshLK commented Jun 20, 2023

Asgardeo team has conducted round-3 performance tests [1] with the suggested changes. In the mean-time relevant resource limits were updated in the azure-websubhub deployment as well. They found following observations:

  • No HTTP 502 or HTTP 503 responses from the hub
  • With the increased concurrency setting with 100 topics no time-out responses found.
  • With increased concurrency setting with only 1 topic there were several time-out responses found.

azure-websubhub development team will complete following action items to provide a proper analysis:

  • Analyze the network connection stats between the AKS cluster (in which the hub is deployed) and the Azure service-bus namespace to identify any network delays.
  • Do a load-test on the Ballerina Azure service bus connector to check for any performance issues in the connector level.

[1] - https://docs.google.com/spreadsheets/d/1DV2-bXqFff-UPR76sS3pLpEeohRVtAdPlmpwpEJ1EbA/edit?usp=sharing

@ayeshLK
Copy link
Member Author

ayeshLK commented Jun 20, 2023

Conducted a load-test on the Ballerina Azure service bus connector using following sample-code:

import ballerinax/asb;
import ballerina/os;
import ballerina/http;

final string CONNECTION_STRING = os:getEnv("CONNECTION_STRING");
configurable string TOPIC_NAME = ?;

service / on new http:Listener(9090) {
    private final asb:MessageSender producer;

    function init() returns error? {
        self.producer = check new ({
            entityType: asb:TOPIC,
            topicOrQueueName: TOPIC_NAME,
            connectionString: CONNECTION_STRING
        });
    }

    resource function post test(@http:Payload json payload) returns http:Ok|error? {
        check self.producer->send({
            body:  payload.toJsonString().toBytes(),
            contentType: asb:JSON
        });
        return {};
    }
}

Resource constraints:

  • CPU: 2
  • Memory: 1GB
  • Ballerina Max Pool Size: 100

For the load-test we configured a request-timeout of 5 seconds (which is the current configuration for Asgardeo publisher).

Following are the results:

Concurrency # Samples Average Median 90% Line 95% Line 99% Line Min Max Error % Throughput Received KB/sec Sent KB/sec
1 1714 349 312 416 486 653 277 5006 0.058% 2.85650 0.27 0.61
5 6080 572 611 757 862 1088 277 5006 0.016% 4.44536 0.41 0.95
10 75844 648 602 870 1112 2032 277 5021 0.171% 22.29318 2.14 4.78
20 31984 645 615 804 921 1407 277 5010 0.191% 11.60340 1.12 2.49
50 43860 650 586 967 1253 2346 284 5021 0.157% 73.07077 6.99 15.67

As per the results there is <1% error rate for all the concurrency settings and all the failed-requests are due to the time-outs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/AzureWebsubHub Tasks related to Azure websubhub deployment Team/PCM Protocol connector packages related issues Type/Task
Projects
Archived in project
Development

No branches or pull requests

1 participant