Introduce testing for identifying regressions which can introduce ttrpc deadlocks #184

rawahars · 2025-01-18T13:10:21Z

As detailed in #72, in the past, ttrpc has encountered deadlocks on the server side or client side when mismatching versions were used.

Version Range	Description	Comments
v1.0.2 and before	Original deadlock bug	#94 for fixing deadlock in `v1.1.0`
v1.1.0 - v1.2.0	No known deadlock bugs
v1.2.0 - v1.2.4	Streaming with a new deadlock bug	#107 introduced deadlock in `v1.2.0`
After v1.2.4	No known deadlock bugs	#168 for fixing deadlock in `v1.2.4`

While the current version of ttrpc does not display any deadlocks, we want to introduce a CI regression test which can test the current code against the following older versions in both server and client scenarios:

v1.0.2
v1.1.0
v1.2.0
v1.2.4
latest

This issue is filed for discussions related to the plan of how such a matrix testing can be introduced for ttrpc.

The text was updated successfully, but these errors were encountered:

rawahars · 2025-01-20T09:51:13Z

The objective is to test the latest code against older versions (v1.0.2, v1.1.0, v1.2.0, v1.2.4, current code) by running a matrix of tests to identify potential deadlocks caused by version mismatches. The tests will involve running the latest code as the server with older versions as clients and vice versa.

To test different versions of ttrpc against the latest code, we encounter a challenge with circular dependencies, which are not supported by Go. Using nested Go modules could bypass this limitation, but it introduces a significant drawback: we would only be able to stress test changes when a new tag is created in the package. Consequently, this approach prevents testing changes in each PR. If a potential deadlock is identified in a release, we would need to scan through all the changes to locate the regression, making debugging more time-consuming and less efficient.

Proposed Approach

The proposed approach involves creating a stress tool based on the latest code, backporting it to older releases, and automating tests via a script that builds and orchestrates tests across versions. This script, integrated into GitHub Actions via a make target, ensures early detection of compatibility issues during CI testing.

Steps Involved in the Approach:

Development of the Stress Tool:

A dedicated stress tool will be created, designed to depend on the latest code changes.
This tool will be integrated into the mainline codebase to ensure it remains updated with ongoing development.
stress tool can be ran as a server or client . It represents a simple client-server interaction where the client sends continuous high-volume requests to the server, and the server responds with the same data, allowing for testing of concurrent request handling and response verification.

Backporting the Stress Tool:

Once the stress tool is created, it will be backported into the older releases against which we need to test the latest code.
For now these would be v1.0.2, v1.1.0, v1.2.0, and v1.2.4 versions. To accomplish the same, we would create branches out of the version tagged code. Check-in the stress tool and then cut a new tag.

Script for Automation:

A script will be developed in the mainline codebase to automate the testing process. This script will handle the following tasks:

Build the stress tool using the latest code from the branch the script is executing from.
Pull and build the corresponding stress tool versions from the identified older releases.
Orchestrate the testing by executing a matrix of tests where the latest tool interacts with older versions and vice versa.
For each pair of server and client version, the script will run the stress tool for a given number of iterations with the specified number of workers (say 100000 and 100 respectively). If the test is not completed within a specified time period (say 5 minutes) then the script will terminate the test and exit with an appropriate error code, signaling the failure.

Integration with CI/CD Pipeline:

To ensure continuous validation, the stress tool will be invoked through a make target.
This make target will be incorporated into the GitHub Actions workflow as part of the Continuous Integration (CI) process.
During CI testing, the script will execute the stress tests automatically, providing immediate feedback on the compatibility and robustness of the latest changes against older releases.

rawahars · 2025-01-20T09:55:32Z

@dmcgowan Please do take a look and provide feedback when you can!

rawahars mentioned this issue Jan 18, 2025

added the ttrpc stress utility #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce testing for identifying regressions which can introduce ttrpc deadlocks #184

Introduce testing for identifying regressions which can introduce ttrpc deadlocks #184

rawahars commented Jan 18, 2025

rawahars commented Jan 20, 2025

rawahars commented Jan 20, 2025

Introduce testing for identifying regressions which can introduce ttrpc deadlocks #184

Introduce testing for identifying regressions which can introduce ttrpc deadlocks #184

Comments

rawahars commented Jan 18, 2025

rawahars commented Jan 20, 2025

Proposed Approach

Steps Involved in the Approach:

Development of the Stress Tool:

Backporting the Stress Tool:

Script for Automation:

Integration with CI/CD Pipeline:

rawahars commented Jan 20, 2025