Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce testing for identifying regressions which can introduce ttrpc deadlocks #184

Open
rawahars opened this issue Jan 18, 2025 · 2 comments

Comments

@rawahars
Copy link

As detailed in #72, in the past, ttrpc has encountered deadlocks on the server side or client side when mismatching versions were used.

Version Range Description Comments
v1.0.2 and before Original deadlock bug #94 for fixing deadlock in v1.1.0
v1.1.0 - v1.2.0 No known deadlock bugs
v1.2.0 - v1.2.4 Streaming with a new deadlock bug #107 introduced deadlock in v1.2.0
After v1.2.4 No known deadlock bugs #168 for fixing deadlock in v1.2.4

While the current version of ttrpc does not display any deadlocks, we want to introduce a CI regression test which can test the current code against the following older versions in both server and client scenarios:

  • v1.0.2
  • v1.1.0
  • v1.2.0
  • v1.2.4
  • latest

This issue is filed for discussions related to the plan of how such a matrix testing can be introduced for ttrpc.

@rawahars
Copy link
Author

The objective is to test the latest code against older versions (v1.0.2, v1.1.0, v1.2.0, v1.2.4, current code) by running a matrix of tests to identify potential deadlocks caused by version mismatches. The tests will involve running the latest code as the server with older versions as clients and vice versa.

To test different versions of ttrpc against the latest code, we encounter a challenge with circular dependencies, which are not supported by Go. Using nested Go modules could bypass this limitation, but it introduces a significant drawback: we would only be able to stress test changes when a new tag is created in the package. Consequently, this approach prevents testing changes in each PR. If a potential deadlock is identified in a release, we would need to scan through all the changes to locate the regression, making debugging more time-consuming and less efficient.

Proposed Approach

The proposed approach involves creating a stress tool based on the latest code, backporting it to older releases, and automating tests via a script that builds and orchestrates tests across versions. This script, integrated into GitHub Actions via a make target, ensures early detection of compatibility issues during CI testing.

Steps Involved in the Approach:

Development of the Stress Tool:

  • A dedicated stress tool will be created, designed to depend on the latest code changes.
  • This tool will be integrated into the mainline codebase to ensure it remains updated with ongoing development.
  • stress tool can be ran as a server or client . It represents a simple client-server interaction where the client sends continuous high-volume requests to the server, and the server responds with the same data, allowing for testing of concurrent request handling and response verification.

Backporting the Stress Tool:

Once the stress tool is created, it will be backported into the older releases against which we need to test the latest code.
For now these would be v1.0.2, v1.1.0, v1.2.0, and v1.2.4 versions. To accomplish the same, we would create branches out of the version tagged code. Check-in the stress tool and then cut a new tag.

Script for Automation:

A script will be developed in the mainline codebase to automate the testing process. This script will handle the following tasks:

  • Build the stress tool using the latest code from the branch the script is executing from.
  • Pull and build the corresponding stress tool versions from the identified older releases.
  • Orchestrate the testing by executing a matrix of tests where the latest tool interacts with older versions and vice versa.
  • For each pair of server and client version, the script will run the stress tool for a given number of iterations with the specified number of workers (say 100000 and 100 respectively). If the test is not completed within a specified time period (say 5 minutes) then the script will terminate the test and exit with an appropriate error code, signaling the failure.

Integration with CI/CD Pipeline:

  • To ensure continuous validation, the stress tool will be invoked through a make target.
  • This make target will be incorporated into the GitHub Actions workflow as part of the Continuous Integration (CI) process.
    During CI testing, the script will execute the stress tests automatically, providing immediate feedback on the compatibility and robustness of the latest changes against older releases.

@rawahars
Copy link
Author

@dmcgowan Please do take a look and provide feedback when you can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant