Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Testing] Add debugging feature which leaves integration test containers running after test completes #9626

Merged

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Feb 19, 2021

Motivation

For debugging purposes, it is useful to have the ability to leave the integration test containers running after the test completes. For example, this feature was necessary in investigating the issue #9622 . It was possible to view the log files and find out the issue. If the containers are killed, this options is lost.

Modifications

Adds handling to the initialization and stopping of Pulsar containers and Pulsar cluster so that containers get configured using Testcontainers "reuse mode" which leaves containers running after the test JVM stops. Normally the Testcontainers automatic container cleanup feature stops all containers which weren't explicitly stopped during the tests.
Testcontainers reuse mode must be enabled by setting environment variable TESTCONTAINERS_REUSE_ENABLE=true (or by setting testcontainers.reuse.enable=true in ~/.testcontainers.properties).

The modifications in this PR skip stopping PulsarContainer and PulsarCluster instances if environment variable PULSAR_CONTAINERS_LEAVE_RUNNING=true .

Usage example

In unix shells, one can pass environment variables by prepending the command with the variables, for example:

PULSAR_CONTAINERS_LEAVE_RUNNING=true TESTCONTAINERS_REUSE_ENABLE=true mvn -B -f tests/pom.xml test -DintegrationTests -DredirectTestOutputToFile=false -DtestRetryCount=0 -DfailIfNoTests=false -Dtest=CLITest#testCreateSubscriptionCommand

After the test run, one can use docker ps and docker exec -it [container_name] bash to get a shell in the running container that was left behind the test run when this feature introduced by this PR is active.

After debugging, one can use this command to kill all containers that were left running:

docker kill $(docker ps -q --filter "label=pulsarcontainer=true")

(the solution in this PR labels the containers with "pulsarcontainer=true")

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will save lot of time while working on integration tests!

LGTM

Copy link
Member

@zymap zymap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It would be wonderful if we can add the description into the test README.md, that would be helpful for others who want to debug the tests.

@zymap zymap added this to the 2.8.0 milestone Feb 19, 2021
@lhotari
Copy link
Member Author

lhotari commented Feb 19, 2021

It would be wonderful if we can add the description into the test README.md, that would be helpful for others who want to debug the tests.

@zymap Makes sense. I was thinking of updating the README later. I have plans to add features for enabling attaching a debugger to a container as well as enabling the use of JConsole / Java Mission Control / Java Flight Recorder by enabling JMX for the JVMs in a configurable way. The README could be updated at that time to also cover the feature of this current PR.

@lhotari
Copy link
Member Author

lhotari commented Feb 19, 2021

/pulsarbot run-failure-checks

@bsideup
Copy link
Member

bsideup commented Feb 19, 2021

@lhotari TBH I would advice against this approach. There is no guarantee that the JVM hook will be using stop() (we thought about terminating multiple containers by the session label filter). Not to mention that Ryuk generally should not be disabled, unless your CI does not support it (the reason why the flag was added on a first place).

If you need to the container to remain running, consider using the reusable containers mode.

@lhotari
Copy link
Member Author

lhotari commented Feb 19, 2021

@bsideup Thanks for your comments. In this case, the changes in this PR aren't meant to be used in CI at all. The reason to leave the container running is to debug an issue locally. I've had the impression that it's the reason why TESTCONTAINERS_RYUK_DISABLED=true exists. At least, that's how I've been using it in the past.

Would you also recommend using the reusable container mode for also for the debugging use case that I have described? I guess that would require also enabling testcontainers.reuse.enable in ~/.testcontainers.properties?

I have one concern. The problem with reusable container mode is that it impacts the execution when a container gets reused. I'd simply like to leave the containers after the test and the test JVM completes so that I could open a shell to the container and inspect the state. Perhaps this would be a feature request for Testcontainers?
Is there a workaround with reusable containers by adding a random label to the container etc so that the container would never get reused but just left behind?

@bsideup
Copy link
Member

bsideup commented Feb 19, 2021

@lhotari

I've had the impression that it's the reason why TESTCONTAINERS_RYUK_DISABLED=true exists

We added the flag because BitBucket did not support mounting Docker socket a few years ago, but, since the builds were ephemeral, Ryuk could be omitted.

Is there a workaround with reusable containers by adding a random label to the container etc so that the container would never get reused but just left behind?

Yes. The reusable feature works by hashing the container's definition. If you make the hash unique (e.g. random label / env variable / network alias / network id) then a new container will be started each execution and old containers won't be terminated (one can see it as a disadvantage but in your case this is exactly what you need :D)

@lhotari
Copy link
Member Author

lhotari commented Feb 19, 2021

Thanks for the advice @bsideup . I'll revisit this PR so that it doesn't depend on TESTCONTAINERS_RYUK_DISABLED=true and uses the reuse containers feature as part of the solution. That's cleaner.

@lhotari lhotari marked this pull request as draft February 19, 2021 16:04
…g after test completes

- For debugging purposes, it is useful to have the ability to leave containers running.
  This mode can be activated by setting environment variables
  PULSAR_CONTAINERS_LEAVE_RUNNING=true and TESTCONTAINERS_REUSE_ENABLE=true
  - in this case, use this command afterwards to kill containers that
    were left running:
    docker kill $(docker ps -q --filter "label=pulsarcontainer=true")
@lhotari lhotari force-pushed the lh-add-testcontainers-debugging-feature branch from a0fd5d8 to f24d888 Compare February 22, 2021 09:50
@lhotari
Copy link
Member Author

lhotari commented Feb 22, 2021

@bsideup I have revisited the solution in this PR to use the Testcontainers reuse mode instead of depending on the usage of TESTCONTAINERS_RYUK_DISABLED=true. Would you mind reviewing the changes?

@lhotari
Copy link
Member Author

lhotari commented Feb 22, 2021

/pulsarbot run-failure-checks

@bsideup
Copy link
Member

bsideup commented Feb 22, 2021

@lhotari it is okay :) I had a feeling that the stop() override is unnecessary but you know your usage better 👍

@lhotari
Copy link
Member Author

lhotari commented Feb 22, 2021

/pulsarbot run-failure-checks

@lhotari lhotari marked this pull request as ready for review February 23, 2021 11:53
@sijie sijie merged commit 458bd91 into apache:master Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants