Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding RemainingActivityListener for diagnosis #643

Merged
merged 2 commits into from
Aug 24, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions src/main/java/org/jvnet/hudson/test/RemainingActivityListener.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
/*
* The MIT License
*
* Copyright 2023 CloudBees, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/

package org.jvnet.hudson.test;

import hudson.Extension;
import hudson.model.Computer;
import hudson.model.Executor;
import hudson.model.Queue;
import java.util.logging.Logger;
import jenkins.model.Jenkins;

/**
* Checks for any ongoing activity (queue, builds) that might interfere with {@link TemporaryDirectoryAllocator#dispose}.
*/
@Extension
public final class RemainingActivityListener implements EndOfTestListener {

/**
* Set to true if you wish for warnings to be treated as fatal test errors.
*/
private static final boolean fatal = Boolean.getBoolean(RemainingActivityListener.class.getName() + ".fatal");

/**
* Set to true if you wish for test shutdown to just wait for activity to cease (not always appropriate).
*/
private static final boolean wait = Boolean.getBoolean(RemainingActivityListener.class.getName() + ".wait");

@Override
public void onTearDown() throws Exception {
if (wait) {
String problem;
while ((problem = problem()) != null) {
Logger.getLogger(RemainingActivityListener.class.getName()).warning(problem);
Thread.sleep(5_000);
}
} else {
String problem = problem();
if (problem != null) {
if (fatal) {
throw new AssertionError(problem);
} else {
Logger.getLogger(RemainingActivityListener.class.getName()).warning(problem);
}
}
}
}

private static String problem() {
for (Computer c : Jenkins.get().getComputers()) {
for (Executor x : c.getAllExecutors()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is incorrect. QueueTaskFuture#get (which is what JenkinsRule uses to determine build completion) waits until AsyncFutureImpl#set is called to complete the future before notifying consumers. That call is done by WorkUnitContext#synchronizeEnd via Executor#finish1. At that point, Executor#isIdle is still false and the executor is still in Computer#getAllExecutors (which is what the code in this PR is checking). Executor#finish2 is called after Executor#finish1 and calls Computer#removeExecutor which (possibly asynchronously!) calls executors.remove(e) to remove the executor, at which point Computer#getAllExecutors will finally stop returning the non-idle executor. If Executor#finish1 completes before the executor is removed by Executor#finish2 (possible in certain scheduling scenarios) then the code in this PR will result in a false positive. Similarly, if Executor#finish2 decides to run the removal asynchronously (possible in certain scheduling scenarios) then the code in this PR will also result in a false positive.

There are really two things that need to be solved to make this code correct:

  • The code in this PR needs to wait until both Executor#finish1 and Executor#finish2 have been called before calling Computer#getAllExecutors. Currently this code runs after QueueTaskFuture#get which means that Executor#finish1 has been called but Executor#finish2 has not necessarily been called.
  • If Executor#finish2 has been called but has placed its task onto Computer#threadPoolForRemoting (which happens only sometimes) then the code in this PR needs to wait until that task has completed, for example by having Computer#removeExecutor keep track of how many pending removals are in progress.

if (!x.isIdle()) {
return x.getCurrentExecutable() + " still seems to be running, which could break deletion of log files or metadata";
}
}
}
for (Queue.Item q : Queue.getInstance().getItems()) {
return q + " is still scheduled, which if it ever runs, could break deletion of log files or metadata";
}
return null;
}

}