Skip to content

Commit

Permalink
CSMDS-315: Extend report.sh with Kafka custom resource dump (strimzi#22)
Browse files Browse the repository at this point in the history
CSMDS-321: Dump all Kafka resources in report.sh (strimzi#24)

CSMDS-329: Add all topic describe to report.sh (strimzi#37)

CSMDS-420: Fix report.sh to not fail when Kafka resource is being deleted during script run (strimzi#39)

CSMDS-317: Add java_thread_dump.sh to dump Java threads of all containers o… (strimzi#23)

CSMDS-445: Make cluster arg optional in report.sh (strimzi#47)

This will allow using report.sh on a namespace which only contains a cluster operator.

CSMDS-433: Fix getting a ready kafka broker pod with kubectl when describing topics (strimzi#48)

The head command will immediately return with first line and if kubectl writes anything to stdout after that, there will be nobody to receive it on the right side of the pipe. Because of that, the command will fail with error code 141.

CSMDS-450: Get events with -o wide flag in report.sh script (strimzi#51)

CSMDS-458: Update report.sh to be cluster-wide (strimzi#54)

To get a proper diagnostic bundle from a cluster, report.sh should be changed to dump all information.
This simplifies the process (should only be called once), and also makes sure that everything needed gets captured for diagnosing issues.

CSMDS-444: Dump license JSON in report.sh (strimzi#56)

CSMDS-444: Use secret.data to capture license content (strimzi#73)

MINOR: Allow report.sh to continue when a resource disappears (strimzi#74)

CSMDS-418: Fix local build issues (strimzi#58)

CSMDS-514: remove --request-timeout flag where it is buggy (strimzi#129)

CSMDS-600: report.sh fails to collect multiple replicasets (strimzi#149)

CSMDS-601: Don't export property files when using report.sh (strimzi#151)

CSMDS-388: Extend report.sh to dump all Kafka Connect CRs and KConnect status (strimzi#150)

CSMDS-598: Tolerating not found entities in report.sh (strimzi#162)

It could happen that between listing by type and the actual retrieval of an entity, the entity is being deleted.

CSMDS-637: Add k8s version to report.sh (strimzi#167)

CSMDS-588: Collect kafka-log-dirs output in report.sh (strimzi#172)

CSMDS-815: Add cluster ID and pod top to report.sh (strimzi#201)

CSMDS-803: Dump additional volumes in report.sh (strimzi#203)
  • Loading branch information
urbandan authored and patrik-marton committed Jan 30, 2025
1 parent 2bdac2a commit 808ae22
Show file tree
Hide file tree
Showing 3 changed files with 754 additions and 230 deletions.
2 changes: 1 addition & 1 deletion Makefile.docker
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ BUILD_ID ?= n/a
BUILD_COMMIT ?= n/a
RELEASE_VERSION ?= $(shell cat $(TOPDIR)/release.version)
DOCKER_PUSH ?= false
DOCKER_NO_RETAG ?= true
DOCKER_NO_RETAG ?= false

ifdef DOCKER_ARCHITECTURE
DOCKER_PLATFORM = --platform linux/$(DOCKER_ARCHITECTURE)
Expand Down
177 changes: 177 additions & 0 deletions tools/java_thread_dump.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
#!/usr/bin/env bash
# Self contained Strimzi thread dump tool.
set -Eeuo pipefail
if [[ $(uname -s) == "Darwin" ]]; then
shopt -s expand_aliases
alias echo="gecho"; alias grep="ggrep"; alias sed="gsed"; alias date="gdate"
fi

error() {
echo "$@" 1>&2 && exit 1
}

{ # this ensures that the entire script is downloaded #
KUBECTL_INSTALLED=false
OC_INSTALLED=false
KUBE_CLIENT="kubectl"
CONTAINER=""
OUT_DIR=""
DUMPS=1
INTERVAL=5
readonly JCMD_LIST_CMD="jcmd -l | grep -v JCmd"
readonly JCMD_DUMP_CMD_TMPL="jcmd PID Thread.print"

# bash version check
if [[ -z ${BASH_VERSINFO+x} ]]; then
error "No bash version information available, aborting"
fi
if [[ "${BASH_VERSINFO[0]}" -lt 4 ]]; then
error "You need bash version >= 4 to run the script"
fi

# kube client check
if [[ -x "$(command -v kubectl)" ]]; then
KUBECTL_INSTALLED=true
else
if [[ -x "$(command -v oc)" ]]; then
OC_INSTALLED=true
KUBE_CLIENT="oc"
fi
fi
if [[ $OC_INSTALLED = false && $KUBECTL_INSTALLED = false ]]; then
error "There is no kubectl or oc installed"
fi

# kube connectivity check
$KUBE_CLIENT version -o yaml --request-timeout=5s 1>/dev/null

readonly USAGE="
Usage: java_thread_dump.sh [options]
This tool dumps the threads of all Java processes running in the containers of a specific pod.
Required:
--namespace=<string> Kubernetes namespace.
--pod=<string> Pod name. Must be a cluster operator, entity operator, kafka, zookeeper or cruise control pod.
Optional:
--container=<string> Container name to limit the thread dump to. By default, all containers are captured with thread dump.
--out-dir=<string> Script output directory.
--dumps=<int> Number of thread dumps to capture. 1 by default.
--interval=<int> Number of seconds to wait between 2 dumps. 5 by default.
"
OPTSPEC=":-:"
while getopts "$OPTSPEC" optchar; do
case "${optchar}" in
-)
case "${OPTARG}" in
namespace=*)
NAMESPACE=${OPTARG#*=} && readonly NAMESPACE
;;
pod=*)
POD=${OPTARG#*=} && readonly POD
;;
container=*)
CONTAINER=${OPTARG#*=} && readonly CONTAINER
;;
out-dir=*)
OUT_DIR=${OPTARG#*=}
OUT_DIR=${OUT_DIR//\~/$HOME} && readonly OUT_DIR
;;
dumps=*)
DUMPS=${OPTARG#*=} && readonly DUMPS
;;
interval=*)
INTERVAL=${OPTARG#*=} && readonly INTERVAL
;;
*)
error "$USAGE"
;;
esac;;
esac
done
shift $((OPTIND-1))

if [[ -z $NAMESPACE || -z $POD ]]; then
error "$USAGE"
fi

if [[ -z $OUT_DIR ]]; then
OUT_DIR="$(mktemp -d)"
fi

if [[ -z $($KUBE_CLIENT get ns "$NAMESPACE" -o name --ignore-not-found) ]]; then
error "Namespace $NAMESPACE not found! Exiting"
fi

mkdir -p "$OUT_DIR/dumps"

declare -a containers
if [[ -z $CONTAINER ]]; then
container_list=$($KUBE_CLIENT get pod -n "$NAMESPACE" "$POD" -ojsonpath="{.spec.containers[*].name}")
for c in $container_list;
do
containers+=("$c")
done
else
containers+=("$CONTAINER")
fi

dump_count=0
for (( i=0 ; i<DUMPS ; i++ ));
do
if [[ $i -ne 0 ]]; then
echo "Backing off for ${INTERVAL}s"
sleep "$INTERVAL"
fi

for c in "${containers[@]}";
do
java_processes_list=$($KUBE_CLIENT exec -n "$NAMESPACE" "$POD" -c "$c" -- /bin/bash -c "$JCMD_LIST_CMD" 2>/dev/null) || true
if [[ -z "$java_processes_list" ]]; then
echo "Skipping container $c as it does not have a running Java process"
continue
fi

declare -a jprocesses
jprocesses=()
while read -r line
do
jprocesses+=("$line")
done <<< "$java_processes_list"

mkdir -p "$OUT_DIR/dumps/$c"

for line in "${jprocesses[@]}"; do
pid=$(echo "$line" | cut -f1 -d' ')
main_class=$(echo "$line" | cut -f2 -d' ')

echo "Dumping threads from container ${c} PID ${pid} main class ${main_class} #${i}"

dump_file_name="thread_dump-${c}-${pid}-${main_class}"
if [[ $DUMPS -ne 1 ]]; then
dump_file_name+="-$i"
fi
dump_file_name+=".txt"

dump_cmd=${JCMD_DUMP_CMD_TMPL/"PID"/"$pid"}
$KUBE_CLIENT exec -n "$NAMESPACE" "$POD" -c "$c" -- /bin/bash -c "$dump_cmd" > "${OUT_DIR}/dumps/${c}/$dump_file_name"
((++dump_count))
done
done
done

if [[ $dump_count -eq 0 ]]; then
error "Could not capture any thread dumps in the specified pod"
fi

FILENAME="tdumps-${NAMESPACE}-${POD}-$(date +"%d-%m-%Y_%H-%M-%S")"
OLD_DIR="$(pwd)"
cd "$OUT_DIR" || exit
zip -qr "$FILENAME".zip ./dumps/
cd "$OLD_DIR" || exit
if [[ $OUT_DIR == *"tmp."* ]]; then
# keeping the old behavior when --out-dir is not specified
mv "$OUT_DIR"/"$FILENAME".zip ./
fi
echo "Thread dump collection file $FILENAME.zip created"
} # this ensures that the entire script is downloaded #
Loading

0 comments on commit 808ae22

Please sign in to comment.