Project Overview

This project simulates a high-performance computing (HPC) cluster for distributed password cracking. Using VirtualBox, Warewulf, and Slurm, the cluster scales password-cracking tasks with John the Ripper across multiple nodes, demonstrating HPC management and pentesting capabilities.

The cluster utilizes stateless compute nodes, which are booted over the network using iPXE and provisioned with containerized environments managed by Warewulf. This approach ensures consistency, flexibility, and efficient resource utilization, as the nodes do not require persistent storage and can be quickly reconfigured or rebuilt from the central container image.

Components

Rocky Linux: provides a stable base for the cluster
VirtualBox: VirtualBox hosts the virtual machines for each cluster node
Network Boot: iPXE enables compute nodes to boot over the network
Ansible: automates configuration and deployment tasks across the cluster, streamlining setup and updates
Warewulf: manages and deploys the operating system and software configurations across compute nodes
Slurm: workload manager that allocates resources and schedules jobs across the cluster
John the Ripper: used to test the cluster's distributed password cracking jobs
Munge: provides secure authentication for message passing between nodes

Versions

Component	Version
Rocky Linux	9.4
VirtualBox	7.0
Ansible	2.14
Warewulf	4.5.7
Slurm	22.05.9
Munge	0.5.13
JohnTheRipper	1.9.0

Environment Setup

In this section, we’ll set up the virtual infrastructure for the HPC_CryptoCluster project by creating a NAT network, configuring virtual machines, and enabling network boot so compute nodes can receive configurations. The table below shows each VM's specifications:

Server	Role	CPU	RAM
Control	Controller	4	8 GB
Node1	Compute Node	2	4 GB
Node2	Compute Node	2	4 GB
Node3	Compute Node	2	4 GB

Create NAT Network:
- In VirtualBox, create a NAT Network
- Disable DHCP
Create VMs:
- 1 Controller Node(Rocky 9 ISO)
- 3 Compute Nodes(No image ISO)
Setup Network Boot:
- Assign all nodes to NAT Network
- Download and attach the iPXE ISO to each compute node’s virtual DVD drive to enable network booting
- Set boot order on compute nodes
See Configuration Images

Note: This project is configured and executed as root for simplicity and to streamline the setup process.

Installation

Power on Controller node and follow these steps to install necessary tools and configure the cluster.

Install Git, Ansible, and Clone the Project Repository:

dnf install -y git ansible-core
git clone -b dev https://github.com/Thuynh808/HPC_CryptoCluster
cd HPC_CryptoCluster
ansible-galaxy collection install -r requirements.yaml -vv

Run the Ansible playbooks to install and configure Warewulf and John the Ripper:

ansible-playbook warewulf.yaml -vv
ansible-playbook john.yaml -vv

In Warewulf container image shell, install dependencies, and configure Slurm for the compute nodes:

wwctl container shell rockylinux-9

dnf install -y git ansible-core
git clone -b dev https://github.com/Thuynh808/HPC_CryptoCluster
cd HPC_CryptoCluster
ansible-galaxy collection install -r requirements.yaml -vv
ansible-playbook slurm-node.yaml -vv
exit #rebuild container image

Set Up Slurm and Munge on the Controller Node to manage the Slurm job scheduler and secure communication:

ansible-playbook slurm-control.yaml -vv

Power on compute nodes to initialize the network boot and connect to the controller node

Deployment Verification

Let's verify everything is up and running!

Confirm Warewulf service is up and node overlays configured

wwctl node list -l && wwctl node list -n
wwctl node list -a | tail -9
systemctl status warewulfd.service --no-pager
firewall-cmd --list-all

See Images

Confirm sample password file is created and Run a benchmark test with John

cd /home/slurm
ls -l
cat john_hash.txt
john --test --format=raw-sha256

See Images

Confirm Slurm and Munge are operational and Munge key is valid

systemctl status slurmctld munge --no-pager
munge -n | ssh node1 unmunge
ssh node1 systemctl status slurmd

See Images

Confirm compute nodes are properly up with network boot and hosts configured

ssh node3

dmesg | head
cat /etc/hosts
sinfo -l
scontrol show node

See Images

Testing Cluster with John the Ripper

This section demonstrates the cluster's functionality through two tests: a single-node password-cracking job and a multi-node distributed job.

Single Node Test

Submit the sbatch password-cracking job on a single compute node

cd /home/slurm
sbatch john_test.sh

Verify the job is submitted and running on single node

sinfo -l
scontrol show job <JobId>

With 2 cpus, Slurm can be configured to allocate 2 processes to split the load of the job

Confirm finished job and view results

scontrol show job <JobId>
cat /home/slurm/john_result.log

The job ran efficiently and recovered all 10 target passwords within 16 minutes and 22 seconds, confirming the effectiveness of the single-node configuration for password cracking.

Multi-Node Distributed Test

Now we'll submit the distributed job

cd /home/slurm
sbatch john_distributed.sh
sleep 5
sinfo -l
scontrol show job <JobId>

The job is allocated across three nodes (node[1-3]), with each node contributing 2 CPUs for a total of 6 CPUs.

Confirm job finished and view results

scontrol show job <JobId>
cat /home/slurm/john_distributed_result.log

Analysis:

The distributed job completed in 6 minutes and 52 seconds, demonstrating a significant reduction in runtime compared to the single-node test (16 minutes and 22 seconds).
All 10 passwords were successfully recovered, showcasing the cluster's ability to handle distributed workloads efficiently.
The multi-node distributed test highlights the efficiency and scalability of the cluster. By utilizing three nodes, the runtime was reduced by nearly 58% compared to the single-node test.

Conclusion

This project gave me valuable hands-on experience building an HPC cluster for distributed password cracking using VirtualBox, Warewulf, Slurm, John the Ripper, and Ansible. I tackled challenges like troubleshooting iPXE network boot, fixing Munge authentication, and refining Warewulf overlays, which reinforced the importance of solid infrastructure management.

With Slurm, the cluster reduced the distributed password cracking job runtime by nearly 58%, showing the power of scaling across multiple nodes. Automating setup with Ansible streamlined the process and ensured consistency across the cluster. Overall, this project strengthened my understanding of HPC concepts, automation, and how distributed systems handle real-world tasks like pentesting and other compute-heavy jobs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Project Overview

Components

Versions

Environment Setup

See Configuration Images

Installation

Deployment Verification

See Images

See Images

See Images

See Images

Testing Cluster with John the Ripper

Single Node Test

Multi-Node Distributed Test

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Project Overview

Components

Versions

Environment Setup

See Configuration Images

Installation

Deployment Verification

See Images

See Images

See Images

See Images

Testing Cluster with John the Ripper

Single Node Test

Multi-Node Distributed Test

Conclusion