Skip to content

Latest commit

 

History

History
239 lines (190 loc) · 8.67 KB

README.md

File metadata and controls

239 lines (190 loc) · 8.67 KB

HPC_CryptoCluster

Project Overview

This project simulates a high-performance computing (HPC) cluster for distributed password cracking. Using VirtualBox, Warewulf, and Slurm, the cluster scales password-cracking tasks with John the Ripper across multiple nodes, demonstrating HPC management and pentesting capabilities.

The cluster utilizes stateless compute nodes, which are booted over the network using iPXE and provisioned with containerized environments managed by Warewulf. This approach ensures consistency, flexibility, and efficient resource utilization, as the nodes do not require persistent storage and can be quickly reconfigured or rebuilt from the central container image.

Components

  • Rocky Linux: provides a stable base for the cluster
  • VirtualBox: VirtualBox hosts the virtual machines for each cluster node
  • Network Boot: iPXE enables compute nodes to boot over the network
  • Ansible: automates configuration and deployment tasks across the cluster, streamlining setup and updates
  • Warewulf: manages and deploys the operating system and software configurations across compute nodes
  • Slurm: workload manager that allocates resources and schedules jobs across the cluster
  • John the Ripper: used to test the cluster's distributed password cracking jobs
  • Munge: provides secure authentication for message passing between nodes

Versions

Component Version
Rocky Linux 9.4
VirtualBox 7.0
Ansible 2.14
Warewulf 4.5.7
Slurm 22.05.9
Munge 0.5.13
JohnTheRipper 1.9.0

Environment Setup

In this section, we’ll set up the virtual infrastructure for the HPC_CryptoCluster project by creating a NAT network, configuring virtual machines, and enabling network boot so compute nodes can receive configurations. The table below shows each VM's specifications:

Server Role CPU RAM
Control Controller 4 8 GB
Node1 Compute Node 2 4 GB
Node2 Compute Node 2 4 GB
Node3 Compute Node 2 4 GB
  • Create NAT Network:

    • In VirtualBox, create a NAT Network
    • Disable DHCP
  • Create VMs:

    • 1 Controller Node(Rocky 9 ISO)
    • 3 Compute Nodes(No image ISO)
  • Setup Network Boot:

    • Assign all nodes to NAT Network
    • Download and attach the iPXE ISO to each compute node’s virtual DVD drive to enable network booting
    • Set boot order on compute nodes

    See Configuration Images

    HPC_CryptoCluster HPC_CryptoCluster HPC_CryptoCluster

Note: This project is configured and executed as root for simplicity and to streamline the setup process.


Installation

Power on Controller node and follow these steps to install necessary tools and configure the cluster.

  • Install Git, Ansible, and Clone the Project Repository:
dnf install -y git ansible-core
git clone -b dev https://github.com/Thuynh808/HPC_CryptoCluster
cd HPC_CryptoCluster
ansible-galaxy collection install -r requirements.yaml -vv
  • Run the Ansible playbooks to install and configure Warewulf and John the Ripper:
ansible-playbook warewulf.yaml -vv
ansible-playbook john.yaml -vv
  • In Warewulf container image shell, install dependencies, and configure Slurm for the compute nodes:
wwctl container shell rockylinux-9
dnf install -y git ansible-core
git clone -b dev https://github.com/Thuynh808/HPC_CryptoCluster
cd HPC_CryptoCluster
ansible-galaxy collection install -r requirements.yaml -vv
ansible-playbook slurm-node.yaml -vv
exit #rebuild container image
  • Set Up Slurm and Munge on the Controller Node to manage the Slurm job scheduler and secure communication:
ansible-playbook slurm-control.yaml -vv
  • Power on compute nodes to initialize the network boot and connect to the controller node

Deployment Verification

Let's verify everything is up and running!

  • Confirm Warewulf service is up and node overlays configured
wwctl node list -l && wwctl node list -n
wwctl node list -a | tail -9
systemctl status warewulfd.service --no-pager
firewall-cmd --list-all

See Images

HPC_CryptoCluster HPC_CryptoCluster

  • Confirm sample password file is created and Run a benchmark test with John
cd /home/slurm
ls -l
cat john_hash.txt
john --test --format=raw-sha256

See Images

HPC_CryptoCluster

  • Confirm Slurm and Munge are operational and Munge key is valid
systemctl status slurmctld munge --no-pager
munge -n | ssh node1 unmunge
ssh node1 systemctl status slurmd

See Images

HPC_CryptoCluster HPC_CryptoCluster

  • Confirm compute nodes are properly up with network boot and hosts configured
ssh node3
dmesg | head
cat /etc/hosts
sinfo -l
scontrol show node

See Images

HPC_CryptoCluster HPC_CryptoCluster


Testing Cluster with John the Ripper

This section demonstrates the cluster's functionality through two tests: a single-node password-cracking job and a multi-node distributed job.

Single Node Test

  • Submit the sbatch password-cracking job on a single compute node
cd /home/slurm
sbatch john_test.sh
  • Verify the job is submitted and running on single node
sinfo -l
scontrol show job <JobId>

HPC_CryptoCluster

  • With 2 cpus, Slurm can be configured to allocate 2 processes to split the load of the job

HPC_CryptoCluster

  • Confirm finished job and view results
scontrol show job <JobId>
cat /home/slurm/john_result.log
  • The job ran efficiently and recovered all 10 target passwords within 16 minutes and 22 seconds, confirming the effectiveness of the single-node configuration for password cracking.

HPC_CryptoCluster

Multi-Node Distributed Test

  • Now we'll submit the distributed job
cd /home/slurm
sbatch john_distributed.sh
sleep 5
sinfo -l
scontrol show job <JobId>
  • The job is allocated across three nodes (node[1-3]), with each node contributing 2 CPUs for a total of 6 CPUs.

HPC_CryptoCluster

  • Confirm job finished and view results
scontrol show job <JobId>
cat /home/slurm/john_distributed_result.log

Analysis:

  • The distributed job completed in 6 minutes and 52 seconds, demonstrating a significant reduction in runtime compared to the single-node test (16 minutes and 22 seconds).
  • All 10 passwords were successfully recovered, showcasing the cluster's ability to handle distributed workloads efficiently.
  • The multi-node distributed test highlights the efficiency and scalability of the cluster. By utilizing three nodes, the runtime was reduced by nearly 58% compared to the single-node test.

HPC_CryptoCluster


Conclusion

This project gave me valuable hands-on experience building an HPC cluster for distributed password cracking using VirtualBox, Warewulf, Slurm, John the Ripper, and Ansible. I tackled challenges like troubleshooting iPXE network boot, fixing Munge authentication, and refining Warewulf overlays, which reinforced the importance of solid infrastructure management.

With Slurm, the cluster reduced the distributed password cracking job runtime by nearly 58%, showing the power of scaling across multiple nodes. Automating setup with Ansible streamlined the process and ensured consistency across the cluster. Overall, this project strengthened my understanding of HPC concepts, automation, and how distributed systems handle real-world tasks like pentesting and other compute-heavy jobs.