Skip to content

baiango/VastHash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

✨ VastHash

Previously named SXAPH-256 - Shift XOR Add Prime Hash 256-Bit.

VastHash is a non-cryptographic experimental hash function designed for prime-sized HashMap to minimize memory accesses and maximize the performance on AVX2 systems while having distribution comparable to multiplication-based hash.

VastHash is evaluated on passwords for prime-sized HashMap. So, it'll perform much worse on non-prime-sized HashMap and possibly on other data.

It's made from brute-forcing every combination of operators |, &, <<, >>, ^, and +. I found claims online, that | and & reduces hash distribution and ~ doesn't improve. But, I wasn't able to get consistently reduced hash distribution result in these claims. Still, they're not our best performers on the list. The Rust compiler will replace - (vpsubq) with + (vpaddq) so that's excluded from the evaluator.

hash_256_a, and hash_256_b from VastHash are generated with SymPy PRNG (sympy.randprime), then run through the Chi² test with the password 500 times, and filter out the best result. Yet, constants were manually picked due to how the hash function wasn't able to generalize outside its test.

The Chi² test is vulnerable to guided optimization framework like Bayesian optimization from Optuna, but multiple bucket sizes are used to improve the hash's ability to generalize over different sized array; it's done on all system cores to ensure maximum core utilization.

Previously, VastHash was brute-force optimized with Optuna, but it was using only half the performance of each core, and, Optuna wasn't more effective than PRNGs on <<, >>, ^, + optimizations for its overhead. Now, its constants are brute-forced optimized on SymPy PRNG.

Current constants are found by SymPy, then converted to Rust code, and reverse engineered from Rust compiler's x86 ASM output.

🌟 Features

  • Hash 65% faster than a SIMD multiply (mul_hash)
  • Outperforms hash distribution of djb2; SHA-256 when on prime-sized array
  • Reduced data race (Doesn't rely on the same previous state except when summing)

⛈️ Deal-breakers

  • Only works with 256-bit SIMD (AVX2)
  • Gray-box hash function (Developed through trial and error)
  • Extreme bug-prone when implementing as it's sensitive to typos
  • Ineffective with non-prime sized array (But more uniform than mul_hash and djb2)
  • Weak diffusion
  • Trivial to perform collision attacks
  • Non-innovative design (Due to restrictive AVX2 instruction set and faster than 1 multiply requirement)

🐍 Original code (Python)

import numpy as np

def vast_hash_impl(input_data, hash_256=np.array([6205865627071447409, 2067898264094941423, 1954899363002243873, 9928278621127670147], dtype=np.uint64)):
	return np.bitwise_xor(input_data, hash_256)

def vast_hash(input_data):
	return np.sum([vast_hash_impl(dat) for dat in input_data])

assert vast_hash_impl(np.array([123, 123, 123, 123], dtype=np.uint64)).all() == np.array([6205865627071447306, 2067898264094941332, 1954899363002243930, 9928278621127670264], dtype=np.uint64).all()
assert vast_hash(np.array([[123, 123, 123, 123], [123, 123, 123, 123]], dtype=np.uint64)) == 3420395603173502432
#![feature(portable_simd)]
use std::simd::u64x4;
use std::arch::x86_64::{_mm_loadu_si128, _mm_add_epi64, _mm_shuffle_epi32, _mm_cvtsi128_si64};
use std::mem;

#[no_mangle]
#[inline] // Reduces latency
pub fn vast_hash_impl(input_data: u64x4) -> u64x4 {
	input_data ^ u64x4::from_array([6205865627071447409, 2067898264094941423, 1954899363002243873, 9928278621127670147])
}

#[no_mangle]
pub fn vast_hash(input_data: &[u64x4]) -> u64 {
	let mut hash = u64x4::splat(0);
	for i in 0..input_data.len() {
		hash += vast_hash_impl(input_data[i]);
	}
	sum_u64x4_icx(hash)
}

#[no_mangle]
pub fn sum_u64x4_scalar(simd: u64x4) -> u64 {
	let arr: [u64; 4] = unsafe { std::mem::transmute(simd) };
	arr[0].wrapping_add(arr[1].wrapping_add(arr[2]).wrapping_add(arr[3]))
}

#[no_mangle]
#[inline] // Same speed as the scalar version
pub fn sum_u64x4_icx(simd: u64x4) -> u64 {
	let arr: [u64; 4] = unsafe { mem::transmute(simd) };
	let v1 = unsafe { _mm_loadu_si128(arr.as_ptr() as *const _) };
	let v2 = unsafe { _mm_loadu_si128((arr.as_ptr().add(2)) as *const _) };
	let v3 = unsafe { _mm_add_epi64(v1, v2) };
	let v4 = unsafe { _mm_shuffle_epi32(v3, 0b11101110) }; // (v3, [2, 3, 2, 3])
	let v5 = unsafe { _mm_add_epi64(v3, v4) };
	unsafe { _mm_cvtsi128_si64(v5) as u64 }
}

pub fn main() {
	{ // vast_hash_impl
		let input_data = u64x4::splat(123);
		let result = vast_hash_impl(input_data);
		assert_eq!(result, u64x4::from_array([6205865627071447306, 2067898264094941332, 1954899363002243930, 9928278621127670264]));
	}{ // vast_hash
		let input_data = vec![u64x4::splat(123), u64x4::splat(123)];
		let result = vast_hash(&input_data);
		assert_eq!(result, 3420395603173502432);
	}{ // sum_u64x4_icx
		let simd = u64x4::from_array([1, 2, 3, 4]);
		assert_eq!(sum_u64x4_icx(simd), sum_u64x4_scalar(simd));

		let simd = u64x4::from_array([5, 6, 7, 8]);
		assert_eq!(sum_u64x4_icx(simd), sum_u64x4_scalar(simd));

		let simd = u64x4::splat(0);
		assert_eq!(sum_u64x4_icx(simd), sum_u64x4_scalar(simd));

		let simd = u64x4::splat(u64::MAX);
		assert_eq!(sum_u64x4_icx(simd), sum_u64x4_scalar(simd));
	}
}

🍪 ASM output from Rust port

.LCPI0_0: ; XOR constants
		.quad   6205865627071447409
		.quad   2067898264094941423
		.quad   1954899363002243873
		.quad   -8518465452581881469
vast_hash_impl:
		mov     rax, rdi
		vmovaps ymm0, ymmword ptr [rsi]
		vxorps  ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] ; XOR
		vmovaps ymmword ptr [rdi], ymm0
		vzeroupper
		ret

.LCPI1_0: ; XOR constants
		.quad   6205865627071447409
		.quad   2067898264094941423
		.quad   1954899363002243873
		.quad   -8518465452581881469
vast_hash:
		test    rsi, rsi ; if input_data.len() == 0
		je      .LBB1_1 ; jump to return
		mov     eax, esi
		and     eax, 3
		cmp     rsi, 4
		jae     .LBB1_8
		vpxor   xmm0, xmm0, xmm0 ; set to 0 in 128-bit
		xor     ecx, ecx ; 0
		jmp     .LBB1_5
.LBB1_1:
		vpxor   xmm0, xmm0, xmm0 ; set to 0 in 128-bit
		jmp     .LBB1_2
.LBB1_8: ; Preparing the constants
		and     rsi, -4
		lea     rdx, [rdi + 96] ; Length of constants
		vpxor   xmm0, xmm0, xmm0 ; set to 0 in 128-bit
		xor     ecx, ecx ; 0
		vmovdqa ymm1, ymmword ptr [rip + .LCPI1_0]
.LBB1_9: ; Unrolled `vast_hash_impl` or XORs
		vpxor   ymm2, ymm1, ymmword ptr [rdx - 96]
		vpaddq  ymm0, ymm2, ymm0
		vpxor   ymm2, ymm1, ymmword ptr [rdx - 64]
		vpxor   ymm3, ymm1, ymmword ptr [rdx - 32]
		vpaddq  ymm2, ymm3, ymm2
		vpaddq  ymm0, ymm2, ymm0
		vpxor   ymm2, ymm1, ymmword ptr [rdx]
		add     rcx, 4
		vpaddq  ymm0, ymm2, ymm0
		sub     rdx, -128
		cmp     rsi, rcx
		jne     .LBB1_9
.LBB1_5:
		test    rax, rax ; If end of `input_data`
		je      .LBB1_2 ; Convert and return `hash` as u64
		shl     rcx, 5
		add     rdi, rcx
		shl     eax, 5
		xor     ecx, ecx
		vmovdqa ymm1, ymmword ptr [rip + .LCPI1_0]
.LBB1_7:
		vpxor   ymm2, ymm1, ymmword ptr [rdi + rcx]
		vpaddq  ymm0, ymm2, ymm0
		add     rcx, 32
		cmp     rax, rcx
		jne     .LBB1_7
.LBB1_2: ; `sum_u64x4_icx` inlined; convert and return `hash` as u64
		vextracti128    xmm1, ymm0, 1
		vpaddq  xmm0, xmm1, xmm0
		vpshufd xmm1, xmm0, 238
		vpaddq  xmm0, xmm0, xmm1
		vmovq   rax, xmm0
		vzeroupper
		ret

sum_u64x4_scalar:
		mov     rax, qword ptr [rdi + 8]
		add     rax, qword ptr [rdi + 16]
		add     rax, qword ptr [rdi + 24]
		add     rax, qword ptr [rdi]
		ret

sum_u64x4_icx:
		vmovdqa xmm0, xmmword ptr [rdi + 16]
		vpaddq  xmm0, xmm0, xmmword ptr [rdi]
		vpshufd xmm1, xmm0, 238
		vpaddq  xmm0, xmm0, xmm1
		vmovq   rax, xmm0
		ret

example::main::he27277a11553942e:
		mov     rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
		movzx   eax, byte ptr [rax]
		ret

⏩🚀 Hashing Performance

This is tested with i5-9300H CPU. The speed will be different from different CPUs.

🥟 8-bit (small) hash

CPU Bench name Time (ps/ns/µs/ms/s)
i5-9300H mul_hash_impl_8 1.3460 ns
djb2_hash_8 2.1672 ns
vast_hash_impl_8 1.2216 ns
fnv_1a_hash_8 2.1814 ns

🍔 256-bit (standard) hash

CPU Bench name Time (ps/ns/µs/ms/s)
i5-9300H mul_hash_impl_256 1.4054 ns
djb2_hash_256 7.1315 ns
vast_hash_impl_256 1.2279 ns
fnv_1a_hash_256 7.1302 ns

🍉 1 MiB (MP3) hash

CPU Bench name Time (ps/ns/µs/ms/s)
i5-9300H mul_hash_1mib 44.530 µs
djb2_hash_1mib 587.82 µs
vast_hash_1mib 23.368 µs
fnv_1a_hash_1mib 1.0407 ms

📄 Miscellaneous

CPU Bench name Time (ps/ns/µs/ms/s)
i5-9300H sum_u64x4_scalar 247.53 ps
sum_u64x4_icx 247.68 ps

Running command:

cargo bench

🚄🔥 Chi² benchmark (Lower gives better distribution)

test_hash_distribution() tests the distribution of hash values generated by a given function for a set of input data. It calculates the chi-squared statistic to measure the deviation of the observed hash distribution from the expected uniform distribution.

const_u64x4_chunks = load_test('BruteX_password.list')

def test_hash_distribution(func, bucket_sizes=[11, 13, 17, 19], u64x4_chunks=const_u64x4_chunks):
	chi2 = 0
	for bucket_size in bucket_sizes:
		buckets = np.zeros(bucket_size, dtype=np.uint64)
		num_hashes = len(u64x4_chunks)

		for i, chk in enumerate(u64x4_chunks):
			hash_value = np.sum([func(c) for c in chk])
			bucket_indices = int(hash_value) % bucket_size
			buckets[bucket_indices] += 1

		expected_frequency = num_hashes // bucket_size
		chi2 += np.sum((buckets - expected_frequency) ** 2)
	return int(chi2)

Hash alignment implementations

def sha256_hash(input_bytes):
	# Compute SHA-256 hash
	hash_object = hashlib.sha256(input_bytes)
	hash_hex = hash_object.hexdigest()

	# Convert the hash value to a 4-element array of 64-bit integers
	hash_int = int(hash_hex, 16)
	hash_array = np.array([(hash_int >> (64 * i)) & 0xffffffffffffffff for i in range(4)], dtype=np.uint64)

	# Return the hash value as a 4-element array of 64-bit integers
	return hash_array

def crc32_hash(input_bytes):
	# Compute CRC32 hash
	hash_object = zlib.crc32(input_bytes)

	# The CRC32 hash value is a single 32-bit integer, so we return it as a 1-element array
	hash_array = np.array([hash_object], dtype=np.uint32)

	# Return the hash value as a 1-element array of 32-bit integers
	return hash_array

def fnv1a_32_hash(input_bytes):
	hash_array = np.array([fnv1a_32(bytes(input_bytes))], dtype=np.uint32)
	return hash_array

def djb2_hash(input_bytes):
	hash_value = np.uint32(5381)
	for b in input_bytes:
		hash_value = hash_value * np.uint32(33) + b
	return hash_value

def mul_hash(input_data, hash_256=np.array([140, 91, 171, 62], dtype=np.uint64)):
	return hash_256 * input_data

🧮 Prime benchmark

ℹ️ Disclosure: vast_hash were evaluated on the BruteX_password.list and [11, 13, 17, 19] bucket sizes.

Bucket sizes = [11, 13, 17, 19]

Hash file = conficker.txt

Hash name Non-uniform score
mul_hash 1,237
djb2_hash 8,207
vast_hash 779
fnv1a_32_hash 683
crc32_hash 695
sha256_hash 793

Running command:

chi2_benchmark(bucket_sizes=[11, 13, 17, 19], hash_path='conficker.txt')

Bucket sizes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Hash file = BruteX_password.list

Hash name Non-uniform score
mul_hash 20,222,418
djb2_hash 8,870,738
vast_hash 70,326
fnv1a_32_hash 86,620
crc32_hash 93,790
sha256_hash 79,962
chi2_benchmark(bucket_sizes=[i for i in range(1, 101) if is_prime(i)], hash_path='BruteX_password.list')

Bucket sizes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Hash file = rockyou-65.txt

Hash name Non-uniform score
mul_hash 986,582,602
djb2_hash 349,747,618
vast_hash 732,138
fnv1a_32_hash 923,010
crc32_hash 739,366
sha256_hash 771,564

Running command:

chi2_benchmark(bucket_sizes=[i for i in range(1, 101) if is_prime(i)], hash_path="rockyou-65.txt")

1️⃣4️⃣6️⃣8️⃣9️⃣🔟 Non-prime benchmark

Bucket sizes = range(1, 101)

Hash file = BruteX_password.list

Hash name Non-uniform score
mul_hash 144,201,418
djb2_hash 57,182,530
vast_hash 1,956,106
fnv1a_32_hash 377,188
crc32_hash 399,244
sha256_hash 316,864

Running command:

chi2_benchmark(bucket_sizes=range(1, 101), hash_path='BruteX_password.list')

About

A new hash for prime-sized HashMap

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published