✨ VastHash

Previously named SXAPH-256 - Shift XOR Add Prime Hash 256-Bit.

VastHash is a non-cryptographic experimental hash function designed for prime-sized HashMap to minimize memory accesses and maximize the performance on AVX2 systems while having distribution comparable to multiplication-based hash.

VastHash is evaluated on passwords for prime-sized HashMap. So, it'll perform much worse on non-prime-sized HashMap and possibly on other data.

It's made from brute-forcing every combination of operators |, &, <<, >>, ^, and +. I found claims online, that | and & reduces hash distribution and ~ doesn't improve. But, I wasn't able to get consistently reduced hash distribution result in these claims. Still, they're not our best performers on the list. The Rust compiler will replace - (vpsubq) with + (vpaddq) so that's excluded from the evaluator.

hash_256_a, and hash_256_b from VastHash are generated with SymPy PRNG (sympy.randprime), then run through the Chi² test with the password 500 times, and filter out the best result. Yet, constants were manually picked due to how the hash function wasn't able to generalize outside its test.

The Chi² test is vulnerable to guided optimization framework like Bayesian optimization from Optuna, but multiple bucket sizes are used to improve the hash's ability to generalize over different sized array; it's done on all system cores to ensure maximum core utilization.

Previously, VastHash was brute-force optimized with Optuna, but it was using only half the performance of each core, and, Optuna wasn't more effective than PRNGs on <<, >>, ^, + optimizations for its overhead. Now, its constants are brute-forced optimized on SymPy PRNG.

Current constants are found by SymPy, then converted to Rust code, and reverse engineered from Rust compiler's x86 ASM output.

🌟 Features

Hash 65% faster than a SIMD multiply (mul_hash)
Outperforms hash distribution of djb2; SHA-256 when on prime-sized array
Reduced data race (Doesn't rely on the same previous state except when summing)

⛈️ Deal-breakers

Only works with 256-bit SIMD (AVX2)
Gray-box hash function (Developed through trial and error)
Extreme bug-prone when implementing as it's sensitive to typos
Ineffective with non-prime sized array (But more uniform than mul_hash and djb2)
Weak diffusion
Trivial to perform collision attacks
Non-innovative design (Due to restrictive AVX2 instruction set and faster than 1 multiply requirement)

🐍 Original code (Python)

import numpy as np

def vast_hash_impl(input_data, hash_256=np.array([6205865627071447409, 2067898264094941423, 1954899363002243873, 9928278621127670147], dtype=np.uint64)):
	return np.bitwise_xor(input_data, hash_256)

def vast_hash(input_data):
	return np.sum([vast_hash_impl(dat) for dat in input_data])

assert vast_hash_impl(np.array([123, 123, 123, 123], dtype=np.uint64)).all() == np.array([6205865627071447306, 2067898264094941332, 1954899363002243930, 9928278621127670264], dtype=np.uint64).all()
assert vast_hash(np.array([[123, 123, 123, 123], [123, 123, 123, 123]], dtype=np.uint64)) == 3420395603173502432

🔩 Rust port

#![feature(portable_simd)]
use std::simd::u64x4;
use std::arch::x86_64::{_mm_loadu_si128, _mm_add_epi64, _mm_shuffle_epi32, _mm_cvtsi128_si64};
use std::mem;

#[no_mangle]
#[inline] // Reduces latency
pub fn vast_hash_impl(input_data: u64x4) -> u64x4 {
	input_data ^ u64x4::from_array([6205865627071447409, 2067898264094941423, 1954899363002243873, 9928278621127670147])
}

#[no_mangle]
pub fn vast_hash(input_data: &[u64x4]) -> u64 {
	let mut hash = u64x4::splat(0);
	for i in 0..input_data.len() {
		hash += vast_hash_impl(input_data[i]);
	}
	sum_u64x4_icx(hash)
}

#[no_mangle]
pub fn sum_u64x4_scalar(simd: u64x4) -> u64 {
	let arr: [u64; 4] = unsafe { std::mem::transmute(simd) };
	arr[0].wrapping_add(arr[1].wrapping_add(arr[2]).wrapping_add(arr[3]))
}

#[no_mangle]
#[inline] // Same speed as the scalar version
pub fn sum_u64x4_icx(simd: u64x4) -> u64 {
	let arr: [u64; 4] = unsafe { mem::transmute(simd) };
	let v1 = unsafe { _mm_loadu_si128(arr.as_ptr() as *const _) };
	let v2 = unsafe { _mm_loadu_si128((arr.as_ptr().add(2)) as *const _) };
	let v3 = unsafe { _mm_add_epi64(v1, v2) };
	let v4 = unsafe { _mm_shuffle_epi32(v3, 0b11101110) }; // (v3, [2, 3, 2, 3])
	let v5 = unsafe { _mm_add_epi64(v3, v4) };
	unsafe { _mm_cvtsi128_si64(v5) as u64 }
}

pub fn main() {
	{ // vast_hash_impl
		let input_data = u64x4::splat(123);
		let result = vast_hash_impl(input_data);
		assert_eq!(result, u64x4::from_array([6205865627071447306, 2067898264094941332, 1954899363002243930, 9928278621127670264]));
	}{ // vast_hash
		let input_data = vec![u64x4::splat(123), u64x4::splat(123)];
		let result = vast_hash(&input_data);
		assert_eq!(result, 3420395603173502432);
	}{ // sum_u64x4_icx
		let simd = u64x4::from_array([1, 2, 3, 4]);
		assert_eq!(sum_u64x4_icx(simd), sum_u64x4_scalar(simd));

		let simd = u64x4::from_array([5, 6, 7, 8]);
		assert_eq!(sum_u64x4_icx(simd), sum_u64x4_scalar(simd));

		let simd = u64x4::splat(0);
		assert_eq!(sum_u64x4_icx(simd), sum_u64x4_scalar(simd));

		let simd = u64x4::splat(u64::MAX);
		assert_eq!(sum_u64x4_icx(simd), sum_u64x4_scalar(simd));
	}
}

🍪 ASM output from Rust port

.LCPI0_0: ; XOR constants
		.quad   6205865627071447409
		.quad   2067898264094941423
		.quad   1954899363002243873
		.quad   -8518465452581881469
vast_hash_impl:
		mov     rax, rdi
		vmovaps ymm0, ymmword ptr [rsi]
		vxorps  ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] ; XOR
		vmovaps ymmword ptr [rdi], ymm0
		vzeroupper
		ret

.LCPI1_0: ; XOR constants
		.quad   6205865627071447409
		.quad   2067898264094941423
		.quad   1954899363002243873
		.quad   -8518465452581881469
vast_hash:
		test    rsi, rsi ; if input_data.len() == 0
		je      .LBB1_1 ; jump to return
		mov     eax, esi
		and     eax, 3
		cmp     rsi, 4
		jae     .LBB1_8
		vpxor   xmm0, xmm0, xmm0 ; set to 0 in 128-bit
		xor     ecx, ecx ; 0
		jmp     .LBB1_5
.LBB1_1:
		vpxor   xmm0, xmm0, xmm0 ; set to 0 in 128-bit
		jmp     .LBB1_2
.LBB1_8: ; Preparing the constants
		and     rsi, -4
		lea     rdx, [rdi + 96] ; Length of constants
		vpxor   xmm0, xmm0, xmm0 ; set to 0 in 128-bit
		xor     ecx, ecx ; 0
		vmovdqa ymm1, ymmword ptr [rip + .LCPI1_0]
.LBB1_9: ; Unrolled `vast_hash_impl` or XORs
		vpxor   ymm2, ymm1, ymmword ptr [rdx - 96]
		vpaddq  ymm0, ymm2, ymm0
		vpxor   ymm2, ymm1, ymmword ptr [rdx - 64]
		vpxor   ymm3, ymm1, ymmword ptr [rdx - 32]
		vpaddq  ymm2, ymm3, ymm2
		vpaddq  ymm0, ymm2, ymm0
		vpxor   ymm2, ymm1, ymmword ptr [rdx]
		add     rcx, 4
		vpaddq  ymm0, ymm2, ymm0
		sub     rdx, -128
		cmp     rsi, rcx
		jne     .LBB1_9
.LBB1_5:
		test    rax, rax ; If end of `input_data`
		je      .LBB1_2 ; Convert and return `hash` as u64
		shl     rcx, 5
		add     rdi, rcx
		shl     eax, 5
		xor     ecx, ecx
		vmovdqa ymm1, ymmword ptr [rip + .LCPI1_0]
.LBB1_7:
		vpxor   ymm2, ymm1, ymmword ptr [rdi + rcx]
		vpaddq  ymm0, ymm2, ymm0
		add     rcx, 32
		cmp     rax, rcx
		jne     .LBB1_7
.LBB1_2: ; `sum_u64x4_icx` inlined; convert and return `hash` as u64
		vextracti128    xmm1, ymm0, 1
		vpaddq  xmm0, xmm1, xmm0
		vpshufd xmm1, xmm0, 238
		vpaddq  xmm0, xmm0, xmm1
		vmovq   rax, xmm0
		vzeroupper
		ret

sum_u64x4_scalar:
		mov     rax, qword ptr [rdi + 8]
		add     rax, qword ptr [rdi + 16]
		add     rax, qword ptr [rdi + 24]
		add     rax, qword ptr [rdi]
		ret

sum_u64x4_icx:
		vmovdqa xmm0, xmmword ptr [rdi + 16]
		vpaddq  xmm0, xmm0, xmmword ptr [rdi]
		vpshufd xmm1, xmm0, 238
		vpaddq  xmm0, xmm0, xmm1
		vmovq   rax, xmm0
		ret

example::main::he27277a11553942e:
		mov     rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
		movzx   eax, byte ptr [rax]
		ret

⏩🚀 Hashing Performance

This is tested with i5-9300H CPU. The speed will be different from different CPUs.

🥟 8-bit (small) hash

CPU	Bench name	Time (ps/ns/µs/ms/s)
i5-9300H	mul_hash_impl_8	1.3460 ns
	djb2_hash_8	2.1672 ns
	vast_hash_impl_8	1.2216 ns
	fnv_1a_hash_8	2.1814 ns

🍔 256-bit (standard) hash

CPU	Bench name	Time (ps/ns/µs/ms/s)
i5-9300H	mul_hash_impl_256	1.4054 ns
	djb2_hash_256	7.1315 ns
	vast_hash_impl_256	1.2279 ns
	fnv_1a_hash_256	7.1302 ns

🍉 1 MiB (MP3) hash

CPU	Bench name	Time (ps/ns/µs/ms/s)
i5-9300H	mul_hash_1mib	44.530 µs
	djb2_hash_1mib	587.82 µs
	vast_hash_1mib	23.368 µs
	fnv_1a_hash_1mib	1.0407 ms

📄 Miscellaneous

CPU	Bench name	Time (ps/ns/µs/ms/s)
i5-9300H	sum_u64x4_scalar	247.53 ps
	sum_u64x4_icx	247.68 ps

Running command:

cargo bench

🚄🔥 Chi² benchmark (Lower gives better distribution)

test_hash_distribution() tests the distribution of hash values generated by a given function for a set of input data. It calculates the chi-squared statistic to measure the deviation of the observed hash distribution from the expected uniform distribution.

const_u64x4_chunks = load_test('BruteX_password.list')

def test_hash_distribution(func, bucket_sizes=[11, 13, 17, 19], u64x4_chunks=const_u64x4_chunks):
	chi2 = 0
	for bucket_size in bucket_sizes:
		buckets = np.zeros(bucket_size, dtype=np.uint64)
		num_hashes = len(u64x4_chunks)

		for i, chk in enumerate(u64x4_chunks):
			hash_value = np.sum([func(c) for c in chk])
			bucket_indices = int(hash_value) % bucket_size
			buckets[bucket_indices] += 1

		expected_frequency = num_hashes // bucket_size
		chi2 += np.sum((buckets - expected_frequency) ** 2)
	return int(chi2)

Hash alignment implementations

def sha256_hash(input_bytes):
	# Compute SHA-256 hash
	hash_object = hashlib.sha256(input_bytes)
	hash_hex = hash_object.hexdigest()

	# Convert the hash value to a 4-element array of 64-bit integers
	hash_int = int(hash_hex, 16)
	hash_array = np.array([(hash_int >> (64 * i)) & 0xffffffffffffffff for i in range(4)], dtype=np.uint64)

	# Return the hash value as a 4-element array of 64-bit integers
	return hash_array

def crc32_hash(input_bytes):
	# Compute CRC32 hash
	hash_object = zlib.crc32(input_bytes)

	# The CRC32 hash value is a single 32-bit integer, so we return it as a 1-element array
	hash_array = np.array([hash_object], dtype=np.uint32)

	# Return the hash value as a 1-element array of 32-bit integers
	return hash_array

def fnv1a_32_hash(input_bytes):
	hash_array = np.array([fnv1a_32(bytes(input_bytes))], dtype=np.uint32)
	return hash_array

def djb2_hash(input_bytes):
	hash_value = np.uint32(5381)
	for b in input_bytes:
		hash_value = hash_value * np.uint32(33) + b
	return hash_value

def mul_hash(input_data, hash_256=np.array([140, 91, 171, 62], dtype=np.uint64)):
	return hash_256 * input_data

🧮 Prime benchmark

ℹ️ Disclosure: vast_hash were evaluated on the BruteX_password.list and [11, 13, 17, 19] bucket sizes.

Bucket sizes = `[11, 13, 17, 19]`

Hash file = conficker.txt

Hash name	Non-uniform score
mul_hash	1,237
djb2_hash	8,207
vast_hash	779
fnv1a_32_hash	683
crc32_hash	695
sha256_hash	793

Running command:

chi2_benchmark(bucket_sizes=[11, 13, 17, 19], hash_path='conficker.txt')

Bucket sizes = `[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]`

Hash file = BruteX_password.list

Hash name	Non-uniform score
mul_hash	20,222,418
djb2_hash	8,870,738
vast_hash	70,326
fnv1a_32_hash	86,620
crc32_hash	93,790
sha256_hash	79,962

chi2_benchmark(bucket_sizes=[i for i in range(1, 101) if is_prime(i)], hash_path='BruteX_password.list')

Bucket sizes = `[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]`

Hash file = rockyou-65.txt

Hash name	Non-uniform score
mul_hash	986,582,602
djb2_hash	349,747,618
vast_hash	732,138
fnv1a_32_hash	923,010
crc32_hash	739,366
sha256_hash	771,564

Running command:

chi2_benchmark(bucket_sizes=[i for i in range(1, 101) if is_prime(i)], hash_path="rockyou-65.txt")

1️⃣4️⃣6️⃣8️⃣9️⃣🔟 Non-prime benchmark

Bucket sizes = `range(1, 101)`

Hash file = BruteX_password.list

Hash name	Non-uniform score
mul_hash	144,201,418
djb2_hash	57,182,530
vast_hash	1,956,106
fnv1a_32_hash	377,188
crc32_hash	399,244
sha256_hash	316,864

Running command:

chi2_benchmark(bucket_sizes=range(1, 101), hash_path='BruteX_password.list')

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
vast_hash		vast_hash
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
UNLICENSE		UNLICENSE
vast_hash_opti.py		vast_hash_opti.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ VastHash

🌟 Features

⛈️ Deal-breakers

🐍 Original code (Python)

🔩 Rust port

🍪 ASM output from Rust port

⏩🚀 Hashing Performance

🥟 8-bit (small) hash

🍔 256-bit (standard) hash

🍉 1 MiB (MP3) hash

📄 Miscellaneous

🚄🔥 Chi² benchmark (Lower gives better distribution)

Hash alignment implementations

🧮 Prime benchmark

Bucket sizes = `[11, 13, 17, 19]`

Bucket sizes = `[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]`

Bucket sizes = `[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]`

1️⃣4️⃣6️⃣8️⃣9️⃣🔟 Non-prime benchmark

Bucket sizes = `range(1, 101)`

About

Releases

Packages

Languages

License

baiango/VastHash

Folders and files

Latest commit

History

Repository files navigation

✨ VastHash

🌟 Features

⛈️ Deal-breakers

🐍 Original code (Python)

🔩 Rust port

🍪 ASM output from Rust port

⏩🚀 Hashing Performance

🥟 8-bit (small) hash

🍔 256-bit (standard) hash

🍉 1 MiB (MP3) hash

📄 Miscellaneous

🚄🔥 Chi² benchmark (Lower gives better distribution)

Hash alignment implementations

🧮 Prime benchmark

Bucket sizes = [11, 13, 17, 19]

Bucket sizes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Bucket sizes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

1️⃣4️⃣6️⃣8️⃣9️⃣🔟 Non-prime benchmark

Bucket sizes = range(1, 101)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Bucket sizes = `[11, 13, 17, 19]`

Bucket sizes = `[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]`

Bucket sizes = `[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]`

Bucket sizes = `range(1, 101)`

Packages