Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x1.4 Faster LruCache #5251

Merged
merged 7 commits into from
Feb 9, 2023
Merged

x1.4 Faster LruCache #5251

merged 7 commits into from
Feb 9, 2023

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Feb 8, 2023

Changes

  • Strongly typed references
  • Devirtualize key when Keccak vs KeccakKey struct
  • Remove secondary data structure to maintain least recently used
  • Use whole set of Keccak bytes for GetHashCode to reduce dictionary collisions
  • Add Unsafe.Same + .Length comparisons for some span compares for early exit when same reference

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Performance tests (before/after)

@benaadams benaadams changed the title Faster cache Faster LruCache Feb 8, 2023
@LukaszRozmej
Copy link
Member

Do we need KeccakKey when we already have ValueKeccak?
Wonder how it benchmarks, also compared to #2978

@LukaszRozmej LukaszRozmej requested a review from Scooletz February 8, 2023 11:29
@benaadams
Copy link
Member Author

Do we need KeccakKey when we already have ValueKeccak?

KeccakKey is just a wrapper to devirtualise the dictionary through using a struct as key, rather than class (generics all use the same Jitted path for classes, so calls on the generic type like .Equals for Dictionary act like interface dispatch, whereas structs the Jit turns into a concrete implementation per struct, so .Equals and .GetHashCode become inlinable direct calls for struct keys)

ValueKeccak would have additional copying and allocation size costs on the conversions from Keccak; as well as not having a fast-path on Bytes being reference equal (which may or may not be common). It would be interesting to investigate, but probably to get the benefits would be a larger task of moving everything to ValueKeccak and dropping Keccak

Wonder how it benchmarks, also compared to #2978

ConcurrentLru uses a ConcurrentDictionary and 3 x ConcurrentQueues so is much heavier on memory usage; depends how much contention you experience on the caches?

@LukaszRozmej
Copy link
Member

Wonder how it benchmarks, also compared to #2978

ConcurrentLru uses a ConcurrentDictionary and 3 x ConcurrentQueues so is much heavier on memory usage; depends how much contention you experience on the caches?

Good question, I think the only way to know is profile the node as a whole and try to see the differences.

@benaadams
Copy link
Member Author

~x1.4 improvement

BenchmarkDotNet=v0.13.4, OS=Windows 11 (10.0.22623.1250)
Intel Core i5-9400F CPU 2.90GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET SDK=7.0.200-preview.22628.1
[Host] : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2
Job-UJOYGK : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2

Method MaxCapacity ItemsCount Mean Error Ratio Alloc Ratio
WithItems 16 1 101.1 ns 0.27 ns 0.78 0.91
WithItems_Previous 16 1 130.2 ns 0.62 ns 1.00 1.00
WithItems 16 2 144.7 ns 0.58 ns 0.71 0.88
WithItems_Previous 16 2 205.2 ns 2.75 ns 1.00 1.00
WithItems 16 8 451.9 ns 5.10 ns 0.70 0.78
WithItems_Previous 16 8 647.0 ns 4.94 ns 1.00 1.00
WithItems 16 32 1,977.0 ns 5.19 ns 0.69 0.72
WithItems_Previous 16 32 2,878.1 ns 10.62 ns 1.00 1.00
WithItems 16 64 4,448.1 ns 9.15 ns 0.69 0.72
WithItems_Previous 16 64 6,439.9 ns 54.36 ns 1.00 1.00
WithItems 32 1 126.2 ns 0.62 ns 0.80 0.95
WithItems_Previous 32 1 158.1 ns 1.15 ns 1.00 1.00
WithItems 32 2 171.2 ns 0.68 ns 0.76 0.93
WithItems_Previous 32 2 223.9 ns 1.40 ns 1.00 1.00
WithItems 32 8 466.6 ns 3.99 ns 0.69 0.84
WithItems_Previous 32 8 676.8 ns 5.14 ns 1.00 1.00
WithItems 32 32 1,731.8 ns 15.45 ns 0.71 0.72
WithItems_Previous 32 32 2,441.5 ns 8.47 ns 1.00 1.00
WithItems 32 64 4,146.0 ns 56.99 ns 0.71 0.72
WithItems_Previous 32 64 5,858.2 ns 41.07 ns 1.00 1.00
WithItems 128 1 242.2 ns 2.25 ns 0.88 0.98
WithItems_Previous 128 1 274.8 ns 2.07 ns 1.00 1.00
WithItems 128 2 291.6 ns 2.56 ns 0.82 0.97
WithItems_Previous 128 2 354.9 ns 6.84 ns 1.00 1.00
WithItems 128 8 593.9 ns 8.64 ns 0.76 0.93
WithItems_Previous 128 8 783.7 ns 7.19 ns 1.00 1.00
WithItems 128 32 1,713.1 ns 4.10 ns 0.69 0.83
WithItems_Previous 128 32 2,489.2 ns 11.77 ns 1.00 1.00
WithItems 128 64 3,342.2 ns 7.24 ns 0.69 0.77
WithItems_Previous 128 64 4,848.3 ns 43.01 ns 1.00 1.00

@benaadams benaadams marked this pull request as ready for review February 8, 2023 12:34
@benaadams benaadams changed the title Faster LruCache x1.4 Faster LruCache Feb 8, 2023
@benaadams benaadams requested review from LukaszRozmej and removed request for Scooletz February 8, 2023 20:39
Copy link
Contributor

@Scooletz Scooletz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments/remarks.

src/Nethermind/Nethermind.Core/Caching/LruKeyCache.cs Outdated Show resolved Hide resolved
{
if (Bytes is null) return 0;

long v0 = Unsafe.ReadUnaligned<long>(ref MemoryMarshal.GetArrayDataReference(Bytes));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you noticed collisions at all? This is Keccak on its own so it should have low collision rate. Also as GetHashCode has int as a return parameter, we'll still use 32bits so that

if (Bytes is null) return 0;
return Unsafe.ReadUnaligned<int>(ref MemoryMarshal.GetArrayDataReference(Bytes));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, if Keccak has equal distribution its first 4 bytes should also have equal distribution and mixing all of them might be an overkill?

Copy link
Member Author

@benaadams benaadams Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a world where people brute force Keccak hashes to get vanity addresses[1][2] with as many zeros as possible with 32bits of consecutive zeros or 0 nibbles of zeros taking a fairly small amount of time [1]; with gpu brute forcing becoming so powerful that you can inverse brute force from the 8+ character vanity address to the private keys if you used a deterministic random start [3].

The distribution of the returns from GetHashcode from user input can lead to a hashcode collision attack in Dictionary causing a DoS CVE[4] as handling collisions are an O(n²) (e.g. CVE-2011-3414)

So for example if someone sent more than 1000+ txns that their Keccak hash all started with 8 zeros; through brute forcing txn construction, as the hashcode only takes the first 4 bytes its entirely possible to turn that into a DoS of the execution client. Whereas using all 256 bits makes that an impossibility with current hardware.

[1] https://github.com/1inch/profanity2
[2] https://github.com/johguse/ERADICATE2
[3] https://blog.1inch.io/a-vulnerability-disclosed-in-profanity-an-ethereum-vanity-address-tool-68ed7455fc8c
[4] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3414

Copy link
Member Author

@benaadams benaadams Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr is safer to fold in all the bits since some of the Keccak hashes are directly determined by EOA; and 8 zero leading hashes (32bits) are brute forcible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting finding about the vanity addresses. Looks good then!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have dabbled in vanity addresses 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also that was my CVE report for .NET 🤣

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a world where people brute force Keccak hashes to get vanity addresses[1][2] with as many zeros as possible with 32bits of consecutive zeros or 0 nibbles of zeros taking a fairly small amount of time [1]; with gpu brute forcing becoming so powerful that you can inverse brute force from the 8+ character vanity address to the private keys if you used a deterministic random start [3].

The distribution of the returns from GetHashcode from user input can lead to a hashcode collision attack in Dictionary causing a DoS CVE[4] as handling collisions are an O(n²) (e.g. CVE-2011-3414)

So for example if someone sent more than 1000+ txns that their Keccak hash all started with 8 zeros; through brute forcing txn construction, as the hashcode only takes the first 4 bytes its entirely possible to turn that into a DoS of the execution client. Whereas using all 256 bits makes that an impossibility with current hardware.

[1] https://github.com/1inch/profanity2 [2] https://github.com/johguse/ERADICATE2 [3] https://blog.1inch.io/a-vulnerability-disclosed-in-profanity-an-ethereum-vanity-address-tool-68ed7455fc8c [4] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3414

Great explanation! By the way, FYI, the profanity vanity vulnerability caused a large hack of a famous market maker recently.

@LukaszRozmej LukaszRozmej merged commit 43a4eab into NethermindEth:master Feb 9, 2023
@benaadams benaadams deleted the cache branch February 9, 2023 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants