Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Rfc2898DeriveBytes #24897

Closed
GrabYourPitchforks opened this issue Feb 2, 2018 · 29 comments · Fixed by #48107
Closed

Improve performance of Rfc2898DeriveBytes #24897

GrabYourPitchforks opened this issue Feb 2, 2018 · 29 comments · Fixed by #48107
Labels
api-approved API was approved in API review, it can be implemented area-System.Security tenet-performance Performance related issue
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Feb 2, 2018

See aspnet/DataProtection#272 for full context.

ASP.NET has its own implementation of RFC2898 because it needs top-of-the-line performance for password hashing in order to have acceptable latency for login scenarios. If we bring these performance improvements to the framework's implementation of these APIs, ASP.NET can remove its own implementation and rely on the standard classes.

API Proposal

PBKDF2 technically starts with a password (that's what the "P" is), which is generally thought of as a string, but the algorithm operates on the bytes of a password, causing the always fun encoding selection problem.

Therefore there are two primary entrypoints, and then spanification thereof:

  • static byte[] PBKDF2(byte[] password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithm, int outputLength)
  • static byte[] PBKDF2(string password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithm, int outputLength) (calls the byte-based version after Encoding.UTF8)
namespace System.Security.Cryptography
{
    public partial class Rfc2898DeriveBytes
    {
        public static byte[] Pbkdf2DeriveBytes(
            byte[] password,
            byte[] salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            int outputLength);

        public static byte[] Pbkdf2DeriveBytes(
            ReadOnlySpan<byte> password,
            ReadOnlySpan<byte> salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            int outputLength);

        // uses destination.Length as outputLength.
        public static void Pbkdf2DeriveBytes(
            ReadOnlySpan<byte> password,
            ReadOnlySpan<byte> salt,
            Span<byte> destination,
            int iterations,
            HashAlgorithmName hashAlgorithm);

        public static byte[] Pbkdf2DeriveBytes(
            string password,
            byte[] salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            int outputLength);

        public static byte[] Pbkdf2DeriveBytes(
            ReadOnlySpan<char> password,
            ReadOnlySpan<byte> salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            int outputLength);

        public static void Pbkdf2DeriveBytes(
            ReadOnlySpan<char> password,
            ReadOnlySpan<byte> salt,
            Span<byte> destination,
            int iterations,
            HashAlgorithmName hashAlgorithm);
    }
}
@bartonjs
Copy link
Member

bartonjs commented Feb 2, 2018

It's about as good as it's going to get. It was fixed up in dotnet/corefx#23269.

We can't really drop any lower because the .NET API is "streaming" and the lower level API is one-shot.

@GrabYourPitchforks
Copy link
Member Author

What if we added to this class static methods that performed a one-shot without streaming? That would solve our immediate issue where we don't want this logic duplicated in ASP.NET, and it should make this functionality easier to consume for the 99% of callers who just want one-shot anyway.

@natemcmaster
Copy link
Contributor

Just for comparison...with aspnet dataprotection on Win 10 x64 with .NET Core 2.1.0-preview2-26225-03, we still get ~20% better perf calling into bcrypt.

Method Iterations Mean Error StdDev Median Op/s Gen 0 Allocated
Win8Pbkdf2ProviderSHA512 1000 829.6 us 14.39 us 21.54 us 817.1 us 1,205.446 - 128 B
Win7Pbkdf2ProviderSHA512 1000 1,626.6 us 26.23 us 39.25 us 1,626.6 us 614.795 - 88 B
Rfc2898DeriveBytesSHA512 1000 1,003.7 us 21.98 us 32.90 us 988.1 us 996.311 - 896 B
Win8Pbkdf2ProviderSHA512 10000 8,284.0 us 194.27 us 290.78 us 8,132.7 us 120.714 - 129 B
Win7Pbkdf2ProviderSHA512 10000 16,196.1 us 379.42 us 567.90 us 15,838.9 us 61.743 - 90 B
Rfc2898DeriveBytesSHA512 10000 10,025.3 us 194.45 us 291.04 us 9,830.3 us 99.747 - 896 B
Win8Pbkdf2ProviderSHA512 100000 82,289.8 us 971.97 us 1,454.81 us 81,385.1 us 12.152 - 132 B
Win7Pbkdf2ProviderSHA512 100000 158,191.2 us 960.96 us 1,438.33 us 157,826.4 us 6.321 - 92 B
Rfc2898DeriveBytesSHA512 100000 99,191.0 us 866.50 us 1,296.94 us 99,289.2 us 10.082 - 896 B

Code:
Win8Pbkdf2Provider
Win7Pbkdf2Provider

The benchmark code:

using System.Security.Cryptography;
using System.Text;
using BenchmarkDotNet.Attributes;
using Microsoft.AspNetCore.Cryptography.KeyDerivation;
using Microsoft.AspNetCore.Cryptography.KeyDerivation.PBKDF2;

namespace PBKDF2Benchmark
{
    public class HashBenchmark
    {
        [Params(1000, 10000, 100000)]
        public int Iterations { get; set; }
        private static readonly byte[] _salt = GenerateSalt();
        private static readonly Win7Pbkdf2Provider _win7Provider = new Win7Pbkdf2Provider();
        private static readonly Win8Pbkdf2Provider _win8Provider = new Win8Pbkdf2Provider();

        private static byte[] GenerateSalt()
        {
            using (var random = new RNGCryptoServiceProvider())
            {
                var result = new byte[64];
                random.GetBytes(result);
                return result;
            }
        }

        [Benchmark]
        public byte[] Win8Pbkdf2ProviderSHA512()
        {
            var key = _win8Provider.DeriveKey("Hello World", _salt, KeyDerivationPrf.HMACSHA512, Iterations, 512 / 8);
            return key;
        }

        [Benchmark]
        public byte[] Win7Pbkdf2ProviderSHA512()
        {
            var key = _win7Provider.DeriveKey("Hello World", _salt, KeyDerivationPrf.HMACSHA512, Iterations, 512 / 8);
            return key;
        }

        [Benchmark]
        public byte[] Rfc2898DeriveBytesSHA512()
        {
            byte[] _password = Encoding.UTF8.GetBytes("Hello World");
            var rfc = new Rfc2898DeriveBytes(_password, _salt, Iterations, HashAlgorithmName.SHA512);
            return rfc.GetBytes( 512 / 8 );
        }
    }
}

@Tornhoof
Copy link
Contributor

Tornhoof commented Dec 8, 2018

Is it allowed to add/use the Unsafe library in System.Security.Cryptography.Algorithms?

tl;dr: We could gain another ~5%.
If we:

  1. replace the current per byte xor handling with an uint xor handling:
    https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/System.Security.Cryptography.Algorithms/src/System/Security/Cryptography/Rfc2898DeriveBytes.cs#L277-L280

  2. replace the temp buffer here:
    https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/System.Security.Cryptography.Algorithms/src/System/Security/Cryptography/Rfc2898DeriveBytes.cs#L257 with a stackalloc

  3. Move the arrayPool thing outside of func
    https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/System.Security.Cryptography.Algorithms/src/System/Security/Cryptography/Rfc2898DeriveBytes.cs#L261
    to here:
    https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/System.Security.Cryptography.Algorithms/src/System/Security/Cryptography/Rfc2898DeriveBytes.cs#L184-L186
    (I don't know why the ArrayPool was used in Func() the same buffer could be reused above).

  4. move the throw CryptographicException in a static method

then benchmarks show the following, where Rfc2898DeriveBytesSHA512 is the current implementation, KeyDerivationPBKDF2SHA512 is the one from DataProtection and Rfc2898DeriveBytesSHA512Mod is the one with the changes above.

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17763.168 (1809/October2018Update/Redstone5)
Intel Core i9-9900K CPU 3.60GHz, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=3.0.100-preview-009812
  [Host]   : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT
  ShortRun : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT

Job=ShortRun  IterationCount=3  LaunchCount=1  
WarmupCount=3  
Method Iterations Mean Error StdDev Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
Rfc2898DeriveBytesSHA512 1000 674.3 us 808.89 us 44.338 us 1.25 0.09 - - - 856 B
KeyDerivationPBKDF2SHA512 1000 540.5 us 56.41 us 3.092 us 1.00 0.00 - - - 128 B
Rfc2898DeriveBytesSHA512Mod 1000 615.6 us 56.71 us 3.109 us 1.14 0.00 - - - 760 B
Rfc2898DeriveBytesSHA512 10000 6,470.6 us 159.19 us 8.726 us 1.20 0.00 - - - 856 B
KeyDerivationPBKDF2SHA512 10000 5,371.2 us 45.89 us 2.515 us 1.00 0.00 - - - 128 B
Rfc2898DeriveBytesSHA512Mod 10000 6,116.1 us 239.27 us 13.115 us 1.14 0.00 - - - 760 B
Rfc2898DeriveBytesSHA512 100000 64,819.1 us 945.27 us 51.813 us 1.21 0.00 - - - 856 B
KeyDerivationPBKDF2SHA512 100000 53,742.1 us 3,080.32 us 168.843 us 1.00 0.00 - - - 134 B
Rfc2898DeriveBytesSHA512Mod 100000 61,247.5 us 1,670.11 us 91.544 us 1.14 0.00 - - - 760 B

The modified class:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Buffers;
using System.Text;
using System.Diagnostics;
using System.Runtime.CompilerServices;

using Internal.Cryptography;

namespace System.Security.Cryptography
{
    public class Rfc2898DeriveBytes : DeriveBytes
    {
        private const int MinimumSaltSize = 8;
        
        private readonly byte[] _password;
        private byte[] _salt;
        private uint _iterations;
        private HMAC _hmac;
        private readonly int _blockSize;

        private byte[] _buffer;
        private uint _block;
        private int _startIndex;
        private int _endIndex;

        public HashAlgorithmName HashAlgorithm { get; }

        public Rfc2898DeriveBytes(byte[] password, byte[] salt, int iterations)
            : this(password, salt, iterations, HashAlgorithmName.SHA1)
        {
        }

        public Rfc2898DeriveBytes(byte[] password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithm)
        {
            if (salt == null)
                throw new ArgumentNullException(nameof(salt));
            if (salt.Length < MinimumSaltSize)
                throw new ArgumentException(SR.Cryptography_PasswordDerivedBytes_FewBytesSalt, nameof(salt));
            if (iterations <= 0)
                throw new ArgumentOutOfRangeException(nameof(iterations), SR.ArgumentOutOfRange_NeedPosNum);
            if (password == null)
                throw new NullReferenceException();  // This "should" be ArgumentNullException but for compat, we throw NullReferenceException.

            _salt = salt.CloneByteArray();
            _iterations = (uint)iterations;
            _password = password.CloneByteArray();
            HashAlgorithm = hashAlgorithm;
            _hmac = OpenHmac();
            // _blockSize is in bytes, HashSize is in bits.
            _blockSize = _hmac.HashSize >> 3;

            Initialize();
        }

        public Rfc2898DeriveBytes(string password, byte[] salt)
             : this(password, salt, 1000)
        {
        }

        public Rfc2898DeriveBytes(string password, byte[] salt, int iterations)
            : this(password, salt, iterations, HashAlgorithmName.SHA1)
        {
        }

        public Rfc2898DeriveBytes(string password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithm)
            : this(Encoding.UTF8.GetBytes(password), salt, iterations, hashAlgorithm)
        {
        }

        public Rfc2898DeriveBytes(string password, int saltSize)
            : this(password, saltSize, 1000)
        {
        }

        public Rfc2898DeriveBytes(string password, int saltSize, int iterations)
            : this(password, saltSize, iterations, HashAlgorithmName.SHA1)
        {
        }

        public Rfc2898DeriveBytes(string password, int saltSize, int iterations, HashAlgorithmName hashAlgorithm)
        {
            if (saltSize < 0)
                throw new ArgumentOutOfRangeException(nameof(saltSize), SR.ArgumentOutOfRange_NeedNonNegNum);
            if (saltSize < MinimumSaltSize)
                throw new ArgumentException(SR.Cryptography_PasswordDerivedBytes_FewBytesSalt, nameof(saltSize));
            if (iterations <= 0)
                throw new ArgumentOutOfRangeException(nameof(iterations), SR.ArgumentOutOfRange_NeedPosNum);

            _salt = Helpers.GenerateRandom(saltSize);
            _iterations = (uint)iterations;
            _password = Encoding.UTF8.GetBytes(password);
            HashAlgorithm = hashAlgorithm;
            _hmac = OpenHmac();
            // _blockSize is in bytes, HashSize is in bits.
            _blockSize = _hmac.HashSize >> 3;

            Initialize();
        }

        public int IterationCount
        {
            get
            {
                return (int)_iterations;
            }

            set
            {
                if (value <= 0)
                    throw new ArgumentOutOfRangeException(nameof(value), SR.ArgumentOutOfRange_NeedPosNum);
                _iterations = (uint)value;
                Initialize();
            }
        }

        public byte[] Salt
        {
            get
            {
                return _salt.CloneByteArray();
            }

            set
            {
                if (value == null)
                    throw new ArgumentNullException(nameof(value));
                if (value.Length < MinimumSaltSize)
                    throw new ArgumentException(SR.Cryptography_PasswordDerivedBytes_FewBytesSalt);
                _salt = value.CloneByteArray();
                Initialize();
            }
        }

        protected override void Dispose(bool disposing)
        {
            if (disposing)
            {
                if (_hmac != null)
                {
                    _hmac.Dispose();
                    _hmac = null;
                }

                if (_buffer != null)
                    Array.Clear(_buffer, 0, _buffer.Length);
                if (_password != null)
                    Array.Clear(_password, 0, _password.Length);
                if (_salt != null)
                    Array.Clear(_salt, 0, _salt.Length);
            }
            base.Dispose(disposing);
        }

        public override byte[] GetBytes(int cb)
        {
            Debug.Assert(_blockSize > 0);

            if (cb <= 0)
                throw new ArgumentOutOfRangeException(nameof(cb), SR.ArgumentOutOfRange_NeedPosNum);
            byte[] password = new byte[cb];

            int offset = 0;
            int size = _endIndex - _startIndex;
            if (size > 0)
            {
                if (cb >= size)
                {
                    Buffer.BlockCopy(_buffer, _startIndex, password, 0, size);
                    _startIndex = _endIndex = 0;
                    offset += size;
                }
                else
                {
                    Buffer.BlockCopy(_buffer, _startIndex, password, 0, cb);
                    _startIndex += cb;
                    return password;
                }
            }

            Debug.Assert(_startIndex == 0 && _endIndex == 0, "Invalid start or end index in the internal buffer.");

            byte[] ui = ArrayPool<byte>.Shared.Rent(_blockSize);
            try
            {
                while (offset < cb)
                {
                    byte[] T_block = Func(ui);
                    int remainder = cb - offset;
                    if (remainder > _blockSize)
                    {
                        Buffer.BlockCopy(T_block, 0, password, offset, _blockSize);
                        offset += _blockSize;
                    }
                    else
                    {
                        Buffer.BlockCopy(T_block, 0, password, offset, remainder);
                        offset += remainder;
                        Buffer.BlockCopy(T_block, remainder, _buffer, _startIndex, _blockSize - remainder);
                        _endIndex += (_blockSize - remainder);
                        return password;
                    }
                }
            }
            finally
            {
                Array.Clear(ui, 0, _blockSize);
                ArrayPool<byte>.Shared.Return(ui);
            }
            return password;
        }

        public byte[] CryptDeriveKey(string algname, string alghashname, int keySize, byte[] rgbIV)
        {
            // If this were to be implemented here, CAPI would need to be used (not CNG) because of
            // unfortunate differences between the two. Using CNG would break compatibility. Since this
            // assembly currently doesn't use CAPI it would require non-trivial additions.
            // In addition, if implemented here, only Windows would be supported as it is intended as
            // a thin wrapper over the corresponding native API.
            // Note that this method is implemented in PasswordDeriveBytes (in the Csp assembly) using CAPI.
            throw new PlatformNotSupportedException();
        }

        public override void Reset()
        {
            Initialize();
        }

        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Security", "CA5350", Justification = "HMACSHA1 is needed for compat. (https://github.com/dotnet/corefx/issues/9438)")]
        private HMAC OpenHmac()
        {
            Debug.Assert(_password != null);

            HashAlgorithmName hashAlgorithm = HashAlgorithm;

            if (string.IsNullOrEmpty(hashAlgorithm.Name))
                throw new CryptographicException(SR.Cryptography_HashAlgorithmNameNullOrEmpty);

            if (hashAlgorithm == HashAlgorithmName.SHA1)
                return new HMACSHA1(_password);
            if (hashAlgorithm == HashAlgorithmName.SHA256)
                return new HMACSHA256(_password);
            if (hashAlgorithm == HashAlgorithmName.SHA384)
                return new HMACSHA384(_password);
            if (hashAlgorithm == HashAlgorithmName.SHA512)
                return new HMACSHA512(_password);

            throw new CryptographicException(SR.Format(SR.Cryptography_UnknownHashAlgorithm, hashAlgorithm.Name));
        }

        private void Initialize()
        {
            if (_buffer != null)
                Array.Clear(_buffer, 0, _buffer.Length);
            _buffer = new byte[_blockSize];
            _block = 1;
            _startIndex = _endIndex = 0;
        }

        // This function is defined as follows:
        // Func (S, i) = HMAC(S || i) ^ HMAC2(S || i) ^ ... ^ HMAC(iterations) (S || i) 
        // where i is the block number.
        private byte[] Func(byte[] ui)
        {
            int tempLength = _salt.Length + sizeof(uint);
            Span<byte> temp = tempLength > 100
                ? new byte[tempLength]
                : stackalloc byte[tempLength];
            _salt.CopyTo(temp);
            Helpers.WriteInt(_block, temp, _salt.Length);

            Span<byte> uiSpan = new Span<byte>(ui, 0, _blockSize);

            if (!_hmac.TryComputeHash(temp, uiSpan, out int bytesWritten) || bytesWritten != _blockSize)
            {
                ThrowCryptographicException();
            }

            byte[] ret = new byte[_blockSize];
            uiSpan.CopyTo(ret);

            for (int i = 2; i <= _iterations; i++)
            {
                if (!_hmac.TryComputeHash(uiSpan, uiSpan, out bytesWritten) || bytesWritten != _blockSize)
                {
                    ThrowCryptographicException();
                }

                // The allowed block sizes are all multiples of 4
                // SHA1 is 160bit -> it needs uint, everything could use ulong, which would be faster
                // Replace with https://github.com/dotnet/corefx/issues/30746
                int rounds = _blockSize >> 2;
                ref uint[] retUint = ref Unsafe.As<byte[], uint[]>(ref ret);
                ref uint[] uiUint = ref Unsafe.As<byte[], uint[]>(ref ui);
                for (int j = 0; j < rounds; j++)
                {
                    retUint[j] ^= uiUint[j];
                }
            }

            // increment the block count.
            _block++;
            return ret;

        }

        private static void ThrowCryptographicException()
        {
            throw new CryptographicException();
        }
    }
}

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@bartonjs bartonjs added api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed enhancement Product code improvement that does NOT require public API changes/additions labels Sep 28, 2020
@bartonjs bartonjs modified the milestones: Future, 6.0.0 Sep 28, 2020
@vcsjones
Copy link
Member

vcsjones commented Jan 26, 2021

A straw man proposal:

namespace System.Security.Cryptography
{
    public class Rfc2898DeriveBytes
    {
        public static byte[] Pbkdf2DeriveBytes(
            byte[] password,
            byte[] salt,
            int iterations,
            int length,
            HashAlgorithmName hashAlgorithm);

        public static byte[] Pbkdf2DeriveBytes(
            ReadOnlySpan<byte> password,
            ReadOnlySpan<byte> salt,
            int iterations,
            int length,
            HashAlgorithmName hashAlgorithm);

        public static void Pbkdf2DeriveBytes(
            ReadOnlySpan<byte> password,
            ReadOnlySpan<byte> salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            Span<byte> destination);

        public static byte[] Pbkdf2DeriveBytes(
            string password,
            byte[] salt,
            int iterations,
            int length,
            HashAlgorithmName hashAlgorithm);

        public static byte[] Pbkdf2DeriveBytes(
            ReadOnlySpan<char> password,
            ReadOnlySpan<byte> salt,
            int iterations,
            int length,
            HashAlgorithmName hashAlgorithm);

        public static void Pbkdf2DeriveBytes(
            ReadOnlySpan<char> password,
            ReadOnlySpan<byte> salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            Span<byte> destination);
    }
}

Thoughts:

  1. We could add more convenience overloads that pick a default HashAlgorithmName. SHA256 there makes the most sense, but is a different default than the instance-based one.
  2. Could add any number of combinations of span, array, throwing, bool returning overloads. I opted for the smalled API set that I think covers the majority of cases without some cartesian product of everything possible. The simple case is covered, anything more complicated can use the span-filling approach.
  3. No real opinion on where these APIs should live or be named. There is already a member called DeriveBytes on the base class, so my first instinct was unavailable.

@Tornhoof
Copy link
Contributor

I ran the benchmark from the original issue again:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=5.0.102
  [Host]     : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
  DefaultJob : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT

Method Iterations Mean Error StdDev
Rfc2898DeriveBytesSHA512 1000 491.8 μs 0.73 μs 0.68 μs
Rfc2898DeriveBytesSHA512Mod 1000 464.0 μs 0.46 μs 0.43 μs
KeyDerivationPBKDF2SHA512 1000 430.2 μs 0.28 μs 0.25 μs
Rfc2898DeriveBytesSHA512 10000 4,835.1 μs 4.67 μs 3.90 μs
Rfc2898DeriveBytesSHA512Mod 10000 4,628.6 μs 5.39 μs 4.78 μs
KeyDerivationPBKDF2SHA512 10000 4,297.7 μs 1.61 μs 1.43 μs
Rfc2898DeriveBytesSHA512 100000 48,963.7 μs 54.02 μs 42.18 μs
Rfc2898DeriveBytesSHA512Mod 100000 46,562.8 μs 27.41 μs 25.64 μs
KeyDerivationPBKDF2SHA512 100000 42,994.6 μs 38.23 μs 33.89 μs

The difference between KeyDerivation is down to ~15% and with switching the byte xor in Func() to be uint based, see #26651 the difference gets down to ~10% on an amd zen3.
I expect having a static one shot implementation without all that stream support (e.g. like the KeyDerivation type) probably should get us close to the same speed as KeyDerivation.

@vcsjones
Copy link
Member

I expect having a static one shot implementation without all that stream support

This also opens the doors to us for platform native implementations (BCryptDeriveKeyPBKDF2, PKCS5_PBKDF2_HMAC, CCKeyDerivationPBKDF). On the surface it looks like the low-level implementation could be allocation free (in .NET terms) on all platforms except < Win10.

@bartonjs
Copy link
Member

Oh, you beat me to a strawman. Nice.

We could add more convenience overloads that pick a default HashAlgorithmName.

Nope. We don't want defaults in crypto API, because eventually all defaults are bad. One day SHA-2 will also be a broken algorithm.

No real opinion on where these APIs should live or be named.

The existing class seems right. DeriveKey was my 5 minutes ago strawman to keep the abbreviation out and better suggest that the purpose is for deriving keys, not IVs.

public static bool TryPbkdf2DeriveBytes

Can the proposed trys fail (other than, perhaps, a length-zero destination)? To me this is more like RNG.Fill... you give it a buffer, it writes as much to it as you asked.

Could add any number of combinations of span, array, throwing, bool returning overloads

I like your set, except that I'd make the Span-writing ones void with Fill semantics. Better than my set from the other issue.

@vcsjones
Copy link
Member

I'd make the Span-writing ones void with Fill semantics.

Yeah that was me trying to switch to decaf coffee. I updated mine.

I kept all the method names the same. We could consider making the Span-filling ones named Fill and the byte[] returning ones called get like RNG since there is no length other than whatever it is you want us to fill.

@MichalPetryka
Copy link
Contributor

I expect having a static one shot implementation without all that stream support

This also opens the doors to us for platform native implementations (BCryptDeriveKeyPBKDF2, PKCS5_PBKDF2_HMAC, CCKeyDerivationPBKDF). On the surface it looks like the low-level implementation could be allocation free (in .NET terms) on all platforms except < Win10.

What's the reason for an allocation in older versions of Windows?

@bartonjs
Copy link
Member

What's the reason for an allocation in older versions of Windows?

Windows 10 has singleton values (which they pseudo-handles) for hash algorithms. Older Windowses don't, so we'd need to instantiate the hash algorithm. While we /could/ write it with IntPtr and try-finally, the done thing is a SafeHandle.

@vcsjones
Copy link
Member

Windows 10 has HMAC pseudo handles https://docs.microsoft.com/en-us/windows/win32/seccng/cng-algorithm-pseudo-handles.

For < Windows 10, we could end up allocating a handle.

@vcsjones
Copy link
Member

PKCS5_PBKDF2_HMAC

Bah. That's OpenSSL 1.1+. @bartonjs I don't suppose you've been thinking about dropping 1.0 anytime soon? Seems like we could shim it for older openssls.

@bartonjs
Copy link
Member

I don't suppose you've been thinking about dropping 1.0 anytime soon?

I haven't looked at the supported OS matrix for .NET 6. Eyeballing it:

  • Ubuntu 16.04 will be out of support
  • Ubuntu 18.04 will still be in support. It shipped with 1.0.2 but security updates claim it's on 1.1.1 now. This is the weird one.
  • RHEL7 will be out of support
  • RHEL8 looks like 1.1(.1)
  • Didn't look at the others yet.

So... maybe? But to be conservative I'm probably going to make the loader support 3.0/1.1/1.0 for 6. Unless that proves impossible, then I'll have to have a conversation with release management around my ability to support that wide of a range of systems.

Seems like we could shim it for older openssls.

Yep, we'd just define CryptoNative_PBKDF2 (or whatever) and set up a FALLBACK_FUNCTION for if 1.0 is the loaded version.

@GrabYourPitchforks
Copy link
Member Author

The one-shot HashAlgorithm implementations (on Windows) already use pseudo-handles if available, falling back to temporary allocations if pseudo-handles are not available. (See code.)

@Tornhoof
Copy link
Contributor

This also opens the doors to us for platform native implementations

Would that help that much? the dataprotection pbkdf2 function on win10 is not much faster and that's using windows' bcrypt api, assuming this is true for all the other implementations too, why would you want to have native implementations? I understand for a large perf gain, which in this case doubles as a security gain too, but for somehting in single digit % range. The source code complexity of PBKDF2 isn't high and it can be well tested, so that can't be it either.

@vcsjones
Copy link
Member

vcsjones commented Jan 26, 2021

@GrabYourPitchforks

The one-shot HashAlgorithm implementations (on Windows) already use pseudo-handles if available,

We have not psuedo-ified the HMAC algorithms which are different handles I believe. BCRYPT_HMAC_SHA256_ALG_HANDLE vs BCRYPT_SHA256_ALG_HANDLE for example. Maybe if #40012 got some traction I could add it though.

Edit: Sorry if I misunderstood, perhaps you were just pointing out a place we are already using pseudo handles.

@Tornhoof

Would that help that much?

Maybe not, but I think then the HMAC one-shot becomes a pre-req for this. Regardless I can bench it. But if, say, we want to have something for the dataprotection to use, whatever we add can't be slower, or they'll keep using their own implementation.

@GrabYourPitchforks
Copy link
Member Author

@vcsjones Yeah, it's a possible source that we could copy + paste from if we wanted to extend the same benefit to HMAC and KDF handles. It and a few related refactorings showed a modest increase in throughput. Need to determine if it's worth taking the churn in this code.

@vcsjones
Copy link
Member

vcsjones commented Feb 7, 2021

I noodled with a proof-of-concept for this, I think the API proposal is reasonable, aside from the name Pbkdf2DeriveBytes(...) but maybe that can be solved in API review.


We could implement this in managed code if we wanted, which might actually be the right way. OpenSSL doesn't have this until OpenSSL v1.1, and macOS has zero documentation for CCKeyDerivationPBKDF, though they do "document" it by shipping headers for it.

If we want to implement in managed code and allocation-free, then this proposal is blocked on #40012 since we have no way to do no-allocating HMAC operations.

@bartonjs
Copy link
Member

bartonjs commented Feb 8, 2021

We could implement this in managed code if we wanted, which might actually be the right way.

Doing it through only one P/Invoke will cut down on at least (iteration*blocks)-1 transitions, which should allow higher iteration counts at the same clock time. Assuming that only OpenSSL 1.0 is a problem, we'd just have to implement PBKDF2 as a fallback function.

@bartonjs bartonjs added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Feb 8, 2021
@vcsjones
Copy link
Member

vcsjones commented Feb 8, 2021

Assuming that only OpenSSL 1.0 is a problem

That appears to be the case. CNG is known to be good, and macOS's API is straight forward enough.

One thing that needs to be checked are minimums across platforms for consistency. OpenSSL follows SP800-132 for minimums:

https://github.com/openssl/openssl/blob/a829b735b645516041b55746e013692babd8cd31/providers/implementations/kdfs/pbkdf2.c#L292-L305

So the salt size, number of iterations, and number of bytes to extract from the KDF will have some minimum. As far as how that affects the API proposals, I don't think it does, other than they will throw an Argument{OutOfRange}Exception.

@Tornhoof
Copy link
Contributor

Tornhoof commented Feb 8, 2021

Some of the min Limits might affect Test vectors, a few published ones include Iteration Count = 1.

@vcsjones vcsjones mentioned this issue Feb 10, 2021
9 tasks
@vcsjones
Copy link
Member

Update based on work in the draft PR:

  1. OpenSSL 1.0.x has the right function. No shim required. I misread some docs.
  2. The minimums are not applicable to the called function in OpenSSL. MacOS and CNG do not appear to have minimums, either. So these APIs will not enforce any particular minimum on salt, extract, or iterations (other than sanity checks, like no negative iterations).

@GrabYourPitchforks
Copy link
Member Author

Nit: We should use the throwing (non-substituting) version of UTF-8 when performing the chars -> bytes conversion.

@vcsjones
Copy link
Member

@GrabYourPitchforks added to the TODO in the PR.

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 16, 2021
@terrajobst
Copy link
Member

terrajobst commented Feb 16, 2021

Video

  • Let's drop the name DeriveBytes from the method name because it doesn't really add anything
  • We considered a separate type but it seems the difference is streaming vs non-streaming and static vs non-static seems to convey is sufficiently.
namespace System.Security.Cryptography
{
    public partial class Rfc2898DeriveBytes
    {
        public static byte[] Pbkdf2(
            byte[] password,
            byte[] salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            int outputLength);

        public static byte[] Pbkdf2(
            ReadOnlySpan<byte> password,
            ReadOnlySpan<byte> salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            int outputLength);

        public static void Pbkdf2(
            ReadOnlySpan<byte> password,
            ReadOnlySpan<byte> salt,
            Span<byte> destination,
            int iterations,
            HashAlgorithmName hashAlgorithm);

        public static byte[] Pbkdf2(
            string password,
            byte[] salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            int outputLength);

        public static byte[] Pbkdf2(
            ReadOnlySpan<char> password,
            ReadOnlySpan<byte> salt,
            int iterations,
            HashAlgorithmName hashAlgorithm,
            int outputLength);

        public static void Pbkdf2(
            ReadOnlySpan<char> password,
            ReadOnlySpan<byte> salt,
            Span<byte> destination,
            int iterations,
            HashAlgorithmName hashAlgorithm);
    }
}

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 16, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 22, 2021
@omajid
Copy link
Member

omajid commented Mar 12, 2021

Is there an issue tracking the removal/replacement of the PBKDF2 implementation ASP.NET Core?

@vcsjones
Copy link
Member

Is there an issue tracking the removal/replacement of the PBKDF2 implementation ASP.NET Core?

Hm, not anymore. There was dotnet/aspnetcore#2508 but it's been closed for a while.

@omajid
Copy link
Member

omajid commented Mar 15, 2021

Thanks. I filed dotnet/aspnetcore#30941

@ghost ghost locked as resolved and limited conversation to collaborators Apr 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Security tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants