Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

[release/3.1] Improve Expression.Lambda<TDelegate> performance (dotnet/runtime#32768) #42874

Closed
wants to merge 1 commit into from

Conversation

cston
Copy link
Member

@cston cston commented Feb 26, 2020

Port of dotnet/runtime#32768
Fixes dotnet/runtime#32767

Description

An Expression lambda delegate cache was added in .NET Core 2.0. If there are many calls to Expression.Lambda<TDelegate>() overloads with distinct delegate types which result in cache misses, the result is a performance regression.

Customer Impact

Performance regression with many calls to Expression.Lambda<TDelegate>() with distinct delegate types. Customer reported this impact in a migration from .NET Framework to .NET Core.

Packaging Impact

None

Regression

Regression from .NET Core 1.0, and .NET Framework.

Testing

See benchmark testing in dotnet/runtime#32767.

Risk

Low. The change avoids the cache and allocates a distinct Expression<TDelegate> each time.

cc @ericstj

@cston cston marked this pull request as ready for review February 26, 2020 18:12
@cston cston modified the milestones: 3.1.4, 3.1.x Feb 26, 2020
@cston cston added the Servicing-consider Issue for next servicing release review label Feb 26, 2020
@cston
Copy link
Member Author

cston commented Feb 26, 2020

cc @vchirikov

@danmoseley
Copy link
Member

@Anipik I naively assumed that servicing something in netcoreapp ( <IsNETCoreApp>true</IsNETCoreApp>) did not require any package/versioning changes. Can you help me understand the versioning change?

@Anipik
Copy link

Anipik commented Feb 27, 2020

did not require any package/versioning changes. Can you help me understand the versioning change?

Yes we didn't require versioning changes if the library is a part of shared framework. I thought we do this for consistency but I verified and we have never done this before.

@cston Can you please remove the assemblyVersion update. sry for the mix up.

@cston
Copy link
Member Author

cston commented Feb 27, 2020

@Anipik, I've reverted the changes to packageIndex.json and Directory.Build.props.

@Anipik Anipik added the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 27, 2020
@vchirikov
Copy link

@Anipik could you explain why this has "no merge" label?

@Anipik
Copy link

Anipik commented Feb 28, 2020

Because the branch is currently closed for any changes. I will remove the label and merge the change when the branch will be open for 3.1.4

@vchirikov
Copy link

Ok, thanks for the explanation.

@leecow
Copy link
Member

leecow commented Mar 3, 2020

Test with EF and we'll hold this for a while to determine if there are more customers hitting this.

@leecow leecow removed the Servicing-consider Issue for next servicing release review label Mar 5, 2020
@vchirikov
Copy link

vchirikov commented Mar 5, 2020

@leecow Did you miss Servicing-approved or it won't be in 3.1.4 ?
It is not clear why and how long you want to hold this fix

@AlexUstinov
Copy link

AlexUstinov commented Mar 5, 2020

@leecow actually every project must be hit to one or another degree if only this project uses LINQ.
For example lets take a look to the following sample:

using System.Linq;

namespace CompilerGeneratedExpressionLambda
{
    public class DataItem
    {
        public long Id { get; set; }
        public string Value { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var allItems = Enumerable.Empty<DataItem>().AsQueryable();
            var filteredItems =
                from item in allItems
                where item.Value == args[0]
                select new {
                    item.Id,
                    item.Value
                };
        }
    }
}

This sample contains just single and very simple LINQ query but even here compiler included two calls of Expression.Lambda<TDelegate>() method into resulting IL code. Those are

// the first is for where statement predicate
IL_008c: call         class [System.Linq.Expressions]System.Linq.Expressions.Expression`1<!!0/*class [System.Runtime]System.Func`2<class CompilerGeneratedExpressionLambda.DataItem, bool>*/> [System.Linq.Expressions]System.Linq.Expressions.Expression::Lambda<class [System.Runtime]System.Func`2<class CompilerGeneratedExpressionLambda.DataItem, bool>>(class [System.Linq.Expressions]System.Linq.Expressions.Expression, class [System.Linq.Expressions]System.Linq.Expressions.ParameterExpression[])
 
// the second is for select statement
IL_0138: call         class [System.Linq.Expressions]System.Linq.Expressions.Expression`1<!!0/*class [System.Runtime]System.Func`2<class CompilerGeneratedExpressionLambda.DataItem, class '<>f__AnonymousType0`2'<int64, string>>*/> [System.Linq.Expressions]System.Linq.Expressions.Expression::Lambda<class [System.Runtime]System.Func`2<class CompilerGeneratedExpressionLambda.DataItem, class '<>f__AnonymousType0`2'<int64, string>>>(class [System.Linq.Expressions]System.Linq.Expressions.Expression, class [System.Linq.Expressions]System.Linq.Expressions.ParameterExpression[])

As we can see just a few hundred methods like this one in a project could currently lead to almost 100% of cache misses in current Expression.Lambda<TDelegate>() method implementation.

@danmoseley
Copy link
Member

@vchirikov in our tactics meeting discussion, they discussed risk vs benefit. Changing code in this library has historically had a higher regression rate because it is complex. Certainly this change is localized, but the ask was that we give it more time in master (it will go out in 5.0 previews for example) unless evidence emerges that it's having a significant impact on a wider range of customers. Do you believe that we're misreading risk or value here?

@vchirikov
Copy link

vchirikov commented Mar 5, 2020

I think that your experienced programmers can predict risk of this fix.
We can also call the author of this regression and ask, it was a mistake or not and what does he think about this fix. ( @bartdesmet are you here? )
And what about your performance benchmarks ? I believe that this will impact at a complex application is quite more than you think. Can we run your benchmarks (if you don't agree with mine) on 2 runtimes with this fix and without? (can somebody from https://github.com/aspnet/benchmarks/ team help us?)
Also @stephentoub you've made great articles about dotnet performance improvements, maybe can you take to us your expertise about this regression and it's impact to big, complex application ?

@bartdesmet
Copy link
Contributor

FWIW, the perf regression caused by the original PR was certainly not intended.

The change in that PR wanted to ensure that all lambda node creations are subject to parameter count checks in order to pick optimized variants of the type for low parameter counts. The original code was always newing up the most expensive variant. To centralize the layout optimization, all code ended up getting routed through CreateLambda helpers, which started to hit the cache that was added before. The intersection of both changes led to the regression.

As long as the new code goes through Create helpers, it will be fine and still have the layout optimization.

@AlexUstinov
Copy link

I'd like to share workaround we use to overcome this issue. It might be potentially interesting for someone who experiences the same issue but who can't wait until 5.0 for resolution.
Here it is

using System.Linq;
using System.Linq.Expressions;
using System.Reflection;

namespace LambdaFactoryCacheResizer
{
    class Program
    {
        static void Main(string[] args)
        {
            ResizeLambdaFactoryDelegateCache(262_144);
        }

        // A temp. fix for https://github.com/dotnet/corefx/pull/42874
        static void ResizeLambdaFactoryDelegateCache(int newSize)
        {
            const BindingFlags anyStatic = BindingFlags.Static | BindingFlags.Public | BindingFlags.NonPublic;
            const BindingFlags anyInstance = BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic;
            var lambdaFactoriesFieldInfo = typeof(Expression).GetField("s_lambdaFactories", anyStatic);
            var cacheType = lambdaFactoriesFieldInfo.FieldType;
            var cacheTypeCtor = cacheType.GetConstructors(anyInstance)
                .SingleOrDefault(c => {
                    var parameters = c.GetParameters();
                    return parameters.Length == 1 && parameters[0].ParameterType == typeof(int);
                });

            var cache = cacheTypeCtor.Invoke(new [] { (object) newSize } );
            lambdaFactoriesFieldInfo.SetValue(null, cache);
        }
    }
}

As you can see we just replace internal factory delegate cache with another much bigger instance.
This fix allowed us to return performance back to normal while without it we have observed up to 100x slowdown on certain operations.
The following is the results of our benchmarks for different cache sizes

|            Method | TypeCount | ExpressionLambdaCacheSize |        Mean |     Error |   StdDev |
|------------------ |---------- |-------------------------- |------------:|----------:|---------:|
| Expression_Lambda |     10000 |                        50 | 1,235.04 ms | 17.091 ms | 2.645 ms |
| Expression_Lambda |     10000 |                     32768 |   347.48 ms |  8.769 ms | 1.357 ms |
| Expression_Lambda |     10000 |                     65536 |   215.38 ms |  3.486 ms | 0.539 ms |
| Expression_Lambda |     10000 |                    131072 |   144.49 ms |  4.530 ms | 0.701 ms |
| Expression_Lambda |     10000 |                    262144 |    96.53 ms |  1.092 ms | 0.169 ms |
| Expression_Lambda |     10000 |                    524288 |    73.07 ms |  2.336 ms | 0.361 ms |
| Expression_Lambda |     10000 |                   1048576 |    60.11 ms |  2.758 ms | 0.427 ms |
| Expression_Lambda |     10000 |                   2097152 |    52.75 ms |  2.062 ms | 0.319 ms |
| Expression_Lambda |     10000 |                   4194304 |    49.93 ms |  1.836 ms | 0.284 ms |

Also I want to show by this comment what ugly hacks we have to use to still have other .Net Core 3.1 benefits.

@vchirikov
Copy link

@bartdesmet Yes, I'm totally agree.
For clarity: Is there a risk of merging this fix in your opinion?

@stephentoub
Copy link
Member

Is there a risk of merging this fix

There is always risk.

@vchirikov
Copy link

It's really strange to hear about risk of deploying the 1-line bug fix that was approved, when you doesn't check the deployed LTS sdk package :)

So, If we don't see this fix in 3.1.3, and in 3.1.4, when it will be available in 3.1 LTS release?

Could you give us a rough estimate when it will be released?

@stephentoub
Copy link
Member

stephentoub commented Mar 6, 2020

It's really strange to hear about risk of deploying the 1-line bug fix that was approved, when you doesn't check the deployed LTS sdk package :)

That's just further evidence that every change, big or small, has the potential to break someone in some way, whether due to the change itself or just the act of trying to make a fix.

when it will be available

This issue has been reported only by you so far. You have a workaround. As Lee wrote earlier, "we'll hold this for a while to determine if there are more customers hitting this."

Thanks for your interest around this.

@AlexUstinov
Copy link

@stephentoub you are right, we have a workaround and we'll survive. But what makes us unhappy is Misrosoft's decision to keep this pretty major thing as is just because no one else reported it.
Personally I don't think waiting for customers to report a problem is the best way to measure an impact. As an alternative approach company could ASK some customers (or do it internally) to measure impact of a KNOWN issue rather than wait someone will rediscover unknown for them issue again.

I believe you all can see this certain issue is hardly discoverable. First of all this internal cache for 64 elements makes inevitably failed all attempts to isolate issue by locally generating load to problematic method. Secondly all these LINQ queries in vast majority of cases are used to access a database but database access often leads to unexpected spikes. People could just spend time optimizing their DB access layer rather than paying attention to very low level method compiler automatically puts into their code. This is what we did. And those optimization even give some improvements! Just because in many cases it is possible to use lesser amount of LINQ expressions in code to achieve the same goals.

Finally we decided to route all problematic read-only production load onto two subsets of machines in parallel. Those was .Net Core 3.1 and .Net Framework 4.7.2 versions of the same code. We took traces on both and then we did a comparison. We expected to see something inside Data.Sqlclient but found a huge difference in Expression.Lambda<TDelegate>().

We were lucky we had these two production ready versions for different platforms to compare otherwise I even don't know how to approach this. You can observe perfect performance when load is low then when load is medium you can observe higher CPU (+10% - +15%) all response times are elevated but nothing too bad. But when load went to high you can observe huge spikes (up to 100x time slower) in response times sporadically.

We were really excited when this PR has been created. It was very unusual to see how fast Microsoft moves trying to make developers life better. But now we can feel only huge disappointment, I'm sorry saying this guys.

PS. As one good friend of mine says, "Everyone has to suffer!". :)

@stephentoub
Copy link
Member

stephentoub commented Mar 6, 2020

We take LTS releases very seriously, with a very high bar, and we have to weigh the very real chance of breaking one party in exchange for fixing an issue for someone else. The more "someone else"s we know a fix will help, the more that outweighs the downside of a potential break. The vast majority of fixes we put into the master branch are never even reviewed for a servicing branch. In this case, it was reviewed, and the answer wasn't "no", the answer was "we'll take our time and wait and see". I hope you can appreciate the conditions we have to weigh. Thanks, again.

@AlexUstinov
Copy link

This position is totally understandable, but my point is different.
What if instead of "Waiting for someone else" just try to look for someone else?
I believe you have this power.
Anyway thank you for healthy discussion.

@aik-jahoda
Copy link

@cston, this PR is there for a long time. Did you consider close it and target .net 5 which is not LTS or .net 6?

cc @stephentoub @Anipik

@danmoseley
Copy link
Member

I am closing this as I'm not aware of a plan to take this forward.

@danmoseley danmoseley closed this Nov 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq.Expressions * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.