Change bound checking in SSE/AVX intrinsics to avoid pointer overflow #821

briancylui · 2018-09-05T00:39:40Z

Aims to solve #980

Suggested by @ahsonkhan to avoid integer overflow in bound checking inside SSE/AVX intrinsics implementation, i.e. change all while (pCurrent + 8 OR 4 <= pEnd) into while (pEnd - pCurrent >= 8 OR 4).

Perf tests results before and after the change are shown below:

Before the change:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain

Type	Method	Mean	Error	StdDev
AvxPerformanceTests	SumU	159.4 us	1.104 us	0.9784 us
NativePerformanceTests	SumU	283.5 us	5.492 us	4.8687 us
SsePerformanceTests	SumU	281.2 us	1.472 us	1.3045 us
AvxPerformanceTests	AddU	276.1 us	3.018 us	2.520 us
NativePerformanceTests	AddU	330.1 us	3.585 us	3.178 us
SsePerformanceTests	AddU	325.6 us	6.883 us	7.926 us

After the change:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain

Type	Method	Mean	Error	StdDev
AvxPerformanceTests	SumU	183.5 us	3.621 us	3.023 us
NativePerformanceTests	SumU	281.6 us	5.261 us	4.921 us
SsePerformanceTests	SumU	294.1 us	2.080 us	1.946 us
AvxPerformanceTests	AddU	296.3 us	5.185 us	4.850 us
NativePerformanceTests	AddU	335.1 us	3.053 us	2.707 us
SsePerformanceTests	AddU	345.0 us	2.155 us	1.800 us

Both SSE and AVX implementations are slower by 10-20% after this change.

In my opinion, after seeing the perf results, I may not recommend merging this PR. I may wait until the alternative suggested by @tannergooding in an earlier PR review has been implemented (2nd item under "Functionality" in briancylui#2):

var remainder = count % elementsPerIteration;
float* pEnd = pdst + (count - remainder);
while (pDstCurrent < pEnd)
{ … }

Another question I have is: would pDstCurrent + 8 OR 4 ever have the possibility to result in integer overflow? According to my knowledge, pEnd is initialized as pDstCurrent + count, and there are Contract.Asserts in the wrapper class to check that count does not exceed the original array length. I'm not sure, and am open to any PR comments and advice.

cc: @danmosemsft @eerhardt @tannergooding @ahsonkhan

eerhardt · 2018-09-05T01:32:28Z

I’m not sure I follow the reasoning. These are pointer operations. So the concern is that we are less than 4 or 8 elements away from the end of the memory? Would the OS ever let us get that close?
I also agree with the point that we are checking the array/span length above these methods, so we are guaranteed to be within the bounds of memory.

briancylui · 2018-09-05T01:35:44Z

@ahsonkhan: I may share the same concern as @eerhardt - would love to learn and hear back.

ahsonkhan · 2018-09-05T01:37:36Z

Another question I have is: would pDstCurrent + 8 OR 4 ever have the possibility to result in integer overflow

~~Imo, given these are public APIs that anyone can call (with potentially invalid inputs), the inputs should be verified within the method body.~~ Being explicit about assertions like src length == dst length would be good (and also act as self-documentation).
Edit: Nevermind, the class is internal.

According to my knowledge, pEnd is initialized as pDstCurrent + count, and there are Contract.Asserts in the wrapper class to check that count does not exceed the original array length. I'm not sure, and am open to any PR comments and advice.

I am not too familiar with the code base here, but don't contract.asserts run in debug mode only? Are these checks unnecessary in release?

Both SSE and AVX implementations are slower by 10-20% after this change.

How about something like the following:

int i = 0;
for (; i < src.Length - 8; i += 8)
{
    Vector256<float> srcVector = Avx.LoadVector256(pSrcCurrent);
    Vector256<float> dstVector = Avx.LoadVector256(pDstCurrent);

    result256 = Avx.Add(result256, Avx.Multiply(srcVector, dstVector));

    pSrcCurrent += 8;
    pDstCurrent += 8;
}

if (src.Length - i <= 4)
{
    ...
    i += 4;
}
while (i < src.Length)
{
    ...
    i++;
}

The number of assembly instruction is essentially identical here (saves a lea, costs an extra add): https://www.diffchecker.com/MywBeoFH (before in red, after in green)

Just my two cents. I would leave it up to others who have more context in this space to validate.

Would the OS ever let us get that close?

No, it won't. I discussed this with @GrabYourPitchforks, and its a general coding guideline to avoid arithmetic overflows like this (on the, albeit unlikely, chance the OS behavior changes in the future).

briancylui · 2018-09-05T06:25:20Z

@ahsonkhan: Thank you for your comments! My apologies that I may have given the wrong hint that the Sse/AvxIntrinsics class is public during our previous conversation - it's actually internal after checking. Thank you very much for pointing that out!

Regarding Contracts.Assert being run in Debug only, it may be a very good point for future follow-up. If the APIs currently do not do any length checking in Release, probably we should do that in the future.

Thank you for the link to the DiffChecker - it looks amazing! Following your logic, would changing while (pCurrent + 8 OR 4 <= pEnd) into while (pCurrent <= pEnd - 8 OR 4) have a similar effect?

ahsonkhan · 2018-09-05T09:31:32Z

Following your logic, would changing while (pCurrent + 8 OR 4 <= pEnd) into while (pCurrent <= pEnd - 8 OR 4) have a similar effect?

That has a similar concern (but with underflow). If src.Length < 8, and in the unlikely chance that the array starts at a memory address close to the beginning of the address space, then (pCurrent <= pEnd - 8) would be true, even though it shouldn't. That is because comparisons are done as if they are unsigned integers.

https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/unsafe-code-pointers/pointer-comparison

The comparison operators compare the addresses of the two operands as if they are unsigned integers.

For example:

// Assume:
src.Length = 6;
pCurrent = 4;
pEnd = pCurrent + src.Length // 4 + 6 * 4 = 28;

if (pCurrent <= pEnd - 8)
{
   // 4 <= 28 - (8 * 4)
   // 4 <= 28 - 32
   // 4 <= (ulong)-4
   // We don't expect to be here, but we will since -4 as a ulong is a really large number.
}

GrabYourPitchforks · 2018-09-06T00:50:12Z

So the concern is that we are less than 4 or 8 elements away from the end of the memory? Would the OS ever let us get that close?

@ahsonkhan and I spoke about this at length yesterday.

With current operating systems, no. But I don't know what OSes we'll be running on in 10 years. Maybe we'll have a fully managed OS like Singularity, and maybe it'll give us access to the full range of addressable memory. I can't predict the future, and I don't want to chance having to chase down logic errors in this code in ten years' time if there's a theoretical overflow condition that we can identify and address now.

GrabYourPitchforks · 2018-09-06T01:05:43Z

FWIW, there is an exception that both C# and C allow when comparing pointers. The element just past the end of the array is always addressable, and it's always guaranteed to compare greater than the address of any element in the array.

That is:

T* ptr; // = pointer to first element in array
size_t numElements; // total number of elements in the array

for (size_t i = 0; i < numElements; i++) {
    assert(&ptr[i] < &ptr[numElements], "This is guaranteed by the language.");
}

A corollary to this is that T* end = ptr + numElements is guaranteed not to integer overflow (so the array can't be placed at the very, very end of addressable memory), but T* end = ptr + numElements + 1 has no such guarantee.

Note: the element just past the end of the array is addressable but not necessarily dereferenceable. That is, T element = ptr[numElements] still has undefined behavior, such as populating the register with garbage or even AVing.

eerhardt · 2018-09-06T01:15:13Z

Ah, ok. I did some more reading, looked at this code again and I understand the concern. The array may be at the very end of memory space. And when pDstCurrent is less than 8 elements from the end, then while (pDstCurrent + 8 <= pDstEnd) will go out of bounds, and potentially overflow. And if it overflows, it will wrap around making condition true.

I think it makes sense using the for (; i < src.Length - 8; i += 8) pattern that @ahsonkhan proposes above or the remainder pattern suggested by @tannergooding in an earlier PR review.

briancylui · 2018-09-06T02:39:46Z

Thanks for all the comments. After some reading, I agree that the problem is potential illegal memory access due to incrementing the pCurrent pointer out of bounds of the Span<float> object. Accessing out-of-bound object can already cause security concerns, and it would be even worse if the pointer got incremented into illegal memory. I should probably change the title of this PR to "avoid pointer overflow" or "avoid illegal memory access" to avoid any possible confusion, and look forward to implementing this fix tomorrow.

A very good takeaway from @ahsonkhan and @GrabYourPitchforks's comments is that we may not want to increment/decrement any pointer for bound checking, and I think it is also a very good opportunity to follow through on @eerhardt and @tannergooding's suggestion, which will pave the way for implementing double-computing. I will spend some time tomorrow to look into how to implement these fixes concretely.

Thank you very much for all the comments!

…sics

briancylui · 2018-09-07T00:16:05Z

Pushed a new commit and deleted the previous commit to show you all a sample of the changes I will make globally.

In the new commit, only the AddScalarU intrinsic has been changed in its SSE and AVX implementations. Once you think that the changes look fine to you, or you may some suggestions to edit them, the changes can be made on a global scale to the rest of the intrinsics.

The performance test results are shown below:

After the change:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain

Type	Method	Mean	Error	StdDev	Median
AvxPerformanceTests	AddScalarU	142.9 us	2.8338 us	7.0045 us	140.1 us
NativePerformanceTests	AddScalarU	191.2 us	2.3916 us	1.8672 us	190.9 us
SsePerformanceTests	AddScalarU	172.7 us	0.7552 us	0.5896 us	172.6 us

Before the change:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain

Type	Method	Mean	Error	StdDev	Median
AvxPerformanceTests	AddScalarU	141.8 us	1.623 us	1.439 us	141.5 us
NativePerformanceTests	AddScalarU	191.8 us	4.514 us	10.901 us	186.5 us
SsePerformanceTests	AddScalarU	180.3 us	3.653 us	7.125 us	177.1 us

The performance results are comparable after the change: SEE got faster by a bit, while AVX got slower by just a bit. The second use of Math.DivRem for countSse and remainderSse may have made the AVX implementation a bit slower.

Look forward to all your comments. Since tomorrow is the last day of my internship, and I have other work items to finish before I go, I don't think I can finish this PR by tomorrow, but I will keep my branch in my fork up so that other contributors can use your previous commits. While I will leave this PR open for the moment, please feel free to close it whenever you feel appropriate. Thank you.

briancylui · 2018-09-07T17:51:06Z

test MachineLearning-CI please

tannergooding · 2018-09-07T17:52:51Z

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

@@ -20,6 +20,10 @@ internal static class AvxIntrinsics
    {
        private static readonly Vector256<float> _absMask256 = Avx.StaticCast<int, float>(Avx.SetAllVector256(0x7FFFFFFF));

+        // The count of 32-bit floats in Vector256<T>
+        private const int AvxAlignment = 8;


This isn't alignment, but rather the number of 32-bit elements that Vector256 can hold...

Maybe Vector256SingleElementCount or something similar...

What about Vector256FloatCount? Is there any preference for SingleElement?

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

tannergooding · 2018-09-07T17:57:42Z

src/Microsoft.ML.CpuMath/SseIntrinsics.cs

                }

-                while (pDstCurrent < pDstEnd)
+                for (int i = 0; i < remainder; i++)
                {
                    Vector128<float> dstVector = Sse.LoadScalarVector128(pDstCurrent);


This would be way more readable as pDstCurrent[i] += scalar, and it should produce the same code.

The various scalar intrinsics are really meant for places where you have to interop between Vector and Scalar code, or where you need some scalar operations which aren't expressible in normal C# code.

Yes, perf test results show that this change improves the runtime significantly:

After all changes, including changing scalar intrinsics to indexed code:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4) Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=3.0.100-alpha1-20180720-2 [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT Toolchain=InProcessToolchain

Type Method Mean Error StdDev

AvxPerformanceTests AddScalarU 172.1 us 2.589 us 2.422 us

NativePerformanceTests AddScalarU 216.9 us 1.785 us 1.491 us

SsePerformanceTests AddScalarU 209.7 us 1.492 us 1.246 us

After partial changes (removing the 2nd time of Math.DivRem):

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4) Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=3.0.100-alpha1-20180720-2 [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT Toolchain=InProcessToolchain

Type Method Mean Error StdDev

AvxPerformanceTests AddScalarU 193.3 us 3.6653 us 3.4285 us

NativePerformanceTests AddScalarU 215.4 us 0.7014 us 0.6218 us

SsePerformanceTests AddScalarU 238.2 us 4.1955 us 3.5034 us

Before the change:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4) Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=3.0.100-alpha1-20180720-2 [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT Toolchain=InProcessToolchain

Type Method Mean Error StdDev

AvxPerformanceTests AddScalarU 200.3 us 3.919 us 4.513 us

NativePerformanceTests AddScalarU 249.4 us 4.400 us 4.116 us

SsePerformanceTests AddScalarU 235.3 us 1.393 us 1.235 us

…calar operations

briancylui · 2018-09-07T19:30:32Z

Pushed a new commit responding to @tannergooding's latest comments. Since the following suggested changes show a significant +10% perf improvement, I will propagate these changes globally to get end-to-end perf results as soon as possible today:

After all changes, including changing scalar intrinsics to indexed code:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain

Type	Method	Mean	Error	StdDev
AvxPerformanceTests	AddScalarU	172.1 us	2.589 us	2.422 us
NativePerformanceTests	AddScalarU	216.9 us	1.785 us	1.491 us
SsePerformanceTests	AddScalarU	209.7 us	1.492 us	1.246 us

After partial changes (removing the 2nd time of Math.DivRem):

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain

Type	Method	Mean	Error	StdDev
AvxPerformanceTests	AddScalarU	193.3 us	3.6653 us	3.4285 us
NativePerformanceTests	AddScalarU	215.4 us	0.7014 us	0.6218 us
SsePerformanceTests	AddScalarU	238.2 us	4.1955 us	3.5034 us

Before the change:

BenchmarkDotNet=v0.11.1, OS=Windows 10.0.17134.228 (1803/April2018Update/Redstone4)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-alpha1-20180720-2
  [Host] : .NET Core 3.0.0-preview1-26710-03 (CoreCLR 4.6.26710.05, CoreFX 4.6.26708.04), 64bit RyuJIT

Toolchain=InProcessToolchain

Type	Method	Mean	Error	StdDev
AvxPerformanceTests	AddScalarU	200.3 us	3.919 us	4.513 us
NativePerformanceTests	AddScalarU	249.4 us	4.400 us	4.116 us
SsePerformanceTests	AddScalarU	235.3 us	1.393 us	1.235 us

ahsonkhan · 2018-09-07T23:23:31Z

we may not want to increment/decrement any pointer for bound checking

Right.

Since the following suggested changes show a significant +10% perf improvement

Nice! The implementation can be reasoned about more easily now as well.
+ We removed the risks related to overflow :)

…r all AVX intrinsics, except MatMul's and those involving AbsMask

ahsonkhan · 2018-09-08T00:01:17Z

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs

                }

-                for (int i = 0; i < remainder - 4; i++)
+                for (int i = 0; i < remainder % 4; i++)


I wouldn't use the relatively expensive modulo operator here.

Maybe, do remainder -= 4 in the above if (remainder >= 4) block and then your for loop can just be:

for (int i = 0; i < remainder; i++) { ... }

briancylui · 2018-09-08T00:41:12Z

The latest commit contains the global replacement of scalar operations by indexed code for almost all AVX intrinsics (except for MatMul's and those methods that involve AbsMask). Changes to SSE intrinsics have not been done yet. The latest perf results are shown at the bottom of briancylui#1. Please note that now it is ready to do end-to-end perf testing using the KMeansAndLogisticRegression benchmark in test\Microsoft.ML.Benchmarks, but it is only that I didn't have enough time to test it by the end of my internship.

Thank you everyone for your feedback! I will keep my branches in my fork up there, and feel free to continue developing using my commits. Have a nice day!

markusweimer · 2018-09-21T17:26:21Z

This PR does not reference an issue. Can you please file one and reference it in the PR description?

briancylui · 2018-09-21T17:30:54Z

@ahsonkhan raised the issue and suggested the change. Anyone familiar with the issue is welcome to carry on this project after my internship if time allows.

briancylui · 2018-09-21T17:54:55Z

Created an issue #980, which this PR now references. The issue may need some polishing, but it solves the problem for the time being. Hope it helps other people interested take on this project!

danmoseley · 2018-09-24T23:49:55Z

Seems this is almost done. @tannergooding will finish it up in a little bit.

Zruty0 · 2018-10-05T16:29:48Z

@tannergooding , are you working on this?

tannergooding · 2018-10-05T16:50:21Z

It's on my backlog, but lower priority than a few other items right now.

shauheen · 2018-10-19T18:51:51Z

@tannergooding should we close the PR and open it when you think you can update the branch?

tannergooding · 2018-10-19T19:11:50Z

That is fine with me.

briancylui changed the title ~~Change bound checking in SSE/AVX intrinsics to avoid integer overflow~~ Change bound checking in SSE/AVX intrinsics to avoid pointer overflow Sep 6, 2018

Removed out-of-bound pointer access for AddScalarU SSE and AVX intrin…

2e0033e

…sics

briancylui force-pushed the SecureBoundChecks branch from da7b5a7 to 2e0033e Compare September 7, 2018 00:08

tannergooding reviewed Sep 7, 2018

View reviewed changes

src/Microsoft.ML.CpuMath/AvxIntrinsics.cs Outdated Show resolved Hide resolved

tannergooding reviewed Sep 7, 2018

View reviewed changes

Respond to PR feedback: remove 2nd Math.DivRem and use indexing for s…

a9558f4

…calar operations

Respond to PR feedback: Replaced scalar operations by indexed code fo…

f26f112

…r all AVX intrinsics, except MatMul's and those involving AbsMask

ahsonkhan reviewed Sep 8, 2018

View reviewed changes

briancylui mentioned this pull request Sep 21, 2018

Avoids pointer overflow in bound checking of SSE/AVX intrinsics #980

Open

danmoseley assigned tannergooding Sep 24, 2018

tannergooding closed this Oct 19, 2018

eerhardt mentioned this pull request Mar 26, 2019

CpuMath Enhancement: Make bound checking of loops in hardware intrinsics more efficient #2939

Closed

ghost locked as resolved and limited conversation to collaborators Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change bound checking in SSE/AVX intrinsics to avoid pointer overflow #821

Change bound checking in SSE/AVX intrinsics to avoid pointer overflow #821

briancylui commented Sep 5, 2018 •

edited

Loading

eerhardt commented Sep 5, 2018 •

edited

Loading

briancylui commented Sep 5, 2018

ahsonkhan commented Sep 5, 2018 •

edited

Loading

briancylui commented Sep 5, 2018

ahsonkhan commented Sep 5, 2018 •

edited

Loading

GrabYourPitchforks commented Sep 6, 2018

GrabYourPitchforks commented Sep 6, 2018 •

edited

Loading

eerhardt commented Sep 6, 2018 •

edited

Loading

briancylui commented Sep 6, 2018

briancylui commented Sep 7, 2018

briancylui commented Sep 7, 2018

tannergooding Sep 7, 2018

briancylui Sep 7, 2018

tannergooding Sep 7, 2018

briancylui Sep 7, 2018

briancylui commented Sep 7, 2018 •

edited

Loading

ahsonkhan commented Sep 7, 2018

ahsonkhan Sep 8, 2018 •

edited

Loading

briancylui commented Sep 8, 2018

markusweimer commented Sep 21, 2018

briancylui commented Sep 21, 2018

briancylui commented Sep 21, 2018

danmoseley commented Sep 24, 2018

Zruty0 commented Oct 5, 2018

tannergooding commented Oct 5, 2018

shauheen commented Oct 19, 2018

tannergooding commented Oct 19, 2018

Change bound checking in SSE/AVX intrinsics to avoid pointer overflow #821

Change bound checking in SSE/AVX intrinsics to avoid pointer overflow #821

Conversation

briancylui commented Sep 5, 2018 • edited Loading

Aims to solve #980

Before the change:

After the change:

eerhardt commented Sep 5, 2018 • edited Loading

briancylui commented Sep 5, 2018

ahsonkhan commented Sep 5, 2018 • edited Loading

briancylui commented Sep 5, 2018

ahsonkhan commented Sep 5, 2018 • edited Loading

GrabYourPitchforks commented Sep 6, 2018

GrabYourPitchforks commented Sep 6, 2018 • edited Loading

eerhardt commented Sep 6, 2018 • edited Loading

briancylui commented Sep 6, 2018

briancylui commented Sep 7, 2018

After the change:

Before the change:

briancylui commented Sep 7, 2018

tannergooding Sep 7, 2018

Choose a reason for hiding this comment

briancylui Sep 7, 2018

Choose a reason for hiding this comment

tannergooding Sep 7, 2018

Choose a reason for hiding this comment

briancylui Sep 7, 2018

Choose a reason for hiding this comment

After all changes, including changing scalar intrinsics to indexed code:

After partial changes (removing the 2nd time of Math.DivRem):

Before the change:

briancylui commented Sep 7, 2018 • edited Loading

After all changes, including changing scalar intrinsics to indexed code:

After partial changes (removing the 2nd time of Math.DivRem):

Before the change:

ahsonkhan commented Sep 7, 2018

ahsonkhan Sep 8, 2018 • edited Loading

Choose a reason for hiding this comment

briancylui commented Sep 8, 2018

markusweimer commented Sep 21, 2018

briancylui commented Sep 21, 2018

briancylui commented Sep 21, 2018

danmoseley commented Sep 24, 2018

Zruty0 commented Oct 5, 2018

tannergooding commented Oct 5, 2018

shauheen commented Oct 19, 2018

tannergooding commented Oct 19, 2018

briancylui commented Sep 5, 2018 •

edited

Loading

eerhardt commented Sep 5, 2018 •

edited

Loading

ahsonkhan commented Sep 5, 2018 •

edited

Loading

ahsonkhan commented Sep 5, 2018 •

edited

Loading

GrabYourPitchforks commented Sep 6, 2018 •

edited

Loading

eerhardt commented Sep 6, 2018 •

edited

Loading

briancylui commented Sep 7, 2018 •

edited

Loading

ahsonkhan Sep 8, 2018 •

edited

Loading