Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to call trigonometric functions? #85

Closed
AlexBatsev opened this issue Apr 29, 2020 · 14 comments
Closed

How to call trigonometric functions? #85

AlexBatsev opened this issue Apr 29, 2020 · 14 comments
Assignees

Comments

@AlexBatsev
Copy link

AlexBatsev commented Apr 29, 2020

Looks like it’s not possible to use basic trig. functions without need to enable Algorithms. Call to eg. Math.Sin() is translated to SinF which is implemented only in Algorithms. That is really strange that these functions are missing from IntrinsicMath where they would be sought in the first place. After examining OpenCL source generated by ILGPU+Algorithms, I’ve noticed that calls to XMath.Sin/Cos are translated to plain sin()/cos() calls, which is correct, that is why the logic of placing them among complex Algorithms escapes me.
My unwillingness to use Algorithms is that it is not updated to the latest 0.8.0 ILGPU, which contains dynamic SharedMemory allocation and RuntimeKernelConfig I really need. How can I get around this problem as I do not really need Algorithms, but only basic trigonometry, dynamic SharedMemory and RuntimeKernelConfig?

@MoFtZ
Copy link
Collaborator

MoFtZ commented Apr 30, 2020

@AlexBatsev, for your purpose, I think you can just use the standard System.Math class provided by the .NET framework - ILGPU will translate System.Math calls to the hardware intrinsic, if it exists.

However, not every function in System.Math is supported as a hardware intrinsic. For example, Cuda provides hardware intrinsic for Sin(float), but not for Sin(double). If you tried to use System.Math.Sin(double) on Cuda accelerators, without ILGPU.Algorithms, you will get a runtime failure.

@AlexBatsev
Copy link
Author

AlexBatsev commented Apr 30, 2020

@MoFtZ, thank you very much for an answer. I'm using only floats in my kernels, no doubles at all, and need just natively supported versions of sin, cos. That is why your suggestion to use System.Math was the first thing I've tried. This doesn't work though! Even if I try to run the simplest possible kernel like

    static void MyKernel(Index1 index, ArrayView<float> dataView, int constant) 
        => dataView[index] = (float)Math.Sin(index + constant);

I get the following error on Cuda which forces me to enable Algorithms
NotSupportedIntrinsicException: The function 'SinF' does not have an intrinsic implementation for this backend. 'EnableAlgorithms' from the Algorithms library not invoked?
If I were able to use trigonometric functions from Math I wouldn't have created this issue.

@MoFtZ
Copy link
Collaborator

MoFtZ commented Apr 30, 2020

@AlexBatsev, just to confirm, are you using Cuda or OpenCL? In your original post, you mentioned looking into the OpenCL source code. However, in your follow-up post, you are trying to use Cuda.

If Cuda, the reason System.Math is failing is because it only provides Sin(double) which cannot be mapped to a hardware intrinsic.

If you are using .NET Core, you should instead use System.MathF which provides Sin(float). ILGPU will then correctly translate this to the hardware intrinsic.

Alternatively, if you are only going to use single-precision floats, you could try setting the Context to use 32-bit floats only:

using (var context = new Context(ContextFlags.Force32BitFloats))
{ /* do stuff */ }

I'm not sure what other side-effects this will have, so use at your own risk. For example, even simple addition and division of double-precision floats will probably be treated as single-precision floats.

@AlexBatsev
Copy link
Author

AlexBatsev commented Apr 30, 2020

@MoFtZ, thanks again.

  1. Using ContextFlags.Force32BitFloats indeed fixes the initial problem of The function 'SinF' does not have an intrinsic.... I don't have doubles in kernels whatsoever and hope that this flag will not introduce any additional problems, like e.g. wrong custom structs data alignment in buffers. Usint System.MathF is not an option as we are using .Net Framework.
  2. I want my kernels to be runnable on all GPUs, not NVidia's only, that is why correct OpenCL code is important. The reason I've initially referred to OpenCL is that OpenCL source code looks like C which I’m able to understand whereas Cuda’s low-level assembly instructions generated by ILGPU I’m not familiar with (strange though, I was sure that Cuda code is very similar to OpenCL).
  3. Additional tests show that ContextFlags.Force32BitFloats did introduce new problems.
    Having the same kernel

static void MyKernel(Index index, ArrayView<float> dataView) => dataView[index] = (float)Math.Sin(index);

let’s compare 2 versions.

  • a) with Algorithms: var context = new Context(); context.EnableAlgorithms();. In this case OpenCL version of the kernel declaration looks like

kernel void ILGPUKernel(
global float* view_0,
struct internal_type_10 var1,
struct kernel_type_0 var2)
{ ……..

and results produced by all 4 accelerators (CPU, CUDA, OpenCL on GPU, OpenCL on CPU) are exactly the same within accuracy.

  • b) without Algorithms. var context = new Context(ContextFlags.Force32BitFloats);. Kernel translates to something like

kernel void ILGPUKernel(
global double* view_0,
int var1,
struct type_96_kernel var2)
{ ……..

One may notice that the buffer is declared as double* instead of float* as in case a). This is surprising to see, unless one assumes that double type is redefined to become a float when flag Force32BitFloats is used (if such a redefinition is possible at all). Results calculated by the kernel, though, prove that this redefinition does not happen, as for OpenCL on GPU they are just wrong, and OpenCL on CPU crashes.
So, I’m back to my initial situation when I’m forced to use Algorithms (and XMath just in case) and stick to ILGPU 0.7 with some heavy and ugly workarounds to have dynamic SharedMemory.

@MoFtZ
Copy link
Collaborator

MoFtZ commented Apr 30, 2020

Thanks for the clarification @AlexBatsev, that clears things up a lot.

The latest code on the ILGPU.Algorithms repository looks like it is already compatible with ILGPU v0.8.0-beta2. I would suggest that you download the repository and create your own nuget package e.g. ILGPU.Algorithms v0.8.0-alex1. You can then use this nuget package in your code - I have successfully done this many times.

This will allow you to use EnableAlgorithms and XMath in your code, without Force32BitFloats. Once the official ILGPU.Algorithms nuget package is updated, you can replace your temporary packages with the official ones.

@MoFtZ
Copy link
Collaborator

MoFtZ commented Apr 30, 2020

Also, with regards to seeing double* in the OpenCL code generator, this looks like a bug.

@m4rs-mt
Copy link
Owner

m4rs-mt commented Apr 30, 2020

@AlexBatsev Thanks for posting this question. I always recommend using the XMath class which provides overloads for all operations using float instead of double values. Using the Force32BitFloats is considered as a "pro-feature" which should be used with extreme caution. However, I am going to investigate this problem and provide a fix in one of the next commits. I am currently restructuring some parts of the type generation anyway.

@m4rs-mt m4rs-mt self-assigned this Apr 30, 2020
@MoFtZ
Copy link
Collaborator

MoFtZ commented Apr 30, 2020

Just in time @m4rs-mt - I was just looking into a fix for that double* issue, but now you can do it instead 😄

@m4rs-mt
Copy link
Owner

m4rs-mt commented Apr 30, 2020

I’m able to understand whereas Cuda’s low-level assembly instructions generated by ILGPU I’m not familiar with (strange though, I was sure that Cuda code is very similar to OpenCL)

Cuda code itself can be considered as an "extension to the C/C++" language. However, the Cuda compiler compiles Cuda code into PTX assembly. ILGPU provides a specially tuned PTX backend to emit the required assembly instructions directly without requiring the Cuda compiler to compile source code. Please note further that ILGPU has been designed to emit low-level assembly instructions instead of "source" code in the first place (like PTX or SPIR-V or even LLVM-IR). That's why the OpenCL source code typically looks a bit strange 😄

@m4rs-mt
Copy link
Owner

m4rs-mt commented Apr 30, 2020

Just in time @m4rs-mt - I was just looking into a fix for that double* issue, but now you can do it instead 😄

I see 🔢 Well you can also dive into this issue if you like 😄 If this problem is related to the TypeContext or the IRTypeContext, it is better to leave this problem to me. But: you can never know in advance 😃

@MoFtZ
Copy link
Collaborator

MoFtZ commented Apr 30, 2020

I found one issue here - the loop sets Float32 to use the word "double".

It looked like there were one or two other spots that might need investigation too.

@AlexBatsev
Copy link
Author

@MoFtZ, @m4rs-mt thanks a lot for your feedbacks. I'll try an approach with temp package for Algorithms.
By the way, it would've been much easier for users of ILGPU if they were forced to use ILGPU-only versions of all math functions, trigonometric inclusive, with System.Math completely prohibited. When I've just started using ILGPU, and asked myself "what should I substitute calls to Math.Sin to inside kernels?" my first move was to open ILGPU documentation and search for a System.Math analogue. Imagine my surprise when I didn't find trigonometry inside IntrinsicMath!
For user it would be advandageous to be forced to play according to GPU rules, understanding clearly what their code is being compiled to. It might also simplify your life as ILGPU developer, if you just throw an exception on some "unknown" function like Math.Sin referring a user to IntrinsicMath.

@m4rs-mt
Copy link
Owner

m4rs-mt commented May 2, 2020

@AlexBatsev Makes sense to me. We should extend the documentation while pointing users to leverage more "specifically" designed classes.

@m4rs-mt
Copy link
Owner

m4rs-mt commented Jun 16, 2020

I will close this issue for now, as the problems mentioned have either been resolved or are currently being processed in different issues.

@m4rs-mt m4rs-mt closed this as completed Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants