m4rs-mt · m4rs-mt · Apr 1, 2022 · Mar 24, 2022 · Mar 24, 2022
diff --git a/Docs/Debugging-and-Profiling.md b/Docs/Debugging-and-Profiling.md
@@ -3,15 +3,15 @@ Debugging with the software emulation layer is very convenient due to the very g
 Currently, detailed kernel debugging is only possible with the CPU accelerator.
 However, we are currently extending the debugging capabilities to also emulate different GPUs in order to test your algorithms with "virtual GPU devices" without needing to have direct access to the actual GPU devices (more information about this feature can be found [here](https://github.com/m4rs-mt/ILGPU/pull/402).
 
-Assertions on GPU hardware devices can be enabled using the `ContextFlags.EnableAssertions` flag (disabled by default when a `Debugger` is not attached to the application).
+Assertions on GPU hardware devices can be enabled using the `Assertions()` method of `Context.Builder` (disabled by default when a `Debugger` is not attached to the application).
 Note that enabling assertions using this flag will cause them to be enabled in `Release` builds as well.
 Be sure to disable this flag if you want to get the best runtime performance.
 
-Source-line based debugging information can be turned on via the flag `ContextFlags.EnableDebugInformation` (disabled by default).
+Source-line based debugging information can be turned on via the `DebugSymbols()` method of `Context.Builder` (disabled by default).
 Note that only the new portable PBD format is supported.
 Enabling debug information is essential to identify problems and catch break points on GPU hardware.
 It is also very useful for kernel profiling as you can link the profiling insights to your source lines.
-You may want to disable inlining via `ContextFlags.NoInlining` to significantly increase the accuracy of your debugging information at the expense cost of runtime performance.
+You may want to disable inlining via `Inlining()` to significantly increase the accuracy of your debugging information at the expense cost of runtime performance.
 
 *Note that the inspection of variables, registers, and global memory on GPU hardware is currently not supported.*
 

diff --git a/Docs/Dynamically-Specialized-Kernels.md b/Docs/Dynamically-Specialized-Kernels.md
@@ -29,8 +29,8 @@ class ...
 
     static void ...(...)
     {
-        using var context = new Context();
-        using var accl = new CudaAccelerator(context);
+        using var context = Context.CreateDefault();
+        using var accl = context.CreateCudaAccelerator(0);
 
         var genericKernel = accl.LoadStreamKernel<ArrayView<int>, int>(GenericKernel);
         ...

diff --git a/Docs/Inside-ILGPU.md b/Docs/Inside-ILGPU.md
@@ -3,11 +3,8 @@
 ILGPU features a modern parallel processing, transformation and compilation model.
 It allows parallel code generation and transformation phases to reduce compile time and improve overall performance.
 
-However, parallel code generation in the frontend module is disabled by default.
-It can be enabled via the enumeration flag `ContextFlags.EnableParallelCodeGenerationInFrontend`.
-
 The global optimization process can be controlled with the enumeration `OptimizationLevel`.
-This level can be specified by passing the desired level to the `ILGPU.Context` constructor.
+This level can be specified by passing the desired level to the `Optimize` method of `Context.Builder`.
 If the optimization level is not explicitly specified, the level is automatically set to `OptimizationLevel.O1`.
 
 The `OptimizationLevel.O2` level uses additional transformations that increase compile time but yield potentially better GPU code.
@@ -35,32 +32,6 @@ It can be used to manually compile kernels for a specific platform.
 Note that **you do not have to create custom backend instances** on your own when using the ILGPU runtime.
 Accelerators already carry associated and configured backends that are used for high-level kernel loading.
 
-```c#
-class ...
-{
-    static void Main(string[] args)
-    {
-        using (var context = new Context())
-        {
-            // Creats a user-defined MSIL backend for .Net code generation
-            using (var cpuBackend = new DefaultILBackend(context))
-            {
-                // Use custom backend
-            }
-
-            // Creates a user-defined backend for NVIDIA GPUs using compute capability 5.0
-            using (var ptxBackend = new PTXBackend(
-                context,
-                PTXArchitecture.SM_50,
-                TargetPlatform.X64))
-            {
-                // Use custom backend
-            }
-        }
-    }
-}
-```
-
 ## IRContext
 
 An `IRContext` manages and caches intermediate-representation (IR) code, which can be reused during the compilation process.
@@ -70,19 +41,6 @@ An `IRContext` is not tied to a specific `Backend` instance and can be reused ac
 Note that the main ILGPU `Context` already has an associated `IRContext` that is used for all high-level kernel-loading functions.
 Consequently, users are not required to manage their own contexts in general.
 
-```c#
-class ...
-{
-    static void Main(string[] args)
-    {
-        var context = new Context();
-
-        var irContext = new IRContext(context);
-        // ...
-    }
-}
-```
-
 ## Compiling Kernels
 
 Kernels can be compiled manually by requesting a code-generation operation from the backend yielding a `CompiledKernel` object.
@@ -93,30 +51,6 @@ Alternatively, you can cast a `CompiledKernel` object to its appropriate backend
 
 We recommend that you use the [high-level kernel-loading concepts of ILGPU](ILGPU-Kernels) instead of the low-level interface.
 
-```c#
-class ...
-{
-    public static void MyKernel(Index index, ...)
-    {
-        // ...
-    }
-
-    static void Main(string[] args)
-    {
-        using var context = new Context();
-        using var b = new PTXBackend(context, ...);
-        // Compile kernel using no specific KernelSpecialization settings
-        var compiledKernel = b.Compile(
-            typeof(...).GetMethod(nameof(MyKernel), BindingFlags.Public | BindingFlags.Static),
-            default);
-
-        // Cast kernel to backend-specific PTXCompiledKernel to access the PTX assembly
-        var ptxKernel = compiledKernel as PTXCompiledKernel;
-        System.IO.File.WriteAllBytes("MyKernel.ptx", ptxKernel.PTXAssembly);
-    }
-}
-```
-
 ## Loading Compiled Kernels
 
 Compiled kernels have to be loaded by an accelerator first before they can be executed.
@@ -131,35 +65,6 @@ An accelerator object offers different functions to load and configure kernels:
 * `LoadKernel`
    Loads explicitly and implicitly grouped kernels. However, implicitly grouped kernels will be launched with a group size that is equal to the warp size
 
-```c#
-class ...
-{
-    static void Main(string[] args)
-    {
-        ...
-        var compiledKernel = backend.Compile(...);
-
-        // Load implicitly grouped kernel with an automatically determined group size
-        var k1 = accelerator.LoadAutoGroupedKernel(compiledKernel);
-
-        // Load implicitly grouped kernel with custom group size
-        var k2 = accelerator.LoadImplicitlyGroupedKernel(compiledKernel);
-
-        // Load any kernel (explicitly and implicitly grouped kernels).
-        // However, implicitly grouped kernels will be dispatched with a group size
-        // that is equal to the warp size of its associated accelerator
-        var k3 = accelerator.LoadKernel(compiledKernel);
-
-        ...
-
-        k1.Dispose();
-        k2.Dispose();
-        // Leave K3 to the GC
-        // ...
-    }
-}
-```
-
 ## Direct Kernel Launching
 
 A loaded kernel can be dispatched using the `Launch` method.
@@ -169,7 +74,7 @@ For performance reasons, we strongly recommend the use of typed kernel launchers
 ```c#
 class ...
 {
-    static void MyKernel(Index index, ArrayView<int> data, int c)
+    static void MyKernel(Index1D index, ArrayView<int> data, int c)
     {
         data[index] = index + c;
     }
@@ -210,7 +115,7 @@ These loading methods work similarly to the these versions, e.g. `LoadAutoGroupe
 ```c#
 class ...
 {
-    static void MyKernel(Index index, ArrayView<int> data, int c)
+    static void MyKernel(Index1D index, ArrayView<int> data, int c)
     {
         data[index] = index + c;
     }
@@ -225,7 +130,7 @@ class ...
         using (var k = accelerator.LoadAutoGroupedKernel(compiledKernel))
         {
             var launcherWithCustomAcceleratorStream =
-                k.CreateLauncherDelegate<AcceleratorStream, Index, ArrayView<int>>();
+                k.CreateLauncherDelegate<AcceleratorStream, Index1D, ArrayView<int>>();
             launcherWithCustomAcceleratorStream(someStream, buffer.Extent, buffer.View, 1);
 
             ...

diff --git a/Docs/Kernels.md b/Docs/Kernels.md
@@ -28,7 +28,7 @@ Use explicitly grouped kernels for full control over GPU-kernel dispatching.
 class ...
 {
     static void ImplicitlyGrouped_Kernel(
-        [Index|Index2|Index3] index,
+        [Index1D|Index2D|Index3D] index,
         [Kernel Parameters]...)
     {
         // Kernel code
@@ -93,24 +93,24 @@ In contrast to older versions of ILGPU, all kernels loaded with these functions
 ```c#
 class ...
 {
-    static void MyKernel(Index index, ArrayView<int> data, int c)
+    static void MyKernel(Index1D index, ArrayView<int> data, int c)
     {
         data[index] = index + c;
     }
 
     static void Main(string[] args)
     {
         ...
-        var buffer = accelerator.Allocate<int>(1024);
+        var buffer = accelerator.Allocate1D<int>(1024);
 
          // Load a sample kernel MyKernel using one of the available overloads
         var kernelWithDefaultStream = accelerator.LoadAutoGroupedStreamKernel<
-                     Index, ArrayView<int>, int>(MyKernel);
+                     Index1D, ArrayView<int>, int>(MyKernel);
         kernelWithDefaultStream(buffer.Extent, buffer.View, 1);
 
          // Load a sample kernel MyKernel using one of the available overloads
         var kernelWithStream = accelerator.LoadAutoGroupedKernel<
-                     Index, ArrayView<int>, int>(MyKernel);
+                     Index1D, ArrayView<int>, int>(MyKernel);
         kernelWithStream(someStream, buffer.Extent, buffer.View, 1);
 
         ...
@@ -126,7 +126,7 @@ However, if you require custom control over the low-level kernel-compilation pro
 
 Starting with version [v0.10.0](https://github.com/m4rs-mt/ILGPU/releases/tag/v0.10.0), ILGPU offers the ability to immediately compile and launch kernels via the accelerator methods (similar to those provided by other frameworks).
 ILGPU exposes direct `Launch` and `LaunchAutoGrouped` methods via the `Accelerator` class using a new strong-reference based kernel cache.
-This cache is used for the new launch methods only and can be disabled via the flag `ContextFlags.DisableKernelLaunchCaching`.
+This cache is used for the new launch methods only and can be disabled via the `Caching(CachingMode.NoKernelCaching)` method of `ContextBuilder`.
 
 ```c#
 class ...
@@ -136,7 +136,7 @@ class ...
 
     }
 
-    static void MyImplicitKernel(Index1 index, ...)
+    static void MyImplicitKernel(Index1D index, ...)
     {
 
     }
@@ -152,10 +152,10 @@ class ...
         accl.Launch(stream, MyKernel, < MyKernelConfig >, ...);
 
         // Launch implicitly grouped MyKernel using the default stream
-        accl.LaunchAutoGrouped(MyImplicitKernel, new Index1(...), ...);
+        accl.LaunchAutoGrouped(MyImplicitKernel, new Index1D(...), ...);
 
         // Launch implicitly grouped MyKernel using the given stream
-        accl.LaunchAutoGrouped(stream, MyImplicitKernel, new Index1(...), ...);
+        accl.LaunchAutoGrouped(stream, MyImplicitKernel, new Index1D(...), ...);
     }
 }
 ```
@@ -173,9 +173,9 @@ var ptxKernel = launcher.GetCompiledKernel() as PTXCompiledKernel;
 System.IO.File.WriteAllText("Kernel.ptx", ptxKernel.PTXAssembly);
 ```
 
-You can specify the context flag `ContextFlags.EnableKernelStatistics` to query additional information about compiled kernels.
+You can use the `DebugSymbols()` method of `Context.Builder` to enable additional information about compiled kernels.
 This includes local functions and consumed local and shared memory.
-After enabling the flag, you can get the information from a compiled kernel launcher delegate instance via:
+After enabling, you can get the information from a compiled kernel launcher delegate instance via:
 ```c#
 // Get kernel information from a kernel launcher instance
 var information = launcher.GetKernelInfo();

diff --git a/Docs/Math-Functions.md b/Docs/Math-Functions.md
@@ -7,13 +7,13 @@ The algorithms library offers the `XMath` class that has support for all common
 Using the 32-bit overloads ensure that the operations are performed on 32-bit floats on the GPU hardware.
 
 ### Fast Math
-Fast-math can be enabled using the `ContextFlags.FastMath` flag and enables the use of fast (and unprecise) math functions.
+Fast-math can be enabled using the `Math(MathMode.Fast)` method of `Context.Builder` and enables the use of fast (and unprecise) math functions.
 Unlike previous versions, the fast-math mode applies to all math instructions. Even to default math operations like `x / y`.
 
 ### Forced 32-bit Math
 Your kernels might rely on third-party functions that are not under your control.
 These functions typically depend on the default .Net `Math` class, and thus, work on 64-bit floating-point operations.
-You can force the use of 32-bit floating-point operations in all cases using the `ContextFlags.Force32BitMath` flag.
+You can force the use of 32-bit floating-point operations in all cases using the `Math(MathMode.Fast32BitOnly)` method of `Context.Builder`.
 Caution: all doubles will be considered as floats to circumvent issues with third-party code.
 However, this also affects the address computations of array-view elements.
 Avoid the use of this flag unless you know exactly what you are doing.
diff --git a/Docs/Memory-Buffers-and-Views.md b/Docs/Memory-Buffers-and-Views.md
@@ -10,18 +10,18 @@ Should be refers to the fact that all memory buffers will be automatically relea
 ```c#
 class ...
 {
-    public static void MyKernel(Index index, ...)
+    public static void MyKernel(Index1D index, ...)
     {
         // ...
     }
 
     static void Main(string[] args)
     {
-        using var context = new Context();
+        using var context = Context.CreateDefault();
         using var accelerator = ...;
 
         // Allocate a memory buffer on the current accelerator device.
-        using (var buffer = accelerator.Allocat<int>(1024))
+        using (var buffer = accelerator.Allocate1D<int>(1024))
         {
             ...
         } // Dispose the buffer after performing all operations
@@ -45,7 +45,7 @@ You can even enable bounds checks in `Release` builds by specifying the context
 ```c#
 class ...
 {
-    static void MyKernel(Index index, ArrayView<int> view1, ArrayView<float> view2)
+    static void MyKernel(Index1D index, ArrayView<int> view1, ArrayView<float> view2)
     {
         ConvertToFloatSample(
             view1.GetSubView(0, view1.Length / 2),
@@ -61,10 +61,10 @@ class ...
     static void Main(string[] args)
     {
         ...
-        using (var buffer = accelerator.Allocat&lt...&gt(...))
+        using (var buffer = accelerator.Allocate1D&lt...&gt(...))
         {
             var mainView = buffer.View;
-            var subView = mainView.GetSubView(0, 1024);
+            var subView = mainView.SubView(0, 1024);
         }
     }
 }
@@ -86,7 +86,7 @@ mad.lo.u64    %rd4, %rd3, 4, %rd1;
 ```
 
 When accessing views using 32-bit indices, the resulting index operation will be performed on 32-bit offsets for performance reasons.
-As a result, this operation can overflow when using a 2D 32-bit based `Index2`, for instance.
+As a result, this operation can overflow when using a 2D 32-bit based `Index2D`, for instance.
 If you already know, that your offsets will not fit into a 32-bit integer, you have to use 64-bit offsets in your kernel.
 
 If you rely on 64-bit offsets, the emitted indexing operating will be slightly more expensive in terms of register usage and computational overhead (at least conceptually). The actual runtime difference depends on your kernel program.
@@ -104,18 +104,18 @@ class ...
         public VariableView<int> Variable;
     }
 
-    static void MyKernel(Index index, DataView view)
+    static void MyKernel(Index1D index, DataView view)
     {
         // ...
     }
 
     static void Main(string[] args)
     {
         // ...
-        using (var buffer = accelerator.Allocat<...>(...))
+        using (var buffer = accelerator.Allocate1D<...>(...))
         {
             var mainView = buffer.View;
-            var firstElementView = mainView.GetVariableView(0);
+            var firstElementView = mainView.VariableView(0);
         }
     }
 }

diff --git a/Docs/Primer_00.md b/Docs/Primer_00.md
@@ -9,7 +9,7 @@ CUDA / OpenCL with the ease of use of C#.
 
 This tutorial is a little different now because we are going to be looking at the ILGPU 1.0.0.
 
-ILGPU should work on any 64bit platform that .Net supports. I have even used it on the inexpensive nvidia jetson nano with pretty decent cuda performance. 
+ILGPU should work on any 64-bit platform that .Net supports. I have even used it on the inexpensive nvidia jetson nano with pretty decent cuda performance. 
 
 Technically ILGPU supports F# but I don't use F# enough to really tutorialize it. I will be sticking to C# in these tutorials.
 
@@ -21,7 +21,7 @@ If enough people care I can record a short video of this process, but I expect t
 2. Create a new C# project.
 ![dotnet new console](Images/newProject.png?raw=true)
 3. Add the ILGPU package
-![dotnet add ILGPU](Images/beta.png?raw=true)
+![dotnet add package ILGPU](Images/beta.png?raw=true)
 4. ??????
 5. Profit