Exposed CapabilityContext constructors #889

m4rs-mt · 2022-11-08T14:28:11Z

This PR exposes the CapabilityContext constructors of the Cuda and OpenCL worlds to allow Backend instance creation without an Accelerator instance. This allows us to precompile kernels offline without having an actual Accelerator at hand.

The following code snippet demonstrates offline compilation of a PTX (Cuda) kernel:

private static void TestKernel(Index1D index, ArrayView<int> input, ArrayView<int> output)
{
    output[index] = input[index];
}

public static void Main()
{
    using var context = Context.Create(builder => builder
        .CPU(new CPUDevice(2, 1, 1)) // Use a very simplistic CPU accelerator instance
        // .Assertions() // Uncomment to use assertions
        // .Debug() // Uncomment to enable debug symbols
        .Optimize(OptimizationLevel.O2));

    using var backend = new PTXBackend(
        context,
        CudaArchitecture.SM_70,
        CudaInstructionSet.ISA_70,
        null);

    var entryPoint = EntryPointDescription.FromExplicitlyGroupedKernel(
        typeof(Program).GetMethod(nameof(TestKernel),
        BindingFlags.NonPublic | BindingFlags.Static));

    var compiledKernel = backend.Compile(entryPoint, default) as PTXCompiledKernel;
    File.WriteAllText("Output.ptx", compiledKernel!.PTXAssembly);
}

The output of this program should like (somewhat) like:

//
// Generated by ILGPU v1.3.0
//

.version 7.0
.target sm_70
.address_size 64

.visible .entry Kernel_TestKernel(
	.param .b32 _index_1942,
	.param .align 8 .b8 _input_1946[32],
	.param .align 8 .b8 _output_1960[32]
)
{
	.reg .b16	%rs<3>;
	.reg .b32	%r<3>;
	.reg .b64	%rd<10>;

	ld.param.b32	%r1, [_index_1942];
	ld.param.b64	%rd7, [_input_1946];
	cvta.to.global.u64	%rd1, %rd7;
	ld.param.b64	%rd2, [_input_1946+8];
	ld.param.b64	%rd3, [_input_1946+16];
	ld.param.b8	%rs1, [_input_1946+24];
	ld.param.b64	%rd7, [_output_1960];
	cvta.to.global.u64	%rd4, %rd7;
	ld.param.b64	%rd5, [_output_1960+8];
	ld.param.b64	%rd6, [_output_1960+16];
	ld.param.b8	%rs2, [_output_1960+24];

	mul.wide.u32	%rd8, %r1, 4;
	add.u64	%rd7, %rd4, %rd8;
	mul.wide.u32	%rd9, %r1, 4;
	add.u64	%rd8, %rd1, %rd9;
	ld.global.b32	%r2, [%rd8];
	st.global.b32	[%rd7], %r2;
	ret;

}

…f kernels.

m4rs-mt added the enhancement label Nov 8, 2022

m4rs-mt added this to the v1.3 milestone Nov 8, 2022

Exposed CapabilityContext constructors to allow offline compilation o…

897113e

…f kernels.

m4rs-mt force-pushed the exposed_capability_contexts branch from 4e65ac1 to 897113e Compare November 8, 2022 14:32

MoFtZ approved these changes Nov 10, 2022

View reviewed changes

m4rs-mt merged commit 511d6e7 into master Nov 11, 2022

m4rs-mt deleted the exposed_capability_contexts branch November 11, 2022 03:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposed CapabilityContext constructors #889

Exposed CapabilityContext constructors #889

m4rs-mt commented Nov 8, 2022 •

edited

Loading

Exposed CapabilityContext constructors #889

Exposed CapabilityContext constructors #889

Conversation

m4rs-mt commented Nov 8, 2022 • edited Loading

m4rs-mt commented Nov 8, 2022 •

edited

Loading