Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposed CapabilityContext constructors #889

Merged
merged 1 commit into from
Nov 11, 2022
Merged

Conversation

m4rs-mt
Copy link
Owner

@m4rs-mt m4rs-mt commented Nov 8, 2022

This PR exposes the CapabilityContext constructors of the Cuda and OpenCL worlds to allow Backend instance creation without an Accelerator instance. This allows us to precompile kernels offline without having an actual Accelerator at hand.

The following code snippet demonstrates offline compilation of a PTX (Cuda) kernel:

private static void TestKernel(Index1D index, ArrayView<int> input, ArrayView<int> output)
{
    output[index] = input[index];
}

public static void Main()
{
    using var context = Context.Create(builder => builder
        .CPU(new CPUDevice(2, 1, 1)) // Use a very simplistic CPU accelerator instance
        // .Assertions() // Uncomment to use assertions
        // .Debug() // Uncomment to enable debug symbols
        .Optimize(OptimizationLevel.O2));

    using var backend = new PTXBackend(
        context,
        CudaArchitecture.SM_70,
        CudaInstructionSet.ISA_70,
        null);

    var entryPoint = EntryPointDescription.FromExplicitlyGroupedKernel(
        typeof(Program).GetMethod(nameof(TestKernel),
        BindingFlags.NonPublic | BindingFlags.Static));

    var compiledKernel = backend.Compile(entryPoint, default) as PTXCompiledKernel;
    File.WriteAllText("Output.ptx", compiledKernel!.PTXAssembly);
}

The output of this program should like (somewhat) like:

//
// Generated by ILGPU v1.3.0
//

.version 7.0
.target sm_70
.address_size 64

.visible .entry Kernel_TestKernel(
	.param .b32 _index_1942,
	.param .align 8 .b8 _input_1946[32],
	.param .align 8 .b8 _output_1960[32]
)
{
	.reg .b16	%rs<3>;
	.reg .b32	%r<3>;
	.reg .b64	%rd<10>;

	ld.param.b32	%r1, [_index_1942];
	ld.param.b64	%rd7, [_input_1946];
	cvta.to.global.u64	%rd1, %rd7;
	ld.param.b64	%rd2, [_input_1946+8];
	ld.param.b64	%rd3, [_input_1946+16];
	ld.param.b8	%rs1, [_input_1946+24];
	ld.param.b64	%rd7, [_output_1960];
	cvta.to.global.u64	%rd4, %rd7;
	ld.param.b64	%rd5, [_output_1960+8];
	ld.param.b64	%rd6, [_output_1960+16];
	ld.param.b8	%rs2, [_output_1960+24];

	mul.wide.u32	%rd8, %r1, 4;
	add.u64	%rd7, %rd4, %rd8;
	mul.wide.u32	%rd9, %r1, 4;
	add.u64	%rd8, %rd1, %rd9;
	ld.global.b32	%r2, [%rd8];
	st.global.b32	[%rd7], %r2;
	ret;

}

@m4rs-mt m4rs-mt added this to the v1.3 milestone Nov 8, 2022
@m4rs-mt m4rs-mt merged commit 511d6e7 into master Nov 11, 2022
@m4rs-mt m4rs-mt deleted the exposed_capability_contexts branch November 11, 2022 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants