You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some kernel loaders (LoadAutoGroupedXXXKernel) provide additional information about the "optimal" number of threads per group and/or the number of groups to achieve maximum occupancy on a certain device. However, this functionality is not available for all overloads. Furthermore, there have been requests (like #76) to get additional information about the number of functions, the amount of stack memory or the maximum call depth of all functions in a final compile unit (CompiledKernel instance - similar to the output of ptxas -v filename). This feature might require an extension of the CompiledKernel class to store additional information about the functions compiled. A sample implementation of such an extension is shown below (can be nested types of the class CompiledKernel)
/// <summary>/// Contains information about functions./// </summary>publicreadonlystructFunctionInfo{
#region Instance
/// <summary>/// Constructs a new function information object./// </summary>publicFunctionInfo(stringname,MethodBasemethod,intlocalMemorySize,intmaxCallDepth){if(localMemorySize<0)thrownewArgumentOutOfRangeException(nameof(localMemorySize));if(maxCallDepth<0)thrownewArgumentOutOfRangeException(nameof(maxCallDepth));Name=name??thrownewArgumentNullException(nameof(name));Method=method;LocalMemorySize=localMemorySize;MaxCallDepth=maxCallDepth;}
#endregion
#region Properties
/// <summary>/// The name of the compiled function inside the kernel./// </summary>publicstringName{get;}/// <summary>/// Returns the managed method reference (if any)./// </summary>publicMethodBaseMethod{get;}/// <summary>/// Returns the local memory size in bytes./// </summary>publicintLocalMemorySize{get;}/// <summary>/// Returns the estimated maximum call depth./// </summary>publicintMaxCallDepth{get;}
#endregion
}/// <summary>/// Provides detailed information about compiled kernels./// </summary>publicclassCompiledKernelInfo{
#region Instance
/// <summary>/// Constructs a new kernel information object./// </summary>/// <param name="sharedAllocations">All shared allocations.</param>/// <param name="functions">/// An array containing detailed function information./// </param>publicKernelInfo(inAllocaKindInformationsharedAllocations,ImmutableArray<FunctionInfo>functions){SharedAllocations=sharedAllocations;Functions=functions;}
#endregion
#region Properties
/// <summary>/// Returns detailed information about all shared allocations./// </summary>/// <remarks>/// This information will be populated if the flag/// <see cref="ContextFlags.EnableKernelInformation"/> is set./// </remarks>publicAllocaKindInformationSharedAllocations{get;}/// <summary>/// Returns information about all functions in the compiled kernel./// </summary>/// <remarks>/// This array will be populated if the flag/// <see cref="ContextFlags.EnableKernelInformation"/> is set./// </remarks>publicImmutableArray<FunctionInfo>Functions{get;}
#endregion
#region Methods
/// <summary>/// Dumps kernel information to the standard console output./// </summary>publicvoidDump()=>Dump(Console.Out);/// <summary>/// Dumps kernel information to the given text writer./// </summary>/// <param name="writer">The text writer.</param>publicvirtualvoidDump(TextWriterwriter){if(writer==null)thrownewArgumentNullException(nameof(writer));// Shared memoryif(SharedAllocations.TotalSize>0){writer.WriteLine("Shared Memory:");writer.Write("\tTotal Size: ");writer.Write(SharedAllocations.TotalSize);writer.WriteLine(" bytes");foreach(varallocinSharedAllocations){writer.Write("\t");writer.Write(alloc.ElementType.ToString());writer.Write('[');writer.Write(alloc.ArraySize);writer.Write("] ");writer.Write(alloc.TotalSize);writer.WriteLine(" bytes");}}// Information about methods, calls and local memory sizesif(!Functions.IsDefaultOrEmpty){writer.WriteLine("Functions:");for(inti=0,e=Functions.Length;i<e;++i){ref readonly varfunctionRef=refFunctions.ItemRef(i);varmethodName=functionRef.Method?.Name??functionRef.Name;writer.Write('\t');writer.WriteLine(methodName);writer.Write("\t\tLocal Memory: ");writer.WriteLine(functionRef.LocalMemorySize);writer.WriteLine(" bytes");writer.Write("\t\tMax Call Depth:");writer.WriteLine(functionRef.MaxCallDepth);}}}
#endregion
}
A link to the existing functionality in the scope of the CompiledKernel class might look like this:
/// <summary>/// Returns information about all functions in the compiled kernel./// </summary>/// <remarks>/// This instance will be available when the/// <see cref="ContextFlags.EnableKernelInformation"/> is set./// </remarks>publicCompiledKernelInfoInfo{get;}
The code above assumes that a ContextFlag named EnableKernelInformation is present to avoid unnecessary object creation if detailed information is not required:
/// <summary>/// Enables detailed kernel information about all compiled kernel functions./// </summary>EnableKernelInformation=1<<3,
All kernel loaders should be extended with additional overloads to "return" custom KernelInfo instances. A sample implementation of this class might look like:
/// <summary>/// Provides detailed information about compiled kernels./// </summary>publicsealedclassKernelInfo:CompiledKernel.KernelInfo{
#region Static
/// <summary>/// /// </summary>/// <param name="info">The underlying kernel info (if any).</param>/// <param name="minGroupSize">The minimum group size (if known).</param>/// <param name="minGridSize">The minimum grid size (if known).</param>/// <returns></returns>publicstaticKernelInfoCreateFrom(CompiledKernel.KernelInfoinfo,int?minGroupSize,int?minGridSize)=>infoisnull?null:newKernelInfo(minGroupSize,minGridSize,info.SharedAllocations,info.Functions);
#endregion
#region Instance
/// <summary>/// Constructs a new kernel information object./// </summary>/// <param name="minGroupSize">The minimum group size (if known).</param>/// <param name="minGridSize">The minimum grid size (if known).</param>publicKernelInfo(int?minGroupSize,int?minGridSize):this(minGroupSize,minGridSize,newAllocaKindInformation(ImmutableArray<AllocaInformation>.Empty,0),ImmutableArray<CompiledKernel.FunctionInfo>.Empty){}/// <summary>/// Constructs a new kernel information object./// </summary>/// <param name="minGroupSize">The minimum group size (if known).</param>/// <param name="minGridSize">The minimum grid size (if known).</param>/// <param name="sharedAllocations">All shared allocations.</param>/// <param name="functions">/// An array containing detailed function information./// </param>publicKernelInfo(int?minGroupSize,int?minGridSize,inAllocaKindInformationsharedAllocations,ImmutableArray<CompiledKernel.FunctionInfo>functions):base(sharedAllocations,functions){MinGroupSize=minGroupSize;MinGridSize=minGridSize;}
#endregion
#region Properties
/// <summary>/// Returns the estimated group size to gain maximum occupancy on this device./// </summary>publicint?MinGroupSize{get;}/// <summary>/// Returns the minimum grid size to gain maximum occupancy on this device./// </summary>publicint?MinGridSize{get;}
#endregion
#region Methods
/// <summary>/// Dumps kernel information to the given text writer./// </summary>/// <param name="writer">The text writer.</param>publicoverridevoidDump(TextWriterwriter){base.Dump(writer);// Group and grid dimensionsif(MinGroupSize.HasValue){writer.Write(nameof(MinGroupSize));writer.Write(' ');writer.WriteLine(MinGroupSize);}if(MinGridSize.HasValue){writer.Write(nameof(MinGridSize));writer.Write(' ');writer.WriteLine(MinGridSize);}}
#endregion
}
The text was updated successfully, but these errors were encountered:
Some kernel loaders (
LoadAutoGroupedXXXKernel
) provide additional information about the "optimal" number of threads per group and/or the number of groups to achieve maximum occupancy on a certain device. However, this functionality is not available for all overloads. Furthermore, there have been requests (like #76) to get additional information about the number of functions, the amount of stack memory or the maximum call depth of all functions in a final compile unit (CompiledKernel
instance - similar to the output ofptxas -v filename
). This feature might require an extension of theCompiledKernel
class to store additional information about the functions compiled. A sample implementation of such an extension is shown below (can be nested types of the classCompiledKernel
)A link to the existing functionality in the scope of the
CompiledKernel
class might look like this:The code above assumes that a
ContextFlag
namedEnableKernelInformation
is present to avoid unnecessary object creation if detailed information is not required:All kernel loaders should be extended with additional overloads to "return" custom
KernelInfo
instances. A sample implementation of this class might look like:The text was updated successfully, but these errors were encountered: