forked from xamarin/xamarin-macios
-
Notifications
You must be signed in to change notification settings - Fork 1
MetalPerformanceShaders macOS xcode9 beta2
Vincent Dondain edited this page Jun 21, 2017
·
1 revision
#MetalPerformanceShaders.framework
diff -ruN /Applications/Xcode9-beta1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h /Applications/Xcode9-beta2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h
--- /Applications/Xcode9-beta1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h 2017-05-20 02:10:25.000000000 -0400
+++ /Applications/Xcode9-beta2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h 2017-06-11 02:09:08.000000000 -0400
@@ -38,7 +38,7 @@
// man -M `xcrun --show-sdk-path -sdk iphoneos9.0`/usr/share/man MPSKernel
//
-/*! @mainpage Metal Shaders - High Performance Kernels on Metal
+/*! @mainpage Metal Performance Shaders - High Performance Kernels on Metal
* @section section_introduction Introduction
*
* MetalPerformanceShaders.framework is a framework of highly optimized compute and graphics shaders that are
@@ -58,6 +58,7 @@
*
* @subsection subsection_usingMPS Using MPS
* To use MPS:
+ * @code
* link: -framework MetalPerformanceShaders
* include: #include <MetalPerformanceShaders/MetalPerformanceShaders.h>
*
@@ -70,14 +71,49 @@
* level headers. iOS 11 already broke source compatibility for lower level headers
* and future releases will probably do so again. The only supported method of
* including MPS symbols is the top level framework header.
+ * @endcode
+ * On macOS, MetalPerformanceShaders.framework is 64-bit only. If you are still supporting
+ * the 32-bit i386 architecture, you can link just your 64-bit slice to MPS using a Xcode
+ * user defined build setting. For example, you can add a setting called LINK_MPS:
+ * @code
+ * LINK_MPS
+ * Debug -framework MetalPerformanceShaders
+ * Intel architecture <leave this part empty>
+ * Release -framework MetalPerformanceShaders
+ * Intel architecture <leave this part empty>
+ * @endcode
+ *
+ * The 64-bit intel architectures will inherit from the generic definition on the Debug and
+ * Release lines. Next, add $(MPS_LINK) to the Other Linker Flags line in your Xcode build
+ * settings.
+ *
+ * In code segments built for both i386 and x86-64 you will need to keep the i386 segment
+ * from attempting to use MPS. In C, C++ and Objective C, a simple #ifdef will work fine.
+ * @code
+ * BOOL IsMPSSupported( id <MTLDevice> device )
+ * {
+ * #ifdef __i386__
+ * return NO;
+ * #else
+ * return MPSSupportsMTLDevice(device);
+ * #endif
+ * }
+ * @endcode
+ *
*
* @section section_data Data containers
* @subsection subsection_metal_containers MTLTextures and MTLBuffers
*
* Most data operated on by Metal Performance Shaders must be in a portable data container appropriate
- * for use on the GPU, such as a MTLTexture, MTLBuffer or MPSImage. The first two should be
- * self-explanatory based on your previous experience with Metal.framework. MPS will use these
- * directly when it can.
+ * for use on the GPU, such as a MTLTexture, MTLBuffer or MPSImage/MPSMatrix/MPSVector. The first two
+ * should be self-explanatory based on your previous experience with Metal.framework. MPS will use these
+ * directly when it can. The other three are wrapper classes designed to make MTLTextures and MTLBuffers
+ * easier to use, especially when the data may be packed in the texture or buffer in an unusual order, or
+ * typical notions like texel do not map to the abstraction (e.g. feature channel) well. MPSImages and
+ * MPSMatrices also come in "temporary" variants. Temporary images and matrices aggressively share
+ * memory with one another, saving a great deal of processor time allocating and tearing down textures.
+ * (This uses MTLHeaps underneath, if you are familiar with that feature.) MPS manages the aliasing to
+ * keep you safe. In exchange you must manage the resource readCount.
*
* Most MPSImage and MPSCNN filters operate only on floating-point or normalized texture formats.
* If your data is in a UInteger or Integer MTLPixelFormat (e.g. MTLPixelFormatR8Uint as opposed
@@ -96,7 +132,7 @@
*
* @subsection subsection_mpstemporaryimage MPSTemporaryImages
* The MPSTemporaryImage (subclass of MPSImage) extends the MPSImage to provide advanced caching of
- * unused memory to increase performance and reduce memory footprint. They are intended as fast
+ * reusable memory to increase performance and reduce memory footprint. They are intended as fast
* GPU-only storage for intermediate image data needed only transiently within a single MTLCommandBuffer.
* They accelerate the common case of image data which is created only to be consumed and destroyed
* immediately by the next operation(s) in a MTLCommandBuffer. MPSTemporaryImages provide convenient and
@@ -106,6 +142,32 @@
* You can not read or write data to a MPSTemporaryImage using the CPU, or use the data in other MTLCommandBuffers.
* Use regular MPSImages for more persistent storage.
*
+ * Why do we need MPSTemporaryImages? Consider what it would be like to write an app without a heap.
+ * All allocations would have to be either on the stack or staticly allocated at app launch. You
+ * would find that allocations that persist for the lifetime of the app are very wasteful when an object
+ * is only needed for a few microseconds. Likewise, once the memory is statically partitioned in this way,
+ * it is hard to dynamically reuse memory for other purposes as different tasks are attempted and the needs
+ * of the app change. Finally, having to plan everything out in advance is just plain inconvenient! Isn't it
+ * nicer to just call malloc() or new as needed? Yes, it is. Even if it means we have to also call free(),
+ * find leaks and otherwise manage the lifetime of the allocation through some mechanism like reference counting,
+ * or add __strong and __weak so that ARC can help us, we do it.
+ *
+ * It should be therefore of little surprise that after the heap data structure by JWJ Williams in 1964, the
+ * heap has been a mainstay of computer science since. The heap allocator was part of the C language a decade
+ * later. Yet, 50 years on, why is it not used in GPU programming? Developers routinely still allocate resources
+ * up front that stay live for the lifetime of the program (command buffer). Why would you do that?
+ * MPSTemporaryImages are MPSImages that use a memory allocated by a command buffer associated heap to store
+ * texels. They only use the memory they need for the part of the command buffer that they need it in, and the
+ * memory is made available for other MPSTemporaryImages that live in another part of the same command buffer.
+ * This allows for a very high level of memory reuse. In the context of a MPSNNGraph, for example, the
+ * InceptionV3 neural network requires 121 MPSImages to hold intermediate results. However, since it uses
+ * MPSTemporaryImages instead, these are reduced to just four physical allocations of the same size as one of
+ * the original images. Do you believe most of your work should be done using MPSTemporaryImages? You should.
+ * You only need the persistent MPSImage for storage needed outside the context of the command buffer, for
+ * example those images that might be read from or written to by the CPU. Use MPSTemporaryImages for
+ * transient storage needs. In aggregate, they are far less expensive than regular MPSImages. Create them,
+ * use them, throw them away, all within a few lines of code. Make more just in time as needed.
+ *
* @section section_discussion The MPSKernel
*
* The MPSKernel is the base class for all MPS kernels. It defines baseline behavior for all MPS
@@ -443,10 +505,16 @@
* filter be applied to the image before down sampling. However, some ringing can occur near high frequency regions
* of the image, making the algorithm less suitable for vector art.
*
- * MetalPerformanceShaders.framework provides a MPSImageLanczosScale function to allow for simple resizing of images into the clipRect
- * of the result image. It can operate with preservation of aspect ratio or not.
+ * MetalPerformanceShaders.framework provides a MPSImageScale functions to allow for simple resizing of images into the clipRect
+ * of the result image. They can operate with preservation of aspect ratio or not.
+ *
+ * MPSImageLanczosScale <MPSImage/MPSResample.h> Resize or adjust aspect ratio of an image using a Lanczos filter.
+ * MPSImageBilinearScale <MPSImage/MPSResample.h> Resize or adjust aspect ratio of an image using bilinear interpolation.
*
- * MPSImageLanczosScale <MPSImage/MPSResample.h> Resize or adjust aspect ratio of an image.
+ * Each method has its own advantages. The bilinear method is faster. However, downsampling by more than a factor
+ * of two will lead to data loss, unless a low pass filter is applied before the downsampling operation. The
+ * lanczos filter method does not have this problem and usually looks better. However, it can lead to ringing
+ * at sharp edges, making it better for photographs than vector art.
*
* @subsection subsection_threshold Image Threshold
* Thresholding operations are commonly used to separate elements of image structure from the rest of an image.
@@ -461,7 +529,18 @@
* MPSImageThresholdTruncate <MPSImage/MPSImageThreshold.h> srcPixel > thresholdVal ? thresholdVal : srcPixel
* MPSImageThresholdToZero <MPSImage/MPSImageThreshold.h> srcPixel > thresholdVal ? srcPixel : 0
* MPSImageThresholdToZeroInverse <MPSImage/MPSImageThreshold.h> srcPixel > thresholdVal ? 0 : srcPixel
+ * MPSImageKeypoint <MPSImage/MPSImageKeypoint.h> return a list of pixels that are greathr than a threshold value
*
+ * @subsection subsection_images_statistics Image Statistics
+ * Several statistical operators are available which return statistics for the entire image, or
+ * a subregion. These operators are:
+ *
+ * MPSImageStatisticsMinAndMax <MPSImage/MPSImageStatistics.h> return maximum and minimum values in the image for each channel
+ * MPSImageStatisticsMean <MPSImage/MPSImageStatistics.h> return the mean channel value over the region of interest
+ * MPSImageStatisticsMeanAndVariance <MPSImage/MPSImageStatistics.h> return the mean channel value and variance of each channel over the region of interest
+ *
+ * These filters return the results in a small (1x1 or 2x1) texture. The region over which the
+ * statistical operator is applied is regulated by the clipRectSource property.
*
* @subsection subsection_math Math Filters
* Arithmetic filters take two source images, a primary source image and a secondary source image, as input and
@@ -497,6 +576,7 @@
* MPSCNNNeuronSoftSign <MPSNeuralNetwork/MPSCNNConvolution.h> A SoftSign neuron activation function x/(1+|x|)
* MPSCNNNeuronELU <MPSNeuralNetwork/MPSCNNConvolution.h> A parametric ELU neuron activation function x<0 ? (a*(e**x-1)) : x
* MPSCNNConvolution <MPSNeuralNetwork/MPSCNNConvolution.h> A 4D convolution tensor
+ * MPSCNNConvolutionTranspose <MPSNeuralNetwork/MPSCNNConvolution.h> A 4D convolution transpose tensor
* MPSCNNFullyConnected <MPSNeuralNetwork/MPSCNNConvolution.h> A fully connected CNN layer
* MPSCNNPoolingMax <MPSNeuralNetwork/MPSCNNPooling.h> The maximum value in the pooling area
* MPSCNNPoolingAverage <MPSNeuralNetwork/MPSCNNPooling.h> The average value in the pooling area
@@ -531,6 +611,14 @@
* the application can make a large MPSImage or MPSTemporaryImage and fill in parts of it with multiple layers
* (as long as the destination feature channel offset is a multiple of 4).
*
+ * The standard MPSCNNConvolution operator also does dilated convolution and sub-pixel convolution. There are
+ * also bit-wise convolution operators that can use only a single bit for precision of the weights. The
+ * precision of the image can be reduced to 1 bit in this case as well. The bit {0,1} represents {-1,1}.
+ *
+ * @subsection subsection_RNN Recurrent Neural Networks
+ *
+ * @subsection subsection_matrix_primitives Matrix Primitives
+ *
* Some CNN Tips:
* - Think carefully about the edge mode requested for pooling layers. The default is clamp to zero, but there
* are times when clamp to edge value may be better.
@@ -539,6 +627,8 @@
* of the output image by {kernelWidth-1, kernelHeight-1,0}. The filter area stretcheds up and to the left
* of the MPSCNNKernel.offset by {kernelWidth/2, kernelHeight/2}. While consistent with other MPS imaging operations,
* this behavior is different from some other CNN implementations.
+ * - If setting the offset and making MPSImages to hold intermediates are taking up a lot of your time,
+ * consider using the MPSNNGraph instead. It will automate these tasks.
* - Please remember:
* MPSCNNConvolution takes weights in the order weight[outputChannels][kernelHeight][kernelWidth][inputChannels / groups]
* MPSCNNFullyConnected takes weights in the order weight[outputChannels][sourceWidth][sourceHeight][inputChannels]
@@ -759,12 +849,25 @@
* MPSImage *inputImage = [[MPSImage alloc] initWithDevice: mtlDevice imageDescriptor: myDescriptor];
* // put some data into the input image here. See MTLTexture.replaceBytes...
* MPSImage * result = [myGraph encodeToCommandBuffer: cmdBuf sourceImages: @[inputImage] ];
+ * [cmdBuf addCompletedHandler: ^(id <MTLCommandBuffer> buf){
+ * // Notify your app that the work is done and the values in result
+ * // are ready for inspection.
+ * }];
* [cmdBuf commit];
- * [cmdBuf waitForCompletion];
+ *
+ * // While we are working on that, encode something else
+ * id <MTLCommandBuffer> cmdBuf2 = mtlCommandQueue.commandBuffer;
+ * MPSImage * result2 = [myGraph encodeToCommandBuffer: cmdBuf2 sourceImages: @[inputImage2] ];
+ * [cmdBuf2 addCompletedHandler: ^(id <MTLCommandBuffer> buf){
+ * // Notify your app that the work is done and the values in result2
+ * // are ready for inspection.
+ * }];
+ * [cmdBuf2 commit];
+ * ...
* @endcode
- * Obviously, if you have more work to do before or after the graph, it might be better to add it to the
- * command buffer before committing it, rather than paying for an extra synchronization from
- * [id <MTLCommandBuffer> waitForCompletion].
+ * The extra synchronization from [id <MTLCommandBuffer> waitForCompletion] should be avoided. It can
+ * be exceptionally costly because the wait for new work to appear allows the GPU clock to spin down.
+ * Factor of two or more performance increases are common with -addCompletedHandler:.
*
* @section subsection_mpsnngraph_sizing MPSNNGraph intermediate image sizing and centering
* The MPSNNGraph will automatically size and center the intermediate images that appear in the graph.
@@ -796,7 +899,7 @@
* or the kernel (also passed to you) in your custom destinationImageDescriptorForSourceImages:sourceStates:
* forKernel:suggestedDescriptor: method, or just ignore it and make a new descriptor.
*
- * @section subsection_mpsnngraph_sizing MPSNNGraph intermediate image allocation
+ * @section subsection_mpsnngraph_image_allocation MPSNNGraph intermediate image allocation
* Typically the graph will make MPSTemporaryImages for these, based on the MPSImageDescriptor obtained
* from the padding policy. Temporary images alias one another and can be used to save a lot of memory,
* in the same way that malloc saves memory in your application by allowing you to reserve memory for
@@ -835,7 +938,7 @@
* place using MPSCNNKernel.destinationFeatureChannelOffset rather than by adding an extra copy. Other optimizations
* may be added as framework capabilities improve.
*
- * @section section_samplecode Sample Code
+ * @section section_samplecode Sample Image Processing Example
* @code
* #import <MetalPerformanceShaders/MetalPerformanceShaders.h>
*