Skip to content

MetalPerformanceShaders macOS xcode9 beta2

Vincent Dondain edited this page Jun 21, 2017 · 1 revision

#MetalPerformanceShaders.framework

diff -ruN /Applications/Xcode9-beta1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h /Applications/Xcode9-beta2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h
--- /Applications/Xcode9-beta1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h	2017-05-20 02:10:25.000000000 -0400
+++ /Applications/Xcode9-beta2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/System/Library/Frameworks/MetalPerformanceShaders.framework/Headers/MetalPerformanceShaders.h	2017-06-11 02:09:08.000000000 -0400
@@ -38,7 +38,7 @@
 //      man -M `xcrun --show-sdk-path -sdk iphoneos9.0`/usr/share/man  MPSKernel
 //
 
-/*! @mainpage Metal Shaders - High Performance Kernels on Metal
+/*! @mainpage Metal Performance Shaders - High Performance Kernels on Metal
  *  @section  section_introduction  Introduction
  *
  *  MetalPerformanceShaders.framework is a framework of highly optimized compute and graphics shaders that are
@@ -58,6 +58,7 @@
  *
  *  @subsection subsection_usingMPS  Using MPS
  *  To use MPS:
+ *   @code
  *      link:     -framework MetalPerformanceShaders
  *      include:  #include <MetalPerformanceShaders/MetalPerformanceShaders.h>
  *
@@ -70,14 +71,49 @@
  *                level headers.  iOS 11 already broke source compatibility for lower level headers
  *                and future releases will probably do so again. The only supported method of
  *                including MPS symbols is the top level framework header.
+ *    @endcode
+ *    On macOS, MetalPerformanceShaders.framework is 64-bit only.  If you are still supporting
+ *    the 32-bit i386 architecture, you can link just your 64-bit slice to MPS using a Xcode
+ *    user defined build setting.  For example, you can add a setting called LINK_MPS:
+ *    @code
+ *        LINK_MPS
+ *            Debug    -framework MetalPerformanceShaders
+ *                Intel architecture            <leave this part empty>
+ *            Release  -framework MetalPerformanceShaders
+ *                Intel architecture            <leave this part empty>
+ *    @endcode
+ *
+ *    The 64-bit intel architectures will inherit from the generic definition on the Debug and
+ *    Release lines. Next, add $(MPS_LINK) to the Other Linker Flags line in your Xcode build
+ *    settings.
+ *
+ *    In code segments built for both i386 and x86-64 you will need to keep the i386 segment
+ *    from attempting to use MPS. In C, C++ and Objective C, a simple #ifdef will work fine.
+ *    @code
+ *        BOOL IsMPSSupported( id <MTLDevice> device )
+ *        {
+ *        #ifdef __i386__
+ *            return NO;
+ *        #else
+ *            return MPSSupportsMTLDevice(device);
+ *        #endif
+ *        }
+ *    @endcode
+ *
  *
  *  @section section_data    Data containers
  *  @subsection subsection_metal_containers  MTLTextures and MTLBuffers
  *
  *  Most data operated on by Metal Performance Shaders must be in a portable data container appropriate
- *  for use on the GPU, such as a MTLTexture, MTLBuffer or MPSImage.  The first two should be 
- *  self-explanatory based on your previous experience with Metal.framework. MPS will use these
- *  directly when it can.
+ *  for use on the GPU, such as a MTLTexture, MTLBuffer or MPSImage/MPSMatrix/MPSVector.  The first two
+ *  should be self-explanatory based on your previous experience with Metal.framework. MPS will use these
+ *  directly when it can.  The other three are wrapper classes designed to make MTLTextures and MTLBuffers
+ *  easier to use, especially when the data may be packed in the texture or buffer in an unusual order, or
+ *  typical notions like texel do not map to the abstraction (e.g. feature channel) well. MPSImages and
+ *  MPSMatrices also come in "temporary" variants. Temporary images and matrices aggressively share
+ *  memory with one another, saving a great deal of processor time allocating and tearing down textures.
+ *  (This uses MTLHeaps underneath, if you are familiar with that feature.) MPS manages the aliasing to
+ *  keep you safe. In exchange you must manage the resource readCount.
  *
  *  Most MPSImage and MPSCNN filters operate only on floating-point or normalized texture formats.
  *  If your data is in a UInteger or Integer MTLPixelFormat (e.g. MTLPixelFormatR8Uint as opposed
@@ -96,7 +132,7 @@
  *
  *  @subsection subsection_mpstemporaryimage  MPSTemporaryImages
  *  The MPSTemporaryImage (subclass of MPSImage) extends the MPSImage to provide advanced caching of
- *  unused memory to increase performance and reduce memory footprint. They are intended as fast
+ *  reusable memory to increase performance and reduce memory footprint. They are intended as fast
  *  GPU-only storage for intermediate image data needed only transiently within a single MTLCommandBuffer.
  *  They accelerate the common case of image data which is created only to be consumed and destroyed
  *  immediately by the next operation(s) in a MTLCommandBuffer.  MPSTemporaryImages provide convenient and 
@@ -106,6 +142,32 @@
  *  You can not read or write data to a MPSTemporaryImage using the CPU, or use the data in other MTLCommandBuffers.
  *  Use regular MPSImages for more persistent storage.
  *
+ *  Why do we need MPSTemporaryImages?  Consider what it would be like to write an app without a heap.
+ *  All allocations would have to be either on the stack or staticly allocated at app launch. You
+ *  would find that allocations that persist for the lifetime of the app are very wasteful when an object
+ *  is only needed for a few microseconds. Likewise, once the memory is statically partitioned in this way,
+ *  it is hard to dynamically reuse memory for other purposes as different tasks are attempted and the needs
+ *  of the app change. Finally, having to plan everything out in advance is just plain inconvenient! Isn't it
+ *  nicer to just call malloc() or new as needed? Yes, it is. Even if it means we have to also call free(),
+ *  find leaks and otherwise manage the lifetime of the allocation through some mechanism like reference counting,
+ *  or add __strong and __weak so that ARC can help us, we do it.
+ *
+ *  It should be therefore of little surprise that after the heap data structure by JWJ Williams in 1964, the
+ *  heap has been a mainstay of computer science since. The heap allocator was part of the C language a decade
+ *  later. Yet, 50 years on, why is it not used in GPU programming? Developers routinely still allocate resources
+ *  up front that stay live for the lifetime of the program (command buffer). Why would you do that?
+ *  MPSTemporaryImages are MPSImages that use a memory allocated by a command buffer associated heap to store
+ *  texels. They only use the memory they need for the part of the command buffer that they need it in, and the
+ *  memory is made available for other MPSTemporaryImages that live in another part of the same command buffer.
+ *  This allows for a very high level of memory reuse. In the context of a MPSNNGraph, for example, the
+ *  InceptionV3 neural network requires 121 MPSImages to hold intermediate results. However, since it uses
+ *  MPSTemporaryImages instead, these are reduced to just four physical allocations of the same size as one of
+ *  the original images. Do you believe most of your work should be done using MPSTemporaryImages? You should.
+ *  You only need the persistent MPSImage for storage needed outside the context of the command buffer, for
+ *  example those images that might be read from or written to by the CPU. Use MPSTemporaryImages for
+ *  transient storage needs. In aggregate, they are far less expensive than regular MPSImages. Create them,
+ *  use them, throw them away, all within a few lines of code. Make more just in time as needed.
+ *
  *  @section section_discussion     The MPSKernel
  *
  *  The MPSKernel is the base class for all MPS kernels. It defines baseline behavior for all MPS 
@@ -443,10 +505,16 @@
  *  filter be applied to the image before down sampling. However, some ringing can occur near high frequency regions 
  *  of the image, making the algorithm less suitable for vector art.
  *
- *  MetalPerformanceShaders.framework provides a MPSImageLanczosScale function to allow for simple resizing of images into the clipRect
- *  of the result image. It can operate with preservation of aspect ratio or not. 
+ *  MetalPerformanceShaders.framework provides a MPSImageScale functions to allow for simple resizing of images into the clipRect
+ *  of the result image. They can operate with preservation of aspect ratio or not.
+ *
+ *      MPSImageLanczosScale        <MPSImage/MPSResample.h>   Resize or adjust aspect ratio of an image using a Lanczos filter.
+ *      MPSImageBilinearScale       <MPSImage/MPSResample.h>   Resize or adjust aspect ratio of an image using bilinear interpolation.
  *
- *      MPSImageLanczosScale              <MPSImage/MPSResample.h>   Resize or adjust aspect ratio of an image.
+ *  Each method has its own advantages. The bilinear method is faster. However, downsampling by more than a factor
+ *  of two will lead to data loss, unless a low pass filter is applied before the downsampling operation.  The
+ *  lanczos filter method does not have this problem and usually looks better. However, it can lead to ringing
+ *  at sharp edges, making it better for photographs than vector art.
  *
  *  @subsection subsection_threshold     Image Threshold
  *  Thresholding operations are commonly used to separate elements of image structure from the rest of an image. 
@@ -461,7 +529,18 @@
  *      MPSImageThresholdTruncate         <MPSImage/MPSImageThreshold.h>  srcPixel > thresholdVal ? thresholdVal : srcPixel
  *      MPSImageThresholdToZero           <MPSImage/MPSImageThreshold.h>  srcPixel > thresholdVal ? srcPixel : 0
  *      MPSImageThresholdToZeroInverse    <MPSImage/MPSImageThreshold.h>  srcPixel > thresholdVal ? 0 : srcPixel
+ *      MPSImageKeypoint                  <MPSImage/MPSImageKeypoint.h>  return a list of pixels that are greathr than a threshold value
  *
+ *  @subsection subsection_images_statistics  Image Statistics
+ *  Several statistical operators are available which return statistics for the entire image, or
+ *  a subregion. These operators are:
+ *
+ *      MPSImageStatisticsMinAndMax       <MPSImage/MPSImageStatistics.h> return maximum and minimum values in the image for each channel
+ *      MPSImageStatisticsMean            <MPSImage/MPSImageStatistics.h> return the mean channel value over the region of interest
+ *      MPSImageStatisticsMeanAndVariance <MPSImage/MPSImageStatistics.h> return the mean channel value and variance of each channel over the region of interest
+ *
+ *  These filters return the results in a small (1x1 or 2x1) texture. The region over which the
+ *  statistical operator is applied is regulated by the clipRectSource property.
  *
  *  @subsection subsection_math     Math Filters
  *  Arithmetic filters take two source images, a primary source image and a secondary source image, as input and
@@ -497,6 +576,7 @@
  *      MPSCNNNeuronSoftSign            <MPSNeuralNetwork/MPSCNNConvolution.h>       A SoftSign neuron activation function x/(1+|x|)
  *      MPSCNNNeuronELU                 <MPSNeuralNetwork/MPSCNNConvolution.h>       A parametric ELU neuron activation function x<0 ? (a*(e**x-1)) : x
  *      MPSCNNConvolution               <MPSNeuralNetwork/MPSCNNConvolution.h>       A 4D convolution tensor
+ *      MPSCNNConvolutionTranspose      <MPSNeuralNetwork/MPSCNNConvolution.h>       A 4D convolution transpose tensor
  *      MPSCNNFullyConnected            <MPSNeuralNetwork/MPSCNNConvolution.h>       A fully connected CNN layer
  *      MPSCNNPoolingMax                <MPSNeuralNetwork/MPSCNNPooling.h>           The maximum value in the pooling area
  *      MPSCNNPoolingAverage            <MPSNeuralNetwork/MPSCNNPooling.h>           The average value in the pooling area
@@ -531,6 +611,14 @@
  *  the application can make a large MPSImage or MPSTemporaryImage and fill in parts of it with multiple layers
  *  (as long as the destination feature channel offset is a multiple of 4).
  *
+ *  The standard MPSCNNConvolution operator also does dilated convolution and sub-pixel convolution. There are
+ *  also bit-wise convolution operators that can use only a single bit for precision of the weights. The
+ *  precision of the image can be reduced to 1 bit in this case as well.  The bit {0,1} represents {-1,1}.
+ *
+ *  @subsection subsection_RNN     Recurrent Neural Networks
+ *
+ *  @subsection subsection_matrix_primitives     Matrix Primitives
+ *
  *  Some CNN Tips:
  *  - Think carefully about the edge mode requested for pooling layers. The default is clamp to zero, but there
  *    are times when clamp to edge value may be better.
@@ -539,6 +627,8 @@
  *    of the output image by {kernelWidth-1, kernelHeight-1,0}. The filter area stretcheds up and to the left
  *    of the MPSCNNKernel.offset by {kernelWidth/2, kernelHeight/2}. While consistent with other MPS imaging operations,
  *    this behavior is different from some other CNN implementations.
+ *  - If setting the offset and making MPSImages to hold intermediates are taking up a lot of your time,
+ *    consider using the MPSNNGraph instead. It will automate these tasks.
  *  - Please remember:
  *      MPSCNNConvolution takes weights in the order weight[outputChannels][kernelHeight][kernelWidth][inputChannels / groups]
  *      MPSCNNFullyConnected takes weights in the order weight[outputChannels][sourceWidth][sourceHeight][inputChannels]
@@ -759,12 +849,25 @@
  *      MPSImage *inputImage = [[MPSImage alloc] initWithDevice: mtlDevice imageDescriptor: myDescriptor];
  *      // put some data into the input image here. See MTLTexture.replaceBytes...
  *      MPSImage * result = [myGraph encodeToCommandBuffer: cmdBuf sourceImages: @[inputImage] ];
+ *      [cmdBuf addCompletedHandler: ^(id <MTLCommandBuffer> buf){
+ *            // Notify your app that the work is done and the values in result
+ *            // are ready for inspection.
+ *       }];
  *      [cmdBuf commit];
- *      [cmdBuf waitForCompletion];
+ *
+ *      // While we are working on that, encode something else
+ *      id <MTLCommandBuffer> cmdBuf2 = mtlCommandQueue.commandBuffer;
+ *      MPSImage * result2 = [myGraph encodeToCommandBuffer: cmdBuf2 sourceImages: @[inputImage2] ];
+ *      [cmdBuf2 addCompletedHandler: ^(id <MTLCommandBuffer> buf){
+ *            // Notify your app that the work is done and the values in result2
+ *            // are ready for inspection.
+ *       }];
+ *      [cmdBuf2 commit];
+ *      ...
  *  @endcode
- *  Obviously, if you have more work to do before or after the graph, it might be better to add it to the 
- *  command buffer before committing it, rather than paying for an extra synchronization from 
- *  [id <MTLCommandBuffer> waitForCompletion].
+ *  The extra synchronization from [id <MTLCommandBuffer> waitForCompletion] should be avoided. It can
+ *  be exceptionally costly because the wait for new work to appear allows the GPU clock to spin down.
+ *  Factor of two or more performance increases are common with -addCompletedHandler:.
  *
  *  @section  subsection_mpsnngraph_sizing   MPSNNGraph intermediate image sizing and centering
  *  The MPSNNGraph will automatically size and center the intermediate images that appear in the graph.
@@ -796,7 +899,7 @@
  *  or the kernel (also passed to you) in your custom destinationImageDescriptorForSourceImages:sourceStates:
  *  forKernel:suggestedDescriptor: method, or just ignore it and make a new descriptor.
  *
- *  @section  subsection_mpsnngraph_sizing   MPSNNGraph intermediate image allocation
+ *  @section  subsection_mpsnngraph_image_allocation   MPSNNGraph intermediate image allocation
  *  Typically the graph will make MPSTemporaryImages for these, based on the MPSImageDescriptor obtained
  *  from the padding policy. Temporary images alias one another and can be used to save a lot of memory,
  *  in the same way that malloc saves memory in your application by allowing you to reserve memory for 
@@ -835,7 +938,7 @@
  *  place using MPSCNNKernel.destinationFeatureChannelOffset rather than by adding an extra copy. Other optimizations 
  *  may be added as framework capabilities improve.
  *
- *  @section  section_samplecode   Sample Code
+ *  @section  section_samplecode   Sample Image Processing Example
  *      @code
  *       #import <MetalPerformanceShaders/MetalPerformanceShaders.h>
  *
Clone this wiki locally