Performance increase for axis = -1
By reusing memory locations, streaming functions with axis = -1 (default behavior) can show drastic performance improvements.
For example, summing arrays of shape (2048, 2048) shows speedup of 3x on my machine.
By reusing memory locations, streaming functions with axis = -1 (default behavior) can show drastic performance improvements.
For example, summing arrays of shape (2048, 2048) shows speedup of 3x on my machine.