Implement reference semantics for all Tensors #160

mratsim · 2017-11-25T12:39:29Z

Current value semantic / copy-on-assignment is not good enough performance-wise. It requires unsafe all over the place which is not ergonomic at all.

I tried copy-on-write as well, in-depth ~~monologue~~ discussion in this issue #157. You can implement COW with atomic reference counting or a shared/non-shared boolean but it has a few problems detailed here:

Refcounting/isShared troubles when wrapped in a container
Performance predictability "when will it copy or share", compared to always copy or always share
Workaroundability: since = is overloaded, it is non-trivial to avoid COW, let a = b.unsafeView won't work

So Arraymancer will move to reference semantics (share Tensor data by default, copy must be made explicit).

Benefits:

CudaTensor already had this semantic
No need to sprinkle unsafeSlice all over the place for performance
- All the unsafe proc can be removed
- Much less code to maintain
Numpy and Julia already work like this
Most copies are explicit (except asContiguous and reshape
- debugging copy issues will be just grep clone *.nim

Disadvantages

Sharing is implicit: users might forget to use clone and share data by mistake.
- debugging sharing issues will be harder than grep unsafe *.nim

In the wild

Numpy and Julia have reference semantics, Matlab and R have copy-on-write.

The text was updated successfully, but these errors were encountered:

…g shares data instead of copy) (#160 and #161) Deprecate unsafe procs as well

bluenote10 · 2017-12-08T13:08:06Z

Disadvantages

Isn't the biggest disadvantage that it is no longer clear whether a function call f(data) modifies data or not? I.e., there is no way to see if a function will modify a tensor. In practice this means that many calls eventually become f(data.copy()) just to make sure that data isn't modified by f, and thus unnecessary copies become ubiquitous. This is one of the biggest issues in the numpy/pandas community and people try hard to go towards immutable data structures to make complex data flow more controllable. Feels like a missed opportunity to follow the all-mutable behavior of interpreted languages and not using compile time guarantees that f(data) cannot modify data.

mratsim · 2017-12-09T07:04:57Z

All function calls that modifies the underlying data (or the metadata) in Arraymancer require a var parameter.

The only gotcha left is:

import arraymancer
let a = [1, 2, 3, 4].toTensor
var b = a
b[2] = 10

echo a # Tensor : [1, 2, 10, 4]

If b was declared as let, it wouldn't compile so unwanted sharing issue can be grepped by looking at var and result variables.

bluenote10 · 2017-12-09T17:31:25Z

Hm, isn't that gotcha what ruins all guarantees? You can also argue that in the Python world you can simply grep for expressions that modify data, which is almost hopeless in a complex code base.

The biggest need for guaranteed immutability (and the unnecessary copy workaround) comes from use cases like this: Consider you don't know anything about f, i.e., it is a function provided by a user to some library. Let's assume that this library wants to make sure that data isn't modified when passing it to f. If I understand the behavior correctly, the library must call it via f(data.copy()) because even the non-var function signature does not guarantee that it is actually immutable. Our Python code bases are full of such copies, which are unnecessary most of the time, but you always have to account for the rare possibility of mutation :(.

mratsim · 2017-12-09T18:04:49Z

I'm in your camp ;), my first non-scripting language was Haskell and I like my referential transparency and deep immutability guarantee. I think I've tried everything for weeks: - Copy-on-write would introduce more troubles than it's worth #157 (comment) - Default seq copy-on-assignment requires to use `unsafe` everywhere especially for slicing (though I can use Nim move {call} optimization so only a single unsafe is needed) to avoid copy. Furthermore it makes the library bigger (to learn and maintain) due to the safe + unsafe version and CPU and Cuda tensors had different semantics. - Shallow on `let`, Deep copy on `var` doesn't work (yet ? See below). I've raised related feature requests: - nim-lang/Nim#6348 - nim-lang/Nim#6793 Adding your own use cases to them would be valuable. Another thing that is promising but that I didn't try is write-tracking: https://nim-lang.org/araq/writetracking.html. This is a recurrent issue that pops up regularly on the forum (and I bugged Araq a lot about it on IRC/Gitter): - 2015: https://forum.nim-lang.org/t/1685/1 - 2017: https://forum.nim-lang.org/t/3374 I'm very well aware of this unfortunate trade-off which for me can be resumed as: Value semantics - Safety at the cost of slowness (copy/heap allocation) and memory consumption OR unwieldly `unsafe` syntax. Vs - Speed at the cost of gotchas (this might have a huge marketing impact if others benchmark typical Arraymancer code against competing frameworks) - Familiar paradigm to Python/Julia users (but not to R/Matlab) - Codebase, API, maintenance simplicity In the future I hope we can use write tracking to prevent this gotcha.

mratsim added breaking-change key feature optimization RFC labels Nov 25, 2017

mratsim added a commit that referenced this issue Nov 27, 2017

BREAKING EVERYTHING: Change to reference semantics (assignment/slicin…

cd21f32

…g shares data instead of copy) (#160 and #161) Deprecate unsafe procs as well

mratsim closed this as completed Dec 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement reference semantics for all Tensors #160

Implement reference semantics for all Tensors #160

mratsim commented Nov 25, 2017

bluenote10 commented Dec 8, 2017 •

edited

Loading

mratsim commented Dec 9, 2017 •

edited

Loading

bluenote10 commented Dec 9, 2017 •

edited

Loading

mratsim commented Dec 9, 2017 via email •

edited

Loading

Implement reference semantics for all Tensors #160

Implement reference semantics for all Tensors #160

Comments

mratsim commented Nov 25, 2017

Benefits:

Disadvantages

In the wild

bluenote10 commented Dec 8, 2017 • edited Loading

mratsim commented Dec 9, 2017 • edited Loading

bluenote10 commented Dec 9, 2017 • edited Loading

mratsim commented Dec 9, 2017 via email • edited Loading

bluenote10 commented Dec 8, 2017 •

edited

Loading

mratsim commented Dec 9, 2017 •

edited

Loading

bluenote10 commented Dec 9, 2017 •

edited

Loading

mratsim commented Dec 9, 2017 via email •

edited

Loading