Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reference semantics for all Tensors #160

Closed
mratsim opened this issue Nov 25, 2017 · 4 comments
Closed

Implement reference semantics for all Tensors #160

mratsim opened this issue Nov 25, 2017 · 4 comments

Comments

@mratsim
Copy link
Owner

mratsim commented Nov 25, 2017

Current value semantic / copy-on-assignment is not good enough performance-wise. It requires unsafe all over the place which is not ergonomic at all.

I tried copy-on-write as well, in-depth monologue discussion in this issue #157. You can implement COW with atomic reference counting or a shared/non-shared boolean but it has a few problems detailed here:

  • Refcounting/isShared troubles when wrapped in a container
  • Performance predictability "when will it copy or share", compared to always copy or always share
  • Workaroundability: since = is overloaded, it is non-trivial to avoid COW, let a = b.unsafeView won't work

So Arraymancer will move to reference semantics (share Tensor data by default, copy must be made explicit).

Benefits:

  • CudaTensor already had this semantic
  • No need to sprinkle unsafeSlice all over the place for performance
    • All the unsafe proc can be removed
    • Much less code to maintain
  • Numpy and Julia already work like this
  • Most copies are explicit (except asContiguous and reshape
    • debugging copy issues will be just grep clone *.nim

Disadvantages

  • Sharing is implicit: users might forget to use clone and share data by mistake.
    • debugging sharing issues will be harder than grep unsafe *.nim

In the wild

Numpy and Julia have reference semantics, Matlab and R have copy-on-write.

@bluenote10
Copy link
Contributor

bluenote10 commented Dec 8, 2017

Disadvantages

Isn't the biggest disadvantage that it is no longer clear whether a function call f(data) modifies data or not? I.e., there is no way to see if a function will modify a tensor. In practice this means that many calls eventually become f(data.copy()) just to make sure that data isn't modified by f, and thus unnecessary copies become ubiquitous. This is one of the biggest issues in the numpy/pandas community and people try hard to go towards immutable data structures to make complex data flow more controllable. Feels like a missed opportunity to follow the all-mutable behavior of interpreted languages and not using compile time guarantees that f(data) cannot modify data.

@mratsim
Copy link
Owner Author

mratsim commented Dec 9, 2017

All function calls that modifies the underlying data (or the metadata) in Arraymancer require a var parameter.

The only gotcha left is:

import arraymancer
let a = [1, 2, 3, 4].toTensor
var b = a
b[2] = 10

echo a # Tensor : [1, 2, 10, 4]

If b was declared as let, it wouldn't compile so unwanted sharing issue can be grepped by looking at var and result variables.

@bluenote10
Copy link
Contributor

bluenote10 commented Dec 9, 2017

Hm, isn't that gotcha what ruins all guarantees? You can also argue that in the Python world you can simply grep for expressions that modify data, which is almost hopeless in a complex code base.

The biggest need for guaranteed immutability (and the unnecessary copy workaround) comes from use cases like this: Consider you don't know anything about f, i.e., it is a function provided by a user to some library. Let's assume that this library wants to make sure that data isn't modified when passing it to f. If I understand the behavior correctly, the library must call it via f(data.copy()) because even the non-var function signature does not guarantee that it is actually immutable. Our Python code bases are full of such copies, which are unnecessary most of the time, but you always have to account for the rare possibility of mutation :(.

@mratsim
Copy link
Owner Author

mratsim commented Dec 9, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants