Skip to content
This repository has been archived by the owner on Jul 10, 2023. It is now read-only.

Dealing with shared ownership (Rc and Arc) #37

Open
SimonSapin opened this issue Feb 10, 2016 · 20 comments
Open

Dealing with shared ownership (Rc and Arc) #37

SimonSapin opened this issue Feb 10, 2016 · 20 comments

Comments

@SimonSapin
Copy link
Member

A central idea behind this crate is that most resources in Rust have one well-defined owner. heap_size_of_children can trace recursively and only consider owned resources. For &T references it can return zero since they usually either point to non-heap memory, or to heap memory owned by something else that will be accounted for there.

This breaks down for types like Rc<T> and Arc<T> that introduce "shared ownership". They do have heap memory that we want to count, but in a fully generic context there is no clear single point where it should be counted.

(In a more specific context there count be a "main" Rc reference for a given resource. For example, following only the "first child" and "next sibling" references in a Kuchiki tree allows traversing the whole tree without duplication.)

In the current version (90dd7e0) this crate has:

impl<T: HeapSizeOf> HeapSizeOf for Arc<T> {
    fn heap_size_of_children(&self) -> usize {
        (**self).heap_size_of_children()
    }
}

This impl is returns an incorrect number for two reasons:

  • It’s too low because it only counts the size of resources owned by T, not the size of T itself and the two reference counts that Arc holds internally. This part is easy to fix.

  • It’s too high because there could be many Arc references to the same T value, and this impl duplicates by that many times the count of memory owned by T.

    One idea that I’ve heard/read (I don’t remember from who, sorry) was to divide by the strong reference count. If all strong references are traced correctly these fractions should add up to the correct value. But the usize return type would have to be changed to f32 or f64 to get remotely accurate results, which is weird at best. And Arc::strong_count and Rc::strong_count are still #[unstable]: Tracking issue for Arc/Rc extras rust-lang/rust#28356

The current version of this crate has another impl:

impl<T> HeapSizeOf for Vec<Rc<T>> {
    fn heap_size_of_children(&self) -> usize {
        // The fate of measuring Rc<T> is still undecided, but we still want to measure
        // the space used for storing them.
        unsafe {
            heap_size_of(self.as_ptr() as *const c_void)
        }
    }
}

My understanding of it as that, in practice, when we can’t have a correct impl of HeapSizeOf, Servo sometimes implement it (incorrectly) returning zero because an underestimated answer is better(?) than no answer.

This Vec<Rc<T>> follows this idea, just making the error smaller: don’t count Rc’s but count the Vec containing them.

Still, I don’t think there’s a reason to treat Rc<T> and Arc<T> differently.

@jdm
Copy link
Member

jdm commented Feb 10, 2016

Yeah, that's an odd inconsistency. I'm surprised we have an implementation for Arc at all.

@asajeffrey
Copy link
Member

If heapsize is going to cope with Rc<T> it looks like it would have to implement a graph-marking algorithm. This would mean something like keeping track of a set of pointers, then on reaching an Rc<T>, checking to see if the pointer is in the set: if it is, return a constant, and if it isn't, add it to the set before recursively calculating the size. This makes calculating heap size quite expensive, as it means keeping track of the set of all visited Rcs.

If we could guarantee that only one heapsize calculation at a time was needed, we could try to cram the marker into the Rc, but this would have to be done unsafely since the heap_size_of_children function only has &self, not &mut self. Arc is probably even trickier, since we'd have to ensure thread-safe access to the marker.

@jdm
Copy link
Member

jdm commented Feb 10, 2016

@asajeffrey What would be the end goal of maintaining that graph?

@asajeffrey
Copy link
Member

Only visiting each Rc<T> once. The pathalogical case is something like:

enum BTree {
    Leaf,
    Node(Rc<BTree>, Rc<BTree>),
}

let tree0 = Rc::new(BTree::Leaf);
let tree1 = Rc::new(BTree::Node(tree0.clone(),tree0.clone()));
let tree2 = Rc::new(BTree::Node(tree1.clone(),tree1.clone()));
...

If you use tree-traversal on treeN, you visit 2^N nodes, with graph-marking you just visit N.

@SimonSapin
Copy link
Member Author

And it’s not (just) about traversal time, it’s to avoid duplicating the count of memory used by T in Arc<T>.

@asajeffrey
Copy link
Member

Indeed, we can avoid having to divide by the ref count by only counting each Rc once.

@nnethercote
Copy link

Firefox's memory profiling still doesn't handle ref-counted types well. For some of the types we return the full size if the refcount is 1, and 0 otherwise. I've thought about doing the divide-by-the-refcount idea (and using integer division) but never gotten around to it. I think using fractional types would be silly and not worth it.

In general it is indeed better to under-measure than over-measure, because you can get the total heap size (from the allocator) and then subtract the size of all known heap data structures and end up with an "unknown" bucket (called "heap-unclassified" in Firefox's about:memory) and you want that "unknown" bucket to be accurate. If you over-measure it could go negative, which is confusing. (Double-counting any memory can have similarly bad effects.)

In general, ensuring correctness of these measurements is a hard problem. When your code says your memory usage is X, how can you trust that? Firefox has a tool called DMD (https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD) that verifies memory reporting by identifying which heap blocks are measure zero times, 1 times, and more than 1 times. It's really important for improving coverage and debugging the measurement code. I was hoping to implement something similar for Rust/Servo but never got around to it, at least partly because it requires (a) wrapping allocations and (b) getting and saving lots of stack traces, both of which are tricky in Rust.

@SimonSapin
Copy link
Member Author

Aww, what’s wrong with reporting 128.00000000000000004 bytes of memory usage? :)

@SimonSapin
Copy link
Member Author

Another "interesting" thing is that Rc and Arc can introduce cycles.

@asajeffrey
Copy link
Member

If we're dealing with cyclic structures, then pretty much we have to either do graph marking, or ignore any Rc with refcount > 1. Are we concerned that ignoring Rc with refcount > 1 would cause reported size to change as Rc's are cloned / dropped?

@SimonSapin
Copy link
Member Author

I think I’d prefer not to have a generic impl<T: HeapSizeOf> HeapSizeOf for Arc<T> (which can not assume anything about how Arc/Rc is used) in this crate at all, but rather answer these questions case by case in specific contexts.

@larsbergstrom
Copy link
Contributor

@asajeffrey I assume that there's also a third point there - you can choose not to ignore any Rc with a refcount > 1 if it's instantiated at a type that can't be cyclic, right? Though I suspect that's getting into a level of plugin wizardry that's a bit silly.

@asajeffrey
Copy link
Member

@larsbergstrom yes, you could try to be clever and get metadata about the type, like can it introduce cycles, but it seems like it would be tricky to get to fly without over-engineering it.

@larsbergstrom
Copy link
Contributor

@asajeffrey I just thought it might be worth looking at, since a lot of the stuff we Rc in servo is basically a heap-allocated integer, a hash table instantiated at immediate types, etc. There may be very few cases where it's possible to create cycles.

@asajeffrey
Copy link
Member

As a hack, we could abuse the refcount, with something like:

   if (refcount < LARGE_NUMBER) {
      refcount += LARGE_NUMBER;
      recursively_calculate_heapsize();
      refcount -= LARGE_NUMBER;
   }

Yuck. It would probably work, but yuck.

@matthewhammer
Copy link

@asajeffrey @larsbergstrom @SimonSapin @nnethercote @jdm Thanks for the really helpful discussion! I appreciate it! My original question to @larsbergstrom and @asajeffrey (over email) was why Rc<_> wasn't implementing this HeapSizeOf trait, but now I have a much better appreciation of the issues involved. It seems tricky to define this trait for shared structures without complicating it with a way to maintain its traversal history.

The context of my original question is a library for Adapton in Rust (See http://adapton.org and https://github.com/cuplv/adapton.rust specifically). My use of Rc in that library may be simpler than in servo, but sharing is widespread, and many uses of Rc are used to create linked structures (like lists) that share common suffixes.

In many cases, I can traverse the shared Rcs such that I visit each only once. This is possible for those Rcs that are indexed by a large global hash map (which implements the library's "memo table", used for incremental reuse, the principle purpose of the library). This memo table creates sharable nodes that point at one another, and are guaranteed to always form a DAG (specifically, there are no cycles, except for well-defined "back edges" that I will simply not traverse). In general, traversing a general DAG while avoiding double-visits requires storing a history of visited nodes (either as a set, or with mutable bits, etc.), but in the case of Adapton in Rust, since the nodes are all indexed by a hash map, the traversal problem is really easy: I just visit each node in the hash table, and do not follow its outgoing edges to other nodes, which all reside in the same table.

However, there are many other Rcs that the library uses to create list-like keys that index this memo table, and those lists share common suffixes. I need to think about a traversal strategy that avoids double-visiting these suffixes, and though it seems doable at the moment, it does not seem obvious to me yet. I'll give it some more thought, since the non-double-visit traversal seems to require something more clever than the traversal for the table of DAG nodes.

In sum, what is clear to me now is that I need to exploit the specific invariants and structure of my library to do a non-double-visit traversal of its DAG structure of memoized nodes. Further, I need to think about what invariants will help me do something clever and (hopefully equally) simple to traverse the keys that index these memoized nodes.

Thanks again for the discussion!

@SimonSapin
Copy link
Member Author

In many cases, I can traverse of the shared Rc<_>s such that I visit each only once. This is possible for those Rc<_>s that are indexed by a large global hash map

[…]

I need to exploit the specific invariants and structure of my library to do a non-double-visit traversal

Yes, this is the kind of thing I had in mind with “case by case specific contexts”.

@matthewhammer
Copy link

@SimonSapin Yes. This stance ("exploit the invariants of the application domain, and do not attempt to give a generic solution") seems right to me now as well, for what it's worth. ;)

SimonSapin added a commit that referenced this issue Mar 6, 2016
This crate has always given a lower bound of the actual heap memory usage:
we use `#[ignore_heap_size_of]` when not doing it is not practical.

This fix #37 with one of the ideas given in the discussion there:
return an accurate result when the reference count is 1
(and ownership is in fact not shared), or zero otherwise.

However, since APIs to access reference counts are not `#[stable]` yet
(see rust-lang/rust#28356),
so always return zero when unstable APIs are not enabled.
@SimonSapin
Copy link
Member Author

I’m currently changing Servo so that it can build with impl<T: HeapSizeOf> HeapSizeOf for Arc<T> removed. The plan (discussed in person with @nox) is to remove that implementation. The issues with it are described in the first message of this issue.

SimonSapin added a commit to servo/servo that referenced this issue Sep 27, 2016
SimonSapin added a commit to servo/servo that referenced this issue Sep 27, 2016
SimonSapin added a commit to servo/servo that referenced this issue Sep 28, 2016
SimonSapin added a commit to servo/servo that referenced this issue Sep 28, 2016
bors-servo pushed a commit to servo/servo that referenced this issue Sep 28, 2016
Use parking_lot::RwLock for PropertyDeclarationBlock

<!-- Please describe your changes on the following line: -->

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1305141

Closes #13176

---

Original PR title: Stop relying on `impl<T: HeapSizeOf> HeapSizeOf for Arc<T>`
servo/heapsize#37 (comment)

This builds on top of that.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #__ (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because refactor

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/13459)
<!-- Reviewable:end -->
bors-servo pushed a commit to servo/servo that referenced this issue Sep 28, 2016
Use parking_lot::RwLock for PropertyDeclarationBlock

<!-- Please describe your changes on the following line: -->

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1305141

Closes #13176

---

Original PR title: Stop relying on `impl<T: HeapSizeOf> HeapSizeOf for Arc<T>`
servo/heapsize#37 (comment)

This builds on top of that.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #__ (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because refactor

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/13459)
<!-- Reviewable:end -->
SimonSapin added a commit to servo/servo that referenced this issue Oct 4, 2016
SimonSapin added a commit to servo/servo that referenced this issue Oct 4, 2016
bors-servo pushed a commit to servo/servo that referenced this issue Oct 4, 2016
Use parking_lot::RwLock for PropertyDeclarationBlock

<!-- Please describe your changes on the following line: -->

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1305141

Closes #13176

---

Original PR title: Stop relying on `impl<T: HeapSizeOf> HeapSizeOf for Arc<T>`
servo/heapsize#37 (comment)

This builds on top of that.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #__ (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because refactor

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/13459)
<!-- Reviewable:end -->
bors-servo pushed a commit to servo/servo that referenced this issue Oct 4, 2016
Use parking_lot::RwLock for PropertyDeclarationBlock

<!-- Please describe your changes on the following line: -->

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1305141

Closes #13176

---

Original PR title: Stop relying on `impl<T: HeapSizeOf> HeapSizeOf for Arc<T>`
servo/heapsize#37 (comment)

This builds on top of that.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #__ (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because refactor

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/13459)
<!-- Reviewable:end -->
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Feb 4, 2017
…Block (from servo:no-arc-heapsize); r=emilio

<!-- Please describe your changes on the following line: -->

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1305141

Closes #13176

---

Original PR title: Stop relying on `impl<T: HeapSizeOf> HeapSizeOf for Arc<T>`
servo/heapsize#37 (comment)

This builds on top of that.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #__ (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because refactor

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

Source-Repo: https://github.com/servo/servo
Source-Revision: aea9545e16fd6ea4a6b1234d1b969457313a5fa7

--HG--
rename : servo/components/style/domrefcell.rs => servo/components/script/dom/bindings/cell.rs
@nnethercote
Copy link

FWIW, in malloc_size_of I've handled Rc/Arc by allowing users provide a table that records which pointers have been measured. This is working well for Stylo.

gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 1, 2019
…Block (from servo:no-arc-heapsize); r=emilio

<!-- Please describe your changes on the following line: -->

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1305141

Closes #13176

---

Original PR title: Stop relying on `impl<T: HeapSizeOf> HeapSizeOf for Arc<T>`
servo/heapsize#37 (comment)

This builds on top of that.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #__ (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because refactor

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

Source-Repo: https://github.com/servo/servo
Source-Revision: aea9545e16fd6ea4a6b1234d1b969457313a5fa7

UltraBlame original commit: 0e74d9fe18cf1cbbab7d7410027a051f94eac90f
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 1, 2019
…Block (from servo:no-arc-heapsize); r=emilio

<!-- Please describe your changes on the following line: -->

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1305141

Closes #13176

---

Original PR title: Stop relying on `impl<T: HeapSizeOf> HeapSizeOf for Arc<T>`
servo/heapsize#37 (comment)

This builds on top of that.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #__ (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because refactor

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

Source-Repo: https://github.com/servo/servo
Source-Revision: aea9545e16fd6ea4a6b1234d1b969457313a5fa7

UltraBlame original commit: 0e74d9fe18cf1cbbab7d7410027a051f94eac90f
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 1, 2019
…Block (from servo:no-arc-heapsize); r=emilio

<!-- Please describe your changes on the following line: -->

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1305141

Closes #13176

---

Original PR title: Stop relying on `impl<T: HeapSizeOf> HeapSizeOf for Arc<T>`
servo/heapsize#37 (comment)

This builds on top of that.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `__` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [ ] These changes fix #__ (github issue number if applicable).

<!-- Either: -->
- [ ] There are tests for these changes OR
- [x] These changes do not require tests because refactor

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

Source-Repo: https://github.com/servo/servo
Source-Revision: aea9545e16fd6ea4a6b1234d1b969457313a5fa7

UltraBlame original commit: 0e74d9fe18cf1cbbab7d7410027a051f94eac90f
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants