Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rc/Arc: get rid of "value" terminology #64484

Closed
RalfJung opened this issue Sep 15, 2019 · 19 comments
Closed

Rc/Arc: get rid of "value" terminology #64484

RalfJung opened this issue Sep 15, 2019 · 19 comments
Assignees
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools

Comments

@RalfJung
Copy link
Member

The Rc docs use the term "value" to refer to an RcBox instance (and similar for Arc). That's IMO a bad use of terminology: A "value" is something like "5" or "true" that does not have an identity beyond its mathematical interpretation; an RcBox has a location so that even two distinct RcBox that contain the same value (say, both contain "5") are "not the same".

This is particularly bad in the docs for ptr_eq:

Returns true if the two Rcs point to the same value (not just values that compare as equal).

"5" and "5" are the same value, and yet Rc::ptr_eq(&Rc::new(5), &Rc::new(5)) returns false. So IMO the docs are just wrong -- or rather, they are using the term "value" in the wrong way.

I suggest that we replace all/most uses of "value" in these docs by "reference-counted object" or maybe something involving "instance". I think that better conveys what is happening.

Opinions? Cc @Centril @SimonSapin @gnzlbg

@RalfJung RalfJung added the A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools label Sep 15, 2019
@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 15, 2019

In my mental model Rc<T>, Arc<T>, etc. are "pointers" that point to a "value" of type T, such that Rc::ptr_eq does pointer equality in an analogous way to what *mut T as usize == *mut T as usize does for raw pointers.

I understand that these types contain a raw pointer which does not actually point to a T, but to some other type that stores a T along with some meta-data (e.g. reference counts). I don't think that calling this type "value" is a good idea. If we don't have to, I'd rather not mention that such a type exists at all. How these types are actually implemented under the hood is an implementation-detail.

@RalfJung
Copy link
Member Author

Agreed. I was just looking for a term for what this pointer points to. It doesn't have to be a type, but I think "reference-counted object" is a reasonable choice. Would you agree?

@steveklabnik
Copy link
Member

A "value" is something like "5" or "true" that does not have an identity beyond its mathematical interpretation;

This might be true in some domains, but not Rust.

I think "reference-counted object" is a reasonable choice. Would you agree?

Rust doesn't have "objects", though we do in the sense you probably intend.

I can appreciate trying to add clarity, but I think the current wording is a good balance between precision and understandability.

@Centril
Copy link
Contributor

Centril commented Sep 15, 2019

I agree that the current wording is confusing by using the term "value" in two different senses in the same sentence. I would suggest:

Returns true if the two Rc<T>s point to the same reference-counted object rather than just the same value boxed in e.g. Rc::new(value).

I think we can afford a bit more elaboration than the current docs provide.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 15, 2019

"reference-counted object"

It's unclear to me whether "reference-counted object" refers to the T, or to the object storing the reference count. If it refers to the object that stores the reference count, I think this is fine, but as mentioned I would prefer not to mention such an implementation detail here.

If it refers to the T, AFAICT, Rc::ptr_eq does not require a value of type T to exit at all. That is, I think the assert in this unsafe code is guaranteed to pass (playground):

fn main() { unsafe {
// create two smart pointers to a T
let mut rc0 = Rc::new(Box::new(0_i32));
let ptr: *mut Box<i32> = Rc::get_mut(&mut rc0).unwrap();
let rc1 = rc0.clone();
// drop the T
std::ptr::drop_in_place(ptr);
// those two smart pointers still exist
dbg!(Rc::strong_count(&rc0)); // prints 2
// they also compare equal
assert!(Rc::ptr_eq(&rc0, &rc1)); // ALWAYS OK
// put a new T behind the Rc
ptr.write(Box::new(42));
}}

A problem with the term "object" is that we have not defined what that means in Rust anywhere. I'm not sure if "place" would be the right word, but maybe we can just say that "ptr_eq returns whether two Rcs point to the same place" ?

@RalfJung
Copy link
Member Author

RalfJung commented Sep 16, 2019

@steveklabnik

This might be true in some domains, but not Rust.

I am pretty sure it is. Rust has "values" and "places", and that use of the term "value" aligns with what I said. This is also how we are using the term "value" in the UCG. Both of these uses of "value" also do not align with what Rc does.

So why are you saying this does not apply to Rust?

Rust doesn't have "objects", though we do in the sense you probably intend.

Fair. We do use the term "allocated object" in some places though, e.g. for pointer::offset.

but I think the current wording is a good balance between precision and understandability.

I still disagree, I think the current wording is not just imprecise but plain wrong.

@Centril

Returns true if the two Rcs point to the same reference-counted object rather than just the same value boxed in e.g. Rc::new(value).

Yeah, something like that -- though I would avoid "boxed". Also, this change affects many Rc operations, not just ptr_eq.

@gnzlbg

If it refers to the T, AFAICT, Rc::ptr_eq does not require a value of type T to exit at all. That is, I think the assert in this unsafe code is guaranteed to pass (playground):

I'd say unless ptr_eq documents this, that code can become UB any time.

A problem with the term "object" is that we have not defined what that means in Rust anywhere. I'm not sure if "place" would be the right word, but maybe we can just say that "ptr_eq returns whether two Rcs point to the same place" ?

Yeah, "object" might be not great, which is why I opened an issue and not a PR. ;)
"place" might work. Alternatively, we could use some terminology similar to Vec and speak about the "heap-allocated buffer" or so where the data is stored.

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 16, 2019

and speak about the "heap-allocated buffer" or so where the data is stored.

That sounds reasonable to me.

I'd say unless ptr_eq documents this, that code can become UB any time.

Indeed. TheRc::ptr_eq only compares the addresses of the buffer where the T is stored. So for example if we were to implement it in such a way that it were to create &Ts internally, then those would be invalid if the T has been dropped, resulting in UB.

@steveklabnik
Copy link
Member

that use of the term "value" aligns with what I said.

I don't see any of those things that suggest any sort of demands around identity; Rust clearly has values, it's the "does not have any identity beyond mathematical interpretation" bit.

I tend to see this language around OOP languages that are adding "value types" to the currently existing "object types", but Rust doesn't have those. It's all just values.

@RalfJung
Copy link
Member Author

RalfJung commented Sep 16, 2019

I don't see any of those things that suggest any sort of demands around identity; Rust clearly has values, it's the "does not have any identity beyond mathematical interpretation" bit.

I see, so this is about the identity on values, not what they refer to? But, if "5" is a value in Rust (in let x = 5, for example), then how can there be anything else that makes up its identity? I mean, for two things to be distinct, there has to be something about them that's different, and when just talking about the abstract "5" vs. the abstract "5", I cannot think of what that would be. For pointers we have the notion that two pointers pointing to the same memory location can have distinct provenance, but that would not apply to something like "5" (and for provenance, what really happens is that the mathematical concept of a "pointer" is just way more complicated than most people think).

Also, we usually speak of a value being "stored" somewhere, e.g. the UCG glossary says a value is whet gets stored inside a place, and the ptr::eq docs use similar wording. That implies the value is the thing inside the "container" (memory, place, whatever you want to call it), and wouldn't it be rather strange for the identity of that thing to depend on the container? That goes against the very concept of being the "content", as I view it.

I also think that there is no good reason to make value have an extra identity. We clearly need a term for the thing that does not have an extra identity, to be able to talk about "a value stored somewhere", to define the "value representation", and so on -- all those uses of the term "value" refer to an abstract mathematical notion of "value" that is independent of how or where such a value is concretely represented as a sequence of bytes. If we don't call that abstract thing a "value", what do we call it? We had a long bikeshed about this in the UCG and "value" was the best term we found.

From what I can see, the Rc/Arc docs are the only place where we use "value" in a sense that they have extra identity beyond "mathematical value". You claim "this [not having extra identity] is not true in Rust", but do you have any evidence beyond the docs we are discussing here (which I think are just a local mistake we can and should fix)?

Also note that ptr::eq and Rc::ptr_eq actually use inconsistent language when it comes to this: ptr::eq emphasizes that it does not compare the "value they [the pointers] point to", which is the exact opposite of Rc::ptr_eq which says it tests "if the two Rc point to the same value". Would you say ptr::eq's doc is wrong? They can't both be right.

I tend to see this language around OOP languages that are adding "value types" to the currently existing "object types", but Rust doesn't have those. It's all just values.

I have no idea what this has to do with "5" being not the same value as "5"? Indeed Rust is all just values, but the values are what gets stored inside some place in memory, and a "5" stored in two different variables is still the same "value" stored multiple times in different "places".

A different form of identity comes in when talking about a pointer to 5, or more precisely a pointer to some memory location that stores 5. Clearly, two distinct pointers can both point to 5. But this does not contradict "every variable stores a value"; the variable of pointer/reference type just stores a value that is "this address in memory [and this provenance]", i.e., the mathematical value is something like 0x0123400[+provenance], and that is what ptr::eq compares. Two pointers can be different mathematical values (different address or different provenance) but still both have the same value stored at the memory they point to. The value of a variable of pointer/reference type still does not have extra identity beyond this (address it points to + provenance); in particular such a variable does not "change its value" when the memory it points to gets changed.

This is all getting a bit long-winding because I cannot entirely figure out where our conceptual mismatch is arising from here.

@steveklabnik
Copy link
Member

steveklabnik commented Sep 17, 2019

Indeed Rust is all just values, but the values are what gets stored inside some place in memory, and a "5" stored in two different variables is still the same "value" stored multiple times in different "places".

I think this is the core of it; I would say that there are two different values, both of which have a bit pattern that corresponds to 5, but they're not conceptually the same thing. That is, identity is memory location.

This is all getting a bit long-winding because I cannot entirely figure out where our conceptual mismatch is arising from here.

I cannot either; I suspect that it's coming from two different terminology traditions. That's usually what happens in these kinds of situations, anyway.

Re-reading the original language again, I think that I have a tolerance for some level of informalism, and so that's why I was fine with the original language, but analyzing more closely, "points to the same value" is kind of a no-op, a re-statement of the same thing, because "points to the same place" is the same as "the same value." It only works because of the contrast with "not just values that compare as equal."

"reference-counted object" or maybe something involving "instance".

My main beef here is that "object" and "instance" will convey the wrong message to the folks coming from OOP languages.

What do you think about:

Returns true if the two Rcs point to the same place, rather than comparing by equality.

This keeps the "place expression" terminology around memory locations, does not use the "value" term we disagree about, and doesn't bring up terms with OOP baggage.

@Centril
Copy link
Contributor

Centril commented Sep 17, 2019

Some discussion in https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/rc-terminology.

I think we should go with:

Returns true if the two Rc<T>s point to the same allocation, rather than comparing the values with PartialEq of T.

@RalfJung
Copy link
Member Author

I think this is the core of it; I would say that there are two different values, both of which have a bit pattern that corresponds to 5, but they're not conceptually the same thing. That is, identity is memory location.

But then what is the name for the thing represented by the bit pattern? The thing that is the same?
Reading the reference, expressions denoting such things are called "value expression", so I'd say such things ought to be called "values".

My main beef here is that "object" and "instance" will convey the wrong message to the folks coming from OOP languages.

I actually think the message conveyed is basically right ("good enough"). E.g., comparing two Integer with == is exactly like comparing two Rc<i32> with ptr_eq: they check if this is the same instance, not two instances of/for the same integer "value".

Returns true if the two Rcs point to the same allocation, rather than comparing the values with PartialEq of T.

Technically not just the allocations have to be the same but also the offsets... but Rc invariants guarantee that. So yeah, "allocation" also works for me.

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 17, 2019

I've opened rust-lang/unsafe-code-guidelines#213 to track providing a precise definition for "allocation" at some point, but in the meantime I don't think that resolving that issue should block trying to land this since I find the term "allocation" in the context of Rc hard to misunderstand, but we can always try to explain it a bit more if users start having questions.

@Centril
Copy link
Contributor

Centril commented Oct 17, 2019

@steveklabnik Do you have any objections to my proposed wording?

@steveklabnik
Copy link
Member

I had actually thought I had 👍 'd it; it looks good to me.

@Centril
Copy link
Contributor

Centril commented Oct 17, 2019

Cool; let's see... who is going to implement this? :D @RalfJung maybe wants to?

@RalfJung
Copy link
Member Author

RalfJung commented Oct 17, 2019 via email

@RalfJung
Copy link
Member Author

Submitted a PR: #65505

Centril added a commit to Centril/rust that referenced this issue Oct 19, 2019
Rc: value -> allocation

See rust-lang#64484. This does not yet edit `Arc` as I first wanted to be sure we agree on the terminology the way it actually ends up. "value" as a term appears a lot in this file, and sometimes it refers to the value stored inside the `RcBox` while sometimes it refers to the `RcBox` itself. I tried to properly tease these apart but may have made some mistakes. The former should now always be called "inner value" and the latter "allocation".

One area where I was very unsure of which terminology is dropping: the `value` field of the `RcBox` will get dropped *earlier* than the `RcBox` itself if there are weak references. I decided that "dropping the value stored in the allocation" refers to dropping the value field, while "destroying the allocation" refers to actually freeing its backing memory.

r? @Centril
Centril added a commit to Centril/rust that referenced this issue Oct 19, 2019
Rc: value -> allocation

See rust-lang#64484. This does not yet edit `Arc` as I first wanted to be sure we agree on the terminology the way it actually ends up. "value" as a term appears a lot in this file, and sometimes it refers to the value stored inside the `RcBox` while sometimes it refers to the `RcBox` itself. I tried to properly tease these apart but may have made some mistakes. The former should now always be called "inner value" and the latter "allocation".

One area where I was very unsure of which terminology is dropping: the `value` field of the `RcBox` will get dropped *earlier* than the `RcBox` itself if there are weak references. I decided that "dropping the value stored in the allocation" refers to dropping the value field, while "destroying the allocation" refers to actually freeing its backing memory.

r? @Centril
@Centril
Copy link
Contributor

Centril commented Oct 25, 2019

Completed in #65505.

@Centril Centril closed this as completed Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools
Projects
None yet
Development

No branches or pull requests

4 participants