Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement From<String> to avoid copying #4

Closed
fitzgen opened this issue Dec 2, 2015 · 17 comments
Closed

Implement From<String> to avoid copying #4

fitzgen opened this issue Dec 2, 2015 · 17 comments

Comments

@fitzgen
Copy link
Owner

fitzgen commented Dec 2, 2015

No description provided.

@ArtemGr
Copy link
Collaborator

ArtemGr commented Dec 17, 2015

For example?

pub struct InlString (InlinableString);
impl From<String> for InlString {
  fn from (s: String) -> InlString {
    if s.len() <= inlinable_string::inline_string::INLINE_STRING_CAPACITY {
      InlString (InlinableString::Inline (InlineString::from (&s[..])))
    } else {
      InlString (InlinableString::Heap (s))
    }
  }
}

@fitzgen
Copy link
Owner Author

fitzgen commented Dec 18, 2015

Sort of.

We want impl From<String> for InlinableString.

@ArtemGr
Copy link
Collaborator

ArtemGr commented Dec 18, 2015

Here's another example:

impl postgres::types::FromSql for InlString {
  fn from_sql<R: Read> (ty: &postgres::types::Type, raw: &mut R, _: &postgres::types::SessionInfo) -> postgres::Result<InlString> {
    use postgres::error::Error; use postgres::types::Type;
    if *ty != Type::Varchar && *ty != Type::Text {
      return Err (Error::from (io::Error::new (std::io::ErrorKind::Other,
        format! ("InlString] Unknown PostgreSQL type: {:?}", ty))))}

    let mut stack = [0u8; 32];
    let got = try! (raw.read (&mut stack));
    if got <= inlinable_string::inline_string::INLINE_STRING_CAPACITY {
      let s = match from_utf8 (&stack[0..got]) {Ok (s) => s, Err (err) => return Err (Error::Conversion (box err))};
      Ok (InlString (InlinableString::Inline (InlineString::from (s))))
    } else {
      let mut buf = Vec::with_capacity (256);
      buf.extend_from_slice (&stack[0..got]);
      try! (raw.read_to_end (&mut buf));
      let s = match from_utf8 (&buf[..]) {Ok (s) => s, Err (err) => return Err (Error::Conversion (box err))};
      Ok (InlString (InlinableString::Heap (String::from (s))))}}

It might be made into a generic From<Read>.

@ArtemGr
Copy link
Collaborator

ArtemGr commented Dec 18, 2015

We want impl From for InlinableString.

That's impossible to test outside the inlinable_string crate, AFAIK.
(You need the newtype to pin the external traits on external types).

@fitzgen
Copy link
Owner Author

fitzgen commented Dec 18, 2015

This is the inlinable_string crate... I think we are talking past each other...

@ArtemGr
Copy link
Collaborator

ArtemGr commented Dec 18, 2015

Code examples I gave aren't from inlinable_string.

@maciejhirsz
Copy link
Contributor

I should check issues before doing PRs. Just run into this, so #6.

I figured there is no point checking the lenght and doing an inline if you already have owned heap.

@ArtemGr
Copy link
Collaborator

ArtemGr commented Jan 25, 2017

I figured there is no point checking the lenght and doing an inline if you already have owned heap.

@maciejhirsz

On the contrary!

Using the heap means an extra random memory access!

Consider for (key, value) in btree_map {if key.starts_with ("f") {count += 1}}. If the key is InlinableString without a heap reference then we're doing sequential memory scan. Both the CPUs and the operating systems (with their virtual memory subsystems) are supposed to be able to optimize for this (with various level of prefetching).

Consider for s in vec {if s.starts_with ("f") {count += 1}}. No heap means approximately one memory fetch per two strings (assuming the usual 64-byte cache lines). Heap means approximately three memory fetches per two strings.

If you're using a heap reference than you're fetching random pieces from all over the heap. This is especially bad if some of your memory is swapped out.

Another example, a large long-living map. If the strings are stored inline then the memory pressure from that map is minimal (e.g. your program takes less memory). If the strings are on the heap then you're paying the full heap overhead (padding and metadata of the allocator).

@maciejhirsz
Copy link
Contributor

Hah. That does actually make a lot of sense. I've updated the PR.

@fitzgen
Copy link
Owner Author

fitzgen commented Jan 26, 2017

Fixed in #6

@fitzgen fitzgen closed this as completed Jan 26, 2017
@ArtemGr
Copy link
Collaborator

ArtemGr commented Mar 29, 2017

@maciejhirsz Benchmark in rust-lang/rust#40601 (comment) might be interesting regarding how the heap fetches (e.g. sort_large_strings) fare against inline values (e.g. sort_large_big_random), methinks.

@ArtemGr
Copy link
Collaborator

ArtemGr commented Apr 3, 2017

Might be a useful template if someone wants to experiment with benchmarking: https://gist.github.com/ArtemGr/cecca70d27178a1f1210df7bd53cf167

@fitzgen
Copy link
Owner Author

fitzgen commented Apr 3, 2017

@ArtemGr are you interested in co-maintaining this crate?

@ArtemGr
Copy link
Collaborator

ArtemGr commented Apr 3, 2017

@fitzgen Yes, I wouldn't mind joining in.

@fitzgen
Copy link
Owner Author

fitzgen commented Apr 3, 2017

@ArtemGr Great, thanks!

How does this sound for process: we lock down the master branch and require an approval review (presumably from one of us) and passing tests for anything to land on it?

@ArtemGr
Copy link
Collaborator

ArtemGr commented Apr 3, 2017

@fitzgen NP! I never worked with protected branches before, so... sure, looks like an interesting thing to try, at the very least.

@fitzgen
Copy link
Owner Author

fitzgen commented Apr 3, 2017

@ArtemGr cool! Invite sent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants