Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in done-0.0.0-reserve on Rust 1.17 #40192

Closed
brson opened this issue Mar 1, 2017 · 11 comments
Closed

Regression in done-0.0.0-reserve on Rust 1.17 #40192

brson opened this issue Mar 1, 2017 · 11 comments
Labels
regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@brson
Copy link
Contributor

brson commented Mar 1, 2017

brian@ip-10-145-43-250:/mnt2/dev/strcursor⟫ rustc +nightly -Vv
rustc 1.17.0-nightly (be760566c 2017-02-28)
binary: rustc
commit-hash: be760566cf938d11d34c2f6bd90d8fd0f67c2344
commit-date: 2017-02-28
host: x86_64-unknown-linux-gnu
release: 1.17.0-nightly
LLVM version: 3.9

The source is hard to acquire so here's an equivalent test case:

mod list {
    use task::Task;

    use std::fmt;

    pub struct List {
        pub tasks: Vec<Task>,
    }

    impl List {
        pub fn to_plaintext(&mut self) -> String {
            self.tasks.sort();
            self.tasks.iter()
                .map(|task| task.to_plaintext())
                .collect::<Vec<_>>()
                .join("\n")
        }
    }

    impl fmt::Display for List {
        fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
            let mut list_string = String::new();
            list_string.push_str("List(\n");
            for task in self.tasks.iter() {
                list_string.push_str(format!("{}", task).as_str());
            }
            list_string.push_str("\n)");
            write!(f, "{}", list_string)
        }
    }

    #[cfg(test)]
    mod tests {
        use super::*;
        use task::*;

        #[test]
        fn test_list_to_plaintext() {
            let mut list = List {
                tasks: vec![
                    Task { name: "done".to_string(), state: TaskState::Done },
                    Task { name: "blocked".to_string(), state: TaskState::Blocked },
                    Task { name: "backlog".to_string(), state: TaskState::Backlog },
                    Task { name: "current".to_string(), state: TaskState::Current },
                ],
            };

            // Should sort and return in correct order
            let expected_list_str =
                "~ current\n\
                 - backlog\n\
                 = blocked\n\
                 + done";

            assert_eq!(list.to_plaintext(), expected_list_str);
        }
    }
}

mod task {
    use std::cmp::Ordering;
    use std::error::Error;
    use std::fmt;

    #[derive(Debug,PartialEq)]
    pub enum TaskError {
        InvalidState(char),
        Malformed,
    }

    impl fmt::Display for TaskError {
        fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
            let description = self.description();

            match *self {
                TaskError::InvalidState(token) => write!(
                    f,
                    "{}: {}",
                    description,
                    token),
                TaskError::Malformed => write!(
                    f,
                    "{}",
                    description),
            }
        }
    }

    impl Error for TaskError {
        fn description(&self) -> &str {
            match *self {
                TaskError::InvalidState(_) => "Invalid token for task state",
                TaskError::Malformed => "Malformed definition",
            }
        }

        fn cause(&self) -> Option<&Error> {
            match *self {
                TaskError::InvalidState(_) => None,
                TaskError::Malformed => None,
            }
        }
    }

    #[derive(Debug,Eq,PartialEq,PartialOrd)]
    pub enum TaskState {
        Current,
        Backlog,
        Blocked,
        Done,
    }

    impl TaskState {
        pub fn from_char(c: char) -> Result<TaskState, TaskError> {
            match c {
                '~' => Result::Ok(TaskState::Current),
                '-' => Result::Ok(TaskState::Backlog),
                '=' => Result::Ok(TaskState::Blocked),
                '+' => Result::Ok(TaskState::Done),
                token @ _ => Result::Err(TaskError::InvalidState(token))
            }
        }

        pub fn to_char(&self) -> char {
            match *self {
                TaskState::Current => '~',
                TaskState::Backlog => '-',
                TaskState::Blocked => '=',
                TaskState::Done    => '+',
            }
        }

        fn order_value(&self) -> u8 {
            match *self {
                TaskState::Current => 0,
                TaskState::Backlog => 1,
                TaskState::Blocked => 2,
                TaskState::Done    => 3,
            }
        }
    }

    impl Ord for TaskState {
        fn cmp(&self, other: &TaskState) -> Ordering {
            self.order_value().cmp(&other.order_value())
        }
    }

    #[derive(Debug,Eq,PartialEq,PartialOrd)]
    pub struct Task {
        pub name: String,
        pub state: TaskState,
    }

    impl Task {
        pub fn new(task_str: &str) -> Result<Task, TaskError> {
            let mut lines = task_str.lines();
            if lines.clone().count() == 0 {
                return Result::Err(TaskError::Malformed)
            }

            // First line should always be the task name
            let main_line = lines.next().unwrap();
            let mut main_line_chars = main_line.chars();

            // First character must be symbol for state
            let state = try!(TaskState::from_char(main_line_chars.next().unwrap()));

            let mut passed_leading_whitespace = false;
            let mut name = String::new();

            for next_char in main_line_chars {
                // Skip the leading whitespace
                if !passed_leading_whitespace {
                    if next_char.is_whitespace() {
                        continue;
                    } else {
                        passed_leading_whitespace = true;
                    }
                }

                name.push(next_char)
            }

            Result::Ok(Task {
                name: name,
                state: state,
            })
        }

        pub fn to_plaintext(&self) -> String {
            format!("{} {}", self.state.to_char(), self.name)
        }
    }

    impl fmt::Display for Task {
        fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
            write!(f, "Task(state: {}, name: {})", self.state.to_char(), self.name)
        }
    }

    impl Ord for Task {
        fn cmp(&self, other: &Task) -> Ordering {
            let state_order = self.state.cmp(&other.state);

            match state_order {
                Ordering::Equal => self.name.cmp(&other.name),
                _ => state_order,
            }
        }
    }

    #[cfg(test)]
    mod tests {
        use std::cmp::Ordering;
        use std::collections::HashMap;
        use super::*;

        #[test]
        fn test_task_state_to_char() {
            assert_eq!(TaskState::Current.to_char(), '~');
            assert_eq!(TaskState::Backlog.to_char(), '-');
            assert_eq!(TaskState::Blocked.to_char(), '=');
            assert_eq!(TaskState::Done.to_char(), '+');
        }

        #[test]
        fn test_task_state_from_char() {
            assert_eq!(TaskState::from_char('~').unwrap(), TaskState::Current);
            assert_eq!(TaskState::from_char('-').unwrap(), TaskState::Backlog);
            assert_eq!(TaskState::from_char('=').unwrap(), TaskState::Blocked);
            assert_eq!(TaskState::from_char('+').unwrap(), TaskState::Done);
            assert!(TaskState::from_char('!').is_err());
        }

        #[test]
        fn test_task_state_ord() {
            assert_eq!(TaskState::Current.cmp(&TaskState::Backlog), Ordering::Less);
            assert_eq!(TaskState::Backlog.cmp(&TaskState::Blocked), Ordering::Less);
            assert_eq!(TaskState::Blocked.cmp(&TaskState::Done), Ordering::Less);
        }

        #[test]
        fn test_task_new_states() {
            let mut valid_tasks = HashMap::new();
            valid_tasks.insert("~ current task", Task {
                name: "current task".to_string(),
                state: TaskState::Current,
            });
            valid_tasks.insert("- backlog task", Task {
                name: "backlog task".to_string(),
                state: TaskState::Backlog,
            });
            valid_tasks.insert("= blocked task", Task {
                name: "blocked task".to_string(),
                state: TaskState::Blocked,
            });
            valid_tasks.insert("+ done task", Task {
                name: "done task".to_string(),
                state: TaskState::Done,
            });

            for (task_str, task) in valid_tasks {
                assert_eq!(Task::new(task_str).unwrap(), task);
            }

            let mut invalid_tasks = HashMap::new();
            invalid_tasks.insert("! task name", TaskError::InvalidState('!'));
            invalid_tasks.insert("", TaskError::Malformed);

            for (task_str, task) in invalid_tasks {
                assert_eq!(Task::new(task_str).err().unwrap(), task);
            }
        }

        #[test]
        fn test_task_to_plaintext() {
            let task = Task {
                name: "task name".to_string(),
                state: TaskState::Current,
            };

            assert_eq!(task.to_plaintext(), "~ task name");
        }

        #[test]
        fn test_task_ord() {
            let current_task = Task { name: "current".to_string(), state: TaskState::Current };
            let backlog_task = Task { name: "backlog".to_string(), state: TaskState::Backlog };
            let blocked_task = Task { name: "blocked".to_string(), state: TaskState::Blocked };
            let done_task = Task { name: "done".to_string(), state: TaskState::Done };

            assert_eq!(current_task.cmp(&backlog_task), Ordering::Less);
            assert_eq!(backlog_task.cmp(&blocked_task), Ordering::Less);
            assert_eq!(blocked_task.cmp(&done_task), Ordering::Less);
        }
    }
}

brian@ip-10-145-43-250:~/dev⟫ ./test

running 7 tests
test task::tests::test_task_ord ... ok
test task::tests::test_task_new_states ... ok
test task::tests::test_task_state_from_char ... ok
test list::tests::test_list_to_plaintext ... FAILED
test task::tests::test_task_state_ord ... ok
test task::tests::test_task_state_to_char ... ok
test task::tests::test_task_to_plaintext ... ok

failures:

---- list::tests::test_list_to_plaintext stdout ----
        thread 'list::tests::test_list_to_plaintext' panicked at 'assertion failed: `(left == right)` (left: `"- backlog\n= blocked\n~ current\n+ done"`, right: `"~ current\n- backlog\n= blocked\n+ done"`)', test.rs:55
note: Run with `RUST_BACKTRACE=1` for a backtrace.


failures:
    list::tests::test_list_to_plaintext

test result: FAILED. 6 passed; 1 failed; 0 ignored; 0 measured

Not on 1.16.

@brson brson added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Mar 1, 2017
@alexcrichton
Copy link
Member

This has to do with a broken implementation of PartialOrd and Ord on the Task type above. The PartialOrd implementation is derived but the Ord implementation is implemented manually, and they're different (different orders I believe).

On Rust 1.16.0 the standard <[T]>::sort algorithm uses Ord::cmp to sort items, which produced the expected test results. On Rust 1.17.0, however, we've got #39538 landed which changed to optimizing the usage to using PartialOrd instead. Notably we're now calling PartialOrd::lt insted of Ord::cmp. (cc @stjepang)

I'm not really sure how to classify this, but we'll likely want to talk about this during libs triage.

@rust-lang/libs, thoughts here?

I'm not sure if there's practical implications to changing to use PartialOrd instead of Ord. It's faster but now I'm wondering if that has correctness implications, e.g. with sorting an array of floats...

@alexcrichton alexcrichton added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 1, 2017
@sfackler
Copy link
Member

sfackler commented Mar 1, 2017

Buggy code is buggy, IMO.

@sfackler
Copy link
Member

sfackler commented Mar 1, 2017

I don't believe there are correctness concerns re float arrays since f32 and f64 aren't Ord.

@ghost
Copy link

ghost commented Mar 2, 2017

Interesting. Now we have to decide who is to blame here :)

In this implementation PartialOrd and Ord have inconsistent behaviors. It should come as no surprise that sorting thus produced weird results. Really, sorting can't be blamed if it's given a broken comparator.

So is the specialization to blame then? Well, the specialization rule certainly isn't incorrect. It's merely an optimization that avoids going through several Options and unwraps. Even if we didn't have specialization nor called the sort function, this very same bug could just as well subtly cause a mess somewhere else in the code.

Oh well. Looks like we have to shift the blame onto the user. That said, I'm feeling very sympathetic. The user wrote what they thought was sensible code and it didn't work in very subtle ways.

I believe this is what happened... They wanted to have a custom implementation of Ord. So they implemented it, but then the compiler complained that this also requires an impl of PartialOrd. So they just said "ugh, I'll stick a simple #[derive(PartialOrd)] and hopefully that will make the compiler shut up". This is a very natural reaction, I think, and is definitely something I struggled with as a beginner in Rust.

Unfortunately, implementing Ord properly is a lot of boilerplate. Requiring both PartialOrd and Ord is annoying. In most cases when a user implements Ord, they want partial_cmp to be simply implemented as Some(self.cmp(other)). I'm putting some blame onto the lack of ergonomics.

Another possible scenario is that the user already had #[derive(Ord, PartialOrd)] and then decided that they need to have a custom Ord. So they implemented it, but forgot about PartialOrd. The compiler didn't issue a warning about anything being fishy. Perhaps we should improve our diagnostics?

@sfackler
Copy link
Member

sfackler commented Mar 2, 2017

In most cases when a user implements Ord, they want partial_cmp to be simply implemented as Some(self.cmp(other)). Of course, this doesn't compile. I'm putting some blame onto the lack of ergonomics.

Why doesn't that compile? That's the pattern I use.

@ghost
Copy link

ghost commented Mar 2, 2017

@sfackler You're right. Sorry, I've fixed my comment now.

But anyways... the point is, it's easier to do the wrong thing by writing #[derive(PartialOrd)] than the right by implementing partial_cmp that way. In my experience, fighting with PartialEq and PartialOrd is often something I do just to make the compiler happy, especially if I'm prototyping something. It's natural to reach out for easy shortcuts in those situations. :)

@alexcrichton
Copy link
Member

@sfackler oh yes that's right (Ord not on floats), so I think the problem here is what we should do in the face of disagreeing Ord/PartialOrd implementations.

I'm tempted to say "yes, code can be buggy" in the sense that "libstd can assume that PartialOrd is the same as Ord if both are implemented".

@ollie27
Copy link
Member

ollie27 commented Mar 2, 2017

There's a clippy lint for the same issue but for Hash and PartialEq: derive_hash_xor_eq.

@ghost
Copy link

ghost commented Mar 14, 2017

Do we have a consensus on this?

Suppose that the user derives PartialOrd and implements Ord manually.
Now they call the sort function, which can perform the following operations to test the ordering:

  1. compare(a, b) (belongs to Ord) - we used to do this.
  2. a < b (belongs to PartialOrd) - we do this now.
  3. Maybe a combination of both.

Should sort() rely on Ord or PartialOrd? And why not both?
Well, since sort() has the T: Ord bound, one could argue that it must exclusively rely on Ord. In fact, with the new PartialOrd<A> for [A] where A: Ord specialization, when using a < b we actually do rely on Ord!

I believe Ord and PartialOrd are very closely intermixed and there isn't a clear line between them. Unfortunately, having disagreeing implementations for those traits is going to cause problems in any case. Even if we shy away from specializations and optimizations, problems will still pop up somewhere.

By this reasoning, I wouldn't classify this issue as a regression, but would like to see rustc emit a warning in cases like this one.

@alexcrichton
Copy link
Member

Ok we got a chance to discuss this during triage today and the conclusion was that we're going to close this issue. We think it's fair to say that PartialOrd and Ord must have the same implementation to behave well (and believe it's already documented as such), so I'm going to close this.

@alexcrichton
Copy link
Member

Note that exploring something like a lint in clippy would be great to help catch this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants