Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement unique_combinations. #363

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

meltinglava
Copy link
Contributor

Some questions that needs to be answared:

  • Document differently?
  • Any optimalisations?
  • The algorithm only requires equal Items to be grouped beside each
    other. This is achived by sorting. However if we there is an
    effisient way to group equals then we can have this require Eq
    insted of Ord

Comment on lines -1162 to 1191
/// vec![1, 2], // Note: these are the same
/// vec![1, 2], // Note: these are the same
/// vec![1, 2],
/// vec![2, 2],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep these examples in the documentation of combinations, to highlight the difference to unique_combinations? (Or am I misreading the diff?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this one got removed with the rebase, thow now might be the time to have them there. I am not sure on the matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might create a table with all types of combinations that shows the same iterator and different output for each of them.

src/unique_combinations.rs Outdated Show resolved Hide resolved
src/unique_combinations.rs Outdated Show resolved Hide resolved
src/unique_combinations.rs Outdated Show resolved Hide resolved
src/unique_combinations.rs Show resolved Hide resolved
@@ -575,6 +576,17 @@ fn combinations() {
vec![3, 4],
]);

let it = vec![1, 2, 2, 3, 4].into_iter().unique_combinations(3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that unique_combinations yields the "same" (modulo ordering) elements that would be obtained by manually filtering elements from combinations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this will not always be true, as combinations explore all combinations before getting a new number for the iterator. We see all the numbers as fast as possible as we want to find the end. This is why the LasyBuffer is used. to not need that many next calls from the iterator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand you correctly. But shouldn't unique_combinations give the same elements that you would get by manually removing duplicates from combinations (possibly sorting the yielded permutations)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true as long as the iterator that combinations is done over is sorted in the first place. Might be a good test with quicksort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thinking is think we should make thise, however without garanteeing the order of the items that comes out in either cases. So sort bought on input and output. Will implement that this weekend

src/unique_combinations.rs Outdated Show resolved Hide resolved
@meltinglava
Copy link
Contributor Author

I did a rebase so diffs should now make more sence to the current codebase.

There have happend a lot of changes to combinations since the initial release of this pr, so we should see if we can use some of the techniques applied to combinations.

As of the LazyBuffer type, i think if this should be merged with unique combinations, this is to major of a rewrite and think it should be a separate pr if so.

Thanks @phimuemue for reviewing.
As for the last ones I have not marked as resolved. Since the rebase and all of the refer to combinations, can you check if they still apply.

Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool that you could address some of the issues. Still, I think we would profit from bringing UniqueCombinations more in line with Combinations.

src/unique_combinations.rs Show resolved Hide resolved
Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements.

As a side note: I see you improved readability considerably by changing some line breaks and adjusting some whitespace. I think we surely want these (independent of the combinations). It might simplify things if we had these in a separate PR, though.

src/unique_combinations.rs Show resolved Hide resolved
@@ -575,6 +576,17 @@ fn combinations() {
vec![3, 4],
]);

let it = vec![1, 2, 2, 3, 4].into_iter().unique_combinations(3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand you correctly. But shouldn't unique_combinations give the same elements that you would get by manually removing duplicates from combinations (possibly sorting the yielded permutations)?

@meltinglava
Copy link
Contributor Author

i have autoformat in my editorconfig on save. I can create an pr with formating for all the files.

Some questions that needs to be answared:
* Document differently?
* Any optimalisations?
* The algorithm only requires equal Items to be grouped beside each
  other. This is achived by sorting. However if we there is an
  effisient way to group equals then we can have this require `Eq`
  insted of `Ord`
position -> indices
next_none -> done
remove len (use indices.len() instead)
removed a comment that did not make sence
renamed local variables to what they are representing
@meltinglava
Copy link
Contributor Author

@phimuemue is there any other blockers than getting formating done on master for this PR (besides the unit testing)?

@phimuemue
Copy link
Member

@phimuemue is there any other blockers than getting formating done on master for this PR (besides the unit testing)?

As always, @jswrenn has the last word. But I still think that the implementation could/should be closer to Combinations::next.

I tried to see if my intuition is wrong, but taking your implementation and transforming piece by piece leads to something very similar to Combinations::next (see my sketch meltinglava/itertools@master...phimuemue:uniq_comb2). (One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None.)

#[test]
fn combinations_and_unique_combinations_has_all_unique_values() {
let mut rng = &mut rand::thread_rng();
let a = [1, 2, 3, 4, 5];
Copy link
Member

@phimuemue phimuemue Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool to have a test for this. Just not sure if having non-determinism such a good idea there, especially if the random test is only run once. (I may be wrong on this one though, if we already have precedence for this kind of test.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest problem here was that i did not find out how you made sure quickcheck made vectors that was garanteed to contain duplicates.

@meltinglava
Copy link
Contributor Author

One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None

I agree to the theory here, thow technicly the break statment here is just sugar for returning in that case. The loops job in all the cases are searching, hence break out / returning when found. Do not find the other one to be drasticly better than the other one.

The reason I am hessitent to do it that way, is that I do not feel like the code is better in terms of readability and logic. If I am not the only one that think this is true, then it is my opinion that combinations should be brought closer to whatever unique_combinations end up looking like if this is a goal that we want.

I have found that it takes quite alot of time to figgure out what an algorithm does when its all based on one mutable number that deals with all the cases.

Here are the pros and cons of each as i see it.

Pros unique:

  • No mut variables used.

Cons:

  • At the moment not the looking like combinations (atm becasue this can change in the future)

Neither one:

  • Same amount of early_return + break + continue statements (4 in bought cases)

Do anyone have anything to add?

@phimuemue
Copy link
Member

Thanks for your feedback.

One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None

I agree to the theory here, thow technicly the break statment here is just sugar for returning in that case. The loops job in all the cases are searching, hence break out / returning when found. Do not find the other one to be drasticly better than the other one.

Regarding the break: If we follow my sketch we should get rid of the break here, by actually exploiting what we know: As stated in line 74

if self.pool[bump_source] < self.pool[bump_target] { // must be true for at least one bump_target

we know that the loop initiated in line 73 (for bump_target in bump_source + 1..pool_len) will not be exhausted without success. This is because line 63 (self.pool[self.indices[i]] == self.pool[i + pool_len - indices_len]) already established an element for whith the loop from line 73 will terminate. We could incorporate this information in the loop of line 73 to get rid of the break, so that it reads more like "find bump_target, increment indices(i) to it, and adjust the remaining indices".

The reason I am hessitent to do it that way, is that I do not feel like the code is better in terms of readability and logic.

You may have a valid point there: I may be biased (because I just compared your new code to the existing one), so another opinion on this could very well inform our choice there.

If I am not the only one that think this is true, then it is my opinion that combinations should be brought closer to whatever unique_combinations end up looking like if this is a goal that we want.

I agree: If UniqueCombinations is found to be the simpler variant, and Combinations would profit from this, we should change it.

I have found that it takes quite alot of time to figgure out what an algorithm does when its all based on one mutable number that deals with all the cases.

You're right: These algorithms are surprisingly tricky to formulate in simple terms. But isn't the essence as follows:

  • Starting at the end, find an index i in indices that can be incremented.
    (In Combinations, we have self.indices[i] == i + self.pool.len() - self.indices.len(), whereas in UniqueCombinations we could have essentially the same "tunneled through self.pool": self.pool[self.indices[i]] == self.pool[i + pool_len - indices_len]).

  • Once we have found this index, increment index(i).
    (In Combinations this is simply self.indices[i] += 1;, while in the UniqueCombinations we could derive a variant without breaks/early-outs that achieves something that increments self.indices(i) to something so that the referenced pool value changes.)

  • Increment the indices after i. For this I would expect the code to be the same in both variants.

I think current Combinations resembles this logic quite well. Do you have another (possibly simpler) view on the problem?

@meltinglava
Copy link
Contributor Author

Just thought how hard it would be doing this with rust iterators. It is qute descriptive if when comments are added. Here we have almost all of combinations, thow i could not get the .iter_mut to return mutable referances. Do also think what I did inside the for loop is wrong. thow this is just an example.

// search for bumpable index
return self
    .indices
    .iter()
    .rev()
    .enumerate()
    .find(|(offset, &index)| self.pool[index] != self.pool[pool_len - 1 - offset])
    .map(|(offset, &index)| (offset, index))
    .clone()
    // when found bump that index and increase the following values
    .map(|(offset, index)| {
        for (o, &mut i) in self
            .indices
            .iter_mut()
            .rev()
            .take(offset + 1)
            .rev()
            .enumerate()
        {
            *i = index + offset + o;
        }
    })

What i like about this version is that its only thefind function does exactly what it says, find the value that fits the predicate, and it does the break / early return handeling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants