Implement unique_combinations. #363

meltinglava · 2019-08-04T15:43:01Z

Some questions that needs to be answared:

Document differently?
Any optimalisations?
The algorithm only requires equal Items to be grouped beside each
other. This is achived by sorting. However if we there is an
effisient way to group equals then we can have this require Eq
insted of Ord

phimuemue · 2020-03-03T18:07:13Z

src/lib.rs

-    ///     vec![1, 2], // Note: these are the same
-    ///     vec![1, 2], // Note: these are the same
+    ///     vec![1, 2],
    ///     vec![2, 2],


Should we keep these examples in the documentation of combinations, to highlight the difference to unique_combinations? (Or am I misreading the diff?)

Think this one got removed with the rebase, thow now might be the time to have them there. I am not sure on the matter.

We might create a table with all types of combinations that shows the same iterator and different output for each of them.

src/unique_combinations.rs

phimuemue · 2020-03-03T19:37:38Z

tests/test_std.rs

@@ -575,6 +576,17 @@ fn combinations() {
        vec![3, 4],
        ]);

+    let it = vec![1, 2, 2, 3, 4].into_iter().unique_combinations(3);


Should we assert that unique_combinations yields the "same" (modulo ordering) elements that would be obtained by manually filtering elements from combinations?

I think that this will not always be true, as combinations explore all combinations before getting a new number for the iterator. We see all the numbers as fast as possible as we want to find the end. This is why the LasyBuffer is used. to not need that many next calls from the iterator.

I am not sure if I understand you correctly. But shouldn't unique_combinations give the same elements that you would get by manually removing duplicates from combinations (possibly sorting the yielded permutations)?

This is true as long as the iterator that combinations is done over is sorted in the first place. Might be a good test with quicksort.

After some thinking is think we should make thise, however without garanteeing the order of the items that comes out in either cases. So sort bought on input and output. Will implement that this weekend

src/unique_combinations.rs

meltinglava · 2020-03-05T11:11:53Z

I did a rebase so diffs should now make more sence to the current codebase.

There have happend a lot of changes to combinations since the initial release of this pr, so we should see if we can use some of the techniques applied to combinations.

As of the LazyBuffer type, i think if this should be merged with unique combinations, this is to major of a rewrite and think it should be a separate pr if so.

Thanks @phimuemue for reviewing.
As for the last ones I have not marked as resolved. Since the rebase and all of the refer to combinations, can you check if they still apply.

phimuemue

Cool that you could address some of the issues. Still, I think we would profit from bringing UniqueCombinations more in line with Combinations.

src/unique_combinations.rs

phimuemue

Thanks for the improvements.

As a side note: I see you improved readability considerably by changing some line breaks and adjusting some whitespace. I think we surely want these (independent of the combinations). It might simplify things if we had these in a separate PR, though.

src/unique_combinations.rs

phimuemue · 2020-03-08T14:31:00Z

tests/test_std.rs

@@ -575,6 +576,17 @@ fn combinations() {
        vec![3, 4],
        ]);

+    let it = vec![1, 2, 2, 3, 4].into_iter().unique_combinations(3);


I am not sure if I understand you correctly. But shouldn't unique_combinations give the same elements that you would get by manually removing duplicates from combinations (possibly sorting the yielded permutations)?

meltinglava · 2020-03-08T17:24:29Z

i have autoformat in my editorconfig on save. I can create an pr with formating for all the files.

Some questions that needs to be answared: * Document differently? * Any optimalisations? * The algorithm only requires equal Items to be grouped beside each other. This is achived by sorting. However if we there is an effisient way to group equals then we can have this require `Eq` insted of `Ord`

position -> indices next_none -> done remove len (use indices.len() instead) removed a comment that did not make sence

renamed local variables to what they are representing

meltinglava · 2020-03-13T14:19:53Z

@phimuemue is there any other blockers than getting formating done on master for this PR (besides the unit testing)?

phimuemue · 2020-03-14T12:21:58Z

@phimuemue is there any other blockers than getting formating done on master for this PR (besides the unit testing)?

As always, @jswrenn has the last word. But I still think that the implementation could/should be closer to Combinations::next.

I tried to see if my intuition is wrong, but taking your implementation and transforming piece by piece leads to something very similar to Combinations::next (see my sketch meltinglava/itertools@master...phimuemue:uniq_comb2). (One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None.)

phimuemue · 2020-03-14T12:23:57Z

tests/test_std.rs

+#[test]
+fn combinations_and_unique_combinations_has_all_unique_values() {
+    let mut rng = &mut rand::thread_rng();
+    let a = [1, 2, 3, 4, 5];


Cool to have a test for this. Just not sure if having non-determinism such a good idea there, especially if the random test is only run once. (I may be wrong on this one though, if we already have precedence for this kind of test.)

My biggest problem here was that i did not find out how you made sure quickcheck made vectors that was garanteed to contain duplicates.

meltinglava · 2020-03-14T16:28:21Z

One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None

I agree to the theory here, thow technicly the break statment here is just sugar for returning in that case. The loops job in all the cases are searching, hence break out / returning when found. Do not find the other one to be drasticly better than the other one.

The reason I am hessitent to do it that way, is that I do not feel like the code is better in terms of readability and logic. If I am not the only one that think this is true, then it is my opinion that combinations should be brought closer to whatever unique_combinations end up looking like if this is a goal that we want.

I have found that it takes quite alot of time to figgure out what an algorithm does when its all based on one mutable number that deals with all the cases.

Here are the pros and cons of each as i see it.

Pros unique:

No mut variables used.

Cons:

At the moment not the looking like combinations (atm becasue this can change in the future)

Neither one:

Same amount of early_return + break + continue statements (4 in bought cases)

Do anyone have anything to add?

phimuemue · 2020-03-15T14:33:55Z

Thanks for your feedback.

One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None

I agree to the theory here, thow technicly the break statment here is just sugar for returning in that case. The loops job in all the cases are searching, hence break out / returning when found. Do not find the other one to be drasticly better than the other one.

Regarding the break: If we follow my sketch we should get rid of the break here, by actually exploiting what we know: As stated in line 74

if self.pool[bump_source] < self.pool[bump_target] { // must be true for at least one bump_target

we know that the loop initiated in line 73 (for bump_target in bump_source + 1..pool_len) will not be exhausted without success. This is because line 63 (self.pool[self.indices[i]] == self.pool[i + pool_len - indices_len]) already established an element for whith the loop from line 73 will terminate. We could incorporate this information in the loop of line 73 to get rid of the break, so that it reads more like "find bump_target, increment indices(i) to it, and adjust the remaining indices".

The reason I am hessitent to do it that way, is that I do not feel like the code is better in terms of readability and logic.

You may have a valid point there: I may be biased (because I just compared your new code to the existing one), so another opinion on this could very well inform our choice there.

If I am not the only one that think this is true, then it is my opinion that combinations should be brought closer to whatever unique_combinations end up looking like if this is a goal that we want.

I agree: If UniqueCombinations is found to be the simpler variant, and Combinations would profit from this, we should change it.

I have found that it takes quite alot of time to figgure out what an algorithm does when its all based on one mutable number that deals with all the cases.

You're right: These algorithms are surprisingly tricky to formulate in simple terms. But isn't the essence as follows:

Starting at the end, find an index i in indices that can be incremented.
(In Combinations, we have self.indices[i] == i + self.pool.len() - self.indices.len(), whereas in UniqueCombinations we could have essentially the same "tunneled through self.pool": self.pool[self.indices[i]] == self.pool[i + pool_len - indices_len]).
Once we have found this index, increment index(i).
(In Combinations this is simply self.indices[i] += 1;, while in the UniqueCombinations we could derive a variant without breaks/early-outs that achieves something that increments self.indices(i) to something so that the referenced pool value changes.)
Increment the indices after i. For this I would expect the code to be the same in both variants.

I think current Combinations resembles this logic quite well. Do you have another (possibly simpler) view on the problem?

meltinglava · 2020-03-16T15:28:59Z

Just thought how hard it would be doing this with rust iterators. It is qute descriptive if when comments are added. Here we have almost all of combinations, thow i could not get the .iter_mut to return mutable referances. Do also think what I did inside the for loop is wrong. thow this is just an example.

// search for bumpable index
return self
    .indices
    .iter()
    .rev()
    .enumerate()
    .find(|(offset, &index)| self.pool[index] != self.pool[pool_len - 1 - offset])
    .map(|(offset, &index)| (offset, index))
    .clone()
    // when found bump that index and increase the following values
    .map(|(offset, index)| {
        for (o, &mut i) in self
            .indices
            .iter_mut()
            .rev()
            .take(offset + 1)
            .rev()
            .enumerate()
        {
            *i = index + offset + o;
        }
    })

What i like about this version is that its only thefind function does exactly what it says, find the value that fits the predicate, and it does the break / early return handeling

jswrenn added the waiting-on-review label Aug 18, 2019

phimuemue reviewed Mar 3, 2020

View reviewed changes

meltinglava force-pushed the master branch from c169713 to 8ba0166 Compare March 5, 2020 10:58

phimuemue reviewed Mar 5, 2020

View reviewed changes

src/unique_combinations.rs Show resolved Hide resolved

phimuemue reviewed Mar 8, 2020

View reviewed changes

meltinglava added 8 commits March 11, 2020 00:20

make change 1.24 compatible

39cfe93

Corrected some review points

070dc34

position -> indices next_none -> done remove len (use indices.len() instead) removed a comment that did not make sence

typo in documentation

6e4f18d

Bring Uniquecombinations more in line with Combinations

077e91f

added Clone to UniqueCombinations

70c647a

removed an if_else and renamed local variables

6fb5f44

renamed local variables to what they are representing

use feature that now always workds as minimum is bumped to 1.32

0eef444

meltinglava force-pushed the master branch from 0af3d29 to 0eef444 Compare March 10, 2020 23:20

Testcase comparison combinations and unique_combinations

4c355e0

meltinglava mentioned this pull request Mar 13, 2020

format all rust files #421

Closed

phimuemue reviewed Mar 14, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement unique_combinations. #363

Implement unique_combinations. #363

meltinglava commented Aug 4, 2019

phimuemue Mar 3, 2020

meltinglava Mar 5, 2020

meltinglava Mar 12, 2020

phimuemue Mar 3, 2020

meltinglava Mar 8, 2020

phimuemue Mar 8, 2020

meltinglava Mar 9, 2020

meltinglava Mar 13, 2020

meltinglava commented Mar 5, 2020

phimuemue left a comment

phimuemue left a comment

phimuemue Mar 8, 2020

meltinglava commented Mar 8, 2020

meltinglava commented Mar 13, 2020

phimuemue commented Mar 14, 2020

phimuemue Mar 14, 2020 •

edited

Loading

meltinglava Mar 14, 2020

meltinglava commented Mar 14, 2020

phimuemue commented Mar 15, 2020

meltinglava commented Mar 16, 2020

Implement unique_combinations. #363

Are you sure you want to change the base?

Implement unique_combinations. #363

Conversation

meltinglava commented Aug 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meltinglava commented Mar 5, 2020

phimuemue left a comment

Choose a reason for hiding this comment

phimuemue left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meltinglava commented Mar 8, 2020

meltinglava commented Mar 13, 2020

phimuemue commented Mar 14, 2020

phimuemue Mar 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meltinglava commented Mar 14, 2020

phimuemue commented Mar 15, 2020

meltinglava commented Mar 16, 2020

phimuemue Mar 14, 2020 •

edited

Loading