Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Into_group_map_by #406

Merged
merged 14 commits into from
Aug 3, 2020
Merged

Into_group_map_by #406

merged 14 commits into from
Aug 3, 2020

Conversation

Sagebati
Copy link
Contributor

@Sagebati Sagebati commented Feb 10, 2020

Add the into_group_map_by , tried with returning an iterator instead of an hashmap, can ben reverted on demand.

@phimuemue
Copy link
Member

Hi! I see that this has a certain value. The following questions came to my mind:

  • Should we implement into_group_map_by in terms of into_group_map? (This way, any improvements to into_group_map would automatically be obtained by into_group_map_by.)
  • I think it is a good idea to postpone into_group_map_by_fold for now, as it does quite a bit, and maybe we could do that more efficient.
  • I am not sure, but it looks like there are some commits that are not necessary for this PR. For review, it is maybe convenient to rebase the commits first.

@Sagebati
Copy link
Contributor Author

Sagebati commented Feb 10, 2020

Should we implement into_group_map_by in terms of into_group_map? (This way, any improvements to into_group_map would automatically be obtained by into_group_map_by.)

I understand it introduces inconsistency in the API, but for me into_group_map was too specific from the beginning. I think both ways are an good idea. (in terms of into_group_map and not).

Yeah, I couldn't get a proper implementation for group_map_by_fold. I'm sorry I'm not a git pro, and I'll look into it, definitively there is some useless commits.

@jswrenn
Copy link
Member

jswrenn commented Feb 10, 2020

Thanks for the contribution!

Is there a strong reason this should produce an iterator as opposed to the HashMap itself?

I agree with all three of @phimuemue points. In the interest of keeping itertools as simple as possible, I'd like to see the into_group_map_by method just be a wrapper around into_group_map.

@Sagebati
Copy link
Contributor Author

The only reason was to facilitate chaining more methods without using into_iter().
Now into_group_map_by uses into_group_map and returns an HashMap.

@phimuemue
Copy link
Member

phimuemue commented Jul 19, 2020

Hi @Sagebati! I'm trying to resolve some PRs. Are you still interested in getting this into itertools? If so: Could you get rid of the formatting changes not necessary for this commit?

@Sagebati
Copy link
Contributor Author

Hi, I'll do it!

@Sagebati
Copy link
Contributor Author

Formatting done !

@jswrenn
Copy link
Member

jswrenn commented Jul 21, 2020

273b382 inadvertently reverted the return type change you made in a0f329a. Could you make into_group_map_by return a HashMap again?

@Sagebati
Copy link
Contributor Author

Ready!

src/lib.rs Outdated Show resolved Hide resolved
@Sagebati
Copy link
Contributor Author

Sagebati commented Aug 3, 2020

The "waiting_on_author" is still there maybe an error ?

@jswrenn
Copy link
Member

jswrenn commented Aug 3, 2020

Thanks for the ping! This looks good to me!

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 3, 2020

Build succeeded:

@bors bors bot merged commit 600b91c into rust-itertools:master Aug 3, 2020
bors bot added a commit that referenced this pull request Dec 20, 2020
465: Add into_grouping_map for efficient group-and-fold operations r=jswrenn a=SkiFire13

Adds two functions on the `Itertools` trait, `into_grouping_map` and `into_grouping_map_by`. 

`into_grouping_map` expects an iter of `(K, V)` where the `K` will be used as key and `V` as value. 
`into_grouping_map_by` expect only an iter of `V` as values, the keys will be calculated using the provided functions. 

Both of them return a `GroupingMap`, which is just a wrapper on an iterator. Since it introduces a lot of related methods I thought it would be better to separate them from the `Itertools` trait. This also prevents duplicating every method for the `_by` version.

All of these functions have in common the fact they perform efficient group-and-fold operations without allocating temporary vecs like you would normally do if you used `into_group_map` + `into_iter` + `map` + `collect::<HashMap<_, _>>()`.

Here's the possible issues I can see, I would like to hear some feedback before trying to fix any of them:
- name: I initially though of `grouping_by` which is the java/kotlin equivalent, but later changed it to `into_grouping_map` to match the already existing `into_group_map` and to differentiate from `group_by`;
- `fold_first`: the equivalent function in the `Itertools` trait is `fold1` but there's an unstable stdlib function that does the same thing and is called `fold_first`. I decided to be consistent with the stdlib;
- `minmax` return type is the already existing `MinMaxResult`, but the `NoElements` variant is never returned. I didn't want to duplicate that struct thinking it could cause confusion (and if I did what name could I have chosen?);
- `sum` and `product`: They dont' use the `Sum` and `Product` traits but instead require `V: Add<V, Output=V>` and `V: Mul<V, Output=V>`. They're pretty much a wrapper around `fold_first`. I don't really know if they're meaningful or if I should just remove them;
- almost every closure takes as one of the parameters a reference to the key. It bloats them a bit, but could be useful if computing the key is a relatively expensive operation.
- no `scan` function. Even though it is an iterator adapter I could sort of "collect" it into a `Vec` (more like extend) but I don't really see an use for this;
- ~~no integration tests for the `_by` and `_by_key` versions of `min`, `max` and `minmax`. To be fair I was a bit lazy, but I also couldn't find any integration test for the normal `minmax_by` and `minmax_by_key` so I though it was fine;~~ added;
- no benchmark (I don't think it is required but probably would be cool to compare this with `into_group_map`;
- I didn't want to touch the rest of the library, but I guess we could implement `into_group_map` in terms of `into_grouping_map`?

Related issues: #457, #309
Related PR: #406 (in particular relates to the `into_group_map_by_fold` function that was proposed but later removed)

Co-authored-by: SkiFire13 <skifire13.1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants