-
Notifications
You must be signed in to change notification settings - Fork 28
11. Aggregation and grouping
Oleksandr Zaytsev edited this page Jan 10, 2018
·
1 revision
All code in this section will be based on Tipping dataset
df := DataFrame loadTips.
The simplest example of applying a groupBy:
operator is grouping the values of a series by the values of another one of the same size.
bill := tips column: #total_bill.
sex := tips column: #sex.
bill groupBy: sex.
The result of this query will be an object of DataSeriesGrouped, which splits the bill into two series, mapped to the unique 'Male'
and 'Female'
values of sex series.
Since most of the time the series that are grouped are both columns of a same data frame, there is a handy shortcut
tips group: #total_bill by: #sex.
The result of groupBy:
operator is rather useless unless combined with
df select: #(sepal_length species)
where: [ :petal_length :petal_width |
(petal_length < 4.9 and: petal_length > 1.6) and:
(petal_width < 0.4 or: petal_width > 1.5) ]
groupBy: #species
aggregate: #sum.
The result of this query will be a data frame with a single column
| sepal_length
------------+--------------
setosa | 15.9
versicolor | 18.2
virginica | 17.1