Skip to content

11. Aggregation and grouping

Oleksandr Zaytsev edited this page Jan 10, 2018 · 1 revision

All code in this section will be based on Tipping dataset

df := DataFrame loadTips.

The simplest example of applying a groupBy: operator is grouping the values of a series by the values of another one of the same size.

bill := tips column: #total_bill.
sex := tips column: #sex.

bill groupBy: sex.

The result of this query will be an object of DataSeriesGrouped, which splits the bill into two series, mapped to the unique 'Male' and 'Female' values of sex series.

Since most of the time the series that are grouped are both columns of a same data frame, there is a handy shortcut

tips group: #total_bill by: #sex.

The result of groupBy: operator is rather useless unless combined with

df select: #(sepal_length species)
   where: [ :petal_length :petal_width |
      (petal_length < 4.9 and: petal_length > 1.6) and:
      (petal_width < 0.4 or: petal_width > 1.5) ]
   groupBy: #species
   aggregate: #sum.

The result of this query will be a data frame with a single column

            |  sepal_length  
------------+--------------
    setosa  |          15.9  
versicolor  |          18.2  
 virginica  |          17.1

Tutorial

  1. Installation
  2. Creating DataSeries
Clone this wiki locally