New pivots.aggregates.aggregate option for nested aggregates #658

ldhasson · 2021-12-20T05:36:45Z

When doing pivots, it's possible to want two nested levels of aggregates. For example, consider daily weather data with multiple readings per day per zip codes. You want a pivoted view that collects daily averages per zip-code over the past 30 days. The first level would define daily averages, while the final level would create an array of the average values for the last 30 days for each measure, i.e., temperature, aqi, rainfall...

Currently, this is not possible as the aggregate in the inner level is replicated for the outer level. It was meant to calculate simple values typical in most pivoting logic.

 { "name":"Weather_PivotView"
  ,"description":"An aggregate of weather daya for the past 30 days",
   "columns":[
      { "sameas":"weather.zipCode" }
     ,{ "sameas":"weather.taken"  , "expression":"?::DATE", "type":"DATE" }
     ,{ "sameas":"weather.value"  , "aggregate":"AVG"  }
    ]
  ,"subWhere":"weather.taken >= now() - interval '30 days'"
  ,"pivots":[
      { "on": "weather.name"
       ,"aggregates":[
           { "name":"value", "suffix":"_values" }
         ]
       ,"interleave":true
       ,"globals":false
       ,"values":[ { "value":"TEMP"    , "description":"Temperature"        }
                  ,{ "value":"AQI"     , "description":"Air Quality Index." }
                  ,{ "value":"RAINFALL", "description":"Rain Fall"          }
                 ]
      }
    ]
 }

The aggregate definition is supplied only once for the source columns, and the aggregate is replicated in the final pivot logic

with T as (
-- 'An aggregate of weather daya for the past 30 days'
select CODESMASTER.Weather."zipCode" as "zipCode" -- FK to zip code
     , CODESMASTER.Weather."taken"::DATE as "taken" -- Start date for the value
     , CODESMASTER.Weather."name" as "name" -- Name
     , avg(CODESMASTER.Weather."value") as "value" -- value.
  from CODESMASTER.Weather
 where (CODESMASTER.Weather."taken" >= now() - interval '30 days')
   and ( CODESMASTER.Weather."name" in ('TEMP', 'AQI', 'RAINFALL')
       )
 group by 1, 2, 3
) select "zipCode" 
       , "taken" 
     , AVG("value") filter (where "name"= 'TEMP')  as "TEMP_values"
     , AVG("value") filter (where "name"= 'AQI')  as "AQI_values"
     , AVG("value") filter (where "name"= 'RAINFALL')  as "RAINFALL_values"
from T
     group by 1, 2

Not only is this more verbose than needed, it's also somewhat counter intuitive to have AVG twice when the second level AVG in this particular case is effectively aggregating only a single value.

We propose to augment the pivots.aggregates elements with the ability to define its own "aggregate" and "orderBy" and allow pivoted column sources to not be aggregates themselves. This provides for a more natural expression of the aggregation logic needed for pivoting as well as the ability to mix different types of aggregation at each level so as to achieve maybe a better representation of all the data points for the past 30 days in an array.

This will allow something like this:

 { "name":"Weather_PivotView"
  ,"description":"An aggregate of weather daya for the past 30 days",
   "columns":[
      { "sameas":"weather.zipCode" }
     ,{ "sameas":"weather.taken"  , "expression":"?::DATE", "type":"DATE" }
     ,{ "sameas":"weather.value"  , "aggregate":"AVG"  }
    ]
  ,"subWhere":"weather.taken >= now() - interval '30 days'"
  ,"pivots":[
      { "on": "weather.name"
       ,"aggregates":[
            { "name":"taken", "suffix":"_takens", "aggregate":"ARRAY", "orderBy":["taken"] }
           ,{ "name":"value", "suffix":"_values", "aggregate":"ARRAY", "orderBy":["taken"] }
         ]
       ,"interleave":true
       ,"globals":false
       ,"values":[ { "value":"TEMP"    , "description":"Temperature"        }
                  ,{ "value":"AQI"     , "description":"Air Quality Index." }
                  ,{ "value":"RAINFALL", "description":"Rain Fall"          }
                 ]
      }
    ]
 }

The following is then generated with the aggregates at both levels being different.

with T as (
-- 'An aggregate of weather daya for the past 30 days'
select CODESMASTER.Weather."zipCode" as "zipCode" -- FK to zip code
     , CODESMASTER.Weather."taken"::DATE as "taken" -- Start date for the value
     , CODESMASTER.Weather."name" as "name" -- Name
     , avg(CODESMASTER.Weather."value") as "value" -- value.
  from CODESMASTER.Weather
 where (CODESMASTER.Weather."taken" >= now() - interval '30 days')
   and ( CODESMASTER.Weather."name" in ('TEMP', 'AQI', 'RAINFALL')
       )
 group by 1, 2, 3
) select "zipCode" 
     , array_agg("taken" order by T."taken" ASC) filter (where "name"= 'TEMP')  as "TEMP_takens"
     , array_agg("value" order by T."taken" ASC) filter (where "name"= 'TEMP')  as "TEMP_values"
     , array_agg("taken" order by T."taken" ASC) filter (where "name"= 'AQI')  as "AQI_takens"
     , array_agg("value" order by T."taken" ASC) filter (where "name"= 'AQI')  as "AQI_values"
     , array_agg("taken" order by T."taken" ASC) filter (where "name"= 'RAINFALL')  as "RAINFALL_takens"
     , array_agg("value" order by T."taken" ASC) filter (where "name"= 'RAINFALL')  as "RAINFALL_values"
from T
     group by 1

This is a large commit - Cleanup of type management for pivoted columns taking into consideration multiple levels of possible expression definitions and aggregates. - Cleanup of TZ logic - Cleanup of SQL logic to only rely on View._PivotColumns instead of replicating logic from the model validation - Added new rule to catch when incompatible expressions and aggregates clash. - Added new rule to catch when someone defines an expression for a pivot value but has multipple pivot aggregates defined. There ae still places that could be simpler, cleaner... but that'll be for later.

ldhasson added feature P2 feature P1 and removed feature P2 labels Dec 20, 2021

This was referenced Dec 21, 2021

New pivots.globals option to block main aggregates to be repeated in final view outside of any specified pivot value. #660

Closed

Array aggregates in Pivot tables are not properly realized: they are not realized as arrays. #659

Closed

ldhasson added a commit that referenced this issue Dec 21, 2021

#658 New Pivot features and fixes

8d5bb9b

ldhasson mentioned this issue Dec 21, 2021

DateTime aggregates in Pivot tables are not properly handled. #661

Closed

ldhasson added a commit that referenced this issue Dec 26, 2021

#658 Vie wPivot enhancements

1b986e9

ldhasson mentioned this issue Dec 26, 2021

Coalesce on an aggregate used in a pivot is not working as expected #592

Closed

ldhasson closed this as completed Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New pivots.aggregates.aggregate option for nested aggregates #658

New pivots.aggregates.aggregate option for nested aggregates #658

ldhasson commented Dec 20, 2021 •

edited

Loading

New pivots.aggregates.aggregate option for nested aggregates #658

New pivots.aggregates.aggregate option for nested aggregates #658

Comments

ldhasson commented Dec 20, 2021 • edited Loading

ldhasson commented Dec 20, 2021 •

edited

Loading