Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pivots.aggregates.aggregate option for nested aggregates #658

Closed
ldhasson opened this issue Dec 20, 2021 · 0 comments
Closed

New pivots.aggregates.aggregate option for nested aggregates #658

ldhasson opened this issue Dec 20, 2021 · 0 comments

Comments

@ldhasson
Copy link
Collaborator

ldhasson commented Dec 20, 2021

When doing pivots, it's possible to want two nested levels of aggregates. For example, consider daily weather data with multiple readings per day per zip codes. You want a pivoted view that collects daily averages per zip-code over the past 30 days. The first level would define daily averages, while the final level would create an array of the average values for the last 30 days for each measure, i.e., temperature, aqi, rainfall...

Currently, this is not possible as the aggregate in the inner level is replicated for the outer level. It was meant to calculate simple values typical in most pivoting logic.

 { "name":"Weather_PivotView"
  ,"description":"An aggregate of weather daya for the past 30 days",
   "columns":[
      { "sameas":"weather.zipCode" }
     ,{ "sameas":"weather.taken"  , "expression":"?::DATE", "type":"DATE" }
     ,{ "sameas":"weather.value"  , "aggregate":"AVG"  }
    ]
  ,"subWhere":"weather.taken >= now() - interval '30 days'"
  ,"pivots":[
      { "on": "weather.name"
       ,"aggregates":[
           { "name":"value", "suffix":"_values" }
         ]
       ,"interleave":true
       ,"globals":false
       ,"values":[ { "value":"TEMP"    , "description":"Temperature"        }
                  ,{ "value":"AQI"     , "description":"Air Quality Index." }
                  ,{ "value":"RAINFALL", "description":"Rain Fall"          }
                 ]
      }
    ]
 }

The aggregate definition is supplied only once for the source columns, and the aggregate is replicated in the final pivot logic

with T as (
-- 'An aggregate of weather daya for the past 30 days'
select CODESMASTER.Weather."zipCode" as "zipCode" -- FK to zip code
     , CODESMASTER.Weather."taken"::DATE as "taken" -- Start date for the value
     , CODESMASTER.Weather."name" as "name" -- Name
     , avg(CODESMASTER.Weather."value") as "value" -- value.
  from CODESMASTER.Weather
 where (CODESMASTER.Weather."taken" >= now() - interval '30 days')
   and ( CODESMASTER.Weather."name" in ('TEMP', 'AQI', 'RAINFALL')
       )
 group by 1, 2, 3
) select "zipCode" 
       , "taken" 
     , AVG("value") filter (where "name"= 'TEMP')  as "TEMP_values"
     , AVG("value") filter (where "name"= 'AQI')  as "AQI_values"
     , AVG("value") filter (where "name"= 'RAINFALL')  as "RAINFALL_values"
from T
     group by 1, 2

Not only is this more verbose than needed, it's also somewhat counter intuitive to have AVG twice when the second level AVG in this particular case is effectively aggregating only a single value.

We propose to augment the pivots.aggregates elements with the ability to define its own "aggregate" and "orderBy" and allow pivoted column sources to not be aggregates themselves. This provides for a more natural expression of the aggregation logic needed for pivoting as well as the ability to mix different types of aggregation at each level so as to achieve maybe a better representation of all the data points for the past 30 days in an array.

This will allow something like this:

 { "name":"Weather_PivotView"
  ,"description":"An aggregate of weather daya for the past 30 days",
   "columns":[
      { "sameas":"weather.zipCode" }
     ,{ "sameas":"weather.taken"  , "expression":"?::DATE", "type":"DATE" }
     ,{ "sameas":"weather.value"  , "aggregate":"AVG"  }
    ]
  ,"subWhere":"weather.taken >= now() - interval '30 days'"
  ,"pivots":[
      { "on": "weather.name"
       ,"aggregates":[
            { "name":"taken", "suffix":"_takens", "aggregate":"ARRAY", "orderBy":["taken"] }
           ,{ "name":"value", "suffix":"_values", "aggregate":"ARRAY", "orderBy":["taken"] }
         ]
       ,"interleave":true
       ,"globals":false
       ,"values":[ { "value":"TEMP"    , "description":"Temperature"        }
                  ,{ "value":"AQI"     , "description":"Air Quality Index." }
                  ,{ "value":"RAINFALL", "description":"Rain Fall"          }
                 ]
      }
    ]
 }

The following is then generated with the aggregates at both levels being different.

with T as (
-- 'An aggregate of weather daya for the past 30 days'
select CODESMASTER.Weather."zipCode" as "zipCode" -- FK to zip code
     , CODESMASTER.Weather."taken"::DATE as "taken" -- Start date for the value
     , CODESMASTER.Weather."name" as "name" -- Name
     , avg(CODESMASTER.Weather."value") as "value" -- value.
  from CODESMASTER.Weather
 where (CODESMASTER.Weather."taken" >= now() - interval '30 days')
   and ( CODESMASTER.Weather."name" in ('TEMP', 'AQI', 'RAINFALL')
       )
 group by 1, 2, 3
) select "zipCode" 
     , array_agg("taken" order by T."taken" ASC) filter (where "name"= 'TEMP')  as "TEMP_takens"
     , array_agg("value" order by T."taken" ASC) filter (where "name"= 'TEMP')  as "TEMP_values"
     , array_agg("taken" order by T."taken" ASC) filter (where "name"= 'AQI')  as "AQI_takens"
     , array_agg("value" order by T."taken" ASC) filter (where "name"= 'AQI')  as "AQI_values"
     , array_agg("taken" order by T."taken" ASC) filter (where "name"= 'RAINFALL')  as "RAINFALL_takens"
     , array_agg("value" order by T."taken" ASC) filter (where "name"= 'RAINFALL')  as "RAINFALL_values"
from T
     group by 1
ldhasson added a commit that referenced this issue Dec 21, 2021
ldhasson added a commit that referenced this issue Dec 24, 2021
This is a large commit
- Cleanup of type management for pivoted columns taking into
consideration multiple levels of possible expression definitions and
aggregates.
- Cleanup of TZ logic
- Cleanup of SQL logic to only rely on View._PivotColumns instead of
replicating logic from the model validation
- Added new rule to catch when incompatible expressions and aggregates
clash.
- Added new rule to catch when someone defines an expression for a pivot
value but has multipple pivot aggregates defined.

There ae still places that could be simpler, cleaner... but that'll be
for later.
ldhasson added a commit that referenced this issue Dec 26, 2021
@ldhasson ldhasson closed this as completed Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

1 participant