Skip to content

First Tutorial Part 9: Shadow Realized Views

Laurent Hasson edited this page Sep 16, 2019 · 2 revisions
Previous Main Next
<-- Part 8 Main Part 10 -->

Realizing All The Views

Last but not least, we always have to think about performance. As you may have noted, we are joining the 2 views together and we have an index on Type for the Form table. So things should be OK. However, we realized Test_XYZ_PivotView but not Test_View nor Test_XYZ_Analytics_View.

Let's do that for completeness here... Both Test_XYZ_PivotView and Test_View are fairly fundamental datasets upon which a lot of analysis could be performed, so realizing those 2 makes sense. As for Test_XYZ_Analytics_View, it's where we have interesting knowledge now and therefore, likely to be a data source for other analytics processes (a tableau report, exporting the data to a CSV for offline analysis in an environment like SAS or R maybe...). The Tilda realize segments are:

--------------------------------------------------------------------------------------------------
-- Test_XYZ_PivotView 
--------------------------------------------------------------------------------------------------
  ,"realize":{
       "primary":{ "columns":["formRefnum"] }
      ,"foreign":[
          { "name":"Form" ,  "srcColumns":["formRefnum"    ], "destObject": "Form" }
         ,{ "name":"User"  , "srcColumns":["formUserRefnum"], "destObject": "User" }
        ]
      ,"indices":[
          { "name":"User", "columns":["formUserRefnum"], "orderBy":["formFillDate desc"] }
         ,{ "name":"Type", "columns":["formType"      ], "orderBy":["formFillDate desc"] }
        ]
    }


--------------------------------------------------------------------------------------------------
-- Test_View 
--------------------------------------------------------------------------------------------------
  ,"realize":{
       "primary":{ "columns":["type"] }
    }
 

--------------------------------------------------------------------------------------------------
-- Test_XYZ_Analytics_View
--------------------------------------------------------------------------------------------------
 ,"realize":{
       "primary":{ "columns":["formRefnum"] }
      ,"foreign":[
          { "name":"Form" ,  "srcColumns":["formRefnum"    ], "destObject": "Form" }
         ,{ "name":"User"  , "srcColumns":["formUserRefnum"], "destObject": "User" }
        ]
      ,"indices":[
          { "name":"User", "columns":["formUserRefnum"], "orderBy":["formFillDate desc"] }
         ,{ "name":"Type", "columns":["formType"      ], "orderBy":["formFillDate desc"] }
        ]
    }

Test_XYZ_PivotView and Test_XYZ_Analytics_View are similar in that they share the same grain (formRefnum) and similar columns (userRefnum...). As such, their realize sections are pretty much identical. As for Test_View, it's a fairly simple construct (akin to a Fact table in a data warehouse) where we accumulated some measures per form type, so the realize section is trivial.

Shadow Realized Views

As discussed previously, something interesting happens when you layer multiple views that are realized: a parallel hierarchy is created of equivalent views that go against their realized dependencies. Since Test_XYZ_Analytics_View is (1) realized and (2) reuses other realized views, a _R parallel view will be generated:

create or replace view TILDATMP.TILDATEST_Test_XYZ_Analytics_View_R as 
select /*DoFormulasSuperView*/
"formRefnum" -- COLUMN
     , "formType" -- COLUMN
     , "formFillDateTZ" -- COLUMN
     , "formFillDate" -- COLUMN
     , "formUserRefnum" -- COLUMN
     , "formUserEmail" -- COLUMN
     , "formCountCorrect" -- COLUMN
     , "formTimeMillisTotal" -- COLUMN
     , "testCount" -- COLUMN
     , "testAnswerCountCorrect" -- COLUMN
     , "testTimeMillisAvg" -- COLUMN
     -- Whether the test was passed or not by answering at least 2 out of the 3 questions.
     , (coalesce("formCountCorrect", 0) >= 2)::integer as "isPassed"
     -- Whether the test took longer that the average time spent across all tests.
     , (coalesce("formTimeMillisTotal", 0) > coalesce("testTimeMillisAvg", 0))::integer as "tookLongerThanAverage"
     -- Whether the test was challenging in that:<LI>
     --   <LI>it was passed,</LI>
     --   <LI>and overall, less than a third of answers across all tests were answered correctly.</LI>
     -- </UL>
     , ((coalesce("formCountCorrect", 0) >= 2)::integer=1 AND coalesce("testAnswerCountCorrect", 0) < coalesce("testCount", 0)*3/2)::integer as "wasChallenging"

 from (
-- 'A view of XYZ forms with analytics'
select TILDATEST.Test_XYZ_Realized."formRefnum" as "formRefnum" -- The primary key for this record
     , TILDATEST.Test_XYZ_Realized."formType" as "formType" -- Form template type
     , TILDATEST.Test_XYZ_Realized."formFillDateTZ" as "formFillDateTZ" -- Generated helper column to hold the time zone ID for 'formFillDate'.
     , TILDATEST.Test_XYZ_Realized."formFillDate" as "formFillDate" -- The date the form was filled
     , TILDATEST.Test_XYZ_Realized."formUserRefnum" as "formUserRefnum" -- The primary key for this record
     , TILDATEST.Test_XYZ_Realized."formUserEmail" as "formUserEmail" -- The user's email
     , TILDATEST.Test_XYZ_Realized."countCorrect" as "formCountCorrect" -- Whether the answer is correct or not
     , TILDATEST.Test_XYZ_Realized."timeMillisTotal" as "formTimeMillisTotal" -- Time in milliseconds for the time spent answering the question
     , TILDATEST.Test_Realized."testCount" as "testCount" -- The primary key for this record
     , TILDATEST.Test_Realized."answerCountCorrect" as "testAnswerCountCorrect" -- Whether the answer is correct or not
     , TILDATEST.Test_Realized."timeMillisAvg" as "testTimeMillisAvg" -- Time in milliseconds for the time spent answering the question
  from TILDATEST.Test_XYZ_Realized
     inner join TILDATEST.Test_Realized on Test_Realized."type" = Test_XYZ_Realized."formType"

      ) as T
-- Realized as /*genRealizedColumnList*/"formRefnum" -- COLUMN ,"formType" -- COLUMN ,"formFillDateTZ" -- COLUMN ,"formFillDate" -- COLUMN ,"formUserRefnum" -- COLUMN ,"formUserEmail" -- COLUMN ,"formCountCorrect" -- COLUMN ,"formTimeMillisTotal" -- COLUMN ,"testCount" -- COLUMN ,"testAnswerCountCorrect" -- COLUMN ,"testTimeMillisAvg" -- COLUMN ,"isPassed" -- FORMULA ,"tookLongerThanAverage" -- FORMULA ,"wasChallenging" -- FORMULA
;

It is somewhat similar to Test_XYZ_Analytics_View in terms of the columns defined, but the structure is quite different:

  • The view is generated in the TILDATMP schema and should never be used directly.
  • Instead of joining and using the sub-views Test_XYZ_PivotView and Test_View, it uses the realized tables Test_XYZ_Realized and Test_Realized.
  • A sub-query is created to wrap what would have been the original selects from the views, and joins.
  • Because all the columns have been pre-computed in the realized tables, the SQL fragments for aggregates, pivoting and formulas have disappeared inside the sub-query in favor of reusing the columns from the realized tables directly.

The expectation is that performance is much improved. To Refill the lot, you'd execute in order:

select Refill_Test_XYZ_Realized();
select Refill_Test_Realized();
select TILDATEST.Refill_Test_XYZ_Analytics_Realized();

The order is important in the sense that Test_XYZ and Test must be refilled first before Refill_Test_XYZ_Analytics is refilled. The HTML documentation for each schema will output a dependency graph and an optimal refill order for whatever views you have defined.

Previous Main Next
<-- Part 8 Main Part 10 -->
Clone this wiki locally