- Consider using the following
dbt.utils
in place of the current url-related macros: - Consider recoupling the
extract_hostname_from_url
&extract_query_string_from_url
into a singularparse_url()
macro --> parse_url([hostname|query]):- Checks for
remove_query_parameters
automatically & simpler than having multiple macros with specific names.
- Checks for
-
USER-SCOPE
Metrics:-
days_active
=COUNT(DISTINCT event_date) AS day_count
: The number of days a user has been active on your website or application. - ?
event_count
? =COUNT(*[events]) AS event_count
: The total number of a events a user has triggered on your webiste or application. - Event-related:
-
avg_events_per_session
-
lifetime_events
-
- Pageview-related:
-
avg_page_views_per_session
-
avg_page_view_duration
-
- Session-related:
-
avg_session_duration
-
lifetime_session_duration
SEE BELOW INUser Lifetime Metrics
.
-
-
-
SESSION-SCOPE
Metrics:- Event-related:
-
avg_events_per_session
-
lifetime_events
-
- Pageview-related:
-
avg_page_view_duration
-
- Event-related:
-
PAGE-SCOPE
Metrics:- Expand on Entrances & Exits:
-
entrance_rate
: The percentage of sessions that started on a page or screen (Entrances
/Sessions
). -
exit_rate
: The percentage of sessions that ended on a page or screen (Exits
/Sessions
).
-
- Event-related:
-
events_per_page
-
- Mirroring GA4 UI:
-
views
, just stick withpage_views
? -
users
-
new_users
-
views_per_user
-
total_engagement_duration
-
avg_engagement_duration
-
unique_user_scrolls
- ?
event_count
-- DECIDE HOW TO BREAK THESE ALL OUT -- - ?
conversions
-- DECIDE HOW TO HANDLE GENERAL CONVERSION_EVENT LOGIC -- - ?
total_revenue
-- CHOOSE 1 NAME,event_value
,total_evet_value
,total_revenue
? --
-
- Expand on Entrances & Exits:
- IN GENERAL CONSIDER ADDING THESE FOR EACH METRIC WHERE APPLICABLE:
-
avg_[entity]
-
total_[entity]
-
first_[entity]
-
last_[entity]
-
- User Lifetime Metrics:
-
lifetime_engaged_sessions
: The number of engaged sessions a user had since they first visited your website or application. -
lifetime_engagemenet_duration
:The length of time since a user's first visit that the user was active on your website or application while it was in the foreground. -
lifetime_session_duration
: The total duration of user sessions, from their first session until the current session expires, including time when your website or application is in the background. -
lifetime_ad_revenue
: The ad revenue you generate from someone since their first visit to your website or app. -
LTV
orlifetime_value
: Lifetime value (LTV) shows the total revenue from purchases on your website or application. You can use the data to determine how valuable users are based on additional revenue you generate. -
lifetime_sessions
: ALREADY HAVE THIS -sessions
@ User Level -
lifetime_transactions
ALREADY HAVE THIS -purchases
@ User Level
-
- ...WAY MORE...
- See Automatically Collected Events in GA4.
REASONING: This will help the setup / configuartion to be almost instant. Where every event_name in the stg_ga4_events
model will be automatically detected and create subsequent models for them.
CURRENT PROGRESS: I am so close, but no cigar yet. I have almost figured out how to dynamically handle all event_params
and their assosiated value.<dtype>_type
through macros, one solved, I can implement it to build one enormous Staging Model for all non statically handled events.
- [ ]
- Testing Packages to use:
- A LOT TO DO FOR THIS...
- See the dbt-ga4 Integration Tests for examples.
- See also dbt's docs for testing a new adapter.
- Decide on considerations for handling certain
event_params
, such as:- Google click-related:
gclsrc
andgclid
.- See here for more info.
- Others like:
debug_mode
,term
, and ?clean_event
.
- Google click-related:
- Decide how to handle default events (e.g.,
click
,scroll
, etc.)
- Docs to Add:
- Add a singular
_metric_definitions.md
file to root of themodels
folder. See here for inspo.- Essentially each metric definition would be in this format:
{% docs %} <Enter your metric definition here, like this. [Source](https://like-to-source-file.here/)> {% enddocs %}
- Need to update as metrics & dimensions are added, removed, or changed.
- Essentially each metric definition would be in this format:
- Add in this.
- Restructure to be:
- A single
__schema__.yml
file, for all sources & models at the root of A NEWdocs
folder.- i.e., combine
_core__models.yml
,_ga4__models.yml
&_ga4__sources.yml
. - REMOVE
_core__models.yml
,_ga4__models.yml
&_ga4__sources.yml
.
- i.e., combine
- A single
__macros__.yml
file.- Rename
_macros.yml
, may choose to join this with the__schema__.yml
file later if we can find fix for models failing to compile.
- Rename
- A single
__docs__.md
file, for all model, metric, & macro definitions/descriptions at the root of A NEWdocs
folder.- i.e., combine
_core__docs.md
,_metric_definitions.md
, &_docs.md
. - REMOVE
_core__docs.md
,_metric_definitions.md
, &_docs.md
.
- i.e., combine
- Then also move
__overview__.md
to the root of A NEWdocs
folder.
- A single
- Add a singular
- Docs to Fix:
- Change
__overview__.md
to simply be a high-level overview with links to say theREADME.md
,Projet Style Guide
, and thewhatever else here
for the project.- See here for inspo.
- Change
The general Fixes are as follows:
num_[entity] -->
total_[entity] OR
[entity] (**Recommended**)
// EX: num_page_views --> page_views
-
dim_ga4__users
table:-
num_sessions
-->sessions
-
num_page_views
-->pages_views
-
num_purchases
-->purchases
-
-
dim_ga4__sessions
table: -
fct_ga4__pages
table:- Consider if we should handle
hour
differently? -
total_time_on_page
-->time_on_page
-->total_engagement_duration
- Consider if we should handle
-
fct_ga4__sessions
table:-
count_page_views
-->page_views
- Consider changing
sum_event_value_in_usd
-->event_value
, will want to allign with UA/Internal usage. -
session_engaged
-->engaged_sessions
?
-
-
ga4__events
table:- Inlcude the
is_
prefix to all boolean fields (e.g.session_enagaged
,engaged_session_event
,entrances
), to be like:is_page_view
,is_purchase
, etc. - Add an additiona
page_path
metric.
- Inlcude the
REASONING: Currently there is only a Staging and Mart Models, and no Intermediate layer. Additionally of the following models are breaking the underlying rule of Staging Models: NO JOINS & NO AGGREGATION / RE-GRAINING.
- Handling
conversion_events
:- 1. ADD A
stg_ga4__event_conversions
UPSTREAM INSTEAD OF HAVING BOTHstg_ga4__page_conversions
ANDstg_ga4__session_conversions
. - 2. MOVE
stg_ga4__page_conversions
-->int_ga4__page_conversions
, AND PULL FROMstg_ga4__event_conversions
. - 3. MOVE
stg_ga4__session_conversions
-->int_ga4__session_conversions
, AND PULL FROMstg_ga4__event_conversions
.
- 1. ADD A
- Handling
FIRST
&LAST
events:-
stg_ga4__sessions_first_last_pageviews
-
stg_ga4__users_first_last_pageviews
-
stg_ga4__users_first_last_events
-
- Handling
user_properties
:-
stg_ga4__users_properties
-
stg_ga4__derived_users_properties
-
- Handling all
event_*
related models:- FIX
stg_ga4__event_page_view
, it is trying to join with something up stream.
- FIX
REASONING: Wide & Denomalized. Unlike old school warehousing, in the modern data stack storage is cheap and it’s compute that is expensive and must be prioritized as such, packing these into very wide denormalized concepts that can provide everything somebody needs about a concept as a goal.
-
dim_ga4__sessions
&fct_ga4__sessions
-->ga4__sessions
-
dim_ga4__users
-->ga4__users
-
fct_ga4__pages
-->ga4__pages
- ALSO, HONESTLY SHOULD INCLUDE AN ENRICHED & UNNESTED
ga4__events
TABLE AS WELL.
- Models:
-
ga4__events
-
event_date
-
event_timestamp
-
event_name
-
event_params
-
user_properties
-
user_ltv
-
device
-
geo
-
traffic_source
-
ecommerce
-
items
-
keys
-
user_key
-
session_key
-
page_key
-
event_key
-
- ``
- ``
- ``
- ``
- ``
- ``
-
-
ga4__pages
-
date
-
hour
-
page_path
-
page_title
-
page_key
-
page_views
-
users
-
pageviews_per_user
-
new_users
-
sessions
-
pageviews_per_session
-
entrances
-
entrance_rate
-
exits
-
exit_rate
-
total_engagement_time
-
avg_engagement_duration
-
-
ga4__sessions
-
user_key
-
session_key
-
session_date
-
session_timestamp
-
is_new_user
-
ga_session_number
-
is_engaged_session
-
landing_page
-
default_channel_grouping
-
source
-
medium
-
campaign
-
country
-
city
-
device_category
-
operating_system
-
session_duration
-
page_views
-
session_event_count
-
avg_event_per_page
-
session_value
-
-
ga4__users
-
- Macros:
-
get_position
-
get_first
-
get_last
-
- [x]
-
- Metrics & Dimensions:
-
user_properties
-
user_ltv
-
device
-
device_category
-
mobile_brand_name
-
mobile_model_name
-
mobile_marketing_name
-
mobile_os_hardware_model
-
operating_system
-
operating_system_version
-
vendor_id
-
advertising_id
-
language
-
is_limited_ad_tracking
-
time_zone_offset_seconds
-
browser
-
browser_version
-
-
geo
-
continent
-
sub_continent
-
country
-
region
-
city
-
metro
-
-
traffic_source
-
source
-
medium
-
campaign
-
-
ecommerce
-
items
-
item_id
-
item_name
-
item_brand
-
item_variant
-
item_category
-
item_category2
-
item_category3
-
item_category4
-
item_category5
-
price
-
quantity
-
item_revenue
-
item_refund
-
coupon
-
affiliation
-
location_id
-
item_list_id
-
item_list_name
-
item_list_index
-
promotion_id
-
promotion_name
-
creative_name
-
creative_slot
-
-
- Possible Good Addition: Identify Bot Traffic
- Check this out.
- IDEALS FOR WHY I AM MAKING SOME OF MY CHOICES REVOLVE AROUND: IMPLICIT VS EXPLICIT
- TO WRITE ON THIS EXTENSIVELY ONCE PROJECT IS MORE MATURE...
- See this Stack Overflow Comment, as this may be a better way to implement the
unest_params
macro:- See also dbt's Statement Blocks.
{%- call statement('my_statement', fetch_result=True) -%}
SELECT my_field FROM my_table
{%- endcall -%}
{%- set my_var = load_result('my_statement')['data'][0][0] -%}
-
Consider adding some Metrics / Dimensions for
User-Scoped
illustrating theirmost_freq
&unique_num_of
for the following:device
&device_category
, etc.location
- SURE SOME OTHER GREAT ONES, WILL BRAINSTORM / WHITESTORM WITH OTEHRS.
-
CHECK THIS RESOURCE OUT: Data Modeling for a Customer 360 View.
-
Monitoring Core Web Vitals in BigQuery.
-
ADD IN ABITRARY SCALES OF 1 TO 10 FOR SAY SOMETHING LIKE
user_activity_scale
ORuser_engagement_scale
.- WHERE YOU COULD USE ~
APPROX_QUANTILES(<metrric_name>, 100)[OFFSET(<percentage_amount>)] AS p<percentage_amount>
. - SEE Sample Queries for Audiences FOR MORE INSPORATION.
- WHERE YOU COULD USE ~
-
IDEA FOR HOW TO HANDLE DYNAMIC UNNEST_BY_KEY FOR ALL NON HARD-CODED EVENT MODELS:
- ESSENTIALLY SHOULD PASS A
exclude_event_params
VARIABLE LIST TO EXLUDE FOR EACH EVENT, MOST LIKELY INCLUDING THE FOLLOWING:['ga_session_id', 'page_location', 'ga_session_number', 'session_engaged', 'engagement_time_msec', 'page_title', 'page_refferr', 'source', 'medium', 'campaign']
- ESSENTIALLY SHOULD PASS A
-
These example queries for GA4 are perfect!
- USE THIS, PERFECT SOLUTION!!! :
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM:
- USE THIS, PERFECT SOLUTION!!! :
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM UNNEST(event_params)
WHERE key = 'value'
) AS event_value
- IMPLEMENTATION WOULD PROB LOOK LIKE:
{% if value.string_value not null%}
value.string_value
{% else %}
COALESCE(value.int_value, value.float_value, value.double_value)
{% endif %}