-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send request events for cached pages with rack middleware #67
Conversation
@@ -389,6 +389,34 @@ So the comparisons in Ruby would be: | |||
|
|||
The fields do match successfully, but note the the first comparison matches `id` on `id` and `course_id` so the match would be wider than expected in some instances. | |||
|
|||
## Page Caching | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asatwal is it worth adding something like 'Some applications use rack middleware to cache pages. If your application does not use this you can disregard this section.' here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added something to that effect.
README.md
Outdated
@@ -389,6 +389,34 @@ So the comparisons in Ruby would be: | |||
|
|||
The fields do match successfully, but note the the first comparison matches `id` on `id` and `course_id` so the match would be wider than expected in some instances. | |||
|
|||
## Page Caching | |||
|
|||
Any page visit in the App will result in a web request event being sent to Big Query. The event is automatically sent by the Controller after action callback `trigger_request_event`. However, cached pages that are served from rack middleware return early and therefore do not execute any actions in the controller. This means that any cached page visits handled by rack middleware do NOT result in a web request event being sent to Big Query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asatwal BigQuery is all one word (no space). Same in line 415.
config/locales/en.yml
Outdated
rack_page_cached: | ||
description: | | ||
A proc which will be called with the rack env, and which should | ||
return boolean indicating whether the page is cached and will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asatwal 'return a boolean'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, couple of minor qs inline!
@@ -79,6 +81,7 @@ def self.configure | |||
config.queue ||= :default | |||
config.user_identifier ||= proc { |user| user&.id } | |||
config.anonymise_web_request_user_id ||= false | |||
config.rack_page_cached ||= proc { |_rack_env| false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
README.md
Outdated
|
||
```ruby | ||
DfE::Analytics.config.rack_page_cached = proc do |rack_env| | ||
def path; File.join(Rails.root, 'public/cached_pages'); end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a method here, and not a plain expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to lazily set the path. But git website had a tantrum about this line, so I've removed the method in anycase :-)
@@ -11,6 +11,9 @@ class Railtie < Rails::Railtie | |||
|
|||
initializer 'dfe.analytics.insert_middleware' do |app| | |||
app.config.middleware.use DfE::Analytics::Middleware::RequestIdentity | |||
|
|||
app.config.middleware.insert_before \ | |||
ActionDispatch::Static, DfE::Analytics::Middleware::SendCachedPageRequestEvent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do if ActionDispatch::Static
isn't in the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it will just at to the end of the stack, like use
does. But Rails should always have this present. This is what our friend Chat GPT says:
In Rails, the ActionDispatch::Static
middleware is included by default in the middleware stack for new applications. This middleware is responsible for serving static files from the public
directory of your Rails application.
However, it's important to note that the exact middleware stack can be customized in the config/application.rb
file of your Rails application. So, it's possible to remove or reorder the ActionDispatch::Static
middleware in the stack if needed.
That being said, it's generally not recommended to remove the ActionDispatch::Static
middleware, as it's an essential part of serving static assets in a Rails application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💟 ChatGPT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK seems reasonable — so it would only fail to run if ever someone used something other than ActionDispatch::Static
as the caching layer. Maybe worth one line in the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is causing issues when precompiling the assets, specifically after updating dfe-analytics
in our app we're starting to see errors when trying to deploy.
> [builder 11/12] RUN RAILS_ENV=production SECRET_KEY_BASE=required-to-run-but-not-used GOVUK_NOTIFY_API_KEY=required-to-run-but-not-used REDIS_URL=redis://required-to-run-but-not-used bundle exec rails assets:precompile:
#18 3.549 rails aborted!
#18 3.549 No such middleware to insert before: ActionDispatch::Static
#18 3.549 /app/config/environment.rb:5:in `<main>'
It sounds like perhaps ActionDispatch::Static
isn't added to the middleware stack when precompiling the assets - I'm not entirely clear on how the middleware stack interacts with the asset precompiling.
We'll revert this change for now in Apply for QTS, but I wonder if there's a way of only inserting the DfE::Analytics::Middleware::SendCachedPageRequestEvent
middleware if the ActionDispatch::Static
middleware exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm we had to revert the upgrade to 1.8.0 on Refer because of an error:
bundle exec rails assets:precompile
#19 3.196 rails aborted!
#19 3.196 No such middleware to insert before: ActionDispatch::Static
https://github.com/DFE-Digital/refer-serious-misconduct/actions/runs/4730348588/jobs/8394175837
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry chaps — fix on way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ticket opened on DI Trello board:
end | ||
|
||
def response | ||
ActionDispatch::Response.new(200, 'Content-Type' => 'text/html') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 304?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to log a response to BQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duncanjbrown - Well spotted. Indeed this should be 304. I checked the git website and it returns 304 when returning content from the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this method?
We need a response as the method we call to send the event to BQ requires a response object. The reponse ends up in the BQ database.
@@ -0,0 +1,47 @@ | |||
# frozen_string_literal: true | |||
|
|||
RSpec.describe DfE::Analytics::Middleware::SendCachedPageRequestEvent do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more robust to (also/alternatively) test this through a request spec, and only stub the rack_page_cached?
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a request spec now to test the caching by rack middleware by stubbing rack_page_cached?
. I've had to hack the controller a bit to remove the callback just for this test, but it tests the concept and the middleware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you!
OOI - could we not just always log web visits using middleware? It is relatively easy (IIRC) to know whether it was cached or not and if it was stopped by any other middleware, especially if we inject it high enough. |
end | ||
|
||
def response | ||
ActionDispatch::Response.new(200, 'Content-Type' => 'text/html') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this method?
Interesting provocation. We get the namespace and user_id down in the controller, but I guess you could argue that those could be put into |
Thanks for thew comments @BroiSatse. Great to get your review on this PR also. I had briefly thought about this, and given it a bit more though now that you've mentioned it:
Having said that, it would be nice to have just a single method to send web request events :-). |
|
||
it 'serves a cached page from rack middleware' do | ||
# Indicate to rack middleware that this page is cached | ||
allow(DfE::Analytics).to receive(:rack_page_cached?).and_return(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cool to get ActionDispatch::Static
to do the right thing so we could test this more realistically but I guess we could add that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do that we'd have to implement rails page caching in one of the pages in the dummy App. It's easy enough to do, but might be fiddly get it to work in rspec. I'll open a ticket in the DI team backlog to do this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Trello card
Trello-1058
Context
We have a big inconsistency tracking homepage visits on the GiT website through BigQuery.
BQ Count: 1637 vs GA Count: 65,263. Please see ticket for details.
After some troubleshooting we found that there is page caching of static pages, the homepage being a static page.
GiT website uses rack-page_caching GEM, that implements caching through rack middleware.
The DfE Analytics syncs web request events with a controller after action callback.
Cached pages are served by rack middleware, returning early and therefore do not execute any actions in the controller. This means that any cached page visits handled by rack middleware do NOT result in a web request event being sent to Big Query.
Changes proposed in this pull request and Guidance
The type and method of caching done by a Rails App is a choice.
The most standard way for rails to cache pages, is by using
actionpack-page_caching
(Or the GEM we use in gitrack-page_caching
). These save cached pages to the public directory.In Rails there is a standard
ActionDispatch::Static middleware
, normally very high up in the stack. It checks for the presence of the requested asset (Eg Javascript, Image, Page etc), and serves the asset from the file cache if it is present (Normally from public but this is configurable).No doubt there are other caching GEMs (Both Rails and Rack), Eg
Rack::Cache
, that serve from a memory cache, such as memcache or Redis.There are also other caching mechanisms such as Action, Fragment, Russian Doll Caching, that serve the cached page from the controller, in which case we don't have a problem with sending of the big query event.
Also adding a header to the request to indicate a cached page, has to be done by the client. The client doesn't know if a requested page is cached, unless there is some content on the page that indicates it is cached or something in the session cookie, and then only after the page is returned
There's also the question of whether the cached page is served from middleware or the controller.
So our specific issue on GiT is a static cached page served by
ActionDispatch::Static
middleware from file. If the page is cached then we need to send the BQ event before hittingActionDispatch::Static
. This is the most common type of caching IMO anyway.Suggestions for adding to DfE Analytics GEM:
SendCachedPageRequestEvent
) beforeActionDispatch::Static
to handle the sending of the event to BQ.config/initializers/dfe_analytics.rb
):rack_page_cached
. This proc should return a boolean indicating whether the page is cached by rackHaving a project configurable proc removes the assumptions about the type and method of caching, allowing the project full control of when to send the web request event to Big Query.