Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send request events for cached pages with rack middleware #67

Merged
merged 4 commits into from
Apr 4, 2023

Conversation

asatwal
Copy link
Collaborator

@asatwal asatwal commented Mar 31, 2023

Trello card

Trello-1058

Context

We have a big inconsistency tracking homepage visits on the GiT website through BigQuery.

BQ Count: 1637 vs GA Count: 65,263. Please see ticket for details.

After some troubleshooting we found that there is page caching of static pages, the homepage being a static page.

GiT website uses rack-page_caching GEM, that implements caching through rack middleware.

The DfE Analytics syncs web request events with a controller after action callback.

Cached pages are served by rack middleware, returning early and therefore do not execute any actions in the controller. This means that any cached page visits handled by rack middleware do NOT result in a web request event being sent to Big Query.

Changes proposed in this pull request and Guidance

The type and method of caching done by a Rails App is a choice.

The most standard way for rails to cache pages, is by using actionpack-page_caching (Or the GEM we use in git rack-page_caching). These save cached pages to the public directory.

In Rails there is a standard ActionDispatch::Static middleware, normally very high up in the stack. It checks for the presence of the requested asset (Eg Javascript, Image, Page etc), and serves the asset from the file cache if it is present (Normally from public but this is configurable).

No doubt there are other caching GEMs (Both Rails and Rack), Eg Rack::Cache, that serve from a memory cache, such as memcache or Redis.

There are also other caching mechanisms such as Action, Fragment, Russian Doll Caching, that serve the cached page from the controller, in which case we don't have a problem with sending of the big query event.

Also adding a header to the request to indicate a cached page, has to be done by the client. The client doesn't know if a requested page is cached, unless there is some content on the page that indicates it is cached or something in the session cookie, and then only after the page is returned

There's also the question of whether the cached page is served from middleware or the controller.

So our specific issue on GiT is a static cached page served by ActionDispatch::Static middleware from file. If the page is cached then we need to send the BQ event before hitting ActionDispatch::Static. This is the most common type of caching IMO anyway.

Suggestions for adding to DfE Analytics GEM:

  • We need to make it obvious that this feature is for only when a cached page is served by rack middleware and not the application controller.
  • DfE Analytics will add middleware (eg SendCachedPageRequestEvent) before ActionDispatch::Static to handle the sending of the event to BQ.
  • The Analytics GEM requires a configurable custom proc methods (In config/initializers/dfe_analytics.rb):
    rack_page_cached. This proc should return a boolean indicating whether the page is cached by rack

Having a project configurable proc removes the assumptions about the type and method of caching, allowing the project full control of when to send the web request event to Big Query.

@asatwal asatwal self-assigned this Mar 31, 2023
@@ -389,6 +389,34 @@ So the comparisons in Ruby would be:

The fields do match successfully, but note the the first comparison matches `id` on `id` and `course_id` so the match would be wider than expected in some instances.

## Page Caching

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asatwal is it worth adding something like 'Some applications use rack middleware to cache pages. If your application does not use this you can disregard this section.' here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added something to that effect.

README.md Outdated
@@ -389,6 +389,34 @@ So the comparisons in Ruby would be:

The fields do match successfully, but note the the first comparison matches `id` on `id` and `course_id` so the match would be wider than expected in some instances.

## Page Caching

Any page visit in the App will result in a web request event being sent to Big Query. The event is automatically sent by the Controller after action callback `trigger_request_event`. However, cached pages that are served from rack middleware return early and therefore do not execute any actions in the controller. This means that any cached page visits handled by rack middleware do NOT result in a web request event being sent to Big Query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asatwal BigQuery is all one word (no space). Same in line 415.

rack_page_cached:
description: |
A proc which will be called with the rack env, and which should
return boolean indicating whether the page is cached and will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asatwal 'return a boolean'?

Copy link
Contributor

@duncanjbrown duncanjbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, couple of minor qs inline!

@@ -79,6 +81,7 @@ def self.configure
config.queue ||= :default
config.user_identifier ||= proc { |user| user&.id }
config.anonymise_web_request_user_id ||= false
config.rack_page_cached ||= proc { |_rack_env| false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

README.md Outdated

```ruby
DfE::Analytics.config.rack_page_cached = proc do |rack_env|
def path; File.join(Rails.root, 'public/cached_pages'); end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a method here, and not a plain expression?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to lazily set the path. But git website had a tantrum about this line, so I've removed the method in anycase :-)

@@ -11,6 +11,9 @@ class Railtie < Rails::Railtie

initializer 'dfe.analytics.insert_middleware' do |app|
app.config.middleware.use DfE::Analytics::Middleware::RequestIdentity

app.config.middleware.insert_before \
ActionDispatch::Static, DfE::Analytics::Middleware::SendCachedPageRequestEvent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do if ActionDispatch::Static isn't in the stack?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it will just at to the end of the stack, like use does. But Rails should always have this present. This is what our friend Chat GPT says:

In Rails, the ActionDispatch::Static middleware is included by default in the middleware stack for new applications. This middleware is responsible for serving static files from the public directory of your Rails application.

However, it's important to note that the exact middleware stack can be customized in the config/application.rb file of your Rails application. So, it's possible to remove or reorder the ActionDispatch::Static middleware in the stack if needed.

That being said, it's generally not recommended to remove the ActionDispatch::Static middleware, as it's an essential part of serving static assets in a Rails application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💟 ChatGPT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK seems reasonable — so it would only fail to run if ever someone used something other than ActionDispatch::Static as the caching layer. Maybe worth one line in the docs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is causing issues when precompiling the assets, specifically after updating dfe-analytics in our app we're starting to see errors when trying to deploy.

https://github.com/DFE-Digital/apply-for-qualified-teacher-status/actions/runs/4609526294/jobs/8146775822

 > [builder 11/12] RUN RAILS_ENV=production     SECRET_KEY_BASE=required-to-run-but-not-used     GOVUK_NOTIFY_API_KEY=required-to-run-but-not-used     REDIS_URL=redis://required-to-run-but-not-used     bundle exec rails assets:precompile:
#18 3.549 rails aborted!
#18 3.549 No such middleware to insert before: ActionDispatch::Static
#18 3.549 /app/config/environment.rb:5:in `<main>'

It sounds like perhaps ActionDispatch::Static isn't added to the middleware stack when precompiling the assets - I'm not entirely clear on how the middleware stack interacts with the asset precompiling.

We'll revert this change for now in Apply for QTS, but I wonder if there's a way of only inserting the DfE::Analytics::Middleware::SendCachedPageRequestEvent middleware if the ActionDispatch::Static middleware exists?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm we had to revert the upgrade to 1.8.0 on Refer because of an error:

bundle exec rails assets:precompile
#19 3.196 rails aborted!
#19 3.196 No such middleware to insert before: ActionDispatch::Static

https://github.com/DFE-Digital/refer-serious-misconduct/actions/runs/4730348588/jobs/8394175837

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry chaps — fix on way!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end

def response
ActionDispatch::Response.new(200, 'Content-Type' => 'text/html')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 304?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to log a response to BQ

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duncanjbrown - Well spotted. Indeed this should be 304. I checked the git website and it returns 304 when returning content from the cache.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method?
We need a response as the method we call to send the event to BQ requires a response object. The reponse ends up in the BQ database.

@@ -0,0 +1,47 @@
# frozen_string_literal: true

RSpec.describe DfE::Analytics::Middleware::SendCachedPageRequestEvent do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more robust to (also/alternatively) test this through a request spec, and only stub the rack_page_cached?method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a request spec now to test the caching by rack middleware by stubbing rack_page_cached?. I've had to hack the controller a bit to remove the callback just for this test, but it tests the concept and the middleware.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

@BroiSatse
Copy link
Contributor

OOI - could we not just always log web visits using middleware? It is relatively easy (IIRC) to know whether it was cached or not and if it was stopped by any other middleware, especially if we inject it high enough.

end

def response
ActionDispatch::Response.new(200, 'Content-Type' => 'text/html')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method?

@duncanjbrown
Copy link
Contributor

OOI - could we not just always log web visits using middleware? It is relatively easy (IIRC) to know whether it was cached or not and if it was stopped by any other middleware, especially if we inject it high enough.

Interesting provocation. We get the namespace and user_id down in the controller, but I guess you could argue that those could be put into RequestLocals and logged on the way out? I guess this would mean checking the request both before and after app.call(env)

@asatwal
Copy link
Collaborator Author

asatwal commented Apr 3, 2023

OOI - could we not just always log web visits using middleware? It is relatively easy (IIRC) to know whether it was cached or not and if it was stopped by any other middleware, especially if we inject it high enough.

Thanks for thew comments @BroiSatse. Great to get your review on this PR also.

I had briefly thought about this, and given it a bit more though now that you've mentioned it:

  • If it's high up in the stack then some of the checks haven't been done. For example routing. So this may be a 404 (Or even an error for that matter, say after input param validation), in which case we may not want to log.

  • Caching is quite project specific, so it's not always possible to know if a page is cached easily. This PR uses standard Rails caching, but what about Rack Cache, or bespoke caching such as Redis / Memcache etc. In that case we couldn't make it very generic.

  • Currently all the projects use the controller callback action to trigger the web request event. To move this to middleware, would be a big change a quite disruptive. So this is the low risk option.

Having said that, it would be nice to have just a single method to send web request events :-).


it 'serves a cached page from rack middleware' do
# Indicate to rack middleware that this page is cached
allow(DfE::Analytics).to receive(:rack_page_cached?).and_return(true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool to get ActionDispatch::Static to do the right thing so we could test this more realistically but I guess we could add that later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do that we'd have to implement rails page caching in one of the pages in the dummy App. It's easy enough to do, but might be fiddly get it to work in rspec. I'll open a ticket in the DI team backlog to do this in the future.

Copy link
Contributor

@duncanjbrown duncanjbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@asatwal asatwal merged commit 9b03ead into main Apr 4, 2023
@asatwal asatwal deleted the send-request-events-for-cached-pages branch April 4, 2023 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants