-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grab region data with less queries #2961
Conversation
some other cleanup and simplification
to allow a large data set in profiling and a smaller one for dev
otherwise we get back 0 values when asking for a specific range, which makes new facilities show incorrect empty values in reports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes for reviewers
if @regions.map(&:region_type).uniq.size != 1 | ||
raise ArgumentError, "RegionSummary must be called with regions of the same region_type" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping it simple here and enforcing that RegionSummary is only called with an array of one type of region.
The alternative is to do some sort of UNION or other more complex SQL to properly handle the different grouping fields required, which is not worth it given how fast these queries are and the fact that we normally only show 2 region types per reporting page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds perfectly fine to me.
Arel.sql(<<-SQL) | ||
COALESCE(SUM(#{field}_under_care::int + #{field}_lost_to_follow_up::int), 0) as #{field}_under_care_with_lost_to_follow_up | ||
SQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to do this manually right now, though may be worth adding it to the mat view itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we sum in memory to keep it in application code?
Update: Oh I see we've just extracted it from elsewhere. Feel free to leave as is for now.
# `select` because the `sum` methods in ActiveRecord can't sum multiple fields. | ||
# We also order by `month_date` because some code in the views expects elements to be ordered by Period from | ||
# oldest to newest - it also makes reading output in specs and debugging much easier. | ||
memoize def facility_state_data(region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this logic is moved to RegionSummary.
@@ -13,7 +13,7 @@ development: | |||
|
|||
profiling: | |||
<<: *default | |||
database: simple-server_development | |||
database: simple-server_profiling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to have a separate database for performance testing in dev - so you can load a large data set into profiling
and not have to worry about your normal dev work clearing it out.
@@ -0,0 +1,18 @@ | |||
module LogFriend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments above - a small little dev / test helper
EOL | ||
# Calls RegionSummary for each region_type in our collection of regions -- this is necessary because | ||
# RegionSummary queries for only type of region at a time. | ||
memoize def region_summaries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now grab all the data for all regions with a query per region_type -- so the most queries a Repository call would require is four if we request state, district, block, and facility level data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet!
@@ -224,10 +224,11 @@ def refresh_views | |||
uncontrolled = repo.uncontrolled | |||
expect(controlled[facility_1.slug][jan]).to eq(2) | |||
expect(controlled[facility_2.slug][jan]).to eq(1) | |||
expect(controlled[facility_3.slug][jan]).to eq(0) | |||
empty_value = v2_flag ? nil : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an edge case for a region with no patients -- Its probably more correct to return the nil
value that the new code will do going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got a few more moving pieces with this PR now, and I wanna make sure we all have a good understanding of how reports are architected. Could you generate some sort of diagram/documentation of how the summary, caching, schemas, repository, all work together?
@@ -0,0 +1,33 @@ | |||
module Reports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an extraction of some logic from the "schema"s? Nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, exactly 👍
Arel.sql(<<-SQL) | ||
COALESCE(SUM(#{field}_under_care::int + #{field}_lost_to_follow_up::int), 0) as #{field}_under_care_with_lost_to_follow_up | ||
SQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we sum in memory to keep it in application code?
Update: Oh I see we've just extracted it from elsewhere. Feel free to leave as is for now.
if @regions.map(&:region_type).uniq.size != 1 | ||
raise ArgumentError, "RegionSummary must be called with regions of the same region_type" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds perfectly fine to me.
@@ -67,27 +51,35 @@ def initialize(regions, periods:) | |||
# Adjusted patient counts are the patient counts from three months ago (the adjusted period) that | |||
# are the basis for control rates. These counts DO NOT include lost to follow up. | |||
memoize def adjusted_patients_without_ltfu | |||
regions.each_with_object({}) { |region, result| result[region.slug] = sum(region, :adjusted_patients_under_care) } | |||
values_at("adjusted_patients_under_care") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think of summing a nested key across objects as a first-class idea when dealing with these reports. Nice to see it in action here 👌
@@ -0,0 +1,18 @@ | |||
module LogFriend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha the number of times I've had to construct this log message in the past, this makes a lot of sense 😂
# Shim a noop method for non dev / test environments | ||
def d(msg) | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👾
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great set of changes! 🙌
Is there a before/after comparison of the performance? The numbers would be very useful to look at to make informed decisions in the future when making design/performance tradeoffs.
@@ -0,0 +1,33 @@ | |||
module Reports | |||
module RegionCaching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At what point in time do you think we can do away with this caching? Or not use it for most of the queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caching is primarily for the calculated rates now, so that we don't need to do region_count * (periods * fields)
number of calculations on each page load. They are fast calculations, but can also add up when there are so many on details pages.
end | ||
|
||
memoize def uncontrolled | ||
values_at("adjusted_uncontrolled_under_care") | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool to see the values_at
simplicity here. I'm guessing we're still sticking with methods because of V1? When we (eventually) remove V1, these could all go into a hash ya?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup 👍
EOL | ||
# Calls RegionSummary for each region_type in our collection of regions -- this is necessary because | ||
# RegionSummary queries for only type of region at a time. | ||
memoize def region_summaries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet!
@@ -0,0 +1,18 @@ | |||
module LogFriend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rubymine debugger ftw! :)
@@ -0,0 +1,103 @@ | |||
require "rails_helper" | |||
|
|||
RSpec.describe Reports::RegionSummary, {type: :model, reporting_spec: true} do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! I wasn't too sure about having a separate entity that represents a "sum"mary :p, but having a different place for it's specs convinces me :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup the grouping and especially the summing LTFU + undercare totals definitely warrants it.
added an architecture diagram
@@ -0,0 +1,41 @@ | |||
Repository architecture | |||
```plantuml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done via plantuml, which has lots of IDE addons and export tools if you wanna play with it.
Story card: ch5448
We currently are doing a query per region against
reporting_facility_states
, which adds up quite a bit for pages likeregions#details
that renders a list of many facilities / blocks / etc.These queries can be consolidated to simplify the data access and improve performance (current page load time can be over 2 seconds, which is too slow).
I'll drop some notes inline to help w/ review on this one.
Added an arch diagram in the PR: