Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grab region data with less queries #2961

Merged
merged 44 commits into from
Oct 13, 2021
Merged

Conversation

rsanheim
Copy link
Contributor

@rsanheim rsanheim commented Oct 1, 2021

Story card: ch5448

We currently are doing a query per region against reporting_facility_states, which adds up quite a bit for pages like regions#details that renders a list of many facilities / blocks / etc.

These queries can be consolidated to simplify the data access and improve performance (current page load time can be over 2 seconds, which is too slow).

I'll drop some notes inline to help w/ review on this one.

Added an arch diagram in the PR:

image

@harimohanraj89 harimohanraj89 temporarily deployed to simple-review-pr-2961 October 1, 2021 19:20 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 4, 2021 16:18 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 4, 2021 16:51 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 4, 2021 19:26 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 4, 2021 20:41 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 5, 2021 15:39 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 5, 2021 18:18 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 5, 2021 18:20 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 6, 2021 17:39 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 6, 2021 19:29 Inactive
to allow a large data set in profiling and a smaller one for dev
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 6, 2021 23:18 Inactive
otherwise we get back 0 values when asking for a specific range, which
makes new facilities show incorrect empty values in reports
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 11, 2021 17:57 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 11, 2021 17:59 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 11, 2021 18:06 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 11, 2021 18:06 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 11, 2021 18:11 Inactive
Copy link
Contributor Author

@rsanheim rsanheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes for reviewers

Comment on lines +46 to +48
if @regions.map(&:region_type).uniq.size != 1
raise ArgumentError, "RegionSummary must be called with regions of the same region_type"
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it simple here and enforcing that RegionSummary is only called with an array of one type of region.

The alternative is to do some sort of UNION or other more complex SQL to properly handle the different grouping fields required, which is not worth it given how fast these queries are and the fact that we normally only show 2 region types per reporting page.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds perfectly fine to me.

Comment on lines +35 to +37
Arel.sql(<<-SQL)
COALESCE(SUM(#{field}_under_care::int + #{field}_lost_to_follow_up::int), 0) as #{field}_under_care_with_lost_to_follow_up
SQL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to do this manually right now, though may be worth adding it to the mat view itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sum in memory to keep it in application code?

Update: Oh I see we've just extracted it from elsewhere. Feel free to leave as is for now.

# `select` because the `sum` methods in ActiveRecord can't sum multiple fields.
# We also order by `month_date` because some code in the views expects elements to be ordered by Period from
# oldest to newest - it also makes reading output in specs and debugging much easier.
memoize def facility_state_data(region)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this logic is moved to RegionSummary.

@@ -13,7 +13,7 @@ development:

profiling:
<<: *default
database: simple-server_development
database: simple-server_profiling
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to have a separate database for performance testing in dev - so you can load a large data set into profiling and not have to worry about your normal dev work clearing it out.

@@ -0,0 +1,18 @@
module LogFriend
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments above - a small little dev / test helper

EOL
# Calls RegionSummary for each region_type in our collection of regions -- this is necessary because
# RegionSummary queries for only type of region at a time.
memoize def region_summaries
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now grab all the data for all regions with a query per region_type -- so the most queries a Repository call would require is four if we request state, district, block, and facility level data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@@ -224,10 +224,11 @@ def refresh_views
uncontrolled = repo.uncontrolled
expect(controlled[facility_1.slug][jan]).to eq(2)
expect(controlled[facility_2.slug][jan]).to eq(1)
expect(controlled[facility_3.slug][jan]).to eq(0)
empty_value = v2_flag ? nil : 0
Copy link
Contributor Author

@rsanheim rsanheim Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an edge case for a region with no patients -- Its probably more correct to return the nil value that the new code will do going forward.

Copy link
Contributor

@harimohanraj89 harimohanraj89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got a few more moving pieces with this PR now, and I wanna make sure we all have a good understanding of how reports are architected. Could you generate some sort of diagram/documentation of how the summary, caching, schemas, repository, all work together?

@@ -0,0 +1,33 @@
module Reports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an extraction of some logic from the "schema"s? Nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, exactly 👍

Comment on lines +35 to +37
Arel.sql(<<-SQL)
COALESCE(SUM(#{field}_under_care::int + #{field}_lost_to_follow_up::int), 0) as #{field}_under_care_with_lost_to_follow_up
SQL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sum in memory to keep it in application code?

Update: Oh I see we've just extracted it from elsewhere. Feel free to leave as is for now.

Comment on lines +46 to +48
if @regions.map(&:region_type).uniq.size != 1
raise ArgumentError, "RegionSummary must be called with regions of the same region_type"
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds perfectly fine to me.

@@ -67,27 +51,35 @@ def initialize(regions, periods:)
# Adjusted patient counts are the patient counts from three months ago (the adjusted period) that
# are the basis for control rates. These counts DO NOT include lost to follow up.
memoize def adjusted_patients_without_ltfu
regions.each_with_object({}) { |region, result| result[region.slug] = sum(region, :adjusted_patients_under_care) }
values_at("adjusted_patients_under_care")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think of summing a nested key across objects as a first-class idea when dealing with these reports. Nice to see it in action here 👌

@@ -0,0 +1,18 @@
module LogFriend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha the number of times I've had to construct this log message in the past, this makes a lot of sense 😂

spec/models/reports/schema_v2_spec.rb Outdated Show resolved Hide resolved
# Shim a noop method for non dev / test environments
def d(msg)
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👾

Copy link
Contributor

@ssrihari ssrihari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great set of changes! 🙌

Is there a before/after comparison of the performance? The numbers would be very useful to look at to make informed decisions in the future when making design/performance tradeoffs.

@@ -0,0 +1,33 @@
module Reports
module RegionCaching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At what point in time do you think we can do away with this caching? Or not use it for most of the queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caching is primarily for the calculated rates now, so that we don't need to do region_count * (periods * fields) number of calculations on each page load. They are fast calculations, but can also add up when there are so many on details pages.

end

memoize def uncontrolled
values_at("adjusted_uncontrolled_under_care")
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool to see the values_at simplicity here. I'm guessing we're still sticking with methods because of V1? When we (eventually) remove V1, these could all go into a hash ya?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup 👍

EOL
# Calls RegionSummary for each region_type in our collection of regions -- this is necessary because
# RegionSummary queries for only type of region at a time.
memoize def region_summaries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@@ -0,0 +1,18 @@
module LogFriend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rubymine debugger ftw! :)

@@ -0,0 +1,103 @@
require "rails_helper"

RSpec.describe Reports::RegionSummary, {type: :model, reporting_spec: true} do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! I wasn't too sure about having a separate entity that represents a "sum"mary :p, but having a different place for it's specs convinces me :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup the grouping and especially the summing LTFU + undercare totals definitely warrants it.

@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 12, 2021 16:37 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 12, 2021 19:40 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 13, 2021 06:51 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 13, 2021 06:52 Inactive
@rsanheim rsanheim temporarily deployed to simple-review-pr-2961 October 13, 2021 06:56 Inactive
@rsanheim
Copy link
Contributor Author

Added a little architecture diagram in this PR, pasting it here for visibility:

image

@rsanheim rsanheim dismissed harimohanraj89’s stale review October 13, 2021 06:57

added an architecture diagram

@@ -0,0 +1,41 @@
Repository architecture
```plantuml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done via plantuml, which has lots of IDE addons and export tools if you wanna play with it.

@rsanheim rsanheim merged commit faa040e into master Oct 13, 2021
@rsanheim rsanheim deleted the grab-all-regions-data-one-query branch October 13, 2021 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants