Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Performance for monthly active users API endpoint #234

Merged
merged 2 commits into from
Jul 14, 2020

Conversation

johnbaldwin
Copy link
Contributor

@johnbaldwin johnbaldwin commented Jul 12, 2020

This PR is to improve the performance of Figures API to improve Figures UI page loading and prevent timeouts on retrieving monthly active users, which gets slow as we're currently pulling live data for the current month

Figures currently performs a live query on the edx-platform
courseware.models.StudentModule model, but the query is too slow when there are many StudentModule records. The rough order of magnitude (ROM) for "many" appears to be about 10^6.

There are two commits.

  • First commit updates the data model
  • Second commit updates the pipeline and metrics code

The data model update adds 'mau' field to 'SiteDailyMetrics' model

The pipeline code now populates 'SiteDailyMetrics.mau' field with the latest numbers for the month as of when the pipeline is run.

The slow query causes an API timeout, degrading performance for Figures UI.

This commit provides the storage for this metric so the following commit can update the pipeline code and the 3rd commit to update the API code

Removed the old MAU calculation code which is now dead code and removed the tests

@johnbaldwin johnbaldwin changed the title Added 'mau' field to 'SiteDailyMetrics' model Improve Performance for monthly active users API endpoint Jul 12, 2020
@codecov-commenter
Copy link

codecov-commenter commented Jul 12, 2020

Codecov Report

Merging #234 into master will increase coverage by 0.05%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #234      +/-   ##
==========================================
+ Coverage   91.30%   91.36%   +0.05%     
==========================================
  Files          39       40       +1     
  Lines        1956     1968      +12     
==========================================
+ Hits         1786     1798      +12     
  Misses        170      170              
Impacted Files Coverage Δ
figures/mau.py 100.00% <100.00%> (ø)
figures/metrics.py 87.74% <100.00%> (-0.24%) ⬇️
...s/migrations/0011_add_mau_to_site_daily_metrics.py 100.00% <100.00%> (ø)
figures/models.py 97.02% <100.00%> (+0.03%) ⬆️
figures/pipeline/site_daily_metrics.py 96.82% <100.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1565694...887f28e. Read the comment docs.

@johnbaldwin johnbaldwin force-pushed the john/update-sdm-active-users branch from 4eee541 to fdb04b4 Compare July 12, 2020 21:02
@johnbaldwin johnbaldwin marked this pull request as ready for review July 12, 2020 21:06
"""
month_for = datetime.datetime.utcnow()
site_sm = figures.sites.get_student_modules_for_site(site)
curr_sm = site_sm.filter(modified__year=month_for.year,
modified__month=month_for.month)
return curr_sm.values('student__id').distinct().count()
return curr_sm.values('student__id').distinct()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this so we're able to use this function to retrieve the distinct ids, then we can get the count from the querset for when we need the count

@johnbaldwin johnbaldwin force-pushed the john/update-sdm-active-users branch from fdb04b4 to 21d88df Compare July 13, 2020 00:11
@OmarIthawi

This comment has been minimized.

@johnbaldwin
Copy link
Contributor Author

johnbaldwin commented Jul 14, 2020

#234 (comment)

@OmarIthawi I didn't need to do the 3rd commit, which was going to update the viewset. Switching logic at figures.metrics took care of it, thanks to abstraction.

I'll revise the git commit message

@johnbaldwin
Copy link
Contributor Author

Thanks Omar!

The purpose of this is to improve API performance. Figures currently
performs a live query on the `edx-platform`
`courseware.models.StudentModule` model, but the query is too slow when
there are many `StudentModule` records. The rough order of magnitude
(ROM) for "many" appears to be about 10^6.

The slow query causes an API timeout, degrading performance for Figures
UI.

This commit provides the storage for this metric so the following commit
can update the pipeline code and metrics code
The purpose of this is to improve API performance. See the previous
commit with SHA 92ea8c4 for an
explanation.

This commit adds the pipeline code to populate 'SiteDailyMetrics.mau'
with the latest numbers for the month as of yesterday

We also clean up the old live MAU collection from StudentModule,
removing dead coe and the tests that call the dead code
@johnbaldwin johnbaldwin force-pushed the john/update-sdm-active-users branch from 21d88df to 887f28e Compare July 14, 2020 17:11
@johnbaldwin johnbaldwin merged commit a5e41d5 into master Jul 14, 2020
@johnbaldwin johnbaldwin deleted the john/update-sdm-active-users branch July 14, 2020 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants