-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement the NumArray class based on an STL template. #1899
Comments
…r<double> e. Modifying functions accordingly. SL
…tions instead of arrays. In progress. SL
…Array::clear() instead of vector::clear().
…vector for storage (instead of array). In progress. SL
…tc_poly_array_alloc_inc instead of num_array_alloc_inc in tc_poly.cc.
…fter switching to STL::vector, it'll ALWAYS be an exact allocation. Previously, we always rounded up to the next allocation increment, but there's no allocation increment anymore. Updating the other code to removes calls to exact.
Ran a set of stress tests to compare running this feature-branch version of stat_analysis that uses the new num_array class (uses vectors) to the development branch version of stat_analysis that uses the original num_array class (uses arrays). Below I show the command line and results of running the command with "/usr/bin/time -v" which gives memory usage and how long it takes to run. |
Here is the command line I used for both version of stat_analysis: /usr/bin/time -v ./stat_analysis -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -job aggregate_stat -line_type MPR -out_line_type CNT -by OBS_SID,FCST_VAR,FCST_LEV -out MPR_to_CNT_by_station.out |
Here is the tar-ball that contains the input files: |
So looking at the "time -v" output from the two large, stress tests, overall the new version (from this feature branch) uses quite a bit less memory but takes just slightly longer to run. Older version uses more memory but runs slightly faster. Note, we look at Maximum resident set size: "A process's resident set size is the amount of memory that belongs to it and is currently present (resident) in RAM (real RAM, not swapped or otherwise not-resident)." Here is the results from the feature branch version (uses new num_array class)
Here is the results from the development branch version (uses original num_array class)
|
Tried some vector techniques within the num_array class to try and speed up stat analysis (in regards to the big stress test). Within num_array.cc, modified the clear() function to call e.reserve(80); Played around with several reserve values and ran several smaller stress tests along with some iterations of the large stress test. From Slack: "Initial results from the big stress (with 80 "-lookin fake_test_data") does not show an improvement. I used a reserve value of 80. It took a bit more time to run and used more memory compared to the baseline (from yesterday without reserve):"
|
Also tried running with a much bigger reserve number of 880. From Slack: "Here are the results of the large stress test using 880 for the reserve value:"
|
Today, tried one more test: to check using the swap function in erase(), It actually uses slightly less memory but took longer to run, so not worth it.
|
So we will leave the num_array class as is. It use 23% less memory but takes 5% longer to run...this deemed acceptable. |
Forgot to mention that initially, I ran an regression test to makes sure the output is exactly the same between this feature branch and the development branch: mkdir -p /d1/projects/MET/MET_pull_requests/met-10.1.0/met-10.1.0_beta4/feature_1899 |
* Per issue #1899, changed the main storage array (double * e) to vector<double> e. Modifying functions accordingly. SL * Per issue #1899: removed commented out code. Still in progress. SL * Per issue #1899: modified more functions to use vector or vector functions instead of arrays. In progress. SL * Per issue #1899: updated add() functions to use push_back to add values to vector. SL * Per #1899, correct logic in NumArray::init_from_scratch() to call NumArray::clear() instead of vector::clear(). * Per issue #1899: added some temporary print statements for debugging. SL * Per issue #1899: added some more temporary print statements for debugging. In progress. SL * Per issue #1899: added some temporary print statements for debugging. In progress. SL * Per issue #1899: replaced Nelements with n_elements(). SL * Per issue #1899: modified extend function some more for using vectors. In progres. SL * Per issue #1899: continued to modify some functions for using a base vector for storage (instead of array). In progress. SL * Per issue #1899: removed / added a few temporary print statements for debugging. SL * Per #1899, this change seems to clearly be a bug. We should be using tc_poly_array_alloc_inc instead of num_array_alloc_inc in tc_poly.cc. * Per issue #1899: commented out some print statements for now. For running make test. SL * Per #1899, updating NumArray class to use STL::vector. * Per #1899, I removed the exact option from NumArray::extend() since after switching to STL::vector, it'll ALWAYS be an exact allocation. Previously, we always rounded up to the next allocation increment, but there's no allocation increment anymore. Updating the other code to removes calls to exact. * Per #1899, removing debug cout statements. * Per #1899, cleaning up one leftover debug cout line. * Per issue #1899: updated the erase() funciton to use vector.clear() and vector.reserve(). SL * Per issue: #1899, cleaned up clear() and erase() functions. SL Co-authored-by: Seth Linden <linden@kiowa.rap.ucar.edu> Co-authored-by: John Halley Gotway <johnhg@ucar.edu>
Describe the Enhancement
This issue arose during development for MET #1875. For that issue, we found/fixed a bug that caused Stat-Analysis to consume way too much memory. However testing for that issue revealed an opportunity for additional optimization of memory usage, as described in this comment.
When running jobs over many, many combinations of station id, variable, level, and lead times using the -by option several times (over 30,000 jobs!), the Stat-Analysis tool consumes a great deal of memory. For each case, Stat-Analysis stores several NumArray objects to store arrays of values (i.e. fcst, obs, climo mean, stdev, and so on). However in MET version 10.0.0, the default allocation increment for NumArray is 1000. So any array with length less than 1000 actually consumes the same amount of memory an array of length 1000.
For the test described in comment, each case processes less than 100 time points. And for that we changed the default NumArray allocation increment from 1000 down to 100, which led to approximately a 75% reduction in memory usage for these two tests.
While running tens of thousands of jobs in Stat-Analysis is not done routinely, it is a good stress test and reveals an opportunity for optimization. Recommend that we reimplement the NumArray class using an STL template (like array or vector) rather than managing chunks of memory ourselves. Hopefully that'll slim down our memory usage.
When these changes are available on a feature branch for testing, recommend that we:
Time Estimate
3 days.
Sub-Issues
Consider breaking the enhancement down into sub-issues.
No sub-issues needed.
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
Split 2793541 and 2702691
Define the Metadata
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
No impacts.
Enhancement Checklist
See the METplus Workflow for details.
Branch name:
feature_<Issue Number>_<Description>
Pull request:
feature <Issue Number> <Description>
Select: Reviewer(s) and Linked issues
Select: Repository level development cycle Project for the next official release
Select: Milestone as the next official version
The text was updated successfully, but these errors were encountered: