Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement the NumArray class based on an STL template. #1899

Closed
8 of 20 tasks
JohnHalleyGotway opened this issue Aug 31, 2021 · 9 comments · Fixed by #1944
Closed
8 of 20 tasks

Reimplement the NumArray class based on an STL template. #1899

JohnHalleyGotway opened this issue Aug 31, 2021 · 9 comments · Fixed by #1944
Assignees
Labels
component: code optimization Code optimization issue MET: Aggregation Tools priority: medium Medium Priority reporting: DTC NCAR Base NCAR Base DTC Project reporting: DTC NOAA R2O NOAA Research to Operations DTC Project requestor: Community General Community type: enhancement Improve something that it is currently doing
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Aug 31, 2021

Describe the Enhancement

This issue arose during development for MET #1875. For that issue, we found/fixed a bug that caused Stat-Analysis to consume way too much memory. However testing for that issue revealed an opportunity for additional optimization of memory usage, as described in this comment.

When running jobs over many, many combinations of station id, variable, level, and lead times using the -by option several times (over 30,000 jobs!), the Stat-Analysis tool consumes a great deal of memory. For each case, Stat-Analysis stores several NumArray objects to store arrays of values (i.e. fcst, obs, climo mean, stdev, and so on). However in MET version 10.0.0, the default allocation increment for NumArray is 1000. So any array with length less than 1000 actually consumes the same amount of memory an array of length 1000.

For the test described in comment, each case processes less than 100 time points. And for that we changed the default NumArray allocation increment from 1000 down to 100, which led to approximately a 75% reduction in memory usage for these two tests.

While running tens of thousands of jobs in Stat-Analysis is not done routinely, it is a good stress test and reveals an opportunity for optimization. Recommend that we reimplement the NumArray class using an STL template (like array or vector) rather than managing chunks of memory ourselves. Hopefully that'll slim down our memory usage.

When these changes are available on a feature branch for testing, recommend that we:

  • Configure DockerHub to build that feature branch.
  • Coordinate with @lindsayrblank rerun tests on that feature branch.

Time Estimate

3 days.

Sub-Issues

Consider breaking the enhancement down into sub-issues.
No sub-issues needed.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Split 2793541 and 2702691

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required: John HG
  • Select scientist(s) or no scientist required: No scientist needed but coordinate with @lindsayrblank for testing.

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added type: enhancement Improve something that it is currently doing priority: medium Medium Priority requestor: Community General Community alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle component: code optimization Code optimization issue MET: Aggregation Tools labels Aug 31, 2021
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.1.0 milestone Aug 31, 2021
@JohnHalleyGotway JohnHalleyGotway self-assigned this Aug 31, 2021
@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle label Aug 31, 2021
sethlinden pushed a commit that referenced this issue Sep 22, 2021
…r<double> e. Modifying functions accordingly. SL
@TaraJensen TaraJensen added reporting: DTC NCAR Base NCAR Base DTC Project reporting: DTC NOAA R2O NOAA Research to Operations DTC Project and removed alert: NEED ACCOUNT KEY Need to assign an account key to this issue labels Sep 23, 2021
sethlinden pushed a commit that referenced this issue Sep 24, 2021
…tions instead of arrays. In progress. SL
sethlinden pushed a commit that referenced this issue Sep 24, 2021
JohnHalleyGotway added a commit that referenced this issue Sep 27, 2021
…Array::clear() instead of vector::clear().
sethlinden pushed a commit that referenced this issue Oct 4, 2021
sethlinden pushed a commit that referenced this issue Oct 4, 2021
sethlinden pushed a commit that referenced this issue Oct 5, 2021
sethlinden pushed a commit that referenced this issue Oct 6, 2021
…vector for storage (instead of array). In progress. SL
sethlinden pushed a commit that referenced this issue Oct 6, 2021
JohnHalleyGotway added a commit that referenced this issue Oct 7, 2021
…tc_poly_array_alloc_inc instead of num_array_alloc_inc in tc_poly.cc.
sethlinden pushed a commit that referenced this issue Oct 7, 2021
JohnHalleyGotway added a commit that referenced this issue Oct 7, 2021
…fter switching to STL::vector, it'll ALWAYS be an exact allocation. Previously, we always rounded up to the next allocation increment, but there's no allocation increment anymore. Updating the other code to removes calls to exact.
sethlinden pushed a commit that referenced this issue Oct 7, 2021
@sethlinden
Copy link
Contributor

Ran a set of stress tests to compare running this feature-branch version of stat_analysis that uses the new num_array class (uses vectors) to the development branch version of stat_analysis that uses the original num_array class (uses arrays). Below I show the command line and results of running the command with "/usr/bin/time -v" which gives memory usage and how long it takes to run.

@sethlinden
Copy link
Contributor

Here is the command line I used for both version of stat_analysis:

/usr/bin/time -v ./stat_analysis -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -lookin fake_test_data -job aggregate_stat -line_type MPR -out_line_type CNT -by OBS_SID,FCST_VAR,FCST_LEV -out MPR_to_CNT_by_station.out

@sethlinden
Copy link
Contributor

sethlinden commented Oct 11, 2021

Here is the tar-ball that contains the input files:

fake_test_data.tar.gz

@sethlinden
Copy link
Contributor

sethlinden commented Oct 11, 2021

So looking at the "time -v" output from the two large, stress tests, overall the new version (from this feature branch) uses quite a bit less memory but takes just slightly longer to run. Older version uses more memory but runs slightly faster.

Note, we look at Maximum resident set size: "A process's resident set size is the amount of memory that belongs to it and is currently present (resident) in RAM (real RAM, not swapped or otherwise not-resident)."

Here is the results from the feature branch version (uses new num_array class)

    User time (seconds): 2803.46
    System time (seconds): 17.20
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 47:01.24
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 760896
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 186849
    Voluntary context switches: 75
    Involuntary context switches: 2977
    Swaps: 0
    File system inputs: 0
    File system outputs: 2351864
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Here is the results from the development branch version (uses original num_array class)

    User time (seconds): 2661.18
    System time (seconds): 18.28
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 44:39.88
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 3276800
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 815905
    Voluntary context switches: 12
    Involuntary context switches: 2442
    Swaps: 0
    File system inputs: 0
    File system outputs: 2351888
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

@sethlinden
Copy link
Contributor

Tried some vector techniques within the num_array class to try and speed up stat analysis (in regards to the big stress test). Within num_array.cc, modified the clear() function to call e.reserve(80); Played around with several reserve values and ran several smaller stress tests along with some iterations of the large stress test.

From Slack: "Initial results from the big stress (with 80 "-lookin fake_test_data") does not show an improvement. I used a reserve value of 80. It took a bit more time to run and used more memory compared to the baseline (from yesterday without reserve):"

    User time (seconds): 2819.30
    System time (seconds): 15.80
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 47:15.67
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 967416
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 238427
    Voluntary context switches: 23
    Involuntary context switches: 3016

@sethlinden
Copy link
Contributor

Also tried running with a much bigger reserve number of 880.

From Slack: "Here are the results of the large stress test using 880 for the reserve value:"

    User time (seconds): 2748.81
    System time (seconds): 17.02
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 46:06.35
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 3486980
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 868346
    Voluntary context switches: 19
    Involuntary context switches: 17044

@sethlinden
Copy link
Contributor

Today, tried one more test: to check using the swap function in erase(), It actually uses slightly less memory but took longer to run, so not worth it.

    User time (seconds): 2928.14
    System time (seconds): 14.29
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 49:03.63
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 760628
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 11
    Minor (reclaiming a frame) page faults: 186850
    Voluntary context switches: 275
    Involuntary context switches: 3151

@sethlinden
Copy link
Contributor

So we will leave the num_array class as is. It use 23% less memory but takes 5% longer to run...this deemed acceptable.

@sethlinden sethlinden mentioned this issue Oct 14, 2021
12 tasks
@sethlinden
Copy link
Contributor

Forgot to mention that initially, I ran an regression test to makes sure the output is exactly the same between this feature branch and the development branch:

mkdir -p /d1/projects/MET/MET_pull_requests/met-10.1.0/met-10.1.0_beta4/feature_1899
cd /d1/projects/MET/MET_pull_requests/met-10.1.0/met-10.1.0_beta4/feature_1899
/d1/projects/MET/MET_pull_requests/MET/scripts/regression/test_regression.sh feature_1899_num_array develop >& test_regression.log&

@JohnHalleyGotway JohnHalleyGotway removed their assignment Oct 14, 2021
@sethlinden sethlinden linked a pull request Oct 14, 2021 that will close this issue
12 tasks
sethlinden added a commit that referenced this issue Oct 15, 2021
* Per issue #1899, changed the main storage array (double * e) to vector<double> e. Modifying functions accordingly. SL

* Per issue #1899: removed commented out code. Still in progress. SL

* Per issue #1899: modified more functions to use vector or vector functions instead of arrays. In progress. SL

* Per issue #1899: updated add() functions to use push_back to add values to vector. SL

* Per #1899, correct logic in NumArray::init_from_scratch() to call NumArray::clear() instead of vector::clear().

* Per issue #1899: added some temporary print statements for debugging. SL

* Per issue #1899: added some more temporary print statements for debugging. In progress. SL

* Per issue #1899: added some temporary print statements for debugging. In progress. SL

* Per issue #1899: replaced Nelements with n_elements(). SL

* Per issue #1899: modified extend function some more for using vectors. In progres. SL

* Per issue #1899: continued to modify some functions for using a base vector for storage (instead of array). In progress. SL

* Per issue #1899: removed / added a few temporary print statements for debugging. SL

* Per #1899, this change seems to clearly be a bug. We should be using tc_poly_array_alloc_inc instead of num_array_alloc_inc in tc_poly.cc.

* Per issue #1899: commented out some print statements for now. For running make test. SL

* Per #1899, updating NumArray class to use STL::vector.

* Per #1899, I removed the exact option from NumArray::extend() since after switching to STL::vector, it'll ALWAYS be an exact allocation. Previously, we always rounded up to the next allocation increment, but there's no allocation increment anymore. Updating the other code to removes calls to exact.

* Per #1899, removing debug cout statements.

* Per #1899, cleaning up one leftover debug cout line.

* Per issue #1899: updated the erase() funciton to use vector.clear() and vector.reserve(). SL

* Per issue: #1899, cleaned up clear() and erase() functions. SL

Co-authored-by: Seth Linden <linden@kiowa.rap.ucar.edu>
Co-authored-by: John Halley Gotway <johnhg@ucar.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: code optimization Code optimization issue MET: Aggregation Tools priority: medium Medium Priority reporting: DTC NCAR Base NCAR Base DTC Project reporting: DTC NOAA R2O NOAA Research to Operations DTC Project requestor: Community General Community type: enhancement Improve something that it is currently doing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants