Tracing Memory Improvements with Sharrow #754

jpn-- · 2023-10-10T19:09:36Z

Is your feature request related to a problem? Please describe.
When running production-scale ActivitySim simulations with Sharrow turned on, tracing consumes a lot of memory. This is because Sharrow is materializing very large intermediate arrays. For example, in a logit model when computing utility values, we compute $V = X \beta$. The array $X$ has a row for every observation and a column for every data element (i.e. every line in the SPEC file). When not tracing, the data in the $X$ array is assembled, consumed, and released dynamically by numba one row at a time, so that the memory to store all of $X$ is never needed. But for tracing, we need to write out to the trace file a (usually small) subset of the rows of $X$. Currently sharrow has no mechanism to save selected rows from the dynamically created values for $X$, so the only way to trace this data is create all of the rows, which temporarily uses a massive amount of memory.

@dhensle pointed out that tracing outside of a full-scale production run might not work when the effects of the full data are important (e.g. in shadow pricing).

Describe the solution you'd like
Sharrow needs additional capabilities to (a) receive instructions about what trace, and (b) output an array of tracing values that can then be dumped into the tracing outputs.

Describe alternatives you've considered
An alternative would be to implement tracing in an all-or-none mode, and selectively re-run only a subset of households through model components. This would probably be fine in most cases, but as noted above may be undesirable if there are interactions that depend on simulating at scale.

i-am-sijia · 2023-12-15T19:09:44Z

Adding more context ...

In Phase 8 data type optimization work, we closely traced the memory usage of the example ARC model, and reported at the 9/26/2023 project meeting that turning household tracing on with Sharrow created additional memory spikes (for the reason Jeff described above), and also additional run time. Below are memory profiling charts showing the difference in memory requirement just by turning household tracing on vs off.

jpn-- added the Feature New feature or request label Oct 10, 2023

i-am-sijia mentioned this issue Dec 15, 2023

Optimize Data Type Usage #673

Closed

dhensle added this to Phase 9 Work Jan 30, 2024

dhensle added this to Phase 10A Oct 1, 2024

dhensle added the Performance Changes that improve performance label Oct 1, 2024

jpn-- moved this to Punt in Phase 10A Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracing Memory Improvements with Sharrow #754

Tracing Memory Improvements with Sharrow #754

jpn-- commented Oct 10, 2023

i-am-sijia commented Dec 15, 2023

Tracing Memory Improvements with Sharrow #754

Tracing Memory Improvements with Sharrow #754

Comments

jpn-- commented Oct 10, 2023

i-am-sijia commented Dec 15, 2023