-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review sandag one zone vs two zone setup performance #431
Comments
I'll take the lead for this and coordinate with @jpn--. Thanks @esanchez01 for updating the examples. Can you also share the logfiles from your runs? |
@bstabler I sent over the location of the 1 & 2 zone logfiles through email. |
@jpn-- and I did some initial review and found the following: 1 - the expression evaluation for the tour_mode_choice is slower for ONE zone versus TWO zone ONE ZONE 16/06/2021 23:47:30 - INFO - activitysim.core.mem - trace_memory_info tour_mode_choice.work.simple_simulate.chunk_1.eval_nl.eval_utils.add.expression_values rss: 62.41GB used: 122.29 GB percent: 38.2% TWO ZONE 16/06/2021 15:22:09 - INFO - activitysim.core.mem - trace_memory_info tour_mode_choice.work.simple_simulate.chunk_1.eval_nl.eval_utils.add.expression_values rss: 62.72GB used: 137.57 GB percent: 43.0% 2 - the expression files are very similar ONE ZONE TWO ZONE 3 - Some more specific logging within a process shows roughly the same number of tours being processed in the chunk and similar tour mode shares but longer runtimes ONE ZONE Line 2703 in mp_households_0-activitysim.log The chunk runtime is 6 seconds for 7029 tours and the mode share is reported DRIVEALONEFREE 2044 TWO ZONE Line 2678 in mp_households_0-activitysim.log The chunk runtime is 166 sec for 7087 tours and the mode share is basically the same DRIVEALONEFREE 2028 4 - so it seems we need to debug/profile a bit to figure out exactly why. The network_los engine/setup, and potentially the MAZ to MAZ costs are what's different so it's likely something in there. Without actually profiling, we're wondering about the bellow note in the quick_loc_df() called by get_mazpairs()....
|
We're going to investigate some more to confirm the hypothesis and to hopefully develop an improvement. We'll look at the DaySim and CTRAMP code for ideas as well. Testing with the full scale setup will be important. This would benefit both 2 and 3 zone setups. |
@esanchez01- I'm in the process of testing this improvement on a full scale version of your model and hoping you can test it as well? |
@bstabler, I manually made the change you mentioned to v0.9.9.1, as this is the version my previous tests ran on (no chunking update), and reran the full SANDAG 2-Zone PSRC-based example. There are some noticeable run time improvements when compared to the equivalent 2-Zone run I shared the log files for:
I will rerun the 3-Zone example with this change as well. |
I like these improvements. Tour mode choice is still quite a bit slower than the 1 zone example, which, if I recall, was a couple minutes, so I'll keep investigating. @esanchez01 - can you post the 1 zone example runtimes by submodel for reference? |
Here are the results of the 3-Zone rerun:
|
Complete run time list for an optimized 1-Zone run:
|
It's still on my list to further investigate the performance difference for trip mode choice after the improvement: 1.2 min for 1 zone versus 15.7 for 2 zone. |
I ran the sandag 2 zone setup on my server with the maz-maz cost lookup improvement above + the chunking improvement on #444 and trip_mode_choice ran in 10 minutes. I then turned off the od_skims['DISTWALK'] and od_skims['DISTBIKE'] expressions in trip mode choice by instead using a value of 0.5 and then restarted the run at trip mode choice and trip_mode_choice ran in 8.7 minutes. That's a bit faster, but not a lot faster so I guess the bulk of the runtime difference is somewhere else. Also, the 2 zone setup does more work. The 1 zone setup uses just the one data set - the od_skims['DIST'] at the TAZ level - where as the 2 zone setup uses the MAZ to MAZ distance table for walk, a separate MAZ to MAZ distance table for bike (with a different set of records), and then both are backstopped by TAZ skims - od_skims['DISTWALK'] and od_skims['DISTBIKE']. Plus, both od_skims['DISTWALK'] and od_skims['DISTBIKE'] are each called twice in the expressions. This duplication can be eliminated by calling these only once in the trip_mode_choice_preprocessor and then referencing them in trip_mode_choice, but I tried this and it only saved less than a minute of time, consistent with the "turning them off" exercise above. @esanchez01 can you re-run the 1, 2, and 3 zone setup with the maz-maz cost lookup improvement above + the chunking improvement on #444 and repost the submodel runtimes? I'd like to see where the biggest runtime difference is between 1 and 2 zone setups and then investigate some more. Thanks. |
@bstabler, I am in the process of re-running all 3 set ups -- I trained and ran the 1 and 2 zone setups so far. I am attaching the run times for the completed runs and will update as more runs complete. Compared to the 1 zone setup, the 2 zone setup sees a significant run time increase for the mode choice sub models. I am wondering if this is due to Some notes:
Here is the run time summary: timing_log_zones.xlsx UPDATES:
|
In my experience, hybrid_uss performs best. I think hybrid_rss (and really just rss for that matter) doesn't produce very reliable estimates of memory use/need for this purpose. I have some updated performance notes in #444. |
I will run SANDAG 1, 2, and 3 zone examples on my server and report runtimes. |
Once #439 is working, we'll have new SANDAG 1, 2, and 3 example runtimes to inform future work efforts. |
Below are updated runtimes for 100% samples of the in-development example_sandag 1, 2, and 3 zone system setups on my Intel Xeon 2.1 GHz 32 core 512 GB RAM server with 28 num processors configured and chunking disabled. The runtimes for the 3 configurations are similar, which suggests the differences we saw earlier in the runtimes between the setups was likely due to chunking due to the use of less RAM. Long term we can certainly improve chunking, but simply using more RAM for this specific model configuration goes a long way to a more performant setup.
|
@bstabler, thank you for sharing. If possible, can you add the total (sub model + non sub model) run times? Also, it would be good to know what the RAM usage of each run was to get a sense of how much may be needed and how the different zone systems compare. |
@bstabler, these are interesting results. I am surprised to see the location choices of 1-zone performed worse than the 2-zone system, some are even worse than the 3-zone system. For example, the trip location time for 1, 2, and 3 zones are 40.5, 37.4, and 35.7 minutes (this is the most time consuming step also, roughly 30-40% of total runtime). This is counter-intuitive, because 1-zone location choice operates at TAZ rather than the MAZ level in 2 and 3 zone systems, shouldn't it be faster or at least not worse than the 2/3 zone system? |
I was also a bit confused by these results. While super-encouraging from a 3-zone perspective, it is my understanding that there are simply fewer calculations in the 1- and 2-zone systems due to the absence of TVPB, so why is there no performance advantage to these simple setups? Is this a matter of more time having been spent optimizing the 3-zone? If so, I think we need to dig into the 1- and 2- zone performance. |
@joecastiglione , I agree we need to dig into the 1 and 2 zone performance more. Especially, some of us will use the 1 or 2 zone system as their region's production model. |
@bstabler, I am attaching a workbook that breaks down the 1, 2, and 3 zone run times (ran on a SANDAG server) and contains notable logging for certain sub models that appeared to have performance issues. This includes what was covered during the 9/14 technical call. |
Thanks @esanchez01. I looked into the reported runtime differences for your 1, 2, and 3 zone setups and have a couple of findings and some improvements (in the commit above) to the 2 zone setup to reduce runtimes.
SANDAG server
My server
I believe some of the increase in runtimes for trip mode choice from the 1 zone to 2 zone setup is due to memory and chunking since my run without chunking has a more reasonable increase in runtime from 1 zone to 2 zone to 3 zone setup. However, I do not expect my chunking disabled tour mode choice setup to go from 1 min for a 1 zone setup to 14 min for a 2 zone setup so I looked into this.
I then ran the complete 2 zone example again, using the same setup as mentioned in an earlier comment but with these revisions, and below are the revised runtimes. The submodels run about 20% faster, and are actually faster than the 1 zone system setup. I did not revise the 3 zone expression files, but they could also be improved, as well as probably the 1 zone system files as well, but they use numpy array indexing instead of pandas DataFrame indexing.
We should think about ways to improve MAZ to MAZ impedance storage and retrieval speed to speed this up. Possible ideas are by using numpy arrays, a different smarter index/search method, and/or caching. @esanchez01 - can you test on your end and do a more thorough expression file update of your various setups? You may also want to update the full scale maz_to_maz_bike.csv on the activitysim resources repo by dropping DIST if you do not want to blend general distance as well. P.S. I also noticed that the transit skims in the expressions have |
@bstabler, I included your changes in the SANDAG 2 zone setup and reran on our server. Both the sub models and the overall run ran about 25% faster -- down to ~140 minutes and ~180 minutes, respectively. Tour mode choice saw the most improvement (~19 minutes to 3 minutes) and improvements were seen across other sub models, consistent with what was observed from your test. I am attaching the results here: maz_maz_update_runtime.xlsx I will look further into the expression files and make adjustments. NOTE: The 1, 2, and 3 zone run times I previously shared in the workbook are all of MTC based setups, whereas the SANDAG 2 zone setup is PSRC based (as you tested). I did this to allow a more apples to apples comparison of the different zone systems. I believe this to be the reason why there is a big difference in the 2 zone trip mode choice run time between your test and mine, as you noted. For the reruns, I used the PSRC based setup and observed similar trip mode choice run times as in your tests. |
@esanchez01 - did you fix the /100 issue as well? If so, can you PR your improvements to the fewer-maz-maz-data-calls-for-sandag-two-zone branch? |
This issue will be closed once we update the examples with the reduced and reused expressions |
And we'll create a new issue to look into the trip_destination model since it remains the slowest submodel. |
@bstabler, I have not yet fixed the /100 issue but will get to it, as well as updating additional expressions, soon. Since the SANDAG examples pull MTC and PSRC expression files, I won't be making updates directly to these files but probably just storing the updated expression files in the SANDAG example directories. |
We're going to create a new issue for the /100 issue. |
The text was updated successfully, but these errors were encountered: