-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accessing tap skims is much slower than taz skims #167
Comments
Pull request #190 fixes this, and other performance issues identified by Wu in https://github.com/UDST/activitysim/wiki/Multiple-Zone-Systems-Tests Runtime for all NetworkLOS methods including tap_skims (get_tappairs3d) is now scaling pretty linearly with respect to index vector length. Runtime for best_transit_path (VECTOR_TEST_SIZE 1014699) is now 401.42 seconds (6.7 minutes) on my 2.5 GHz MacBook Pro, down from 6.5 hours on SANDAG 2.30GHz Windows Server. Here are before/after runtimes on my Mac (except for best_transit_path, which I didn't waste time running in before mode, given Wu's experience with before version...)
|
This is great Jeff! I will do a test and update the test memo by 'faking' a SANDAG ABM setting. |
I look forward to seeing the updated test results. There is probably still some room for performance improvement of best_transit_path but probably more on the order of a factor of two then sixty... |
As discussed on our project call today, we'll look into speeding up |
|
There were several issues with the tests in general, and get_taps_mazs in particular that were negatively affecting the benchmarks. First off, the timing for the tests included the overhead of randomly selecting the input vectors/dfs. I was doing this with pandas DataFrame sample, which turns out to be pretty slow for large sample sizes (like 6 seconds to sample 20000000 rows from maz_df.) This time was essentially eliminated by replacing pd.DataFrame.sample with np.random.choice on the maz_df index. I was able to replace the pandas sample with a numpy version in all tests except get_maz_pairs and get_maz_tap_pairs, resulting in improved performance in all those tests. Note that this change did not speed up the actual NetworkLOS functions, it just made the tests more accurate. Secondly, unlike the other test functions, the test function get_taps_mazs() in example_multi/simulation.py was making two calls to get_taps_mazs to exercise the function with and without the attribute selection capability. This resulted in almost doubled runtimes for get_taps_mazs. (I say 'almost' because the attribute selection capability filters out null attribute rows and so returns fewer mazs for each taz and so runs slightly faster than the non-attribute version.) Thirdly, the per unit runtimes for the tests are slightly misleading for comparisons because get_taps_mazs returns an average of 16 maz/tap rows for each input maz. When teh unit runtimes for get_taps_mazs is expressed in terms of rows returned rather than input rows, its performance looks a lot more similar to the other NetworkLOS functions.
New benchmarks:
As I remarked, the timings for the get_taps_mazs attribute feature are slightly faster than the version that just returns the list of pairs (at least for 'drive_distance') due to the filtering of missing values:
Note: these changes are incorporated in the example_multi branch that I just pushed. I will be submitting a pull request to the master branch shortly. |
As part of the multiple-zone system pilot testing, @wusun2 found that accessing tap skims, maz to tap pandas table data, and other data sets, is much slower than accessing the taz skims, which use direct position indexing. The tap skims are using lookups for zone numbers, and the pandas tables are using loc (support for labels) instead of iloc (positions). We need to revise this in the next iteration of the pilot.
The text was updated successfully, but these errors were encountered: