Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inspect causing big performance regression #2908

Closed
valentinsulzer opened this issue Apr 30, 2023 · 15 comments · Fixed by #2909
Closed

inspect causing big performance regression #2908

valentinsulzer opened this issue Apr 30, 2023 · 15 comments · Fixed by #2909
Labels
priority: high To be resolved as soon as possible

Comments

@valentinsulzer
Copy link
Member

Screenshot 2023-04-30 at 1 17 59 PM

Screenshot 2023-04-30 at 1 18 42 PM

This is due to #2862

@agriyakhetarpal can you look into a way to make this faster? Otherwise we should revert, it's not worth the functionality gain

@valentinsulzer valentinsulzer added the priority: high To be resolved as soon as possible label Apr 30, 2023
@valentinsulzer
Copy link
Member Author

valentinsulzer commented Apr 30, 2023

This is a huge performance hit in some cases (e.g. repeated solving of the SPM is now 10x slower) so we should fix immediately and make a 23.4.1 release
Screenshot 2023-04-30 at 1 23 01 PM

Maybe this helps https://stackoverflow.com/questions/17407119/python-inspect-stack-is-slow

@valentinsulzer
Copy link
Member Author

Also weird that the benchmarks never caught this, are they working now @Saransh-cpp ?

@agriyakhetarpal
Copy link
Member

Since this is immediate I am happy to revert — though I will get back on this and try some other faster implementation in the next week, wasn't aware of such a potential slowdown 😅 Where can I run the above profiler?

@valentinsulzer
Copy link
Member Author

It's snakeviz, you can pip install then run in ipython terminal

> %load_ext snakeviz
> %snakeviz function_to_profile()

@valentinsulzer
Copy link
Member Author

valentinsulzer commented Apr 30, 2023

Potential solutions are:

  • find a faster implementation of doing the same thing
  • use some form of caching so this is only calculated once per register call
  • add tagging manually as an input to the register function
  • remove altogether

The functionality this provides is not very critical compared to the performance cost, so I agree just removing it and doing a quick patch release is the best way forward until we figure out a better way

@agriyakhetarpal
Copy link
Member

I agree with removing altogether and releasing a patch for now.

  • We can indeed explore a caching option, albeit will some overhead still remain when a model is called and its citation is registered for the first time when running an experiment? In that case a faster inspect.stack() implementation altogether would be better or by modifying all __call__ methods with a metaclass

  • Preserving a dict of models mapped to their citations hardcoded manually is also a solution

I am just figuring out some end-semester assignments and submissions so it will take me a few days to try these ways. Please go ahead and reopen #1182 for the time being

valentinsulzer added a commit that referenced this issue May 1, 2023
This was referenced May 1, 2023
valentinsulzer added a commit that referenced this issue May 2, 2023
@Saransh-cpp
Copy link
Member

Saransh-cpp commented May 2, 2023

Also weird that the benchmarks never caught this, are they working now @Saransh-cpp ?

The benchmarks were indeed failing on develop, but the benchmark workflow was fixed after the inspect PR was merged. Also, benchmarks only run on push and would not run on PRs from forks (I am not sure why?).

Now they're passing!

@valentinsulzer
Copy link
Member Author

We should allow it to run on PRs from forks

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented May 5, 2023

@tinosulzer, I tried to profile a faster implementation (it turns out that using inspect.currentframe is miles faster than inspect.stack because it does not create a stack trace). However, I am not sure if I'm correctly interpreting the visualisations I get from snakeviz – here's what I am running:

# In the ipython shell
import pybamm
%load_ext snakeviz
%snakeviz pybamm.lithium_ion.SPM()

Before

image

and the results for pybamm.citations.register(), and for _caller_name() that gets called internally by register()

ncalls tottime percall cumtime percall filename:lineno(function)
1 7.4e-06 7.4e-06 0.03022 0.03022 citations.py:109(register)
1 9.9e-06 9.9e-06 0.03021 0.03021 citations.py:61(_caller_name)

Now

image

ncalls tottime percall cumtime percall filename:lineno(function)
1 6.4e-06 6.4e-06 1.35e-05 1.35e-05 citations.py:111(register)
1 4.2e-06 4.2e-06 6.4e-06 6.4e-06 citations.py:61(_caller_name)

which seems to be at least 2000x faster if we compare cumulative times (the earlier takes ~27% of the time taken to call SPM, which then dropped down to 0.02% in the latter); assuming I am reading the results right. This can be sped up even more if we use sys._getframe instead, but using internal functions from sys is generally discouraged.

Could you perhaps write the code for a longer and very complex PyBaMM experiment that I can try to profile (such as the ones running behind your screenshots above) just to see if the results are scaling properly and do not cause any performance regressions?

@valentinsulzer
Copy link
Member Author

> import pybamm
> %load_ext snakeviz
> %snakeviz model = pybamm.lithium_ion.DFN()
> sim = pybamm.Simulation(model) # should be very fast, no need to profile
> %snakeviz sim.solve([0,3600]) 
> %snaleviz sim.solve([0,3600])

The final solve is repeated twice because several things get cached the first time. Ideally, the second time, almost all the time should be spent in the _integrate method of the solver.

Also repeat the whole thing with pybamm.lithium_ion.SPM(), which is faster and therefore any overheads are a larger percentage of the total runtime.

It looks like using inspect.currentframe will be a sufficient fix.

@agriyakhetarpal
Copy link
Member

Thanks; for initialising the DFN model, the performance hit with register is 0.01–0.03%. With sim.solve([0, 3600]), the mean comes out to be around 0.02%, and on running sim.solve([0, 3600]) again, it comes out as about 0.15% (15 ncalls). I repeated the process for ten iterations (for SPM as well below) to make sure I'm not receiving any statistical outliers and that the behaviour was okay.

The second time, _integrate always takes ~94–95% of the total time, which, as you said, should be ideal:

image

Similar results arise for the SPM model: the performance hit for register is 0.01–0.03%, though on the second iteration of sim.solve([0, 3600]), it shoots up to 1.5–1.6% (14 ncalls). Since things are cached, _integrate uses 88% of the total time:

image

which means registering citations for the SPM model impacts it just slightly more than doing so for the DFN model – is this performance reasonable enough to open a PR, or should I try to improve upon this more?

@valentinsulzer
Copy link
Member Author

Sorry I forgot to reply to previous discussion. The speedup is great, but it's surprising that it's being called 14 times on the second solve. Can you find a way to cache the citation tags?

@agriyakhetarpal
Copy link
Member

Yes sure, I'll give that a try and convert that to a draft – would you suggest using functools or would some basic caching with a dictionary suffice? If we're aiming for a best-case scenario, the time taken to read the citation from it should hopefully be negligible compared to registering it

@valentinsulzer
Copy link
Member Author

Either is fine

@agriyakhetarpal
Copy link
Member

Just using @cache from functools (under the assumption that a built-in way would be better than writing a custom cache) and without any other changes halves the total time taken to about ~0.87% (averaged out across 10 times) on the second time a simulation of the SPM model is solved with sim.solve([0, 3600]), and this value keeps getting smaller in later iterations, but the number of ncalls remains the same due to some reason.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: high To be resolved as soon as possible
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants