-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summing unusably slow #30
Comments
This takes way too long on my machine too, with NumPy 1.8.1. I remember summing array elements going faster. It was not very fast, but it was much faster, if my memory serves well. I'm going to have a closer look. On the long term, the real solution is to have |
I did some timing tests, which I reproduce here for reference:
The calculation time is exponential. I did not expect this at all:
The Python built-in
Completely removing NumPy is as slow:
So, the problem lies entirely within The resulting sum depends on 2000 random variables and |
Is there any progress on this issue? I've just started using |
I have not looked at this yet (having two jobs takes a toll!). Now, this looks like an issue that's probably concentrated on a small part of the code, so I left myself a note for early July: I may have time then to investigate this. |
I use the following code in an ipython cell after the necessary imports:
I get the following output: 14054975 function calls in 4.046 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) So most of the time is spent in |
I then installed
Using the command https://gist.github.com/andyfaff/96ed26a32d9899e2ac5f Those lines are a nested for loop. |
Investigating further with |
Wow, thanks for all these tests. I'll have a close look in the coming days. |
I had a look. What you observe is normal: every time one term is added, the program calculates the derivatives of the new sum with respect to all the variables summed so far (this is needed information), which explains the quadratic time. I thought about how to optimize this. The following is a little technical, but it is useful as a reference: When summing independent variables and adding a new term, none of derivatives of the already calculated sum changes. In an ideal world, they would be kept as is in memory, and only the derivative of the sum with respect to the new term would be registered. In a general sum of two correlated terms, only derivatives with respect to the independent variables that are common to the two terms need to be updated in order to get the derivatives of the sum. The problem is that each time a term is added, the program ends up creating a new dependent variable, so all the previously known derivatives must be eventually copied from the already calculated sum (another independent variable) variable) to the new one. This is natural because in code like Now, it would be great to find a mechanism, at least for sums, through which the derivatives of the terms are not copied when adding a new term: this would make the summation linear in time, for independent variables. The difficulty, is that in general, the derivatives should be copied (like in the example of This "no copy of derivative values" optimization would be quite specific to the sum function, since its partial derivatives are 1 (many existing derivatives would thus not have to be updated, as they are technically only multiplied by a factor of 1). That said, this is such an important case that it is worth investigating the issue. I will think again about this in about a month. It does not look so easy, and there would probably be an overhead for all functions but the sum, but it may be worth it—if at all possible. |
PS: the idea below is not the simplest or most flexible one. See the next comments for a better idea (formal, lazy linear expansions of the calculated functions). I think I know how summing could be sped up: by using the collections.ChainMap class in order to store the derivatives of the sum, based on the derivatives of each term with respect to the independent variables. This ChainMap would replace the current This idea is different from the idea of some kind of "hard links" to existing lists of derivatives (see my previous comment). In fact, the derivatives of a variable with respect to its independent variables is fixed, A difficulty is that ChainMap was introduced in Python 3.3. Can it be backported? In about a month, I will:
|
Note to self: Even multiplication could in principle be accelerated similarly, with a generalization of the ChainMap method: when multiplying to variables, the independent variables that are not shared by the factors could have their derivatives represented as pairs (second_factor, derivatives_of_the_first_factor)—and symmetrically for the other factor. The common independent variables would have a derivative which is calculated by the usual chain rule (uv'+u'v); the new derivatives (that are not simply a factor applied to the old ones) would be stored in a new mapping {variable: derivative} put in front of the ChainMap. In order to keep most of the code as is, the new "factored mappings" (factor, derivatives) should behave like dictionaries, with the factor automatically applied to the derivative, when needed. This method would prevent the derivatives of the factors to be duplicated, at the cost of a computing overhead each time that a derivative needs to be evaluated, so it is not clear that it would be always efficient. In order to solve this problem, a more advanced mechanism could be applied: when the (factor, derivatives) has to calculate too many products between the factor and the derivatives, they could all be calculated and the (factor, derivatives) could be replaced by a newly created derivatives mapping (where the factor has been applied to the original derivatives). This model can be generalized to functions of a single argument: the derivatives of f(g(x,…)) are simply the derivatives of g(…) multiplied by a factor (f'(g(…)). Factors could be combined on the fly: the derivatives represented by (factor1, (factor0, derivatives)) would simply be (factor1*factor0, derivatives), pretty much in the same was as the chain rule is currently applied. I am not completely clear yet on the implications of this idea, but I feel that it ultimately points to replacing the current creation of a new derivatives mapping for each operation (which slows down to quadratice time repeated operations like the sum) by a different structure that refers differently to the independent variables involved. To be studied. |
Note to self: I kept thinking about an optimization for sums, that could also work in the general case. I came up with the following principles:
With this method, summing independent variables is linear in time (instead of quadratic). More generally, summing expressions that do not contain identical independent variables is linear too. Even more generally, the calculation time is essentially proportional to the number of variable occurrences (a variable can count for more than one) in the expression (after each term is evaluated). Optimizations can be performed on expression/trees as needed. For example:
This approach is simpler than the ChainMap approach above, which insisted on always evaluating chain rule expressions, while leaving untouched sub-expressions and correcting them for the needed variables (variables that belong to more than one term). It does take more memory, though, but in practice many functions have arity one, so the first optimization above should be good.. Furthermore, the depth of the expression trees should be small (a high tree would be obtained with sin(sin(sin(sin(…)))), which is a rare case—which would be handled by the optimization above, if needed. |
Note to self: the "lazy evaluation of derivatives via formal expression" idea above does not solve the problem of the slowness of sum(). In fact, constructing a sum of numbers with uncertainties is fast (linear in time), but the natural way of calculating the final uncertainty is slow (quadratic, like before). This "natural" way consists in recursively calculating the derivatives of each intermediate sum (i.e. of …+…, then of …+…+…, etc.). This is natural because in general variables depend on each other, so it makes sense to memorize any intermediate result, as in:
This time complexity thus essentially comes from the fact that Python constructs …+…+…+…+… as pairwise sums ((…+…)+…)+…: each sum then "wants" to calculate its derivatives with respect to the variables involved, which yields a quadratic run time. Now, this is not good for summing many numbers. What we want in this case is a gathering of all the terms in ((…+…)+…)+…, without calculating the derivatives of the intermediate results. This can be done in linear time. However, finding a method that also satisfies the requirements of:
One solution, at least for sums, might be to blend the two ideas above:
In effect, the speed up would be based on the calculation of formal expressions, and on their efficient expand-and-collect operation (which collects the numerical coefficient in front of each independent variable in the linear expansion of the calculated function) for some formulas, in particular the sum of many independent variables, where this can be done linearly in time. This architecture might be general enough to allow for more efficient expand-and-collect strategies to be added (maybe the sum is not a case which is too particular). |
PS: ChainMaps actually do not solve the quadratic time problem, because the lookup of shared variables between terms would become linear, for each additional term (formal coefficients stored as {x:3, y:4}, {z: 5}, {t: 6},…, with each new term requiring a linear lookup so as to combine the new term with any existing term with the same variable). The current situation is that the collection of the terms of a long formal sum is quadratic (with the formal sum being calculated linearly in time). Maybe it is possible to introduce a new class that would make the collection linear in time while preserving the generally good idea of doing the collection for intermediate variables? Additional challenge: have both sum() and cumsum() run fast (linear time). |
Note to self: Goals:
Difficulties:
These requirements seem to essentially imply the following approach (which differs from the ideas above, which do not fully work):
To be done:
|
I've also got an issue with slow summing. Solution for me was to write much simpler implementation of Maybe we can do something like this in uncertainties library? C-implementation can also speed things up. |
@sashabaranov Your implementation is restricted to independent variables, which indeed speeds things up because it does not have to keep track of interdependent variables. The goal of the |
Note to self: when combining numbers with uncertainty, maybe a copy could be avoided by checking whether the numbers have any reference to them ( |
Note to self: the idea of the lazy evaluation of derivatives might work if the final result is the only thing which is stored in memory (instead of keeping in memory the derivatives of each intermediate term, as is currently done). Now, a disadvantage is that the resulting absence of "caching" will slow down situations like x = 1±1, y = f(x), z = g(y), if getting the uncertainty on z before the uncertainty on y (because the calculation of derivatives for z will not cache derivatives for y). Maybe the more technical suggestion above of counting references to quantities would help (intermediate results in x = a+b+c+… would not have the same reference count as x—but this is to be tested, as the devil is in the details). |
Some news: I have started implementing a linear-time version of the idea above (https://github.com/lebigot/uncertainties/tree/master_faster_uncert_calc). Summing 5,000 independent variables went from 17 s to 4 s (MacBook early 2015) and the algorithm should be linear instead of quadratic (I have not yet tested this). But this implementation is recursive (basically evaluating the uncertainty on ((a+b)+c)+d+… based on the uncertainty on (a+b)+c, etc.), and this hits Python's recursion limit, which cannot be increased ad infinitum (increasing it to 50,000 and summing a few tens of thousands of terms crashes). However, I now have a non-recursive implementation in mind. I will implement it and see. |
Note: I turned to SymPy for ideas and summed 5,000 formal terms (with no calculation done whatsoever). After 30 minutes, the calculation had not returned. |
Some great news: Progress report on this issue, based on (incomplete) version 84a27f5 on my laptop: summing 5,000 terms takes only 0.070 s with the new version (down from 17 s with the published version: a 250x speedup). Summing 20x more terms (100,000 terms) takes 20x times more time (1.4 s, down from 170 min, a 7,000x speedup), and summing an additional 10x more terms (1,000,000 terms) takes 10x more time (15 s). Thus, the new calculation is indeed linear in time (instead of quadratic), and absolute calculation times are much, much shorter. Very nice. Now, more work is needed until all the tests pass: the The new implementation is in branch https://github.com/lebigot/uncertainties/tree/master_faster_uncert_calc, which will probably be merged into the main branch (master) at some point, since I don't foresee any blocking point. The remaining difficulties seem to involve mostly NumPy support (49 in 54 unit tests pass). |
The master branch now has the (much) faster implementation. All tests pass. |
The latest release now includes the much faster uncertainty calculation. PyPI release: https://pypi.python.org/pypi?:action=display&name=uncertainties&version=3.0.1 |
Thank you Eric. On 16 August 2016 at 07:00, Eric O. LEBIGOT (EOL) notifications@github.com
Dr. Andrew Nelson |
Thank you @andyfaff for your input on this subject, a year ago: this pushed me to think more about the problem and, ultimately, to find a solution. :) |
Hooray! Great job. |
I just want to make sure this is a limitation of the package, and not my own fault. But it seems that summing large arrays is prohibitively slow. I have several arrays with about 20,000 points, each with errors, but it's impossible to find the mean because it takes too long to calculate. To be more specific, I'm getting the mean to normalize the array
Is there any quicker way this can be done? Thanks.
The text was updated successfully, but these errors were encountered: