Primenumber generator, exploring different implementations and languages
- Starting very simple, doing calculations as one would do by hand
Needed about 8 seconds for primes up to 1E6
- Thinking about what numbers don't have to be tested to decrease the number of computations
Needed about 10 seconds for primes up to 1E6, the longer time is because numpy.int64 is already used.
- Using numpy array operations: Numbers spanning over a range of 100k are tested for the same prime at the same time.
Needed less than 3 seconds for primes up to 1E6. Only one core was free. About 1 second comes from the printing of the numbers.
Needed about 14 seconds for primes up to 10E6 when the other 7 cores are free and the numbers are not printed out.
- Using numpy array operations and multiprocessing: Numbers spanning over a range of 100k are tested for the same prime at the same time and that is done on 8 cores in parallel.
Needed about 4 seconds for primes up to 10E6 when running on 8 cores and the numbers are not printed out.
Needed about 95 seconds for primes up to 100E6 (over 10 minutes total CPU time).
- Only doing the necessary calculations, each number and each prime are calculated consecutively
Needed less than 1 seconds for primes up to 1E6.
Needed about 4 seconds for primes up to 10E6 when the other 7 cores are free and the numbers are not printed out. About 2 seconds comes from the printing of the numbers.
Needed about 7 seconds for primes up to 10E6 when using long instead of int.
- Instead of adding one prime at a time to the vector, enlarge the vector in bigger steps and then fill it once the primes are calculated
Needed about 7 seconds for primes up to 10E6, improvement to primes_cpp_v1.cpp is only few percent.
- Based on v1, but uses long long unsigned int
This didn't increase computation time (as seen in python).
- Use multithreading
- Use three different vectors to store the primes, depending on the size of the number, to save memory
This sped up the computation, but the overall CPU time remained more or less the same.
- Only keep necessary primes to calculate the next primes in memory, as otherwise running out memory
- Use only two different vectors to store the primes, depending on the size of the number, as the number of items in first adds more overhead than savings (32 bit numbers might be stored as 64 bit anyways)
Slightly faster on real time, but slightly more CPU time. Saving might have some impact as well.
- When calculating primes, don't use the interface of PrimesBase, but the vector directly.
- Allows the storing the primes in a binary file, which reads and writes quicker. The file is still quite big (400MB until 1E9) and another improvement could be to split into several files or to spend some CPU time to covert it into 62-adecimal number and store as text
- This also shows well how quickly the complexity growths, this version requires 10 times the number of code lines compared to the first versions.
Slightly faster on real time (1 second) and less CPU time (5.6 s) for primes up to 10E6.
- Avoid some checks during the calculation of primes as only a vector with useful primes to check against is given.
- Store primes more inteligent to save storage space: Using 8 bytes (64 bits) for each prime is a lot of storage wasted. To also avoid too big files, files are split for each order of magnitude, starting with "below 1E9". Now primes until 1E9 are stored as 32 bits, and in consecutive files the first prime is stored as 64 bit and then only half the differences between consecutive primes as 16 bit (until 1E11 8 bit would be enough, but that would only save 7GB, not much if calculation is done until 1E14 or more. Although there is a possible improvements: until 1E14 it's rare that half the diff is more than 8 bits, so in these cases one could store just two (or more) 8 bit numbers and add them together to save something in the order of 300 GB).
Not much improvements in CPU time (5.2 s) for primes up to 10E6.
- Only doing the necessary calculations, each number and each prime are calculated consecutively
Needed less than 2 seconds for primes up to 1E6.
Needed about 5 seconds for primes up to 10E6 when the other 7 cores are free and the numbers are not printed out. About 1 second comes from the printing of the numbers. Without compiling through javac another 0.5 seconds are needed.
Improvement with replacing a division with a multiplication: nearly twice as fast for primes up to 10E6.
- Only doing the necessary calculations, creating some lists in preparation to do parallel calculations. This has improved the speed, as the check which primes are not needed to be tested as often, but then handling the lists takes a bit longer.
Needed less than 2 seconds for primes up to 1E6.
Needed about 3 to 4 seconds for primes up to 10E6 when the other 7 cores are free and the numbers are not printed out. Testing primes in ranges of 100 is more efficient than in the ranges of 10k (trade off between uneccessary primes tested and calculating prime*prime to check what primes are not useful).