-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelization Opportunity #73
Comments
There is a dependency in this loop, yes. Some of the loops in the generation can be parallelized however, and if the logic is changed, then more of the loops can be parallalized. However, not all loops are easy to change, and the speedup wouldn't always be linear: the algorithm would have to be changed a bit. Paralallization is much easier (and much more linear) if the set is split into multiple sub-sets; basically if multiple filters are generated that are independent of each other. The disadvantage of that approach is: lookup would be slower in this case. |
Is that necessarily true? Suppose I have 10 filters. I get an input, I hash to a value n between 0 and 9. The hash function is fixed and is selected before any of the filters are constructed. Then I check filter n. There is addition step but if the filter is large enough to warrant splitting it up, it should be a negligible cost. (I have a cold so maybe I am not thinking straight.) |
Let me give you some more context why I want to specifically parallelize that loop. In ChalametPIR (https://github.com/itzmeanjan/ChalametPIR), we extend binary fuse filters to key-value maps. Think of a key-value map, which we are going to encode as a matrix s.t. each column of that matrix is a binary fuse filter. Rows are representing encoded key and value pairs s.t. each element of an row only keeps say 8 significant bits (must be <32 -bits). So when constructing the filters, we do common task of computing reverse_order, reverse_h etc. only once https://github.com/itzmeanjan/ChalametPIR/blob/befe5afbf7c99a58a5a4cf6c9347ece369ef22aa/src/pir_internals/matrix.rs#L338-L339, because that part is same for all the vertical binary fuse filters. Computation of the fingerprints for a specific key-value pair (encoded as a row in the resulting matrix) is done inside the loop https://github.com/itzmeanjan/ChalametPIR/blob/befe5afbf7c99a58a5a4cf6c9347ece369ef22aa/src/pir_internals/matrix.rs#L353. Now if we have a way to parallelize the loop we can enjoy huge boost in the performance of server-setup function of the key-value PIR scheme. |
@itzmeanjan Can you elaborate on why you think that this particular loop can be parallelized for a huge boost in performance? It is not at all obvious to me what you have in mind. Could you sketch the algorithm you envision? |
@lemire let me know if this helps. Forgive my handwriting 🙏 😆 |
@itzmeanjan I am afraid that this does not help me. I assume that you have an algorithm in mind and some cost model. Can you spell it out? |
Yes: there is one more step in the calculation. But as you wrote: it is most likely a negligible cost. It is just one operation. I think for the problem at hand, it wouldn't matter. I have to admit I do not fully understand the problem to be solved, but I still try to comment:
So all the filters have the same order. The question is then: instead of multiple filters with the same order, why not use one filter that has more bits per key? I can understand if the number of bits is not a multiple of 2 (eg. 240 bits) then it might be easier to use multiple filters -- but it is slower to query. But actually that is not my main point... I just want to understand why you are using multiple filters that have the same ordering... If I understood correctly, you are interested in performance, so this doesn't match my expectation. It might just show that I don't fully understand the problem... The real point is: I think splitting the filter (or the filters) into multiple smaller filters would resolve the parallelization problem. Let's say you split the large filter into 16 smaller filters: you can build all of them in parallel. Without having to change the code in the filters. At the end, you might then have a multi-dimensional array of filters. But I think the problem is solved, and performance would still be very good: actually construction would be better, because building 16 (or 256,...) independent filters concurrently is certainly easier (and likely faster) than trying to paralellize construction of one filter. I think. |
What would help me, probably, in understanding the problem you want to solve if you explain what is the "end user" lookup / query API, and algorithm you use. I assume it is "get(key) => value / error". The key is as byte array... is the size of the byte array fixed? The value is a byte array... the size is variable? The errors are just probabillistic, right? So with a very, very low probability you could get a value if you have asked for a key that is not stored? |
Hi @lemire , good day.
Is it possible to parallelize this loop
xor_singleheader/include/binaryfusefilter.h
Lines 733 to 747 in a5a3630
The text was updated successfully, but these errors were encountered: