-
Notifications
You must be signed in to change notification settings - Fork 110
localstore/db how to insert items in GC index #1031
Comments
Honestly, I have no idea what "add interval to accesstimestamp" means. @janos and I were discussing the following idea the other day: For each chunk in the database we store 2 extra hashes; something like We also maintain a list of insertion points When a chunk is added to the database, and we want to add it at the 80% mark for example, we can look up the hash of the chunk that is currently sitting at that insertion point, we add our new chunk, and we update the insertion point hash as well as the prevhash and nexthash of the neighbouring chunks. When a chunk is served to a retrieval request, it can be inserted at the top of the queue and the For bootstrapping, all quantiles (except [head] of course) will map to the end of the chunk queue. This would happen until the total number of chunks has reached 50% of the total storage size. At that point the 50% marker can be set, while the others (60%, 70%... whatever) continue pointing to the last chunk. GC happens from the tail of the queue, proceeding to prevhash every time. One main benefit of this idea is that we can maintain multiple insertion points, and we don't have to update every chunk's entry when the queue changes because position-in-the-queue is a property and not an attribute. |
@homotopycolimit "add interval to accesstimestamp" is meant to refer to @janos solution that at each retrieval access an interval is added to the last accesstimestamp, thereby boosting its rank (add more protection against GC). |
Ah ok. |
why global? |
@homotopycolimit relations in double linked list require a lock to ensure that no other parallel goroutine is changing the list. If we want to insert a hash at some percentile, we would need to lock two hashes in between which the new one is inserted, but also their neighbours, as some other chunk may need to be inserted among them at the same time, and so on, which at the end results in the global lock. |
it still feels like a local lock to me :) but I trust your verdict |
Descriptions of different Garabge collection implementationsHere are summarized garbage collection suggestions based on meetings with @zelig, @homotopycolimit and @nagydani, and https://hackmd.io/t-OQFK3mTsGfrpLCqDrdlw document. Note: Chunks are added to garbage collection index only after they have been synced. Suggestion 1: Quantiles with accessTimestamp indexGC index encoding would be ProsSimple algorithm. Fast garbage collection. ConsRequires global lock. Additional chunk index for tracking chunk origins Suggestion 2: Quantiles in linked listGC index structure would be defined ProsVery simple algorithm to understand and test. ConsAny operations on double linked list and for keeping track for quantiles a global lock is required. Garbage collection iteration is slow and expensive as it requires to use ldbstore get instead of ldbstore iterator. Tracking quantiles require additional fields. Adding a new quantile or potential calibration is an expensive operation as it requires walking through the whole list with ldbstore get. Additional chunk index for tracking chunk origins Suggestion 3: Ranking functionGC index encoding would be
and added as integer to the accessTimestamp. Garbage collection would just remove the chunks from the tail of the queue as chunks will be ordered with least important at the end. The rank is actually increasing accessTimestamp by some value calculated by this parameters. The increment is greater as the popularity raises, for more proximate chunks and by chunk origin. Chunks with greater rank will be placed higher in the queue and later garbage collected. The returned value from the rank function would place the chunk at the same time in the future as it would place a chunk that is accessed for the first time at the same proximity and with the same origin. The problem is that the ranking function can be derived only by guesses and empirical data, based on the dynamics of chunks that we predict. Every parameter may have a linear relation to the rank or non linear. With linear relation we are risking that some chunks may be placed in the queue so high that even when not accessed for a long time, they are not garbage collected. So a non linear dependance would be better for popularity parameter, for example a logarithmic. Every parameter needs to have one or more constants that need to define their relation to accessTimestamp. The first order linear constant is in the same unit as accessTimestamp is, nanoseconds. If more proximate chunks need to be prioritized in the queue, the most simple way would be to add more time to the result of the ranking function. For example, the constant for proximity would be 10 minutes (very arbitrary guess) which would be multiplied by the proximity level. If the proximity value raises for more closer chunks, then the ranking function would multiply proximity with the constant of 10 minutes. For popularity, it would multiply logarithm of chunk accesses number with its constant which may be 1h, for example. Chunk origin enumeration would prioritize ranking by adding different fixed constants based on enum value. The rank function can be as simple as linear function of independent parameters ProsChunks in garbage collection index are completely independent and can be inserted or moved in queue without a global lock. Iteration on garbage collection index is the least expensive operation for iterating over leveldb key/value pairs, as keys are already sorted on the database level. ConsConstants and the formula for the ranking function can be defined only by empirical data and our assumptions about chunk prioritization in time, which may be very subjective, making a consensus on it very difficult, or impossible to achieve. At least two additional indexes need to be added. One for popularity counters Suggestion 4: Self adjusting quantiles or ranking functionThis would be an alternative to the previous suggestions where quantiles or ranking function constants would be calibrated at runtime. This would be a very advanced approach, but requires discussion and formalization. |
great writeup @janos. |
Thanks @homotopycolimit. Yes, I missed the explanation about this notation, which represents encoding of key/value pairs in the database. Key and value are separated by |
nice job @janos One approach could be: when accessing a chunk
we define if distance is meant to be important, new chunks could behave as if they were accessed WDYT? |
@zelig I like the formula for accessTimestamp and also the distance handling only at the insertion. In the similar way as distance is included at insertion, the origin could be as well, so we would not need to add two more indexes. 👍 We would need to figure out what values we want for calibration constants. |
how about |
That are good sane starting values. |
Suggestion 5: Approximating linked list quantiles with timestampsThe access timestamps at the quantiles as described in Suggestion 1 can be reasonably well approximated by looking at a random sample of chunks and their access timestamps. Thus, newly inserted chunks can have their timestamps set accordingly, putting them very near the desired quantile. Moreover, unlike Suggestion 1, these timestamps need not be tracked and updated with every change in the database, just regularly enough that they are still reasonable. Thus, we are approximating the behavior of that expected from Suggestion 2 at a much lower cost. ProsThe behavior is close to that of Suggestion 2. The costs are similar to those of Suggestion 1. ConsRandom sampling of the chunk database might be tricky. It might require adding a random field to the database and querying the database using that field, to ensure a fair sampling. |
Further comments (resulting from discussion with @zelig and @homotopycolimit ): Insertion into the database happen from four sources:
The latter two happen on any node, the former two only on gateways and end-user nodes. There are two ways at looking at this issue: the overall health of the network (i.e. chunks being retrievable) and the local interests of the node operator. From the first perspective, the desirable behavior is that chunks close to the node's address (esp. those within the "radius of responsibility") are not GC'd. From the latter perspective, a desirable strategy would be one close to a Nash equilibrium, i.e. there are no obvious ways in which one can profit more, if everyone else follows the same strategy. Neither huge uploads (think of an attacker pushing random stuff into Swarm) nor huge downloads (think of a "Netflix binge") should flush the entire local store of a node. The nice thing about Suggestion 2 is that it guarantees that a certain fraction of the store is allocated for the most profitable chunks and is guaranteed protection. Synced chunks and especially uploads should not be GC'd before they are synced to nodes closer to their destination. The average (but not necessarily the actual) order number of a chunk in an ordering by access time is the same as its order number by access rate, assuming that chunks are accessed as a Poisson process (i.e. memoryless, random events). Without knowing anything about a chunk, we can estimate its future rate of access by its distance from the node's address. If a chunk's address falls within the node's most proximate neighborhood, it should be inserted in the middle of the GC queue, as assuming (for a moment) no caching all other chunks are also there and all the cached chunks got ahead of other chunks because of their additional popularity in comparison to the most proximate chunks. Each bin's distance from the most proximate neighborhood should half the distance from the end of the GC queue. |
I just want to note here that we can do the quantile-insertion with the double-linked list without a global lock. (Example: to remove a chunk, lock the chunk and it's two neighbours, update the 'next-chunk' link in the previous chunk and the 'previous-chunk' link in the next chunk, and maybe update the quantile pointer) |
@homotopycolimit I am not so sure. In order to know who the neighbours are that you want to lock, you must make a db lookup and lock it with the same operations, which is essentially a global lock plus and extra lookup. Dani's idea of approximating quantiles is a good one. @janos let's decide on an implementation and start working on this. |
please explain to me how locking down 3 (or 5) chunks to be edited simultaneously requires a global lock of all chunks? I don't get it. |
just imagine you need to prevent modification of the same item simultaneously. |
I don't see the deadlock. To prevent a deadlock we need to be able to implement a function: get-a-lock-on-all-5-chunks-or-release-and-try-again. This should be possible without a global lock. no? A global lock is only needed of we are changing state globally, but we are clearly only changing things locally. what am I missing? |
For the record, I'm not wedded to the idea of the linked list, I just want to understand why it can't work before we do a random sampling technique to approximate with some effort what he linked list gives us easily. I am certainly not an engineer, but here is my naive idea: Race conditions (when requesting locks) happen when neighbouring chunks are to undergo simultaneous edits... say both chunk i and chunk i+1 have been requested and thus should be moved up the queue. Assuming that the size of the db is large compared to the size of 'chunks accessed per second', this is a rare occurrence. So when we detect that we cannot get a lock on all the chunks we want, release all and wait a random number of between 0 and 100 milliseconds and then try again. Chances are that one of the go routines will get the needed lock before the other one tries again and the deadlock situation untangles. What is the likelihood of a permanent deadlock under these conditions? ...or maybe, instead of the random delay, the global lock is the fallback method that would only ever be invoked in this situation. |
Every chunk that is encountered is added to the top of the queue. Garbage collection happens at the bottom of the queue.
We make no distinction between chunks that are encountered due to syncing, due to a retrieval request originating elsewhere, due to local data consumption on the node ... This creates multiple problems.
The first thing a new GC queue will need is the basic infrastructure to handle chunks differently depending on context, to insert chunks at different points on a GC queue or to maintain multiple queues.
and possibly
Each of these might require different operations of GC. The ideas discussed above do not delve too much on where in the chunk queue each chunk should be inserted, but mostly focus on how to maintain a chunk queue with multiple insertion points in the first place.
Note, the questions of the underlying queue, and how to manage insertions, deletions, and reorderings is separate from the discussion on how we should handle the different scenarios. However they are not independent, as the discussion about a second GC queue for local data shows.
unknown |
Let's move this discussion over to https://swarmresear.ch/t/garbage-collection/16/2 |
Related issues:
anteva:
It seems to me that this is the main entrypoint issue for the new GC effort.
Whoever works on this and has the most knowledge on this topic: please add the following sections to this issue:
Without such a history it will be very difficult to understand any of the decisions made in this effort a few months from now.
cc @zelig @nagydani @homotopycolimit @janos
The text was updated successfully, but these errors were encountered: