-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A story of two indexers #511
A story of two indexers #511
Conversation
91a8efa
to
a130a76
Compare
optionsUtxoPath :: Maybe FilePath, | ||
optionsDatumPath :: Maybe FilePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, is it possible (or desirable) to put the data from the indexers in the same database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of SQLite, most likely not due to higher contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
coordinator <- initialCoordinator indexerCount | ||
when (isJust datumPath) $ do | ||
ch <- atomically . dupTChan $ _channel coordinator | ||
void . forkIO . datumWorker coordinator ch $ fromJust datumPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do the dupTChan
inside the datumWorker
instead of passing it as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually thinking about how this works, this is a possible multi-threading bug. Let me fix it first :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. So there is no bug.
Moving the channel creation in the function would mean I need to define an inner loop function which I preferred not to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
void . forkIO . datumWorker coordinator ch $ fromJust datumPath | ||
when (isJust utxoPath) $ do | ||
ch <- atomically . dupTChan $ _channel coordinator | ||
void . forkIO . utxoWorker coordinator ch $ fromJust utxoPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
:: Maybe FilePath | ||
-> Maybe FilePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach does not seem extensible for multiple indexers. Why not pass a list of workers? They seem to have a similar interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can probably be done, but it's not trivial. You need to provide specific configurations for each indexer, and the startup may also be different (not the case here). The issue mentioned 2 indexers and I am not sure if supporting multiple indexers is on the bird path, but if it is, it should be a separate issue, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
-> TChan (ChainSyncEvent (BlockInMode CardanoMode)) | ||
-> FilePath | ||
-> IO () | ||
datumWorker Coordinator{_barrier} ch path = Datum.open path (Datum.Depth 2160) >>= innerLoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The security param 2160
shouldn't be hardcoded right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why it shouldn't be hardcoded in this case. Would you like to expose it as a command-line argument? One for all indexers, or one command-line argument for each? I can write the code if you think it's important to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the security param to be same for all indexers in order to guaranty that rollbacks will only modify the data that is in memory.
I'll add a story about querying the security param from the local node directly (without needing to pass it as a CLI param).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if making the two indexes explicitly concurrent is worth the effort. If we keep concurrency implict (because most of the computation is pure) the entire PR boils down to
withStream bla bla $ L.impurely S.foldM_ (mconcat [consumer1, consumer2])
It's not clear to me that explicit concurrency brings an advange in terms of performace. I have found some discussions 1 2 about SQLite and multithreading but I haven't dug into it yet.
AFAIU each indexer opens a new sqlite database so there should not be any contention at the connection level, is this right?
I have noticed you use Control.Concurrent.STM.TChan
(like I did in plutus-streaming). This is an unbounded channel and it's memory usage will grow without limits if the producer is faster than the consumer. I believe this is likely here. Can you either use a bounded channel or make sure this arbitrary grow doesn't happen in other means?
Dismissing my "request changes" to not block author while I am not available
a130a76
to
6a1b6e1
Compare
One of the main complaints that the users of the PAB / chain-index have is related to scalability / efficiency of our implementation. This is why I opted for a slightly less composable implementation, but one that uses the CPU more efficiently.
I think it does as you are running two indexers in parallel (CPU usage is quite high on all indexers, so parallelizing it to several cores seems like a win). The only reason why this would not be the case is if the disk is not fast enough, but I would not think that to be the case. We could test it though, but I would suggest we do that in another issue if anyone thinks the above reasoning is not very sound.
Right.
The TChans are artificially bounded by the semaphore. I have also tested the implementation for memory leaks on the main net and (after fixing the streaming memory leak) everything runs smoothly. |
a1e92fe
to
92fdf6e
Compare
92fdf6e
to
265f074
Compare
* Remove dependency on quickspec. * Extract inputs and output from block * Implement handlers. * Remove dependency on fromCardanoTx. * Add some lost queries. * Add indexer options. * Add options for running the indexers. * The two indexers should now run in parallel. * Fixed a concurrency bug. * Add some comments. * Format plutus-chain-index.cabal
Pre-submit checklist: