A story of two indexers #511

raduom · 2022-06-10T07:53:25Z

Pre-submit checklist:

Branch
- Tests are provided (if possible)
- Commit sequence broadly makes sense
- Key commits have useful messages
- Formatting, PNG optimization, etc. are updated
PR
- Self-reviewed the diff
- Useful pull request description
- Reviewer requested

koslambrou · 2022-06-12T09:14:30Z

plutus-chain-index/app/Marconi.hs

+    optionsUtxoPath   :: Maybe FilePath,
+    optionsDatumPath  :: Maybe FilePath


Curious, is it possible (or desirable) to put the data from the indexers in the same database?

In the case of SQLite, most likely not due to higher contention.

plutus-chain-index/app/Marconi.hs

koslambrou · 2022-06-12T09:20:58Z

plutus-chain-index/app/Marconi.hs

+      coordinator <- initialCoordinator indexerCount
+      when (isJust datumPath) $ do
+        ch <- atomically . dupTChan $ _channel coordinator
+        void . forkIO . datumWorker coordinator ch $ fromJust datumPath


Why not do the dupTChan inside the datumWorker instead of passing it as a parameter?

Actually thinking about how this works, this is a possible multi-threading bug. Let me fix it first :)

Ok. So there is no bug.

Moving the channel creation in the function would mean I need to define an inner loop function which I preferred not to do.

koslambrou · 2022-06-12T09:21:10Z

plutus-chain-index/app/Marconi.hs

+        void . forkIO . datumWorker coordinator ch $ fromJust datumPath
+      when (isJust utxoPath) $ do
+        ch <- atomically . dupTChan $ _channel coordinator
+        void . forkIO . utxoWorker coordinator ch $ fromJust utxoPath


Same as above

koslambrou · 2022-06-12T09:28:49Z

plutus-chain-index/app/Marconi.hs

+  :: Maybe FilePath
+  -> Maybe FilePath


This approach does not seem extensible for multiple indexers. Why not pass a list of workers? They seem to have a similar interface.

It can probably be done, but it's not trivial. You need to provide specific configurations for each indexer, and the startup may also be different (not the case here). The issue mentioned 2 indexers and I am not sure if supporting multiple indexers is on the bird path, but if it is, it should be a separate issue, I think.

koslambrou · 2022-06-12T09:39:15Z

plutus-chain-index/app/Marconi.hs

+  -> TChan (ChainSyncEvent (BlockInMode CardanoMode))
+  -> FilePath
+  -> IO ()
+datumWorker Coordinator{_barrier} ch path = Datum.open path (Datum.Depth 2160) >>= innerLoop


The security param 2160 shouldn't be hardcoded right?

I am not sure why it shouldn't be hardcoded in this case. Would you like to expose it as a command-line argument? One for all indexers, or one command-line argument for each? I can write the code if you think it's important to do so.

I would expect the security param to be same for all indexers in order to guaranty that rollbacks will only modify the data that is in memory.
I'll add a story about querying the security param from the local node directly (without needing to pass it as a CLI param).

andreabedini

I am wondering if making the two indexes explicitly concurrent is worth the effort. If we keep concurrency implict (because most of the computation is pure) the entire PR boils down to

withStream bla bla $ L.impurely S.foldM_ (mconcat [consumer1, consumer2])

It's not clear to me that explicit concurrency brings an advange in terms of performace. I have found some discussions 1 2 about SQLite and multithreading but I haven't dug into it yet.

AFAIU each indexer opens a new sqlite database so there should not be any contention at the connection level, is this right?

I have noticed you use Control.Concurrent.STM.TChan (like I did in plutus-streaming). This is an unbounded channel and it's memory usage will grow without limits if the producer is faster than the consumer. I believe this is likely here. Can you either use a bounded channel or make sure this arbitrary grow doesn't happen in other means?

Dismissing my "request changes" to not block author while I am not available

raduom · 2022-06-14T08:05:21Z

I am wondering if making the two indexes explicitly concurrent is worth the effort. If we keep concurrency implict (because most of the computation is pure) the entire PR boils down to.

One of the main complaints that the users of the PAB / chain-index have is related to scalability / efficiency of our implementation. This is why I opted for a slightly less composable implementation, but one that uses the CPU more efficiently.

It's not clear to me that explicit concurrency brings an advange in terms of performace. I have found some discussions 1 2 about SQLite and multithreading but I haven't dug into it yet.

I think it does as you are running two indexers in parallel (CPU usage is quite high on all indexers, so parallelizing it to several cores seems like a win). The only reason why this would not be the case is if the disk is not fast enough, but I would not think that to be the case.

We could test it though, but I would suggest we do that in another issue if anyone thinks the above reasoning is not very sound.

AFAIU each indexer opens a new sqlite database so there should not be any contention at the connection level, is this right?

Right.

I have noticed you use Control.Concurrent.STM.TChan (like I did in plutus-streaming). This is an unbounded channel and it's memory usage will grow without limits if the producer is faster than the consumer. I believe this is likely here. Can you either use a bounded channel or make sure this arbitrary grow doesn't happen in other means?

The TChans are artificially bounded by the semaphore. I have also tested the implementation for memory leaks on the main net and (after fixing the streaming memory leak) everything runs smoothly.

* Remove dependency on quickspec. * Extract inputs and output from block * Implement handlers. * Remove dependency on fromCardanoTx. * Add some lost queries. * Add indexer options. * Add options for running the indexers. * The two indexers should now run in parallel. * Fixed a concurrency bug. * Add some comments. * Format plutus-chain-index.cabal

raduom changed the title ~~Raduom/a story of two indexers~~ A story of two indexers Jun 10, 2022

raduom marked this pull request as ready for review June 10, 2022 11:36

raduom requested review from andreabedini and koslambrou June 10, 2022 11:48

raduom force-pushed the raduom/a-story-of-two-indexers branch from 91a8efa to a130a76 Compare June 10, 2022 11:55

koslambrou reviewed Jun 12, 2022

View reviewed changes

plutus-chain-index/app/Marconi.hs Show resolved Hide resolved

koslambrou reviewed Jun 12, 2022

View reviewed changes

andreabedini previously requested changes Jun 13, 2022

View reviewed changes

raduom force-pushed the raduom/a-story-of-two-indexers branch from a130a76 to 6a1b6e1 Compare June 14, 2022 05:30

raduom force-pushed the raduom/a-story-of-two-indexers branch from a1e92fe to 92fdf6e Compare June 14, 2022 09:25

andreabedini self-requested a review June 14, 2022 09:27

andreabedini approved these changes Jun 14, 2022

View reviewed changes

raduom requested a review from koslambrou June 14, 2022 20:21

koslambrou approved these changes Jun 15, 2022

View reviewed changes

raduom added 10 commits June 15, 2022 17:47

Remove dependency on quickspec.

80733d2

Extract inputs and output from block

9262ac5

Implement handlers.

fa42ee4

Remove dependency on fromCardanoTx.

6c573e8

Add some lost queries.

744a075

Add indexer options.

6a90253

Add options for running the indexers.

21f887f

The two indexers should now run in parallel.

c2a0ddf

Fixed a concurrency bug.

4319579

Add some comments.

99dffa9

Format plutus-chain-index.cabal

265f074

raduom force-pushed the raduom/a-story-of-two-indexers branch from 92fdf6e to 265f074 Compare June 15, 2022 15:24

raduom merged commit f20efc3 into IntersectMBO:main Jun 16, 2022

raduom deleted the raduom/a-story-of-two-indexers branch June 16, 2022 05:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A story of two indexers #511

A story of two indexers #511

raduom commented Jun 10, 2022

koslambrou Jun 12, 2022 •

edited

Loading

raduom Jun 14, 2022

koslambrou Jun 15, 2022

koslambrou Jun 12, 2022

raduom Jun 14, 2022

raduom Jun 14, 2022

koslambrou Jun 15, 2022

koslambrou Jun 12, 2022

koslambrou Jun 12, 2022

raduom Jun 14, 2022

koslambrou Jun 15, 2022

koslambrou Jun 12, 2022

raduom Jun 14, 2022

koslambrou Jun 15, 2022

andreabedini left a comment •

edited

Loading

raduom commented Jun 14, 2022

		optionsUtxoPath :: Maybe FilePath,
		optionsDatumPath :: Maybe FilePath

A story of two indexers #511

A story of two indexers #511

Conversation

raduom commented Jun 10, 2022

koslambrou Jun 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreabedini left a comment • edited Loading

Choose a reason for hiding this comment

raduom commented Jun 14, 2022

koslambrou Jun 12, 2022 •

edited

Loading

andreabedini left a comment •

edited

Loading