experimental refactoring of channel communication #1378

mschoch · 2020-04-20T21:49:32Z

this is an attempt to improve the readability of
some scorch goroutine communication by hiding
some of the channel mechanics inside methods
that better describe what they're doing

this is an attempt to improve the readability of some scorch goroutine communication by hiding some of the channel mechanics inside methods that better describe what they're doing

mschoch · 2020-04-20T21:50:40Z

This is work in progress, I don't love all the nouns and verbs yet. But the idea was to see if I could remove some duplicated code, and replace sections that involved a lot of channel mechanics and replace them with types and methods that described what they were doing.

Feedback welcome.

mschoch · 2020-04-20T21:55:24Z

I should mention one other thing. My ultimate goal is to rewrite the main loops of all of these to be a single select statement (not counting inner selects for non-blocking writes where necessary). The introducer is already there, the merger is close (though this PR actually takes a step back in one area), but the persister is currently two very weirdly arranged select statements (fall though one, then block on the other, in a way that ensures we fall through the first one again).

mschoch · 2020-04-20T22:19:46Z

Second commit cleans up the merger to now have just a single select.

mschoch · 2020-04-20T22:53:55Z

Possible bug while reviewing this stuff. There are 4 places where we do the following:

case ew := <-s.persisterNotifier:

However, in 3 of the 4, we also do this:

lastMergedEpoch = ew.epoch

But, in one case we do not:

bleve/index/scorch/persister.go

Lines 222 to 225 in fb361a7

    
           case ew = <-s.persisterNotifier: 
        
           	// if the watchers are already caught up then let them wait, 
        
           	// else let them continue to do the catch up 
        
           	persistWatchers = append(persistWatchers, ew)

The reason I'm asking is that I'd like to fold this into main select case at the top, so in my new version this would be added. I'm probably going to code it up anyway...

mschoch · 2020-04-20T23:11:33Z

Third commit attempts to apply the same process to the persister loop. A few more dicey changes in this version, but wanted to get the big picture in place.

mschoch · 2020-04-21T01:19:34Z

So, I realized I should try to explain the reasoning behind the transformation.

In both the merger and the persister, the main select at the top had a "default" exit path. In one obvious case (the first time through) this seems necessary, we're not yet "waiting" for anything. However, in all subsequent cases, we've done some work, and are now actually "waiting" on something specific (persister waiting for introducer to give it something new to process, and merger waiting for persister to give it something new to process). These "waits" happen at the bottom of the loop, and the expectation is that it means we'll quickly fall through the top select again.

So, with that in mind, the change I've made is to set up our "first time through" case to look like the other case. Both the persister and merger now take an initial root epoch (read from file for the open case, or 0 is fine for the new case). They do the same "wait" for this epoch, as they will after actually doing work to process an epoch later. By making these two cases work the same, we can fold things into a single select case.

If we can agree this is actually correct, we can further refactor out all the long blocks of code for each case into functions. This should eventually create a separation between the logic of "when do to do something" from the "how to actually do something that is needed".

sreekanth-cb · 2020-04-21T10:15:08Z

index/scorch/communication.go

+	*e = append(*e, watcher)
+}
+
+func (e *epochWatchers) NotifySatisfiedWatchers(epoch uint64) {


any better names NotifySatisfiedWatchers -> NotifyEligibleWatchers or anything else?

No, I dislike all the names.

sreekanth-cb · 2020-04-21T10:15:52Z

index/scorch/communication.go

+
+type watcherChan chan *epochWatcher
+
+func (w watcherChan) NotifyUsAfter(epoch uint64, closeCh chan struct{}) (*epochWatcher, error) {


"Us" seems redundant in NotifyUsAfter?

I added "us", because it felt wrong to me that both sides of this seemed to be something like "notify after epoch". But one side was waiting for something, and the other was telling you that it happened. I was trying to add a word that clarified which side of the fence we were on.
Because when you read the code, that part should be clear, but today it isn't.

index/scorch/communication.go

index/scorch/merge.go

mschoch · 2020-04-21T19:53:34Z

Some notes from review with @sreekanth-cb this morning:

I had changed the initial epochWatcher created by the persister and merger to use the current root epoch value, however Sreekanth pointed out a potential problem with this. Imagine, a large data load happened, then finished. Merging would continue on for some time. If at this point the index was closed and reopened, initializing the merger epoch watcher with the current root seq num would cause it to not resume any merge operations. The other alternative would be to initialize it with 0, though we need examine this further.
If we initialize the merger epochWatcher with 0, while the persister would see this right away, it too might not respond right away with the current code (we would add the watcher to the list, but don't check to see if it should be closed on that path any more) So the deeper concern is that the structural changes made here could result in a stuck merger on startup, at least until a mutation comes in (persister waits for introducer, and merger waits for persister). Previously those "default" paths meant some initial initialization took place without these dependencies.
How much of those cases can be simulated/tested with unit tests? This seems possible, but difficult.
The persister has pause logic which tends to fall into 3 paths, short pause, long pause, or no pause. While talking through one case, we were concerned possibly the merger ends up stuck on long pauses, because it seems like the merger is far behind, but doesn't ever catch up. This needs further examination.
During the review, I saw this line: https://github.com/blevesearch/bleve/blob/master/index/scorch/persister.go#L178-L180 which gives some concern, as here we close all the epochWatchers without regard for the epoch they passed in. To me, this is a bit concerning, as the epochWatchers have nicely defined behavior, ping me back when this thing happens. But here we ping back without the check. Sreekanth seems to think this is intended, but I feel like I don't understand it yet.

experimental refactoring of channel communication

412a8c9

this is an attempt to improve the readability of some scorch goroutine communication by hiding some of the channel mechanics inside methods that better describe what they're doing

mschoch added the do not merge label Apr 20, 2020

mschoch requested review from abhinavdangeti and sreekanth-cb April 20, 2020 21:49

improve merger loop to have a single select

83ae1fc

refactor persister loop into single select

0a8784e

sreekanth-cb reviewed Apr 21, 2020

View reviewed changes

remove outdated comment

c034871

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experimental refactoring of channel communication #1378

experimental refactoring of channel communication #1378

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 21, 2020

sreekanth-cb Apr 21, 2020

mschoch Apr 21, 2020

sreekanth-cb Apr 21, 2020

mschoch Apr 21, 2020

mschoch commented Apr 21, 2020


		type watcherChan chan *epochWatcher

		func (w watcherChan) NotifyUsAfter(epoch uint64, closeCh chan struct{}) (*epochWatcher, error) {

experimental refactoring of channel communication #1378

Are you sure you want to change the base?

experimental refactoring of channel communication #1378

Conversation

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 20, 2020

mschoch commented Apr 21, 2020

sreekanth-cb Apr 21, 2020

Choose a reason for hiding this comment

mschoch Apr 21, 2020

Choose a reason for hiding this comment

sreekanth-cb Apr 21, 2020

Choose a reason for hiding this comment

mschoch Apr 21, 2020

Choose a reason for hiding this comment

mschoch commented Apr 21, 2020