-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mvcc: txns and r/w views #7105
mvcc: txns and r/w views #7105
Conversation
@heyitsanthony Even with the change, a write will still block all reads? |
@xiang90 yes, still WIP |
@heyitsanthony ACK |
ce1b298
to
475c9fe
Compare
d69233a
to
8fb8b74
Compare
seemingly stable PTAL /cc @xiang90 |
mvcc/backend/backend_test.go
Outdated
k, v := rtx.UnsafeRange([]byte("test"), tt.key, tt.end, tt.limit) | ||
rtx.Unlock() | ||
if !reflect.DeepEqual(tt.wkey, k) || !reflect.DeepEqual(tt.wval, v) { | ||
t.Errorf("#%d: want k=%+v, v=%d; got k=%+v, v=%+v", i, tt.wkey, tt.wval, k, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%+v
for tt.wval
to be consistent?
mvcc/kv.go
Outdated
} | ||
|
||
// txnWriteReadOnly promotes a read txn to a write, panicking if any reads are called. | ||
type txnReadWrite struct{ TxnRead } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment does not match the struct naming, and its implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it should be coerce, not promote
mvcc/backend/read_tx.go
Outdated
rt.txmu.Unlock() | ||
|
||
// FIXME: may have dup keys? | ||
return append(k2, keys...), append(v2, vals...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only enable write buffer for keys
bucket, dedup should be easy. we never overwrite in keys
bucket, but simply append. i need to check if it is possible to have dups in buffer and current read.tx though.
if we enable write buffer for buckets when keys overwriting might occur, we have to merge the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be safe without needing to hardcode the "keys" bucket so long as it can be guaranteed that range fetches of overwrites only happen in writetxs. I believe this is the case for the current codebase, but I will have to do an audit.
mvcc/backend/read_tx.go
Outdated
return append(k2, keys...), append(v2, vals...) | ||
} | ||
|
||
func (rt *readTx) UnsafeForEach(bucketName []byte, visitor func(k, v []byte) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have ordering assumption perviously? now we visit write buffer first, which might break ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything using foreach (alarms, hash) make no ordering assumptions, but it may change the hash values if foreach expects the keys to go from minimum to maximum. The dup map can potentially get huge if it's done base txn first.
mvcc/backend/tx_buffer.go
Outdated
return nil | ||
} | ||
|
||
func (txr *txReadBuffer) writeback(txw *txWriteBuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little bit confusing. i would expect we write-back from txw to txr.
func (txw *txWriteBuffer) writeback(txr *txReadBuffer) {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the thinking here was txReadBuffer is the receiver so it receives the data from the write buffer, but OK...
mvcc/backend/batch_tx.go
Outdated
|
||
func (t *batchTxBuffered) Unlock() { | ||
if t.pending != 0 { | ||
t.backend.readTx.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A read tx might block a batch Tx from finishing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm. i guess it is fine.
@heyitsanthony Looks reasonable overall. Have you tried to re-run the read bench to make sure this does solve the contention? |
Running on my desktop: master:
readtx patches:
|
fed846b
to
f7d5ca1
Compare
return | ||
} | ||
switch { | ||
case tw.ranges == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these uint
fields get initialized after counter.Inc
? Can't find where in this commit somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They initialize to 0 in newMetricsTxn
* and are incremented if there's a range/delete/put. If there's only one operation, the metric is treated like a singleton range/delete/put. If there's more than one operation, it only increments the txn counter. The Txn should be thrown away after calling End
so there's no need to reset the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense now. Thanks!
mvcc/backend/tx_buffer.go
Outdated
widx++ | ||
} | ||
bb.buf[widx] = bb.buf[ridx] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary blank line?
mvcc/backend/tx_buffer.go
Outdated
if bb.used == bbsrc.used { | ||
return | ||
} | ||
if bytes.Compare(bb.buf[bb.used-bbsrc.used].key, bbsrc.buf[0].key) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this happen?
Let's say
bb buf ['x','y','z'], used 3
bbsrc buf ['a','b'], used 2
bb.merge.bbsrc and then now
bb.buf ['x','y','z','a','b'], used 5
Now
bb.buf[bb.used-bbsrc.used]==bb.buf[5-2]==bb.buf[3]=='a'
bb.buf[3]=='a'==bbsrc.buf[0]
Seems like bytes.Compare(bb.buf[bb.used-bbsrc.used].key, bbsrc.buf[0].key)
is always 0, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is off by 1 and the comparison should be <
. The index should be (bb.used-bbsrc.used)-1.
@xiang90 @heyitsanthony Benchmark with 3 nodes
With patch
|
If we run the same tests with more clients (same configuration as in golang/go#18020), we get the best performance ever!
|
36935eb
to
69c7a82
Compare
ReadTxs are designed for read-only accesses to the backend using a read-only boltDB transaction. Since BatchTx's are long-running transactions, all writes to BatchTx will writeback to ReadTx, overlaying the base read-only transaction.
Clean-up of the mvcc interfaces to use txn interfaces instead of an id. Adds support for concurrent read-only mvcc transactions. Fixes etcd-io#7083
Codecov Report
@@ Coverage Diff @@
## master #7105 +/- ##
=========================================
- Coverage 70.47% 70.38% -0.1%
=========================================
Files 237 244 +7
Lines 21238 21342 +104
=========================================
+ Hits 14968 15021 +53
- Misses 5136 5186 +50
- Partials 1134 1135 +1
Continue to review full report at Codecov.
|
Clean-up of the mvcc interfaces to use txn interfaces instead of an id.
Adds support for concurrent read-only mvcc transactions.
Fixes #7083