Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load(oid) must die! #50

Closed
jimfulton opened this issue Apr 1, 2016 · 5 comments
Closed

load(oid) must die! #50

jimfulton opened this issue Apr 1, 2016 · 5 comments

Comments

@jimfulton
Copy link
Member

jimfulton commented Apr 1, 2016

Update: The discussion below primarily pertains to non-RelStorage storages. We ended up moving to a model where ZODB's MVCC implementation moves to an adapter applied to storages other than RelStorage. The discussion below pertains (now) to how that adapter interacts with storages. ZODB connections will still use load to load objects from IMVCCStorage instances, which will either be RelStorage instances or adapted instances.


Currently, we use load(oid) to load current data for an object
unless we've recieved an invalidation for it (the object). This leads to
complex code that:

  • checkes to see if an object has been invalidated and if not:
  • loads the object using load, and
  • if while loading, the object was invalidated, it then reloads using
    loadBefore.

The IStorage.load method is problematic because it's required to
load current data. This leads to delicate ordering of load
results and invalidations, which also provide information about the
current version of an object. In FileStorage, we also have to get a
read lock on the file before we load current data, to prevent writes.

In reviewing NEO, I realized that there's no reason for load to
exist, and in fact, NEO doesn't really implement it.

First of all, it makes no sense to use load if the transaction
time (Connection._txn_time) is known. We should just use
loadBefore with the transaction time.

Second, one can choose a suitable transaction time at the beginning of
a transaction. A suitable transaction time is just past (because we
use loadBefore) the database transaction time. We don't want to
read any data that was written later, regardless of what we've gotten
invalidations for. We can use IStorage.lastTransaction() to
determine the last transaction id(/time). For storages with remote
data, like ZEO or NEO, we might choose to request the last transaction
id from the remote server to make sure we've processed any in-flight
transactions. This is what NEO does. (More about this in a separate
issue.)

By determining the transaction time at the start of a transaction, we
can:

  • Always call loadBefore which can be much more efficiently
    implemented than load().
  • Greatly simplify Connection._setstate.
  • Greatly simplify logic of ZEO, and perhaps NEO or RelStorage.

So, in conclusion:

  • Reimplement transaction handling in Connection to set _txn_time
    at the beginning of a transaction using
    p64(u64(storage.lastTransaction()) + 1).

  • Simplify _setstate by always calling loadBefore using _txn_time.

    Update get(oid) in a similar fashion. (This is probably not as big
    an issue in this case, as get(oid) doesn't load object state,
    but I still want to deprecate load.)

jimfulton pushed a commit that referenced this issue May 4, 2016
…tion.

This implements: #50

Rather than watching invalidations, simply use 1 + the storages
lastTransaction, which is equivalent to but much simpler than waiting
for the first invalidation after a transaction starts.

More importantly, it means we can always use loadBefore and get away
from load.  We no longer have to worry about ordering of invalidations
and load() results.

Much thanks to NEO for pointing the way toward this simplification!

Implementing this initially caused a deadlock, because DB.open()
called Connection.open() while holding a database lock and
Connection.open() now calls IStotage.lastTransaction(), which acquires a
storage lock. (It's not clear that lastTransaction() really needs a
storage lock.)  Meanwhile, IStotage.tpc_finish() calls a DB function
that requires the DB lock while holding the storage lock.  Fixing this
required moving the call to Connection.open() outside the region where
the DB lock was held.

To debug the problem above, I greatly improved lock-debugging
support. Now all of the ZODB code imports Lock, RLock and Condition
from ZODB.utils. If the DEBUG_LOCKING is set to a non-empty value,
then these are wrapped in such a way that debugging information is
printed as they are used. This made spotting the nature of the
deadlock easier.

Of course, a change this basic broke lots of tests. Most of the
breakage arises from the fact that connections now call
lastTransaction on storages at transaction boundaries.  Lots of tests
didn't clean up databases and connections properly.  I fixed many
tests, but ultimately gave up and added some extra cleanup code that
violated transaction-manager underware (and the underware's privates)
to clear transaction synchonizers in test setup and tear-down.  I plan
to add a transaction manager API for this and to use it in a
subsequent PR.

This tests makes database and connection hygiene a bit more important,
especially for tests, because a connection will continue to interact
with storages if it isn't properly closed, which can lead to errors if
the storage is closed.  I chose not to swallow these errors in
Connection, choosing rather to clean up tests.

The thread debugging and test changes make this PR larger than I would
have liked. Apologies in advance to the reviewers.
@jamadden
Copy link
Member

On RelStorage, lastTransaction() (even with your partial patch) isn't set until some load() operation is requested. That is, RelStorage doesn't read from the DB by itself (e.g., when a storage is created), and only reading from the DB initializes lastTransaction().

Maybe this isn't a problem with the machinery of Connection, maybe it guarantees a call to load() before trying to use lastTransaction(). (Note that sync() alone isn't sufficient.) But it is a problem using the storage directly, one must be aware of the order requirements.

Maybe this is flat out a bug in RelStorage and it should read from the DB if those values are uninitialized.

@jimfulton
Copy link
Member Author

This is moot because with #66, Connection won't be calling lastTransaction on RelStorage instances.

@jamadden
Copy link
Member

Good deal!

Does ClientStorage have a valid/correct lastTransaction() immediately after it is created (in the same way FileStorage does)? If so, it still might make sense to change RelStorage to behave like the other storages in this way (that would eliminate the one special-case remaining in zodbconvert).

@jimfulton
Copy link
Member Author

WRT clientStorage and lastTransaction, it's value is "valid" when it's created, but may be z64 until it has connected. (z64 is technically valid, but generally way out of date. :))

jamadden added a commit to zodb/relstorage that referenced this issue Jun 15, 2016
Discussed in zopefoundation/ZODB#50 this makes
it more consistent with ClientStorage and FileStorage.

Adjust zodbconvert to use this fact and remove all the special casing.
This should guarantee that zodbconvert works with
ClientStorage (although this is not covered by unit tests).
jamadden added a commit to zodb/relstorage that referenced this issue Jun 15, 2016
Discussed in zopefoundation/ZODB#50 this makes
it more consistent with ClientStorage and FileStorage.

Adjust zodbconvert to use this fact and remove all the special casing.
This should guarantee that zodbconvert works with
ClientStorage (although this is not covered by unit tests).
mamico pushed a commit to mamico/relstorage that referenced this issue Aug 29, 2016
Discussed in zopefoundation/ZODB#50 this makes
it more consistent with ClientStorage and FileStorage.

Adjust zodbconvert to use this fact and remove all the special casing.
This should guarantee that zodbconvert works with
ClientStorage (although this is not covered by unit tests).

(cherry picked from commit 968d4f0)
@jimfulton
Copy link
Member Author

Long live loadBefore!

navytux added a commit to navytux/ZODB that referenced this issue Nov 30, 2020
This backports to ZODB4 Connection ZODB5's approach to handle MVCC via
always calling storage.loadBefore() instead of "load for latest version
+ loadBefore if we were notified of database changes" approach.

Why?
----

Short answer: because Wendelin.core 2 needs to know at which particular
database state application-level ZODB connection is viewing the
database, and it is hard to implement such functionality correctly
without this backport. Please see appendix for the explanation.

What
----

This backports to ZODB4 the minimum necessary part of upstream commit 227953b
(Simplify MVCC by determining transaction start time using lastTransaction) +
follow-up correctness fixes:

zopefoundation#50
zopefoundation#56
zopefoundation#291
zopefoundation#307

In short:

- a Connection is always opened with explicitly corresponding to a particular database revision
- Connection uses only loadBefore with that revision to load objects
- every time a Connection is (re)opened, the result of queued invalidations and
  explicit query to storage.lastTransaction is carefully merged to refresh
  Connection's idea about which database state it corresponds to.

The "careful" in last point is important. Historically ZODB5 was first reworked
in commit 227953b (zopefoundation#56) to always
call lastTransaction to refresh state of Connection view. Since there
was no proper synchronisation with respect to process of handling
invalidations, that lead to data corruption issue due to race in
Connection.open() vs invalidations:

zopefoundation#290

That race and data corruption was fixed in commit b5895a5
(zopefoundation#291) by way of avoiding
lastTransaction call and relying only on invalidations channel when
refreshing Connection view.

This fix in turn led to another data corruption issue because in
presence of client-server reconnections, ZODB storage drivers can partly
skip notifying client with detailed invalidation messages:

zopefoundation#291 (comment)

A fix to that issue (zopefoundation#307)
proposed to change back to query storage for lastTransaction on every
Connection refresh, but to implement careful merging of lastTransaction
result and data from invalidation channel. However it was found that the
"careful merging" can work correctly only if we require from storage
drivers a particular ordering of invalidation events wrt lastTransaction
return and result:

zopefoundation#307 (comment)

While ZEO was already complying with that requirements, NEO had to be
fixed to support that:

zopefoundation#307 (comment)
https://lab.nexedi.com/nexedi/neoppod/commit/a7d101ec
https://lab.nexedi.com/nexedi/neoppod/commit/96a5c01f

Patch details
-------------

We change Connection._txn_time to be a "before" for the database state
to which Connection view corresponds. This state is hooked to be
initialized and updated in Connection._flush_invalidations - the
function that is called from both explicit Connection (re)open and at
transaction boundaries via Connection.afterCompletion hook.

Objects loading is factored into Connection._load which replaces old
"load + check invalidated + fallback to loadBefore" game in
Connection._setstate.

Connection.open now calls Connection._flush_invalidations
unconditionally - even if it was global cache reset event - because
besides invalidation flushes the latter is now responsible for querying
storage lastTransaction.

TmpStore - a "storage" that provides runtime support for savepoints - is
refactored correspondingly to delegate loading of original objects back
to underlying Connection.

DB.close is modified - similarly to ZODB5 - to release DB's Connections
carefully with preventing connections from DB poll from implicitly
starting new transactions via afterCompletion hook.

ZODB.nxd_patches is introduced to indicate to client software that this
particular patch is present and can be relied upon.

Tests are updated correspondingly. In 227953b Jim talks about
converting many tests - because

	"Lots of tests didn't clean up databases and connections properly"

and because new MVCC approach

	"makes database and connection hygiene a bit more important,
	especially for tests, because a connection will continue to interact
	with storages if it isn't properly closed, which can lead to errors if
	the storage is closed."

but finally implementing automatic cleanup at transaction boundaries
because there are too many tests to fix. We backport only automatic
cleanup + necessary explicit test fixes to keep the diff minimal.

All tests pass. This includes tests for ZODB itself, ZEO and NEO test
over hereby modified ZODB(*), my test programs from

zopefoundation#290	and
zopefoundation/ZEO#155

and ERP5 tests. Upcoming wendelin.core 2 also work with this change.

(*) ZEO, NEO and ERP5 tests fail sometimes, but there is no regression
here because ZEO, NEO and ERP5 tests are failing regularly, and in the
same way, even with unmodified ZODB.

Appendix. zconn_at
------------------

This appendix provides motivation for the backport:

For wendelin.core v2 we need a way to know at which particular database
state application-level ZODB connection is viewing the database. Knowing
that state, WCFS client library interacts with WCFS filesystem server
and, in simple terms, requests the server to provide data as of that
particular database state. Let us call the function that for a client
ZODB connection returns database state corresponding to its database
view zconn_at.

Now here is the problem: zconn_at is relatively easy to implement for
ZODB5 - see e.g. here:

https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.13-54-ga6a8f5b/lib/zodb.py#L142-181
https://lab.nexedi.com/nexedi/wendelin.core/commit/3bd82127

however, for ZODB4, since its operational models is not
directly MVCC, it is not that straightforward. Still, even for older
ZODB4, for every client connection, there _is_ such at that corresponds
to that connection view of the database.

We need ZODB4 support, because ZODB4 is currently the version that
Nexedi uses, and my understanding is that it will stay like this for not
a small time. I have the feeling that ZODB5 was reworked in better
direction, but without caring enough about quality which resulted in
concurrency bugs with data corruption effects like

zopefoundation#290
zopefoundation/ZEO#155
etc.

Even though the first one is now fixed (but it broke other parts and so
both ZODB had to be fixed again _and_ NEO had to be fixed for that ZODB
fix to work currently), I feel that upgrading into ZODB5 for Nexedi will
require some non-negligible amount of QA work, and thus it is better if
we move step-by-step - even if we eventually upgrade to ZODB5 - it is
better we first migrate wendelin.core 1 -> wendelin.core 2 with keeping
current version of ZODB.

Now please note what would happen if zconn_at gives, even a bit, wrong
answer: wcfs client will ask wcfs server to provide array data as of
different database state compared to current on-client ZODB connection.
This will result in that data accessed via ZBigArray will _not_
correspond to all other data accessed via regular ZODB mechanism.
It is, in other words, a data corruptions.
In some scenarios it can be harmless, but generally it is the worst
that can happen to a database.

It is good to keep in mind ZODB issue290 when imagining corner cases
that zconn_at has to deal with. Even though that issue is ZODB5 only, it
shows what kind of bugs it can be in zconn_at implementation for ZODB4.

Just for the reference: in Wendelin-based systems there is usually constant
stream of database ingestions coming from many places simultaneously. Plus many
activities crunching on the DB at the same time as well. And the more clients a
system handles, the more there will be level-of-concurrency increase. This
means that the problem of correctly handling concurrency issues in zconn_at is
not purely theoretical, but has direct relation to our systems.

--------

With this backport, zconn_at for ZODB4 becomes trivial and robust to implement:

https://lab.nexedi.com/kirr/wendelin.core/blob/484071b3/lib/zodb.py#L183-195

I would like to thank Joshua Wölfel whose internship helped this topic
to shape up:

https://www.erp5.com/project_section/wendelin-ia-project/forum/Joshua-internship-D8b7NNhWfz

/cc @Nexedi, @jwolf083
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants