Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include both modified and just created objects into invalidations #160

Merged
merged 1 commit into from
Jul 29, 2020

Conversation

navytux
Copy link
Contributor

@navytux navytux commented Jul 10, 2020

Starting from 1999 (b3805a2 "just getting started") only modified - not
just created - objects were included into ZEO invalidation messages:

b3805a2f#diff-52fb76aaf08a1643cdb8fdaf69e37802R126-R127

In 2000 this behaviour was further changed to not send invalidation
message at all if the only objects a transaction has were the created ones:

230ffbe8#diff-52fb76aaf08a1643cdb8fdaf69e37802L163-R163

In 2016 the latter was reconsidered as bug and fixed in ZEO5 because
ZODB5 relies more heavily on MVCC semantic and needs to be notified
about every transaction committed to storage to be able to properly
update ZODB.Connection view:

02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
9613f09b

In 2020, with this patch, I'm proposing to reconsider initial "send only
modified, not created objects" as bug, and include both modified and
just created objects into invalidation messages at least for the
following reasons:

  • a ZODB client (not necessarily native ZODB/py client) can maintain
    raw cache for the storage. If such client tries to load an oid at
    database view when that object did not existed yet, gets "no object"
    reply and stores that information into raw cache, to properly invalidate
    the cache it needs an invalidation message from ZODB server that
    includes created object.

  • tools like zodb watch [1,2,3] don't work properly (give incorrect output)
    if not all objects modified/created by a transaction are included into
    invalidation messages.

  • similarly to zodb watch, a monitoring tool, that would want to be
    notified of all created/modified objects, won't see full
    database-change picture, and so won't work properly without knowing
    which objects were created.

  • wendelin.core 2 - which builds data from ZODB BTrees and data objects
    into virtual filesystem - needs to get invalidation messages with both
    modified and created objects to properly implement its own lazy
    invalidation and isolation protocol for file blocks in OS cache: when
    a block of file is accessed, all clients, that have this block mmaped,
    need to be notified and asked to remmap that block into particular
    revision of the file depending on a client's view of the filesystem and
    database [4,5].

    To compute to where a client needs to remmap the block, WCFS server
    (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
    to see whether client's view of the filesystem is before object creation
    (and then ask that client to pin that block to hole), or after creation
    (and then ask the client to pin that block to corresponding revision).

    This computation needs ZODB server to send invalidation messages in
    full: with both modified and just created objects.

The patch is simple - it removes if serial != b"\0\0\0\0\0\0\0\0"
before queuing oid into ZEOStorage.invalidated, and adjusts the tests
correspondingly. From my point of view and experience, in practice, this
patch should not cause any compatibility break nor performance regressions.

Thanks beforehand,
Kirill

/cc @jimfulton

[1] https://lab.nexedi.com/kirr/neo/blob/ea53a795/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71

@navytux
Copy link
Contributor Author

navytux commented Jul 10, 2020

CI is failing with errors like

821 Failure in test checkTransactionExtensionFromIterator (ZEO.tests.testZEO.FileStorageClientHexTests)
822Traceback (most recent call last):
823  File "/opt/python/2.7.15/lib/python2.7/unittest/case.py", line 329, in run
824    testMethod()
825  File "/home/travis/build/zopefoundation/ZEO/eggs/ZODB-5.6.0-py2.7.egg/ZODB/tests/IteratorStorage.py", line 114, in checkTransactionExtensionFromIterator
826    self.assertNotEqual(extension_bytes, txn.extension_bytes)
827  File "/opt/python/2.7.15/lib/python2.7/unittest/case.py", line 522, in assertNotEqual
828    raise self.failureException(msg)
829AssertionError: '\x80\x03}q\x01U\x03fooK\x01s.' == '\x80\x03}q\x01U\x03fooK\x01s.'

and like

Error in test checkTransactionExtensionFromIterator (ZEO.tests.testZEO.BlobAdaptedFileStorageTests)
738Traceback (most recent call last):
739  File "/opt/python/3.4.8/lib/python3.4/unittest/case.py", line 58, in testPartExecutor
740    yield
741  File "/opt/python/3.4.8/lib/python3.4/unittest/case.py", line 580, in run
742    testMethod()
743  File "/home/travis/build/zopefoundation/ZEO/eggs/ZODB-5.6.0-py3.4.egg/ZODB/tests/IteratorStorage.py", line 101, in checkTransactionExtensionFromIterator
744    revid = self._dostore(oid, data=MinPO(1), extension=ext)
745  File "/home/travis/build/zopefoundation/ZEO/eggs/ZODB-5.6.0-py3.4.egg/ZODB/tests/StorageTestBase.py", line 157, in _dostore
746    r2 = self._storage.tpc_vote(t)
747  File "/home/travis/build/zopefoundation/ZEO/src/ZEO/ClientStorage.py", line 771, in tpc_vote
748    self._check_trans(txn, 'tpc_vote')
749  File "/home/travis/build/zopefoundation/ZEO/src/ZEO/ClientStorage.py", line 489, in _check_trans
750    raise ClientDisconnected(meth, 'on a disconnected transaction')
751ZEO.Exceptions.ClientDisconnected: ('tpc_vote', 'on a disconnected transaction')

That are unrelated to my patch and that were there before with current master.

In my understanding all this errors are caused by ZODB 5.6.0 upgrade, and are likely to go away after #159.

navytux added a commit to navytux/ZEO that referenced this pull request Jul 13, 2020
This is ZEO4 backport of zopefoundation#160

It changes ZEO4 to both include created objects into invalidation
messages, and, in turn, not to skip sending invalidation message at
all if committed transaction only creates objects. Please see original
description for details below.

Extra changes compared to ZEO5 patch:

- tests/servertesting.py: provide callAsyncNoSend to avoid the following crash:

  File ".../ZEO/tests/testZEO2.py", line 156, in ZEO.tests.testZEO2.proper_handling_of_errors_in_restart
  Failed example:
      zs1.tpc_finish('1').set_sender(0, conn1)
  Exception raised:
      Traceback (most recent call last):
        File "/usr/lib/python2.7/doctest.py", line 1315, in __run
          compileflags, 1) in test.globs
        File "<doctest ZEO.tests.testZEO2.proper_handling_of_errors_in_restart[18]>", line 1, in <module>
          zs1.tpc_finish('1').set_sender(0, conn1)
        File ".../ZEO/StorageServer.py", line 408, in tpc_finish
          self.storage.tpc_finish(self.transaction, self._invalidate)
        File ".../ZODB/FileStorage/FileStorage.py", line 741, in tpc_finish
          f(self._tid)
        File ".../ZEO/StorageServer.py", line 418, in _invalidate
          self.invalidated, self.get_size_info())
        File ".../ZEO/StorageServer.py", line 1110, in invalidate
          p.client.invalidateTransaction(tid, invalidated)
        File ".../ZEO/StorageServer.py", line 1466, in invalidateTransaction
          self.rpc.callAsyncNoSend('invalidateTransaction', tid, args)
      AttributeError: Connection instance has no attribute 'callAsyncNoSend'

- testZEO2.proper_handling_of_errors_in_restart: adjust it since now an
  invalidation message is sent and previously it was completely avoided
  because objects in that test are only created.

---- 8< ---- (original description)

Starting from 1999 (b3805a2 "just getting started") only modified - not
just created - objects were included into ZEO invalidation messages:

zopefoundation@b3805a2f#diff-52fb76aaf08a1643cdb8fdaf69e37802R126-R127

In 2000 this behaviour was further changed to not send invalidation
message at all if the only objects a transaction has were the created ones:

zopefoundation@230ffbe8#diff-52fb76aaf08a1643cdb8fdaf69e37802L163-R163

In 2016 the latter was reconsidered as bug and fixed in ZEO5 because
ZODB5 relies more heavily on MVCC semantic and needs to be notified
about every transaction committed to storage to be able to properly
update ZODB.Connection view:

zopefoundation@02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
zopefoundation@9613f09b

In 2020, with this patch, I'm proposing to reconsider initial "send only
modified, not created objects" as bug, and include both modified and
just created objects into invalidation messages at least for the
following reasons:

- a ZODB client (not necessarily native ZODB/py client) can maintain
  raw cache for the storage. If such client tries to load an oid at
  database view when that object did not existed yet, gets "no object"
  reply and stores that information into raw cache, to properly invalidate
  the cache it needs an invalidation message from ZODB server that
  _includes_ created object.

- tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
  if not all objects modified/created by a transaction are included into
  invalidation messages.

- similarly to `zodb watch`, a monitoring tool, that would want to be
  notified of all created/modified objects, won't see full
  database-change picture, and so won't work properly without knowing
  which objects were created.

- wendelin.core 2 - which builds data from ZODB BTrees and data objects
  into virtual filesystem - needs to get invalidation messages with both
  modified and created objects to properly implement its own lazy
  invalidation and isolation protocol for file blocks in OS cache: when
  a block of file is accessed, all clients, that have this block mmaped,
  need to be notified and asked to remmap that block into particular
  revision of the file depending on a client's view of the filesystem and
  database [4,5].

  To compute to where a client needs to remmap the block, WCFS server
  (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
  to see whether client's view of the filesystem is before object creation
  (and then ask that client to pin that block to hole), or after creation
  (and then ask the client to pin that block to corresponding revision).

  This computation needs ZODB server to send invalidation messages in
  full: with both modified and just created objects.

The patch is simple - it removes `if serial != b"\0\0\0\0\0\0\0\0"`
before queuing oid into ZEOStorage.invalidated, and adjusts the tests
correspondingly. From my point of view and experience, in practice, this
patch should not cause any compatibility break nor performance regressions.

Thanks beforehand,
Kirill

/cc @jimfulton

[1] https://lab.nexedi.com/kirr/neo/blob/ea53a795/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
@navytux
Copy link
Contributor Author

navytux commented Jul 13, 2020

ZEO4 backport: #161.

Copy link
Member

@jamadden jamadden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

@navytux
Copy link
Contributor Author

navytux commented Jul 13, 2020

@jamadden, thanks for feedback. Is it ok to apply the patch to master?

@jamadden
Copy link
Member

I would personally prefer to see the changes needed to make the builds green merged, then this rebased on master and merged when it too is green. PR #159 has been approved, it just needs to be merged. It wasn't clear that the author of that PR was satisfied with it though.

@navytux
Copy link
Contributor Author

navytux commented Jul 13, 2020

Thanks. Ok, let's wait for #159 to be resolved first. /cc @jmuchemb

@navytux
Copy link
Contributor Author

navytux commented Jul 13, 2020

For the record: the property that all objects - both modified and just created - are included into invalidation messages is required and can help to remove next_serial from loadBefore return. This, in turn, can help to do 2x less SQL queries in loadBefore for RelStorage and NEO (and maybe other storages too): zopefoundation/ZODB#318 (comment).

navytux added a commit to navytux/ZODB that referenced this pull request Jul 16, 2020
…s and not to skip transactions

Currently invalidate documentation is not clear whether it should be
called for every transaction and whether it should include full set of
objects created/modified by that transaction. Until now this was working
relatively well for the sole purpose of invalidating client ZEO cache,
because for that particular task it is relatively OK not to include just
created objects into invalidation messages, and even to completely skip
sending invalidation if transaction only create - not modify - objects.
Due to this fact the workings of the client cache was indifferent to the
ambiguity of the interface.

In 2016 skipping transactions with only created objects was reconsidered
as bug and fixed in ZEO5 because ZODB5 relies more heavily on MVCC
semantic and needs to be notified about every transaction committed to
storage to be able to properly update ZODB.Connection view:

zopefoundation/ZEO@02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
zopefoundation/ZEO@9613f09b

However just-created objects were not included into invalidation
messages until, hopefully, recently:

zopefoundation/ZEO#160

As ZODB is started to be used more widely in areas where it was not
traditionally used before, the ambiguity in invalidate interface and the
lack of guarantees - for any storage - to be notified with full set of
information, creates at least the following problems:

- a ZODB client (not necessarily native ZODB/py client) can maintain
  raw cache for the storage. If such client tries to load an oid at
  database view when that object did not existed yet, gets "no object"
  reply and stores that information into raw cache, to properly invalidate
  the cache it needs an invalidation message from ZODB server that
  *includes* created object.

- tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
  if not all objects modified/created by a transaction are included into
  invalidation messages.

- similarly to `zodb watch`, a monitoring tool, that would want to be
  notified of all created/modified objects, won't see full
  database-change picture, and so won't work properly without knowing
  which objects were created.

- wendelin.core 2 - which builds data from ZODB BTrees and data objects
  into virtual filesystem - needs to get invalidation messages with both
  modified and created objects to properly implement its own lazy
  invalidation and isolation protocol for file blocks in OS cache: when
  a block of file is accessed, all clients, that have this block mmaped,
  need to be notified and asked to remmap that block into particular
  revision of the file depending on a client's view of the filesystem and
  database [4,5].

  To compute to where a client needs to remmap the block, WCFS server
  (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
  to see whether client's view of the filesystem is before object creation
  (and then ask that client to pin that block to hole), or after creation
  (and then ask the client to pin that block to corresponding revision).

  This computation needs ZODB server to send invalidation messages in
  full: with both modified and just created objects.

Also:

- the property that all objects - both modified and just created -
  are included into invalidation messages is required and can help to
  remove `next_serial` from `loadBefore` return in the future.
  This, in turn, can help to do 2x less SQL queries in loadBefore for
  NEO and RelStorage (and maybe other storages too):
  zopefoundation#318 (comment)

Current state of storages with respect to new requirements:

- ZEO: does not skip transactions, but includes only modified - not
  created - objects. This is fixed by zopefoundation/ZEO#160

- NEO: already implements the requirements in full

- RelStorage: already implements the requirements in full, if I
  understand correctly:

  https://github.com/zodb/relstorage/blob/3.1.2-1-gaf57d6c/src/relstorage/adapters/poller.py#L28-L145

While editing invalidate documentation, use the occasion to document
recently added property that invalidate(tid) is always called before
storage starts to report its lastTransaction() ≥ tid - see 4a6b028
(mvccadapter: check if the last TID changed without invalidation).

/cc @jimfulton, @jamadden, @jmuchemb, @vpelletier, @arnaud-fontaine, @gidzit, @klawlf82, @jwolf083

[1] https://lab.nexedi.com/kirr/neo/blob/049cb9a0/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
@navytux
Copy link
Contributor Author

navytux commented Jul 16, 2020

Corresponding amendment to ZODB interfaces: zopefoundation/ZODB#319.

navytux added a commit to navytux/ZODB that referenced this pull request Jul 27, 2020
loadAt is new optional storage interface that is intended to replace loadBefore
with more clean and uniform semantic. Compared to loadBefore, loadAt:

1) returns data=None and serial of the removal, when loaded object was found to
   be deleted. loadBefore is returning only data=None in such case. This loadAt
   property allows to fix DemoStorage data corruption when whiteouts in overlay
   part were not previously correctly taken into account.

   zopefoundation#318

2) for regular data records, does not require storages to return next_serial,
   in addition to (data, serial). loadBefore requirement to return both
   serial and next_serial is constraining storages unnecessarily, and,
   while for FileStorage it is free to implement, for other storages it is
   not - for example for NEO and RelStorage, finding out next_serial, after
   looking up oid@at data record, costs one more SQL query:

   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508
   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482

   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264
   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199

   next_serial is not only about execution overhead - it is semantically
   redundant to be there and can be removed from load return. The reason
   I say that next_serial can be removed is that in ZODB/py the only place,
   that I could find, where next_serial is used on client side is in client
   cache (e.g. in NEO client cache), and that cache can be remade to
   work without using that next_serial at all. In simple words whenever
   after

     loadAt(oid, at)  ->  (data, serial)

   query, the cache can remember data for oid in [serial, at] range.

   Next, when invalidation message from server is received, cache entries,
   that had at == client_head, are extended (at -> new_head) for oids that
   are not present in invalidation message, while for oids that are present
   in invalidation message no such extension is done. This allows to
   maintain cache in correct state, invalidate it when there is a need to
   invalidate, and not to throw away cache entries that should remain live.
   This of course requires ZODB server to include both modified and
   just-created objects into invalidation messages

     ( zopefoundation/ZEO#160 ,
       zopefoundation#319 ).

   Switching to loadAt should thus allow storages like NEO and, maybe,
   RelStorage, to do 2x less SQL queries on every object access.

   zopefoundation#318 (comment)

In other words loadAt unifies return signature to always be

   (data, serial)

instead of

   POSKeyError				object does not exist at all
   None					object was removed
   (data, serial, next_serial)		regular data record

used by loadBefore.

This patch:

- introduces new interface.
- introduces ZODB.utils.loadAt helper, that uses either storage.loadAt,
  or, if the storage does not implement loadAt interface, tries to mimic
  loadAt semantic via storage.loadBefore to possible extent + emits
  corresponding warning.
- converts MVCCAdapter to use loadAt instead of loadBefore.
- changes DemoStorage to use loadAt, and this way fixes above-mentioned
  data corruption issue; adds corresponding test; converts
  DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt.
- adds loadAt implementation to FileStorage and MappingStorage.
- adapts other tests/code correspondingly.

/cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch
@navytux
Copy link
Contributor Author

navytux commented Jul 27, 2020

For the record: the property that all objects - both modified and just created - are included into invalidation messages is required and can help to remove next_serial from loadBefore return. This, in turn, can help to do 2x less SQL queries in loadBefore for RelStorage and NEO (and maybe other storages too): zopefoundation/ZODB#318 (comment).

Patch with loadAt - that amends loadBefore semantic as explained above - is here: zopefoundation/ZODB#323.

navytux added a commit to navytux/ZODB that referenced this pull request Jul 27, 2020
loadAt is new optional storage interface that is intended to replace loadBefore
with more clean and uniform semantic. Compared to loadBefore, loadAt:

1) returns data=None and serial of the removal, when loaded object was found to
   be deleted. loadBefore is returning only data=None in such case. This loadAt
   property allows to fix DemoStorage data corruption when whiteouts in overlay
   part were not previously correctly taken into account.

   zopefoundation#318

2) for regular data records, does not require storages to return next_serial,
   in addition to (data, serial). loadBefore requirement to return both
   serial and next_serial is constraining storages unnecessarily, and,
   while for FileStorage it is free to implement, for other storages it is
   not - for example for NEO and RelStorage, finding out next_serial, after
   looking up oid@at data record, costs one more SQL query:

   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508
   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482

   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264
   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199

   next_serial is not only about execution overhead - it is semantically
   redundant to be there and can be removed from load return. The reason
   I say that next_serial can be removed is that in ZODB/py the only place,
   that I could find, where next_serial is used on client side is in client
   cache (e.g. in NEO client cache), and that cache can be remade to
   work without using that next_serial at all. In simple words whenever
   after

     loadAt(oid, at)  ->  (data, serial)

   query, the cache can remember data for oid in [serial, at] range.

   Next, when invalidation message from server is received, cache entries,
   that had at == client_head, are extended (at -> new_head) for oids that
   are not present in invalidation message, while for oids that are present
   in invalidation message no such extension is done. This allows to
   maintain cache in correct state, invalidate it when there is a need to
   invalidate, and not to throw away cache entries that should remain live.
   This of course requires ZODB server to include both modified and
   just-created objects into invalidation messages

     ( zopefoundation/ZEO#160 ,
       zopefoundation#319 ).

   Switching to loadAt should thus allow storages like NEO and, maybe,
   RelStorage, to do 2x less SQL queries on every object access.

   zopefoundation#318 (comment)

In other words loadAt unifies return signature to always be

   (data, serial)

instead of

   POSKeyError				object does not exist at all
   None					object was removed
   (data, serial, next_serial)		regular data record

used by loadBefore.

This patch:

- introduces new interface.
- introduces ZODB.utils.loadAt helper, that uses either storage.loadAt,
  or, if the storage does not implement loadAt interface, tries to mimic
  loadAt semantic via storage.loadBefore to possible extent + emits
  corresponding warning.
- converts MVCCAdapter to use loadAt instead of loadBefore.
- changes DemoStorage to use loadAt, and this way fixes above-mentioned
  data corruption issue; adds corresponding test; converts
  DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt.
- adds loadAt implementation to FileStorage and MappingStorage.
- adapts other tests/code correspondingly.

/cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch
navytux added a commit to navytux/ZODB that referenced this pull request Jul 27, 2020
loadAt is new optional storage interface that is intended to replace loadBefore
with more clean and uniform semantic. Compared to loadBefore, loadAt:

1) returns data=None and serial of the removal, when loaded object was found to
   be deleted. loadBefore is returning only data=None in such case. This loadAt
   property allows to fix DemoStorage data corruption when whiteouts in overlay
   part were not previously correctly taken into account.

   zopefoundation#318

2) for regular data records, does not require storages to return next_serial,
   in addition to (data, serial). loadBefore requirement to return both
   serial and next_serial is constraining storages unnecessarily, and,
   while for FileStorage it is free to implement, for other storages it is
   not - for example for NEO and RelStorage, finding out next_serial, after
   looking up oid@at data record, costs one more SQL query:

   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508
   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482

   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264
   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199

   next_serial is not only about execution overhead - it is semantically
   redundant to be there and can be removed from load return. The reason
   I say that next_serial can be removed is that in ZODB/py the only place,
   that I could find, where next_serial is used on client side is in client
   cache (e.g. in NEO client cache), and that cache can be remade to
   work without using that next_serial at all. In simple words whenever
   after

     loadAt(oid, at)  ->  (data, serial)

   query, the cache can remember data for oid in [serial, at] range.

   Next, when invalidation message from server is received, cache entries,
   that had at == client_head, are extended (at -> new_head) for oids that
   are not present in invalidation message, while for oids that are present
   in invalidation message no such extension is done. This allows to
   maintain cache in correct state, invalidate it when there is a need to
   invalidate, and not to throw away cache entries that should remain live.
   This of course requires ZODB server to include both modified and
   just-created objects into invalidation messages

     ( zopefoundation/ZEO#160 ,
       zopefoundation#319 ).

   Switching to loadAt should thus allow storages like NEO and, maybe,
   RelStorage, to do 2x less SQL queries on every object access.

   zopefoundation#318 (comment)

In other words loadAt unifies return signature to always be

   (data, serial)

instead of

   POSKeyError				object does not exist at all
   None					object was removed
   (data, serial, next_serial)		regular data record

used by loadBefore.

This patch:

- introduces new interface.
- introduces ZODB.utils.loadAt helper, that uses either storage.loadAt,
  or, if the storage does not implement loadAt interface, tries to mimic
  loadAt semantic via storage.loadBefore to possible extent + emits
  corresponding warning.
- converts MVCCAdapter to use loadAt instead of loadBefore.
- changes DemoStorage to use loadAt, and this way fixes above-mentioned
  data corruption issue; adds corresponding test; converts
  DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt.
- adds loadAt implementation to FileStorage and MappingStorage.
- adapts other tests/code correspondingly.

/cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch
@navytux navytux force-pushed the y/invalidation-for-create branch from 587b7d9 to 25c6fa3 Compare July 29, 2020 07:13
@navytux
Copy link
Contributor Author

navytux commented Jul 29, 2020

( rebased the patch to master after #159 has been merged )

@navytux
Copy link
Contributor Author

navytux commented Jul 29, 2020

CI is ok besides pypy which failed in zeo-fan-out.test . I'm not sure whether it is related to my patch or not. For the reference: CI fo #159 itself has two failures in py27 and in pypy: https://travis-ci.org/github/zopefoundation/ZEO/builds/711701287.

Starting from 1999 (b3805a2 "just getting started") only modified - not
just created - objects were included into ZEO invalidation messages:

zopefoundation@b3805a2f#diff-52fb76aaf08a1643cdb8fdaf69e37802R126-R127

In 2000 this behaviour was further changed to not send invalidation
message at all if the only objects a transaction has were the created ones:

zopefoundation@230ffbe8#diff-52fb76aaf08a1643cdb8fdaf69e37802L163-R163

In 2016 the latter was reconsidered as bug and fixed in ZEO5 because
ZODB5 relies more heavily on MVCC semantic and needs to be notified
about every transaction committed to storage to be able to properly
update ZODB.Connection view:

zopefoundation@02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
zopefoundation@9613f09b

In 2020, with this patch, I'm proposing to reconsider initial "send only
modified, not created objects" as bug, and include both modified and
just created objects into invalidation messages at least for the
following reasons:

- a ZODB client (not necessarily native ZODB/py client) can maintain
  raw cache for the storage. If such client tries to load an oid at
  database view when that object did not existed yet, gets "no object"
  reply and stores that information into raw cache, to properly invalidate
  the cache it needs an invalidation message from ZODB server that
  _includes_ created object.

- tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
  if not all objects modified/created by a transaction are included into
  invalidation messages.

- similarly to `zodb watch`, a monitoring tool, that would want to be
  notified of all created/modified objects, won't see full
  database-change picture, and so won't work properly without knowing
  which objects were created.

- wendelin.core 2 - which builds data from ZODB BTrees and data objects
  into virtual filesystem - needs to get invalidation messages with both
  modified and created objects to properly implement its own lazy
  invalidation and isolation protocol for file blocks in OS cache: when
  a block of file is accessed, all clients, that have this block mmaped,
  need to be notified and asked to remmap that block into particular
  revision of the file depending on a client's view of the filesystem and
  database [4,5].

  To compute to where a client needs to remmap the block, WCFS server
  (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
  to see whether client's view of the filesystem is before object creation
  (and then ask that client to pin that block to hole), or after creation
  (and then ask the client to pin that block to corresponding revision).

  This computation needs ZODB server to send invalidation messages in
  full: with both modified and just created objects.

The patch is simple - it removes `if serial != b"\0\0\0\0\0\0\0\0"`
before queuing oid into ZEOStorage.invalidated, and adjusts the tests
correspondingly. From my point of view and experience, in practice, this
patch should not cause any compatibility break nor performance regressions.

Thanks beforehand,
Kirill

/cc @jimfulton

[1] https://lab.nexedi.com/kirr/neo/blob/ea53a795/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
@navytux navytux force-pushed the y/invalidation-for-create branch from 25c6fa3 to d072663 Compare July 29, 2020 07:35
@dataflake
Copy link
Member

dataflake commented Jul 29, 2020

CI is ok besides pypy which failed in zeo-fan-out.test . I'm not sure whether it is related to my patch or not. For the reference: CI fo #159 itself has two failures in py27 and in pypy: https://travis-ci.org/github/zopefoundation/ZEO/builds/711701287.

You're pointing to an outdated CI result from #159 that has passing test runs after it, so that specific test result doesn't mean much. The latest test run on your PR succeeds as well.

As I have found working on #159, the tests have timing/race condition issues in at least several places. Tests that fail during one run mysteriously succeed the next time. They're not a reliable problem indicator unless you've run them several times and you start noticing which specific tests tend to fail spuriously so you can distinguish test timing issues versus real issues.

@navytux
Copy link
Contributor Author

navytux commented Jul 29, 2020

@dataflake, thanks for feedback. Indeed CI on current master is green. CI for my patch, after plain re-upload, is also green now. I've quickly looked into what happens in zeo-fan-out.test, and it looks like the failure there is related to #155 ("Data corruption due to race condition with external invalidations"):

Now, let's see if we can break it. :)
>>> def f():
... c = db1.open(transaction.TransactionManager())
... r = c.root()
... i = 0
... while i < 100:
... r[1].v -= 1
... r[2].v += 1
... try:
... c.transaction_manager.commit()
... i += 1
... except ZODB.POSException.ConflictError:
... c.transaction_manager.abort()
... c.close()
>>> import threading
>>> threadf = threading.Thread(target=f)
>>> threadg = threading.Thread(target=f)
>>> threadf.start()
>>> threadg.start()
>>> s2 = db2.storage
>>> start_time = time.time()
>>> import os
>>> from ZEO.ClientStorage import _lock_blob
>>> while time.time() - start_time < 999:
... t = tm2.begin()
... if r2[1].v + r2[2].v:
... print('oops', r2[1], r2[2])
... if r2[1].v == 800:
... break # we caught up
... path = s2.fshelper.getBlobFilename(*blob_id)
... if os.path.exists(path):
... ZODB.blob.remove_committed(path)
... _ = s2.fshelper.createPathForOID(blob_id[0])
... blob_lock = _lock_blob(path)
... s2._call('sendBlob', *blob_id, timeout=9999)
... blob_lock.close()
... else: print('Dang')
>>> threadf.join()
>>> threadg.join()

It should not be related to my patch, which only extends invalidation messages contents, not change their timings or ordering.

So @jamadden, @jimfulton, everyone, is it ok to merge?

Kirill

P.S. Yes, ZEO5 state seems to be not very good with respect to races...

Copy link
Member

@jamadden jamadden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@navytux
Copy link
Contributor Author

navytux commented Jul 29, 2020

@jamadden, thanks, merging.

@navytux navytux merged commit ab86bd7 into zopefoundation:master Jul 29, 2020
@navytux navytux deleted the y/invalidation-for-create branch July 29, 2020 12:27
navytux added a commit that referenced this pull request Jul 29, 2020
@navytux
Copy link
Contributor Author

navytux commented Jul 29, 2020

( changelog entry added in 3de995f )

@navytux
Copy link
Contributor Author

navytux commented Jul 29, 2020

( hope it is ok )

@jmuchemb
Copy link
Member

jmuchemb commented Jul 29, 2020

Note that like NEO, ZEO is now more vulnerable to the memory leak that I tried to fix with zopefoundation/ZODB#185. Even if the leak already existed for ZEO, I mean that the new behaviour increases its probability/severity.

@navytux
Copy link
Contributor Author

navytux commented Jul 29, 2020

Note that like NEO, ZEO is now more vulnerable to the memory leak that I tried to fix with zopefoundation/ZODB#185. Even if the leak already existed for ZEO, I mean that the new behaviour increases its probability/severity.

@jmuchemb, thanks for the note. It indeed makes more sense to fix the leak by flushing invalidations immediately instead of queuing them indefinitely for closed connections that are kept in DB pool cache. Should we reopen zopefoundation/ZODB#185 to do this?

/cc @jimfulton, @jmadden, @icemac

navytux added a commit to navytux/ZODB that referenced this pull request Jul 29, 2020
…s and not to skip transactions

Currently invalidate documentation is not clear whether it should be
called for every transaction and whether it should include full set of
objects created/modified by that transaction. Until now this was working
relatively well for the sole purpose of invalidating client ZEO cache,
because for that particular task it is relatively OK not to include just
created objects into invalidation messages, and even to completely skip
sending invalidation if transaction only create - not modify - objects.
Due to this fact the workings of the client cache was indifferent to the
ambiguity of the interface.

In 2016 skipping transactions with only created objects was reconsidered
as bug and fixed in ZEO5 because ZODB5 relies more heavily on MVCC
semantic and needs to be notified about every transaction committed to
storage to be able to properly update ZODB.Connection view:

zopefoundation/ZEO@02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
zopefoundation/ZEO@9613f09b

However just-created objects were not included into invalidation
messages until, hopefully, recently:

zopefoundation/ZEO#160

As ZODB is started to be used more widely in areas where it was not
traditionally used before, the ambiguity in invalidate interface and the
lack of guarantees - for any storage - to be notified with full set of
information, creates at least the following problems:

- a ZODB client (not necessarily native ZODB/py client) can maintain
  raw cache for the storage. If such client tries to load an oid at
  database view when that object did not existed yet, gets "no object"
  reply and stores that information into raw cache, to properly invalidate
  the cache it needs an invalidation message from ZODB server that
  *includes* created object.

- tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
  if not all objects modified/created by a transaction are included into
  invalidation messages.

- similarly to `zodb watch`, a monitoring tool, that would want to be
  notified of all created/modified objects, won't see full
  database-change picture, and so won't work properly without knowing
  which objects were created.

- wendelin.core 2 - which builds data from ZODB BTrees and data objects
  into virtual filesystem - needs to get invalidation messages with both
  modified and created objects to properly implement its own lazy
  invalidation and isolation protocol for file blocks in OS cache: when
  a block of file is accessed, all clients, that have this block mmaped,
  need to be notified and asked to remmap that block into particular
  revision of the file depending on a client's view of the filesystem and
  database [4,5].

  To compute to where a client needs to remmap the block, WCFS server
  (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
  to see whether client's view of the filesystem is before object creation
  (and then ask that client to pin that block to hole), or after creation
  (and then ask the client to pin that block to corresponding revision).

  This computation needs ZODB server to send invalidation messages in
  full: with both modified and just created objects.

Also:

- the property that all objects - both modified and just created -
  are included into invalidation messages is required and can help to
  remove `next_serial` from `loadBefore` return in the future.
  This, in turn, can help to do 2x less SQL queries in loadBefore for
  NEO and RelStorage (and maybe other storages too):
  zopefoundation#318 (comment)

Current state of storages with respect to new requirements:

- ZEO: does not skip transactions, but includes only modified - not
  created - objects. This is fixed by zopefoundation/ZEO#160

- NEO: already implements the requirements in full

- RelStorage: already implements the requirements in full, if I
  understand correctly:

  https://github.com/zodb/relstorage/blob/3.1.2-1-gaf57d6c/src/relstorage/adapters/poller.py#L28-L145

While editing invalidate documentation, use the occasion to document
recently added property that invalidate(tid) is always called before
storage starts to report its lastTransaction() ≥ tid - see 4a6b028
(mvccadapter: check if the last TID changed without invalidation).

/cc @jimfulton, @jamadden, @jmuchemb, @vpelletier, @arnaud-fontaine, @gidzit, @klawlf82, @jwolf083

[1] https://lab.nexedi.com/kirr/neo/blob/049cb9a0/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
navytux added a commit to navytux/ZODB that referenced this pull request Jul 30, 2020
…s and not to skip transactions

Currently invalidate documentation is not clear whether it should be
called for every transaction and whether it should include full set of
objects created/modified by that transaction. Until now this was working
relatively well for the sole purpose of invalidating client ZEO cache,
because for that particular task it is relatively OK not to include just
created objects into invalidation messages, and even to completely skip
sending invalidation if transaction only create - not modify - objects.
Due to this fact the workings of the client cache was indifferent to the
ambiguity of the interface.

In 2016 skipping transactions with only created objects was reconsidered
as bug and fixed in ZEO5 because ZODB5 relies more heavily on MVCC
semantic and needs to be notified about every transaction committed to
storage to be able to properly update ZODB.Connection view:

zopefoundation/ZEO@02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
zopefoundation/ZEO@9613f09b

However just-created objects were not included into invalidation
messages until, hopefully, recently:

zopefoundation/ZEO#160

As ZODB is started to be used more widely in areas where it was not
traditionally used before, the ambiguity in invalidate interface and the
lack of guarantees - for any storage - to be notified with full set of
information, creates at least the following problems:

- a ZODB client (not necessarily native ZODB/py client) can maintain
  raw cache for the storage. If such client tries to load an oid at
  database view when that object did not existed yet, gets "no object"
  reply and stores that information into raw cache, to properly invalidate
  the cache it needs an invalidation message from ZODB server that
  *includes* created object.

- tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
  if not all objects modified/created by a transaction are included into
  invalidation messages.

- similarly to `zodb watch`, a monitoring tool, that would want to be
  notified of all created/modified objects, won't see full
  database-change picture, and so won't work properly without knowing
  which objects were created.

- wendelin.core 2 - which builds data from ZODB BTrees and data objects
  into virtual filesystem - needs to get invalidation messages with both
  modified and created objects to properly implement its own lazy
  invalidation and isolation protocol for file blocks in OS cache: when
  a block of file is accessed, all clients, that have this block mmaped,
  need to be notified and asked to remmap that block into particular
  revision of the file depending on a client's view of the filesystem and
  database [4,5].

  To compute to where a client needs to remmap the block, WCFS server
  (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
  to see whether client's view of the filesystem is before object creation
  (and then ask that client to pin that block to hole), or after creation
  (and then ask the client to pin that block to corresponding revision).

  This computation needs ZODB server to send invalidation messages in
  full: with both modified and just created objects.

Also:

- the property that all objects - both modified and just created -
  are included into invalidation messages is required and can help to
  remove `next_serial` from `loadBefore` return in the future.
  This, in turn, can help to do 2x less SQL queries in loadBefore for
  NEO and RelStorage (and maybe other storages too):
  zopefoundation#318 (comment)

Current state of storages with respect to new requirements:

- ZEO: does not skip transactions, but includes only modified - not
  created - objects. This is fixed by zopefoundation/ZEO#160

- NEO: already implements the requirements in full

- RelStorage: already implements the requirements in full, if I
  understand correctly:

  https://github.com/zodb/relstorage/blob/3.1.2-1-gaf57d6c/src/relstorage/adapters/poller.py#L28-L145

While editing invalidate documentation, use the occasion to document
recently added property that invalidate(tid) is always called before
storage starts to report its lastTransaction() ≥ tid - see 4a6b028
(mvccadapter: check if the last TID changed without invalidation).

/cc @jimfulton, @jamadden, @jmuchemb, @vpelletier, @arnaud-fontaine, @gidzit, @klawlf82, @jwolf083
/reviewed-on zopefoundation#319
/reviewed-by @dataflake

[1] https://lab.nexedi.com/kirr/neo/blob/049cb9a0/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
@jmuchemb
Copy link
Member

Should we reopen zopefoundation/ZODB#185 to do this?

It's a PR, not an issue, so no. The suggested implementation is wrong. I will be better to submit a patch in a new PR; actually, I started a draft but it's not finished. Or open an issue, but I don't need it to remember.

navytux added a commit to navytux/ZEO that referenced this pull request Jul 31, 2020
( Upstream commit ab86bd7 )

This is ZEO4 backport of zopefoundation#160

It changes ZEO4 to both include created objects into invalidation
messages, and, in turn, not to skip sending invalidation message at
all if committed transaction only creates objects. Please see original
description for details below.

Extra changes compared to ZEO5 patch:

- tests/servertesting.py: provide callAsyncNoSend to avoid the following crash:

  File ".../ZEO/tests/testZEO2.py", line 156, in ZEO.tests.testZEO2.proper_handling_of_errors_in_restart
  Failed example:
      zs1.tpc_finish('1').set_sender(0, conn1)
  Exception raised:
      Traceback (most recent call last):
        File "/usr/lib/python2.7/doctest.py", line 1315, in __run
          compileflags, 1) in test.globs
        File "<doctest ZEO.tests.testZEO2.proper_handling_of_errors_in_restart[18]>", line 1, in <module>
          zs1.tpc_finish('1').set_sender(0, conn1)
        File ".../ZEO/StorageServer.py", line 408, in tpc_finish
          self.storage.tpc_finish(self.transaction, self._invalidate)
        File ".../ZODB/FileStorage/FileStorage.py", line 741, in tpc_finish
          f(self._tid)
        File ".../ZEO/StorageServer.py", line 418, in _invalidate
          self.invalidated, self.get_size_info())
        File ".../ZEO/StorageServer.py", line 1110, in invalidate
          p.client.invalidateTransaction(tid, invalidated)
        File ".../ZEO/StorageServer.py", line 1466, in invalidateTransaction
          self.rpc.callAsyncNoSend('invalidateTransaction', tid, args)
      AttributeError: Connection instance has no attribute 'callAsyncNoSend'

- testZEO2.proper_handling_of_errors_in_restart: adjust it since now an
  invalidation message is sent and previously it was completely avoided
  because objects in that test are only created.

---- 8< ---- (original description)

Starting from 1999 (b3805a2 "just getting started") only modified - not
just created - objects were included into ZEO invalidation messages:

zopefoundation@b3805a2f#diff-52fb76aaf08a1643cdb8fdaf69e37802R126-R127

In 2000 this behaviour was further changed to not send invalidation
message at all if the only objects a transaction has were the created ones:

zopefoundation@230ffbe8#diff-52fb76aaf08a1643cdb8fdaf69e37802L163-R163

In 2016 the latter was reconsidered as bug and fixed in ZEO5 because
ZODB5 relies more heavily on MVCC semantic and needs to be notified
about every transaction committed to storage to be able to properly
update ZODB.Connection view:

zopefoundation@02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
zopefoundation@9613f09b

In 2020, with this patch, I'm proposing to reconsider initial "send only
modified, not created objects" as bug, and include both modified and
just created objects into invalidation messages at least for the
following reasons:

- a ZODB client (not necessarily native ZODB/py client) can maintain
  raw cache for the storage. If such client tries to load an oid at
  database view when that object did not existed yet, gets "no object"
  reply and stores that information into raw cache, to properly invalidate
  the cache it needs an invalidation message from ZODB server that
  _includes_ created object.

- tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
  if not all objects modified/created by a transaction are included into
  invalidation messages.

- similarly to `zodb watch`, a monitoring tool, that would want to be
  notified of all created/modified objects, won't see full
  database-change picture, and so won't work properly without knowing
  which objects were created.

- wendelin.core 2 - which builds data from ZODB BTrees and data objects
  into virtual filesystem - needs to get invalidation messages with both
  modified and created objects to properly implement its own lazy
  invalidation and isolation protocol for file blocks in OS cache: when
  a block of file is accessed, all clients, that have this block mmaped,
  need to be notified and asked to remmap that block into particular
  revision of the file depending on a client's view of the filesystem and
  database [4,5].

  To compute to where a client needs to remmap the block, WCFS server
  (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
  to see whether client's view of the filesystem is before object creation
  (and then ask that client to pin that block to hole), or after creation
  (and then ask the client to pin that block to corresponding revision).

  This computation needs ZODB server to send invalidation messages in
  full: with both modified and just created objects.

The patch is simple - it removes `if serial != b"\0\0\0\0\0\0\0\0"`
before queuing oid into ZEOStorage.invalidated, and adjusts the tests
correspondingly. From my point of view and experience, in practice, this
patch should not cause any compatibility break nor performance regressions.

Thanks beforehand,
Kirill

/cc @jimfulton

[1] https://lab.nexedi.com/kirr/neo/blob/ea53a795/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
navytux added a commit to navytux/ZODB that referenced this pull request Jul 31, 2020
loadAt is new optional storage interface that is intended to replace loadBefore
with more clean and uniform semantic. Compared to loadBefore, loadAt:

1) returns data=None and serial of the removal, when loaded object was found to
   be deleted. loadBefore is returning only data=None in such case. This loadAt
   property allows to fix DemoStorage data corruption when whiteouts in overlay
   part were not previously correctly taken into account.

   zopefoundation#318

2) for regular data records, does not require storages to return next_serial,
   in addition to (data, serial). loadBefore requirement to return both
   serial and next_serial is constraining storages unnecessarily, and,
   while for FileStorage it is free to implement, for other storages it is
   not - for example for NEO and RelStorage, finding out next_serial, after
   looking up oid@at data record, costs one more SQL query:

   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508
   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482

   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264
   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199

   next_serial is not only about execution overhead - it is semantically
   redundant to be there and can be removed from load return. The reason
   I say that next_serial can be removed is that in ZODB/py the only place,
   that I could find, where next_serial is used on client side is in client
   cache (e.g. in NEO client cache), and that cache can be remade to
   work without using that next_serial at all. In simple words whenever
   after

     loadAt(oid, at)  ->  (data, serial)

   query, the cache can remember data for oid in [serial, at] range.

   Next, when invalidation message from server is received, cache entries,
   that had at == client_head, are extended (at -> new_head) for oids that
   are not present in invalidation message, while for oids that are present
   in invalidation message no such extension is done. This allows to
   maintain cache in correct state, invalidate it when there is a need to
   invalidate, and not to throw away cache entries that should remain live.
   This of course requires ZODB server to include both modified and
   just-created objects into invalidation messages

     ( zopefoundation/ZEO#160 ,
       zopefoundation#319 ).

   Switching to loadAt should thus allow storages like NEO and, maybe,
   RelStorage, to do 2x less SQL queries on every object access.

   zopefoundation#318 (comment)

In other words loadAt unifies return signature to always be

   (data, serial)

instead of

   POSKeyError				object does not exist at all
   None					object was removed
   (data, serial, next_serial)		regular data record

used by loadBefore.

This patch:

- introduces new interface.
- introduces ZODB.utils.loadAt helper, that uses either storage.loadAt,
  or, if the storage does not implement loadAt interface, tries to mimic
  loadAt semantic via storage.loadBefore to possible extent + emits
  corresponding warning.
- converts MVCCAdapter to use loadAt instead of loadBefore.
- changes DemoStorage to use loadAt, and this way fixes above-mentioned
  data corruption issue; adds corresponding test; converts
  DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt.
- adds loadAt implementation to FileStorage and MappingStorage.
- adapts other tests/code correspondingly.

/cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch
navytux added a commit to zopefoundation/ZODB that referenced this pull request Aug 31, 2020
…s and not to skip transactions

Currently invalidate documentation is not clear whether it should be
called for every transaction and whether it should include full set of
objects created/modified by that transaction. Until now this was working
relatively well for the sole purpose of invalidating client ZEO cache,
because for that particular task it is relatively OK not to include just
created objects into invalidation messages, and even to completely skip
sending invalidation if transaction only create - not modify - objects.
Due to this fact the workings of the client cache was indifferent to the
ambiguity of the interface.

In 2016 skipping transactions with only created objects was reconsidered
as bug and fixed in ZEO5 because ZODB5 relies more heavily on MVCC
semantic and needs to be notified about every transaction committed to
storage to be able to properly update ZODB.Connection view:

zopefoundation/ZEO@02943acd#diff-52fb76aaf08a1643cdb8fdaf69e37802L889-R834
zopefoundation/ZEO@9613f09b

However just-created objects were not included into invalidation
messages until, hopefully, recently:

zopefoundation/ZEO#160

As ZODB is started to be used more widely in areas where it was not
traditionally used before, the ambiguity in invalidate interface and the
lack of guarantees - for any storage - to be notified with full set of
information, creates at least the following problems:

- a ZODB client (not necessarily native ZODB/py client) can maintain
  raw cache for the storage. If such client tries to load an oid at
  database view when that object did not existed yet, gets "no object"
  reply and stores that information into raw cache, to properly invalidate
  the cache it needs an invalidation message from ZODB server that
  *includes* created object.

- tools like `zodb watch` [1,2,3] don't work properly (give incorrect output)
  if not all objects modified/created by a transaction are included into
  invalidation messages.

- similarly to `zodb watch`, a monitoring tool, that would want to be
  notified of all created/modified objects, won't see full
  database-change picture, and so won't work properly without knowing
  which objects were created.

- wendelin.core 2 - which builds data from ZODB BTrees and data objects
  into virtual filesystem - needs to get invalidation messages with both
  modified and created objects to properly implement its own lazy
  invalidation and isolation protocol for file blocks in OS cache: when
  a block of file is accessed, all clients, that have this block mmaped,
  need to be notified and asked to remmap that block into particular
  revision of the file depending on a client's view of the filesystem and
  database [4,5].

  To compute to where a client needs to remmap the block, WCFS server
  (that in turn acts as ZODB client wrt ZEO/NEO server), needs to be able
  to see whether client's view of the filesystem is before object creation
  (and then ask that client to pin that block to hole), or after creation
  (and then ask the client to pin that block to corresponding revision).

  This computation needs ZODB server to send invalidation messages in
  full: with both modified and just created objects.

Also:

- the property that all objects - both modified and just created -
  are included into invalidation messages is required and can help to
  remove `next_serial` from `loadBefore` return in the future.
  This, in turn, can help to do 2x less SQL queries in loadBefore for
  NEO and RelStorage (and maybe other storages too):
  #318 (comment)

Current state of storages with respect to new requirements:

- ZEO: does not skip transactions, but includes only modified - not
  created - objects. This is fixed by zopefoundation/ZEO#160

- NEO: already implements the requirements in full

- RelStorage: already implements the requirements in full, if I
  understand correctly:

  https://github.com/zodb/relstorage/blob/3.1.2-1-gaf57d6c/src/relstorage/adapters/poller.py#L28-L145

While editing invalidate documentation, use the occasion to document
recently added property that invalidate(tid) is always called before
storage starts to report its lastTransaction() ≥ tid - see 4a6b028
(mvccadapter: check if the last TID changed without invalidation).

/cc @jimfulton, @jamadden, @jmuchemb, @vpelletier, @arnaud-fontaine, @gidzit, @klawlf82, @jwolf083
/reviewed-on #319
/reviewed-by @dataflake
/reviewed-by @jmuchemb

[1] https://lab.nexedi.com/kirr/neo/blob/049cb9a0/go/zodb/zodbtools/watch.go
[2] https://lab.nexedi.com/kirr/neo/commit/e0d59f5d
[3] https://lab.nexedi.com/kirr/neo/commit/c41c2907

[4] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/wcfs.go#L94-182
[5] https://lab.nexedi.com/kirr/wendelin.core/blob/1efb5876/wcfs/client/wcfs.h#L20-71
NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Oct 13, 2020
5.2.2 brings in important fix that is needed for wendelin.core 2:
zopefoundation/ZEO#160
NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Oct 14, 2020
5.2.2 brings in important fix that is needed for wendelin.core 2:
zopefoundation/ZEO#160
navytux added a commit to navytux/ZODB that referenced this pull request Mar 16, 2021
loadAt is new optional storage interface that is intended to replace loadBefore
with more clean and uniform semantic. Compared to loadBefore, loadAt:

1) returns data=None and serial of the removal, when loaded object was found to
   be deleted. loadBefore is returning only data=None in such case. This loadAt
   property allows to fix DemoStorage data corruption when whiteouts in overlay
   part were not previously correctly taken into account.

   zopefoundation#318

2) for regular data records, does not require storages to return next_serial,
   in addition to (data, serial). loadBefore requirement to return both
   serial and next_serial is constraining storages unnecessarily, and,
   while for FileStorage it is free to implement, for other storages it is
   not - for example for NEO and RelStorage, finding out next_serial, after
   looking up oid@at data record, costs one more SQL query:

   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L484-508
   https://lab.nexedi.com/nexedi/neoppod/blob/fb746e6b/neo/storage/database/mysqldb.py#L477-482

   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/storage/load.py#L259-L264
   https://github.com/zodb/relstorage/blob/3.1.1-1-ge7628f9/src/relstorage/adapters/mover.py#L177-L199

   next_serial is not only about execution overhead - it is semantically
   redundant to be there and can be removed from load return. The reason
   I say that next_serial can be removed is that in ZODB/py the only place,
   that I could find, where next_serial is used on client side is in client
   cache (e.g. in NEO client cache), and that cache can be remade to
   work without using that next_serial at all. In simple words whenever
   after

     loadAt(oid, at)  ->  (data, serial)

   query, the cache can remember data for oid in [serial, at] range.

   Next, when invalidation message from server is received, cache entries,
   that had at == client_head, are extended (at -> new_head) for oids that
   are not present in invalidation message, while for oids that are present
   in invalidation message no such extension is done. This allows to
   maintain cache in correct state, invalidate it when there is a need to
   invalidate, and not to throw away cache entries that should remain live.
   This of course requires ZODB server to include both modified and
   just-created objects into invalidation messages

     ( zopefoundation/ZEO#160 ,
       zopefoundation#319 ).

   Switching to loadAt should thus allow storages like NEO and, maybe,
   RelStorage, to do 2x less SQL queries on every object access.

   zopefoundation#318 (comment)

In other words loadAt unifies return signature to always be

   (data, serial)

instead of

   POSKeyError				object does not exist at all
   None					object was removed
   (data, serial, next_serial)		regular data record

used by loadBefore.

This patch:

- introduces new interface.
- introduces ZODB.utils.loadAt helper, that uses either storage.loadAt,
  or, if the storage does not implement loadAt interface, tries to mimic
  loadAt semantic via storage.loadBefore to possible extent + emits
  corresponding warning.
- converts MVCCAdapter to use loadAt instead of loadBefore.
- changes DemoStorage to use loadAt, and this way fixes above-mentioned
  data corruption issue; adds corresponding test; converts
  DemoStorage.loadBefore to be a wrapper around DemoStorage.loadAt.
- adds loadAt implementation to FileStorage and MappingStorage.
- adapts other tests/code correspondingly.

/cc @jimfulton, @jamadden, @vpelletier, @jmuchemb, @arnaud-fontaine, @gidzit, @klawlf82, @hannosch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants