perf: optimize row merging #628

igorbernstein2 · 2022-08-08T20:33:41Z

This PR rewrites the row merging logic to be more correct and improve performance:

extract row merging logic into its own class to simplify complexity of ReadRows handling
Use OrderedDict instead of dict() for {family: { qualifier: [] }} data, this should maintain serverside ordering (family in creation order and qualifier in lexiographical).
define an explicit state machine with states implemented as methods
add various optimizations like:
- slots on hot objects to avoid dict lookups
- avoiding dict lookups for contiguous family and qualifier keys

Overall this improves performance by 20% and in my opinion is a lot more readable

- extract read-rows-acceptance-test.json based tests into own file - update the json to match the latest available in https://github.com/googleapis/conformance-tests/tree/main/bigtable/v2 - use parameterized pytest test to run all of the scenarios (instead of creating a function for each json blob) - use json protobufs to parse the file I left a TODO to allow easy updates of the file, unfortunately its not straight forward as the canonical protos get renamed for python gapic Next PR will extract row merging functionality from row_data to make it easier to maintain

snippet-bot · 2022-08-08T20:33:45Z

No region tags are edited in this PR.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

Refresh this comment

google/cloud/bigtable/row_merger.py

google/cloud/bigtable/row_data.py

Mariatta · 2022-08-11T18:11:59Z

Can you add the documentation of row_merger to the Sphinx documentation?
First create docs/row-merger.rst (see the example content from row-data.rst)

Then add row-merger to the usage.rst line 18.

We should also update the usages of row_data.Cell / row_data.PartialRowData / row_data.InvalidChunk in the documentation to use the row_merger instead.

This is in preparation for extracting row merging into a separate class. See googleapis#628

dbolduc

The row merging logic LGTM. I left some nits that you can fix or not.

google/cloud/bigtable/row_merger.py

* chore: move row value classes out of row_data This is in preparation for extracting row merging into a separate class. See #628 Co-authored-by: Anthonios Partheniou <partheniou@google.com>

# Conflicts: # google/cloud/bigtable/row_data.py

# Conflicts: # google/cloud/bigtable/row_merger.py

igorbernstein2 added 9 commits August 5, 2022 13:04

fix type annotation

32c0551

fix lints

0dcdec4

fix coverage

a6345e0

fix pytest warning

36e29ca

wip - new row merger

5c89bc7

wip: split into methods

b33f5ad

Merge remote-tracking branch 'origin/main' into row_merger

894d3da

format

6465a9d

igorbernstein2 requested review from a team as code owners August 8, 2022 20:33

product-auto-label bot added size: xl Pull request size is extra large. api: bigtable Issues related to the googleapis/python-bigtable API. labels Aug 8, 2022

igorbernstein2 requested review from mutianf and Mariatta August 8, 2022 20:34

igorbernstein2 added 3 commits August 8, 2022 16:35

tweaks

6b70004

lint

1fd7e46

fixes

01f4680

mutianf requested changes Aug 9, 2022

View reviewed changes

kolea2 changed the title ~~chore: optimize row merging~~ perf: optimize row merging Aug 10, 2022

igorbernstein2 added 7 commits August 10, 2022 14:45

improve coverage

a93efbc

lint

a848082

typo

60f7fee

comment

1ca5745

coverage

b73ec7d

coverage

71f4513

lint

62a6c19

mutianf approved these changes Aug 10, 2022

View reviewed changes

remove unecssary check

2ca3e62

igorbernstein2 added a commit to igorbernstein2/python-bigtable that referenced this pull request Aug 11, 2022

chore: move row value classes out of row_data

88cddbf

This is in preparation for extracting row merging into a separate class. See googleapis#628

igorbernstein2 mentioned this pull request Aug 11, 2022

chore: move row value classes out of row_data #633

Merged

dbolduc approved these changes Aug 12, 2022

View reviewed changes

igorbernstein2 added 2 commits August 17, 2022 14:02

Merge remote-tracking branch 'origin/main' into row_merger

9db0027

# Conflicts: # google/cloud/bigtable/row_data.py

Merge remote-tracking branch 'igor/row_merger' into row_merger

ef8cf3f

# Conflicts: # google/cloud/bigtable/row_merger.py

product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Aug 17, 2022

igorbernstein2 added 3 commits August 17, 2022 14:07

address feedback

37d1ced

flake8

9012af1

revert accidental commit

2b28101

Mariatta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 17, 2022

Mariatta approved these changes Aug 17, 2022

View reviewed changes

igorbernstein2 added kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Aug 17, 2022

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 17, 2022

igorbernstein2 merged commit c71ec70 into googleapis:main Aug 17, 2022

igorbernstein2 deleted the row_merger branch August 17, 2022 20:04

release-please bot mentioned this pull request Aug 17, 2022

chore(main): release 2.11.3 #638

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize row merging #628

perf: optimize row merging #628

igorbernstein2 commented Aug 8, 2022 •

edited

Loading

snippet-bot bot commented Aug 8, 2022 •

edited

Loading

Mariatta commented Aug 11, 2022

dbolduc left a comment

perf: optimize row merging #628

perf: optimize row merging #628

Conversation

igorbernstein2 commented Aug 8, 2022 • edited Loading

snippet-bot bot commented Aug 8, 2022 • edited Loading

No region tags are edited in this PR.

Mariatta commented Aug 11, 2022

dbolduc left a comment

Choose a reason for hiding this comment

igorbernstein2 commented Aug 8, 2022 •

edited

Loading

snippet-bot bot commented Aug 8, 2022 •

edited

Loading