-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad performance for SQLAlchemy >= 1.4 due to disabled caching #436
Comments
Hi @lonvia |
I figured it out. The conversion of the value of a SELECT ST_AsText(ST_GeomFromWKB(%s, %s)) which happens the second time that statement is used in a query, as seen in the output with sqlalchemy echo enabled:
I hacked around this by removing the convertion to a The root cause is relying on the conversion in an expression compile function. I don't think statement compilation is the place for that, and the compiler cache is doing the right thing when faced with the same function being called with the same parameter type (WKBElement). I know psycopg2 will convert a memoryview by default. So I tried making mysqlclient do the same by modifying the engine fixture in the tests: # For other dialects the engine is directly returned
connect_args = {}
if db_url.startswith("mysql://") or db_url.startswith("mariadb://"):
# top of file:
# from MySQLdb.converters import conversions as mysqldb_conv
conv = connect_args['conv'] = mysqldb_conv.copy()
def mv_conv(mv: memoryview, conv_dict: dict):
return mv.tobytes().hex()
conv[memoryview] = mv_conv
current_engine = create_engine(db_url, echo=_engine_echo, connect_args=connect_args)
current_engine.update_execution_options(search_path=["gis", "public"]) But I ran into the problem that
It shouldn't be passing it as a literal... even if that was the intent, the correct syntax is Tracing it to the source for mysqlclient was a dead end, since the code would have to be modified to pass With the driver as a dead end, I looked to sqlalchemy hooks and added this to the engine fixture instead of the connect args: if current_engine.dialect.name in ["mysql", "mariadb"]:
...
@event.listens_for(current_engine, "before_cursor_execute", retval=True)
def before_cursor_execute(
conn, cursor, statement, parameters, context, executemany
):
if isinstance(parameters, (tuple, list)):
parameters = tuple(x.tobytes() if isinstance(x, memoryview) else x
for x in parameters)
elif isinstance(parameters, dict):
for k in parameters:
if isinstance(parameters[k], memoryview):
parameters[k] = parameters[k].tobytes()
return statement, parameters ...and it works, with caching! I think the internals will need some changes to play nice and only pass Patchdiff --git a/geoalchemy2/admin/dialects/mysql.py b/geoalchemy2/admin/dialects/mysql.py
index 5b7ef6c..18dfe03 100644
--- a/geoalchemy2/admin/dialects/mysql.py
+++ b/geoalchemy2/admin/dialects/mysql.py
@@ -172,9 +172,6 @@ def _compile_GeomFromText_MySql(element, compiler, **kw):
def _compile_GeomFromWKB_MySql(element, compiler, **kw):
element.identifier = "ST_GeomFromWKB"
- wkb_data = list(element.clauses)[0].value
- if isinstance(wkb_data, memoryview):
- list(element.clauses)[0].value = wkb_data.tobytes()
compiled = compiler.process(element.clauses, **kw)
srid = element.type.srid
diff --git a/geoalchemy2/types/__init__.py b/geoalchemy2/types/__init__.py
index 7696698..c044413 100644
--- a/geoalchemy2/types/__init__.py
+++ b/geoalchemy2/types/__init__.py
@@ -110,7 +110,7 @@ class _GISType(UserDefinedType):
""" This is the way by which spatial operators are defined for
geometry/geography columns. """
- cache_ok = False
+ cache_ok = True
""" Disable cache for this type. """
def __init__(
@@ -245,7 +245,7 @@ class Geometry(_GISType):
""" The element class to use. Used by the parent class'
``result_processor`` method. """
- cache_ok = False
+ cache_ok = True
""" Disable cache for this type. """
@@ -275,7 +275,7 @@ class Geography(_GISType):
""" The element class to use. Used by the parent class'
``result_processor`` method. """
- cache_ok = False
+ cache_ok = True
""" Disable cache for this type. """
@@ -315,7 +315,7 @@ class Raster(_GISType):
""" The element class to use. Used by the parent class'
``result_processor`` method. """
- cache_ok = False
+ cache_ok = True
""" Disable cache for this type. """
def __init__(self, spatial_index=True, from_text=None, name=None, nullable=True) -> None:
diff --git a/tests/conftest.py b/tests/conftest.py
index df75114..a588d01 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -3,7 +3,7 @@ from pathlib import Path
import pytest
from sqlalchemy import MetaData
-from sqlalchemy import create_engine
+from sqlalchemy import create_engine, event
from sqlalchemy import text
from sqlalchemy.dialects.mysql.base import MySQLDialect
from sqlalchemy.dialects.sqlite.base import SQLiteDialect
@@ -236,6 +236,20 @@ def engine(tmpdir, db_url, _engine_echo, _require_all_dialects):
else:
pytest.skip(reason=msg)
+ @event.listens_for(current_engine, "before_cursor_execute", retval=True)
+ def before_cursor_execute(
+ conn, cursor, statement, parameters, context, executemany
+ ):
+ if isinstance(parameters, (tuple, list)):
+ parameters = tuple(x.tobytes() if isinstance(x, memoryview) else x
+ for x in parameters)
+ elif isinstance(parameters, dict):
+ for k in parameters:
+ if isinstance(parameters[k], memoryview):
+ parameters[k] = parameters[k].tobytes()
+
+ return statement, parameters
+
yield current_engine
current_engine.dispose() |
Ooooh that's super interesting @calebj , thank you very much for digging into this!!! |
Hi there. |
I use PostgreSQL with asyncpg so I have no use cases that need the plugin, but the changes didn't break anything as a side effect when I added it to the engine. My only concern is regarding drivers that properly support conversion functions (most of them do). Checking all of the parameters in each statement is a heavy-handed fix to work around bad behavior of the driver used in the test suite. Most of the others support registering converter functions that work. Unfortunately, there is no universal case that can be added in geoalchemy. It would be nice to be able to disable the hook, maybe with a URL parameter. EDIT: I linked the changes to implement the option as a URL parameter on the PR. |
Thank you very much for your feedback and suggestion @calebj ! |
I understand how the plugin works. The concern is that the workaround applies to all MySQL drivers, when the workaround might only be necessary for some of them. We already know mysqldb misbehaves, but all tests pass using a converter on pymysql. Although it's impossible to test the async drivers aiomysql and asyncmy in this test suite, they should work the same way since they are heavily based on pymysql. That's why I think a parameter to disable the hook is called for. |
Ohhh ok, I misunderstood, I thought you were speaking about dialects instead of drivers, I see your point now, thanks for your explanation! |
#338 has disable caching for all Geometry and Geography types. This effectively disables SQLAlchemy's caching mechanism for anything where geometry columns are involved.
Take this simple table:
Selecting the full table will create a cache key:
while only selecting the geometry column does not:
If I change Geometry to
cache_ok=True
, then I measure about 20% performance gain in my application, so the difference is fairly significant.The text was updated successfully, but these errors were encountered: