You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Targets (data type handling, batching, SQL object generation, tests, etc.)
Description
In this issue, there was a discussion on refactoring connections.
Background
To sum up, if I understand correctly:
There were a lot of dangling connections because SQLConnector did not encapsulate the connection management itself, but Taps/Streams and other objects were opening connections (and not closing them), which generated problems.
The solutions that was agreed was making SQLConnector encapsulate the connection, so only SQLConnector can use the connection object, and connection handling should be opaque to other objects.
Problem
The problem I am seeing here is that I am not seeing an easy way of managing transactions then. Right now, the SQLConnector, for each method, uses with self._connect() as conn, conn.begin():.
But I think this does not allow us to group several operations in a single transaction. I know not all DBMS allow to include DDL operations in a transaction, but a lot of them do.
Concretly, I am working on the target-postgres here. Right now:
SQLConnector is extended and sublcassed into PostgresConnector.
A lot of methods are overriden with a different signature, with a lot of copy&paste from the original method, but only changing a couple of lines to include everything in one transaction.
Over time, this makes the overriden implementation diverge (a lot) from the Meltano SDK, which is not ideal.
Having a single transaction can make sense in targets, where I might want to have the ALTER statement of a table and the bulk_insert_records() in a single transaction.
Ideas
View SQLConnector as just a wrapper on SQLAlchemy to encapsulate some simple operations and it should resemble as much as possible SQLAlchemy's implementation. SQLAlchemy expects commits and connections to be managed by the caller, therefore maybe this class is not the best place to manage connections. Thinking of targets now, maybe SQLSink is a better place? (that's where the current target-postgres implementation manages it). Therefore, all DDL methods should have a connector or session argument that you can pass to them. If this argument was None, we could default to the current implementation of with self._connect() as conn, conn.begin():.
Create another intermediate class to encapsulate the DDL but not the execution itself. For example, now we have:
Feature scope
Targets (data type handling, batching, SQL object generation, tests, etc.)
Description
In this issue, there was a discussion on refactoring connections.
Background
To sum up, if I understand correctly:
There were a lot of dangling connections because SQLConnector did not encapsulate the connection management itself, but Taps/Streams and other objects were opening connections (and not closing them), which generated problems.
The solutions that was agreed was making SQLConnector encapsulate the connection, so only SQLConnector can use the connection object, and connection handling should be opaque to other objects.
Problem
The problem I am seeing here is that I am not seeing an easy way of managing transactions then. Right now, the SQLConnector, for each method, uses
with self._connect() as conn, conn.begin():
.But I think this does not allow us to group several operations in a single transaction. I know not all DBMS allow to include DDL operations in a transaction, but a lot of them do.
Concretly, I am working on the target-postgres here. Right now:
SQLConnector
is extended and sublcassed intoPostgresConnector
.Over time, this makes the overriden implementation diverge (a lot) from the Meltano SDK, which is not ideal.
Having a single transaction can make sense in targets, where I might want to have the
ALTER
statement of a table and thebulk_insert_records()
in a single transaction.Ideas
View
SQLConnector
as just a wrapper on SQLAlchemy to encapsulate some simple operations and it should resemble as much as possible SQLAlchemy's implementation. SQLAlchemy expects commits and connections to be managed by the caller, therefore maybe this class is not the best place to manage connections. Thinking of targets now, maybeSQLSink
is a better place? (that's where the current target-postgres implementation manages it). Therefore, all DDL methods should have aconnector
orsession
argument that you can pass to them. If this argument was None, we could default to the current implementation ofwith self._connect() as conn, conn.begin():
.Create another intermediate class to encapsulate the DDL but not the execution itself. For example, now we have:
We could have something like:
@qbatten , you were the OP of the issue I link, if you have any ideas they would be very welcome!
The text was updated successfully, but these errors were encountered: