Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: SQLConnector connection lifecycle handling #1393

Closed
qbatten opened this issue Feb 6, 2023 · 6 comments
Closed

refactor: SQLConnector connection lifecycle handling #1393

qbatten opened this issue Feb 6, 2023 · 6 comments

Comments

@qbatten
Copy link
Contributor

qbatten commented Feb 6, 2023

Feature scope

Other

Description

This is a sub-issue broken out of #1352.

This comment explains this sub-issue in detail.

The idea here is to make the SQLConnector correctly handle it's database connection, and take on sole ownership of that job. In other words:

  • The connection itself is protected
  • Streams & other classes/functions/etc are not allowed to open/close/access a connection (subclasses of SQLConnector are still allowed & expected to do so).
  • The SQLConnector itself opens and closes connections correctly and doesn't leave connections hanging open

Some more detailed next steps here are:

  1. Make a SQLConnector (protected) method, _connect that can be used as a context manager, and use that every single time a connection is opened. (An alternative is just to make sure that any connection opened gets closed as soon as it's done, but I think that's not as nice & is more prone to error.)
  2. Only the SQLConnector can use the connection. No other objects should access or be aware of the SQLConnector's connection; and anything that other objects ask the SQLConnector to do that involve interaction w the database should be accessible via a SQLConnector method that correctly opens and closes the connection for that db interaction.

To fully realize this goal, we need to make breaking changes. In particular:

  • Stop accepting a _connection arg on instantiation
  • Stop storing and accessing _connection altogether
  • Remove connection and create_sqlalchemy_connection. It's a public method that returns a connection. That encourages bad behavior.
  • Remove create_sqlalchemy_engine. There shouldn't be a public method for creating or getting a sqlalchemy engine; also there should just be one engine per SQLConnector.
@qbatten qbatten added kind/Feature New feature or request valuestream/SDK labels Feb 6, 2023
@qbatten
Copy link
Contributor Author

qbatten commented Feb 6, 2023

There is a mostly-not-breaking way to do this, and I'm going to push a draft PR of that soon. It does...sorta break things, but only for subclasses.

The breaking change is as follows: we currently allow someone to pass _connection. In order to move towards the goal of this issue, that must be removed, it's just incompatible with this change.

@tayloramurphy
Copy link
Collaborator

tayloramurphy commented Feb 6, 2023

Thanks for the issue / PR! I'll have the engineering team take a look at the PR.

@BuzzCutNorman
Copy link
Contributor

BuzzCutNorman commented Feb 7, 2023

If you remove create_sqlalchemy_engine would there, be a new way for developers to set engine arguments? I have overloaded it to allow some user configuration options in this portion of my tap-mssql.

@qbatten
Copy link
Contributor Author

qbatten commented Feb 7, 2023

@BuzzCutNorman you should be able to do that exact thing, by overriding the _engine method instead.

E.g.

    def _engine(self) -> Engine:
        """Return the engine object.
        Returns:
            The SQLAlchemy Engine that's attached to this SQLConnector instance.
        """

        if not self._cached_engine:
            eng_prefix = "ep."
            eng_config = {f"{eng_prefix}url":self.sqlalchemy_url,f"{eng_prefix}echo":"False"}

            if self.config.get('sqlalchemy_eng_params'):
                for key, value in self.config['sqlalchemy_eng_params'].items():
                    eng_config.update({f"{eng_prefix}{key}": value})
           self._cached_engine = sqlalchemy.engine_from_config(eng_config, prefix=eng_prefix)
        return cast(Engine, self._cached_engine)

I just copied/pasted/edited that in the github comment text box so the spacing might be off lol. Hope that makes sense though. Basically, instead of create_sqlalchemy_engine, we're just creating and caching a single engine in _engine. So you'd replace the code you linked to with my snippet above. And you just gotta make sure to hold onto the (new!) if not self._cached_engine... and self._cached_engine = sqlalchemy.engine_from_config...{or however the engine is being made} so that the caching behavior is preserved.

Here's the method in the PR that I'm talking about ("_engine")

@BuzzCutNorman
Copy link
Contributor

Thanks, for the explanation and example. much appreciated.

@aaronsteers aaronsteers moved this to Up Next in Office Hours Feb 8, 2023
@aaronsteers aaronsteers moved this from Up Next to Discussed in Office Hours Feb 8, 2023
@qbatten
Copy link
Contributor Author

qbatten commented Feb 14, 2023

Solved by #1394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

3 participants