Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: db connection analytics #13346

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 13 additions & 11 deletions superset/databases/commands/create.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
from flask_appbuilder.security.sqla.models import User
from marshmallow import ValidationError

from superset import app
from superset.commands.base import BaseCommand
from superset.dao.exceptions import DAOCreateFailedError
from superset.databases.commands.exceptions import (
Expand All @@ -33,11 +32,9 @@
)
from superset.databases.commands.test_connection import TestConnectionDatabaseCommand
from superset.databases.dao import DatabaseDAO
from superset.extensions import db, security_manager
from superset.extensions import db, event_logger, security_manager

logger = logging.getLogger(__name__)
config = app.config
stats_logger = config["STATS_LOGGER"]


class CreateDatabaseCommand(BaseCommand):
Expand All @@ -54,10 +51,12 @@ def run(self) -> Model:
try:
TestConnectionDatabaseCommand(self._actor, self._properties).run()
except Exception:
db.session.rollback()
stats_logger.incr(
f"db_connection_failed.{database.db_engine_spec.__name__}"
)
with event_logger.log_context(
action="db_connection_failed",
engine=database.db_engine_spec.__name__,
):
db.session.rollback()

raise DatabaseConnectionFailedError()

# adding a new database we always want to force refresh schema list
Expand All @@ -69,8 +68,9 @@ def run(self) -> Model:
security_manager.add_permission_view_menu("database_access", database.perm)
db.session.commit()
except DAOCreateFailedError as ex:
logger.exception(ex.exception)
raise DatabaseCreateFailedError()
with event_logger.log_context(action=f"db_creation_failed.{ex.exception}"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to just use log in places like this. The main difference between log and log_context is that the former, in addition to being a context manager, logs the duration automatically, together with some dashboard/chart metadata that is irrelevant here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true... it also logs the user_id and referrer, so maybe it's still useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to either 1) refactor log to do some of the magic in log_context maybe add a bool flag enrich_with_request_context=False, or 2) can we just call the context manager with the with block? Would that just work? or 3) refactor log_context to NOT be be context manager and create a new one say log_context_manager (or better name) that's used only when we want duration / with block

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem is that adding the flag enrich_with_request_context would be a breaking change (our event logger, for example, would have to be updated).

I think the best way here is:

  1. Create a new method in AbstractEventLogger called log_with_context, that does all the job that log_context does but is a function, not a context manager. The method calls log at the end, with all the extracted context.
  2. Change AbstractEventLogger.log_context to use log_with_context for the enrichment, removing the shared code between them.
  3. In this PR, call the new log_with_context. Since it delegates the actual logging to log it will work with all existing event loggers, and is not a breaking change..

Eventually I think we should have the event logger itself be a context manager by implementing __enter__ and __exit__ methods, so we could call:

with event_logger(action="foo"):
   do_something()

Since we can do this by adding methods to AbstractEventLogger it would also not be a breaking change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick clarification that enrich_with_request_context would be false by default to protect backwards compatibity. But maybe function composition is better than a parameter here, I'm ok with with either pattern.

There's a need to really clarify/improve the method in AbstractEventLogger with good/clear names here between functions we can call, content manager(s) and decorator(s)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're merging this branch into @hughhhh's feature branch rather than master, @betodealmeida said that he could do the log_context refactor in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on the refactor right now.

@mistercrunch you're right, I forgot that log takes a **kwargs that would swallow the new argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hughhhh I saw this earlier today, that we're currently logging the entire exception. Can we change this on line 71 and 72 to log just the class? I originally thought since we were already logging it, it would be fine to pass to the event logger, but I think we should actually clean this up as well.

logger.exception(ex.exception)
raise DatabaseCreateFailedError()
return database

def validate(self) -> None:
Expand All @@ -90,4 +90,6 @@ def validate(self) -> None:
if exceptions:
exception = DatabaseInvalidError()
exception.add_list(exceptions)
raise exception

with event_logger.log_context(action="db_connection_failed"):
raise exception
50 changes: 29 additions & 21 deletions superset/databases/commands/test_connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
from sqlalchemy.engine.url import make_url
from sqlalchemy.exc import DBAPIError, NoSuchModuleError

from superset import app
from superset.commands.base import BaseCommand
from superset.databases.commands.exceptions import (
DatabaseSecurityUnsafeError,
Expand All @@ -33,11 +32,10 @@
)
from superset.databases.dao import DatabaseDAO
from superset.exceptions import SupersetSecurityException
from superset.extensions import event_logger
from superset.models.core import Database

logger = logging.getLogger(__name__)
config = app.config
stats_logger = config["STATS_LOGGER"]


class TestConnectionDatabaseCommand(BaseCommand):
Expand All @@ -58,34 +56,44 @@ def run(self) -> None:
impersonate_user=self._properties.get("impersonate_user", False),
encrypted_extra=self._properties.get("encrypted_extra", "{}"),
)
if database is not None:
database.set_sqlalchemy_uri(uri)
database.db_engine_spec.mutate_db_for_connection_test(database)
username = self._actor.username if self._actor is not None else None
engine = database.get_sqla_engine(user_name=username)
if database is None:
raise DBAPIError(None, None, None)

database.set_sqlalchemy_uri(uri)
database.db_engine_spec.mutate_db_for_connection_test(database)
username = self._actor.username if self._actor is not None else None
engine = database.get_sqla_engine(user_name=username)
with closing(engine.raw_connection()) as conn:
if not engine.dialect.do_ping(conn):
raise DBAPIError(None, None, None)

with event_logger.log_context(
action="test_connection_success", engine=make_url(uri).drivername,
):
return
except (NoSuchModuleError, ModuleNotFoundError):
driver_name = make_url(uri).drivername
raise DatabaseTestConnectionDriverError(
message=_("Could not load database driver: {}").format(driver_name),
)
except DBAPIError as ex:
stats_logger.incr(
f"test_connection_error.{make_url(uri).drivername}.{ex.__class__.__name__}"
)
raise DatabaseTestConnectionFailedError()
with event_logger.log_context(
action=f"test_connection_error.{ex.__class__.__name__}",
engine=make_url(uri).drivername,
):
raise DatabaseTestConnectionFailedError()
except SupersetSecurityException as ex:
stats_logger.incr(
f"test_connection_error.{make_url(uri).drivername}.{ex.__class__.__name__}"
)
raise DatabaseSecurityUnsafeError(message=str(ex))
except Exception as ex:
stats_logger.incr(
f"test_connection_error.{make_url(uri).drivername}.{ex.__class__.__name__}"
)
raise DatabaseTestConnectionUnexpectedError()
with event_logger.log_context(
action=f"test_connection_error.{ex.__class__.__name__}",
engine=make_url(uri).drivername,
):
raise DatabaseSecurityUnsafeError(message=str(ex))
except Exception as ex: # pylint: disable=broad-except
with event_logger.log_context(
action=f"test_connection_error.{ex.__class__.__name__}",
engine=make_url(uri).drivername,
):
raise DatabaseTestConnectionUnexpectedError()

def validate(self) -> None:
database_name = self._properties.get("database_name")
Expand Down
5 changes: 3 additions & 2 deletions superset/utils/log.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,14 @@ def log_context( # pylint: disable=too-many-locals
from superset.views.core import get_form_data

start_time = time.time()
referrer = request.referrer[:1000] if request.referrer else None
user_id = g.user.get_id() if hasattr(g, "user") and g.user else None
payload_override = {}

# yield a helper to add additional payload
yield lambda **kwargs: payload_override.update(kwargs)

referrer = request.referrer[:1000] if request.referrer else None
user_id = g.user.get_id() if hasattr(g, "user") and g.user else None

payload = collect_request_payload()
if object_ref:
payload["object_ref"] = object_ref
Expand Down
133 changes: 95 additions & 38 deletions tests/databases/commands_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,28 @@

import pytest
import yaml
from sqlalchemy.engine.url import make_url
from sqlalchemy.exc import DBAPIError

from superset import db, security_manager
from superset.commands.exceptions import CommandInvalidError
from superset.commands.importers.exceptions import IncorrectVersionError
from superset.connectors.sqla.models import SqlaTable
from superset.dao.exceptions import DAOCreateFailedError
from superset.databases.commands.create import CreateDatabaseCommand
from superset.databases.commands.exceptions import (
DatabaseCreateFailedError,
DatabaseInvalidError,
DatabaseNotFoundError,
DatabaseSecurityUnsafeError,
DatabaseTestConnectionDriverError,
DatabaseTestConnectionFailedError,
DatabaseTestConnectionUnexpectedError,
)
from superset.databases.commands.export import ExportDatabasesCommand
from superset.databases.commands.importers.v1 import ImportDatabasesCommand
from superset.databases.commands.test_connection import TestConnectionDatabaseCommand
from superset.databases.schemas import DatabaseTestConnectionSchema
from superset.errors import SupersetError
from superset.errors import ErrorLevel, SupersetError, SupersetErrorType
from superset.exceptions import SupersetSecurityException
from superset.models.core import Database
from superset.utils.core import backend, get_example_database
Expand Down Expand Up @@ -523,65 +527,118 @@ def test_import_v1_rollback(self, mock_import_dataset):


class TestTestConnectionDatabaseCommand(SupersetTestCase):
@mock.patch("superset.databases.dao.DatabaseDAO.build_db_for_connection_test")
@mock.patch("superset.databases.commands.test_connection.stats_logger.incr")
@patch("superset.extensions.event_logger.log_context")
@patch("superset.databases.dao.DatabaseDAO.build_db_for_connection_test")
def test_connection_db_exception(
self, mock_stats_logger, mock_build_db_for_connection_test
self, mock_build_db_connection_test, mock_event_logger,
):
"""Test that users can't export databases they don't have access to"""
"""Test that exceptions are being properly logged"""
mock_build_db_connection_test.side_effect = [
DBAPIError("An error occurred!", None, None),
SupersetSecurityException(
SupersetError(
"dummy",
SupersetErrorType.DATASOURCE_SECURITY_ACCESS_ERROR,
ErrorLevel.ERROR,
)
),
Exception("An error occurred!"),
]
database = get_example_database()
mock_build_db_for_connection_test.side_effect = Exception(
"An error has occurred!"
)
db_uri = database.sqlalchemy_uri_decrypted
json_payload = {"sqlalchemy_uri": db_uri}
command_without_db_name = TestConnectionDatabaseCommand(
security_manager.find_user("admin"), json_payload
test_item = DatabaseTestConnectionSchema().load(json_payload)
command = TestConnectionDatabaseCommand(
security_manager.find_user("admin"), test_item
)
with self.assertRaises(DatabaseTestConnectionFailedError):
command.run()

mock_event_logger.assert_called_with(
action="test_connection_error.DBAPIError",
engine=make_url(db_uri).drivername,
)

with self.assertRaises(DatabaseSecurityUnsafeError):
command.run()

mock_event_logger.assert_called_with(
action="test_connection_error.SupersetSecurityException",
engine=make_url(db_uri).drivername,
)

with self.assertRaises(DatabaseTestConnectionUnexpectedError):
command_without_db_name.run()
command.run()

mock_stats_logger.assert_called()
mock_event_logger.assert_called_with(
action="test_connection_error.Exception", engine=make_url(db_uri).drivername
)

@mock.patch("superset.databases.dao.DatabaseDAO.build_db_for_connection_test")
@mock.patch("superset.databases.commands.test_connection.stats_logger.incr")
def test_connection_superset_security_connection(
self, mock_stats_logger, mock_build_db_for_connection_test
@patch("superset.extensions.event_logger.log_context")
def test_connection_db_success(
self, mock_event_logger,
):
"""Test that users can't export databases they don't have access to"""
"""Test that test_connection is logging on success"""
database = get_example_database()
mock_build_db_for_connection_test.side_effect = SupersetSecurityException(
SupersetError(error_type=500, message="test", level="info", extra={})
)
db_uri = database.sqlalchemy_uri_decrypted
json_payload = {"sqlalchemy_uri": db_uri}
command_without_db_name = TestConnectionDatabaseCommand(
security_manager.find_user("admin"), json_payload
test_item = DatabaseTestConnectionSchema().load(json_payload)
command = TestConnectionDatabaseCommand(
security_manager.find_user("admin"), test_item
)
command.run()

with self.assertRaises(DatabaseSecurityUnsafeError):
command_without_db_name.run()
mock_event_logger.assert_called_with(
action="test_connection_success", engine=make_url(db_uri).drivername
)

mock_stats_logger.assert_called()

@mock.patch("superset.databases.dao.DatabaseDAO.build_db_for_connection_test")
@mock.patch("superset.databases.commands.test_connection.stats_logger.incr")
def test_connection_db_api_exc(
self, mock_stats_logger, mock_build_db_for_connection_test
class TestCreateDatabaseCommand(SupersetTestCase):
@patch("superset.databases.dao.DatabaseDAO.create")
@patch("superset.extensions.event_logger.log_context")
@patch(
"superset.databases.commands.test_connection.TestConnectionDatabaseCommand.run"
)
def test_create_database_error(
self, mock_test_connection_db, mock_event_logger, mock_db_create
):
"""Test that users can't export databases they don't have access to"""
"""Test that exceptions are being properly logged"""

database = get_example_database()
mock_build_db_for_connection_test.side_effect = DBAPIError(
statement="error", params={}, orig={}
)
db_uri = database.sqlalchemy_uri_decrypted
json_payload = {"sqlalchemy_uri": db_uri}
command_without_db_name = TestConnectionDatabaseCommand(
command_without_db_name = CreateDatabaseCommand(
security_manager.find_user("admin"), json_payload
)

with self.assertRaises(DatabaseTestConnectionFailedError):
# test with no db name
with self.assertRaises(DatabaseInvalidError):
command_without_db_name.run()

mock_stats_logger.assert_called()
mock_event_logger.assert_called_with(action="db_connection_failed")

# test when connection fails
mock_test_connection_db.side_effect = Exception("An error has occurred!")
mock_db_create.return_value = database
json_payload = {"sqlalchemy_uri": db_uri, "database_name": "foo"}
command = CreateDatabaseCommand(
security_manager.find_user("admin"), json_payload
)

with self.assertRaises(Exception):
command.run()

mock_event_logger.assert_called_with(
action="db_connection_failed", engine=database.db_engine_spec.__name__
)

# test when creation fails
mock_db_create.side_effect = DAOCreateFailedError(
exception=Exception("An error occurred")
)

with self.assertRaises(DatabaseCreateFailedError):
command.run()

mock_event_logger.assert_called_with(
action="db_creation_failed.An error occurred"
)