feat: db connection analytics #13346

eschutho · 2021-02-25T20:30:02Z

SUMMARY

adds analytics logs to the db connection success/failures

TEST PLAN

unit tests

ADDITIONAL INFORMATION

eschutho · 2021-02-25T20:33:00Z

superset/databases/commands/test_connection.py

-            with closing(engine.raw_connection()) as conn:
-                if not engine.dialect.do_ping(conn):
-                    raise DBAPIError(None, None, None)
+                with closing(engine.raw_connection()) as conn:


engine wasn't defined if database was None, so I put it under this block.

codecov · 2021-02-25T20:54:15Z

Codecov Report

❗ No coverage uploaded for pull request base (db-connection-logs@d323eb1). Click here to learn what that means.
The diff coverage is n/a.

@@                  Coverage Diff                  @@
##             db-connection-logs   #13346   +/-   ##
=====================================================
  Coverage                      ?   54.61%           
=====================================================
  Files                         ?      478           
  Lines                         ?    15982           
  Branches                      ?     4123           
=====================================================
  Hits                          ?     8729           
  Misses                        ?     7253           
  Partials                      ?        0

Flag	Coverage Δ
cypress	`54.61% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d323eb1...0ceb8f7. Read the comment docs.

betodealmeida

LGTM, my only concern is that in theory we might fail to rollback a transaction. I also left a comment on using log instead of log_context.

betodealmeida · 2021-02-25T21:07:27Z

superset/databases/commands/create.py

+                with event_logger.log_context(action=f"db_connection_failed.{database.db_engine_spec.__name__}"):
+                    db.session.rollback()


This worries me a bit — if there's an error in the context manager we won't rollback the transaction.

I think it would be better to just log, instead of using a context manager:

try: event_logger.log(action=f"db_connection_failed.{database.db_engine_spec.__name__}") finally: db.session.rollback()

To ensure that rollback is always called.

Maybe log should be pessimistic / guaranteed to never raise(?) so that we don't have to use caution every time we call it? That would mean wrapping the content of log in a try ?

@betodealmeida and I were talking about this earlier, and most statements before the yield are pretty safe, but to be sure, I moved what I could below the yield statement. I don't think there's much risk of this failing now:

start_time = time.time() payload_override = {}

lmk if you think this works as a solution @betodealmeida @mistercrunch

betodealmeida · 2021-02-25T21:18:26Z

superset/databases/commands/create.py

@@ -69,8 +64,9 @@ def run(self) -> Model:
            security_manager.add_permission_view_menu("database_access", database.perm)
            db.session.commit()
        except DAOCreateFailedError as ex:
-            logger.exception(ex.exception)
-            raise DatabaseCreateFailedError()
+            with event_logger.log_context(action=f"db_creation_failed.{ex.exception}"):


We might want to just use log in places like this. The main difference between log and log_context is that the former, in addition to being a context manager, logs the duration automatically, together with some dashboard/chart metadata that is irrelevant here.

true... it also logs the user_id and referrer, so maybe it's still useful?

We may want to either 1) refactor log to do some of the magic in log_context maybe add a bool flag enrich_with_request_context=False, or 2) can we just call the context manager with the with block? Would that just work? or 3) refactor log_context to NOT be be context manager and create a new one say log_context_manager (or better name) that's used only when we want duration / with block

One problem is that adding the flag enrich_with_request_context would be a breaking change (our event logger, for example, would have to be updated).

I think the best way here is:

Create a new method in AbstractEventLogger called log_with_context, that does all the job that log_context does but is a function, not a context manager. The method calls log at the end, with all the extracted context.

Change AbstractEventLogger.log_context to use log_with_context for the enrichment, removing the shared code between them.

In this PR, call the new log_with_context. Since it delegates the actual logging to log it will work with all existing event loggers, and is not a breaking change..

Eventually I think we should have the event logger itself be a context manager by implementing __enter__ and __exit__ methods, so we could call:

with event_logger(action="foo"): do_something()

Since we can do this by adding methods to AbstractEventLogger it would also not be a breaking change.

Quick clarification that enrich_with_request_context would be false by default to protect backwards compatibity. But maybe function composition is better than a parameter here, I'm ok with with either pattern.

There's a need to really clarify/improve the method in AbstractEventLogger with good/clear names here between functions we can call, content manager(s) and decorator(s)

since we're merging this branch into @hughhhh's feature branch rather than master, @betodealmeida said that he could do the log_context refactor in a separate PR.

I'm working on the refactor right now.

@mistercrunch you're right, I forgot that log takes a **kwargs that would swallow the new argument.

mistercrunch · 2021-02-25T23:56:12Z

I feel like we may want to add engine as it's own dimension as opposed to overloading the action type with it.

Eventually I'd like to add the engine to all analytics events that interact with analytics databases like viewing a chart, running a SQL Lab query, .

I think you can just pass as a keyword as in event_logger.log_context('test_connection_success', engine=engine)

I made a related comment here: #13223 (comment)

eschutho · 2021-02-26T02:30:25Z

superset/databases/commands/create.py

-                stats_logger.incr(
-                    f"db_connection_failed.{database.db_engine_spec.__name__}"
-                )
+                with event_logger.log_context(action=f"db_connection_failed", engine=database.db_engine_spec.__name__):


@mistercrunch I added in the engine here as a kwarg. Also, just as an FYI, the strings that we're passing in these two files aren't exactly 1:1. Here we have to use the engine spec name, and in the test_connection file we pass the engine+driver combo. But the information is essentially the same.. any downstream integrations can still see the engine from the string that is passed in.

eschutho · 2021-02-26T22:35:40Z

rerun tests

eschutho · 2021-03-04T01:18:02Z

superset/databases/commands/create.py

@@ -69,8 +68,9 @@ def run(self) -> Model:
            security_manager.add_permission_view_menu("database_access", database.perm)
            db.session.commit()
        except DAOCreateFailedError as ex:
-            logger.exception(ex.exception)
-            raise DatabaseCreateFailedError()
+            with event_logger.log_context(action=f"db_creation_failed.{ex.exception}"):


@hughhhh I saw this earlier today, that we're currently logging the entire exception. Can we change this on line 71 and 72 to log just the class? I originally thought since we were already logging it, it would be fine to pass to the event logger, but I think we should actually clean this up as well.

eschutho · 2021-03-06T02:13:21Z

@hughhhh is taking this effort over in https://github.com/apache/superset/pull/13468/files

superset-github-bot bot added the preset-io label Feb 25, 2021

pull-request-size bot added the size/L label Feb 25, 2021

eschutho force-pushed the elizabeth/db-connection-analytics branch from e11c306 to eabd614 Compare February 25, 2021 20:31

eschutho commented Feb 25, 2021

View reviewed changes

betodealmeida requested changes Feb 25, 2021

View reviewed changes

eschutho changed the title ~~Elizabeth/db connection analytics~~ feature: db connection analytics Feb 25, 2021

eschutho commented Feb 26, 2021

View reviewed changes

eschutho changed the title ~~feature: db connection analytics~~ feat: db connection analytics Feb 26, 2021

eschutho force-pushed the elizabeth/db-connection-analytics branch 3 times, most recently from f6d8ea1 to 4fe1895 Compare February 26, 2021 18:20

eschutho closed this Feb 26, 2021

eschutho reopened this Feb 26, 2021

eschutho force-pushed the elizabeth/db-connection-analytics branch 2 times, most recently from fb68366 to d4588f6 Compare February 26, 2021 23:57

eschutho added 3 commits March 1, 2021 14:25

add db connection failure to event logger

e9a216d

add db connection success logger

73d9431

make log_context safer before yield

a22d006

eschutho force-pushed the elizabeth/db-connection-analytics branch 2 times, most recently from 4538e2f to bf0ff7b Compare March 1, 2021 23:52

pass engine as separate kwarg to event_logger

0ceb8f7

eschutho force-pushed the elizabeth/db-connection-analytics branch from bf0ff7b to 0ceb8f7 Compare March 1, 2021 23:59

hughhhh mentioned this pull request Mar 3, 2021

feat: refactor on DBEventLogger to allow for context management #13441

Merged

6 tasks

eschutho commented Mar 4, 2021

View reviewed changes

eschutho closed this Mar 6, 2021

eschutho deleted the elizabeth/db-connection-analytics branch April 16, 2021 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: db connection analytics #13346

feat: db connection analytics #13346

eschutho commented Feb 25, 2021

eschutho Feb 25, 2021

codecov bot commented Feb 25, 2021 •

edited

Loading

betodealmeida left a comment

betodealmeida Feb 25, 2021

mistercrunch Feb 25, 2021

eschutho Feb 26, 2021

eschutho Feb 26, 2021 •

edited

Loading

betodealmeida Feb 25, 2021

eschutho Feb 26, 2021

mistercrunch Feb 26, 2021

betodealmeida Feb 26, 2021

mistercrunch Feb 26, 2021

eschutho Feb 26, 2021

betodealmeida Feb 26, 2021

mistercrunch commented Feb 25, 2021 •

edited

Loading

eschutho Feb 26, 2021 •

edited

Loading

eschutho commented Feb 26, 2021

eschutho Mar 4, 2021

eschutho commented Mar 6, 2021

		with event_logger.log_context(action=f"db_connection_failed.{database.db_engine_spec.__name__}"):
		db.session.rollback()

feat: db connection analytics #13346

feat: db connection analytics #13346

Conversation

eschutho commented Feb 25, 2021

SUMMARY

TEST PLAN

ADDITIONAL INFORMATION

Choose a reason for hiding this comment

codecov bot commented Feb 25, 2021 • edited Loading

Codecov Report

betodealmeida left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschutho Feb 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mistercrunch commented Feb 25, 2021 • edited Loading

eschutho Feb 26, 2021 • edited Loading

Choose a reason for hiding this comment

eschutho commented Feb 26, 2021

Choose a reason for hiding this comment

eschutho commented Mar 6, 2021

codecov bot commented Feb 25, 2021 •

edited

Loading

eschutho Feb 26, 2021 •

edited

Loading

mistercrunch commented Feb 25, 2021 •

edited

Loading

eschutho Feb 26, 2021 •

edited

Loading