Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add logs to db test connection and post failures #13223

Closed
wants to merge 19 commits into from

Conversation

hughhhh
Copy link
Member

@hughhhh hughhhh commented Feb 18, 2021

SUMMARY

We need more data on what is causing failures in the db test connection and connection endpoints.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is really helpful! I've been debugging some stuff this week that would really benefit from this.

@codecov-io
Copy link

codecov-io commented Feb 18, 2021

Codecov Report

Merging #13223 (d323eb1) into master (974f447) will increase coverage by 0.78%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13223      +/-   ##
==========================================
+ Coverage   77.14%   77.92%   +0.78%     
==========================================
  Files         865      868       +3     
  Lines       44961    47701    +2740     
  Branches     5415     5351      -64     
==========================================
+ Hits        34686    37173    +2487     
- Misses      10152    10405     +253     
  Partials      123      123              
Flag Coverage Δ
cypress 58.54% <ø> (ø)
javascript 62.17% <ø> (ø)
mysql 80.33% <100.00%> (?)
postgres 80.37% <100.00%> (?)
presto 80.04% <91.66%> (?)
python 81.82% <100.00%> (+1.00%) ⬆️
sqlite 80.00% <91.66%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/databases/commands/create.py 92.45% <100.00%> (+0.61%) ⬆️
superset/databases/commands/test_connection.py 98.07% <100.00%> (+8.94%) ⬆️
superset/reports/notifications/slack.py 84.21% <0.00%> (-4.98%) ⬇️
superset/common/query_object.py 91.30% <0.00%> (-0.24%) ⬇️
superset/charts/schemas.py 100.00% <0.00%> (ø)
superset/tasks/celery_app.py 0.00% <0.00%> (ø)
superset/reports/notifications/email.py 100.00% <0.00%> (ø)
superset/security/api.py 100.00% <0.00%> (ø)
superset/common/query_actions.py 95.38% <0.00%> (ø)
superset/db_engine_specs/firebird.py 100.00% <0.00%> (ø)
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 974f447...d323eb1. Read the comment docs.

@@ -238,6 +238,7 @@ def post(self) -> Response:
except DatabaseInvalidError as ex:
return self.response_422(message=ex.normalized_messages())
except DatabaseConnectionFailedError as ex:
logger.exception("Database connection failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we downgrade these from exception logging? Connection failures are an expected part of setting up new database connections and shouldn't wind up in exception tracking systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to understand the reasons for failures, it would also be a good idea to log the exception. However, beware: these exceptions frequently contain database credentials, which it would be bad to track.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on the level. Just a note, @willbarrett: @logger.exception logs the full traceback plus the message, so it would've helped figuring out the reason.

Copy link
Member

@nytai nytai Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.warning("Database connection failed", exc_info=True) also achieves the same. As the one currently running the error/exception logs portion of our biweekly alerts review, I would greatly appreciate if we reduced the amount of expected exceptions being logged at error and exception level. It would make my job significantly easier, as well as help reduce the strain on our log quota

Copy link
Member

@dpgaspar dpgaspar Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @willbarrett concern, unfortunately some engines dbapi surface exceptions with sensitive information logger.warning("Database connection failed", exc_info=True) will actually log this info. we can lower it to debug, since on normal situations production systems are not set to this level.

Also there's an ongoing effort to remap these exceptions at the engine spec level #12869, so that we can add more granularity to connection failures for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now wondering if there's a better way to get the info that @betodealmeida and @hughhhh are looking for - would you be able to describe the questions that you're trying to answer with this logging? Then we can support you better in coming to a safe solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to answer the following:
What db engines are failing the most?
What are the test connection and db connection exception error for these db engines?

If we can get this information with other data let us know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What DB engines are failing the most?
I think you could parse the submitted SQLAlchemy URL to pull out the scheme portion of the URL and log that on failure. This would avoid the risk of capturing secure credentials.

What are the the test connection and DB connection exceptions for these engines?
Would it be sufficient to log the exception class from the engine without the message? The class will give you the category of error without including any of the potentially sensitive details. If this would work, you could more easily perform this logging from the command rather than in the endpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great use case for statsd!

Copy link
Member

@willbarrett willbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More logging is great! Let's do it at a warning level or below so we don't pollute error tracking with expected and explicitly handled errors.

@@ -607,6 +608,7 @@ def test_connection( # pylint: disable=too-many-return-statements
TestConnectionDatabaseCommand(g.user, item).run()
return self.response(200, message="OK")
except DatabaseTestConnectionFailedError as ex:
logger.warning(ex)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log will read out the following:
WARNING:superset.databases.api:Could not load database driver: mssql+pymssql

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is one possible error message - others are not so innocuous and include username and password information. We've seen that with a few different database engines.

@@ -238,6 +238,7 @@ def post(self) -> Response:
except DatabaseInvalidError as ex:
return self.response_422(message=ex.normalized_messages())
except DatabaseConnectionFailedError as ex:
logger.warning("Database connection failed: %s", item["sqlalchemy_uri"])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log will be masked uri

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that item["sqlalchemy_uri" is only updated with the masked version if line 236 is reached - if CreateDatabaseCommand(...).run() fails, as we can expect if this code is reached, then I believe there's a good chance that the full sqlalchemy_uri will be logged.

While the masked URI is better than the full URI, it still exposes username and additional connection information that it might be better not to have in logs.

@@ -607,6 +612,11 @@ def test_connection( # pylint: disable=too-many-return-statements
TestConnectionDatabaseCommand(g.user, item).run()
return self.response(200, message="OK")
except DatabaseTestConnectionFailedError as ex:
engine = item["sqlalchemy_uri"].split("://")[0]
self.incr_stats(
f"db_test_connection_failed.{engine}.{ex.__class__.__name__}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ex.__class__.__name__ will always be recorded as DatabaseTestConnectionFailedError if I understand this correctly. To capture the exception surfaced by the engine, you'd need to alter the command to surface that class name as part of the error object or move this logging into the command.

@mistercrunch
Copy link
Member

Can we add two analytics events on top of the statsd events here too!?

I'd love to throw in the database engine details in these events. Currently we have the API call test_connection logged through the decorator, but I'd add test_connection_success and test_connection_failed on top, with the engine detail.

Check out the event_logger.log_context context_generator, here's an example https://github.com/apache/superset/blob/master/superset/connectors/sqla/models.py#L1492

@pull-request-size pull-request-size bot added size/M and removed size/S labels Feb 24, 2021
@mistercrunch mistercrunch mentioned this pull request Feb 25, 2021
6 tasks
@pull-request-size pull-request-size bot added size/L and removed size/M labels Mar 1, 2021
@hughhhh hughhhh closed this Mar 2, 2021
@hughhhh hughhhh reopened this Mar 2, 2021
@hughhhh hughhhh closed this Mar 2, 2021
@hughhhh hughhhh reopened this Mar 2, 2021
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@willbarrett
Copy link
Member

LGTM once the build is green

Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hughhhh hughhhh closed this Mar 4, 2021
@mistercrunch mistercrunch deleted the db-connection-logs branch March 25, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants