Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 TSL/SSL Errors in jobs carried out by Create-A-Derived-Table #3038

Closed
jhpyke opened this issue Jan 19, 2024 · 2 comments
Closed

🐛 TSL/SSL Errors in jobs carried out by Create-A-Derived-Table #3038

jhpyke opened this issue Jan 19, 2024 · 2 comments
Labels
bug Something isn't working data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools stale

Comments

@jhpyke
Copy link
Contributor

jhpyke commented Jan 19, 2024

Describe the bug.

As we experienced on the Cloud Platform runners, under certain (unclear) circumstances, a DBT job running in a Github Runner will experience temporary loss of service with AWS. When this occurs, it will experience the error:

ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:1129)

This issue does not appear to have a temporal dependency (the same job run at different times will experience the same error), but DOES appear to have a workload dependency (If a job experiences these errors, it will seem to do so relatively consistently, with few runs entirely unaffected by the issue). No way has been found thus far to replicate the errors on a developer machine.

This issue will prevent tables associated with that job from deploying successfully. Currently, the impact of this is limited to NOMIS Curated, but we need to better understand why this issue is occuring so we can advise on mitigation/prevention.

To Reproduce

  1. Run this workflow (Please coordinate with Gwion before doing so, as thrashing the source in S3 could disrupt airflow jobs via rate limiting if poorly timed)
  2. Wait for the Deploy DBT Models step
  3. During the step, after a few minutes, a temporary drop in connection to AWS will be experienced. Any tables deployed during this time are marked as ERROR. Any tables that depend on them will be marked as SKIP.
  4. After 1-3 minutes, connectivity will restore and the rest of the deployment will proceed as you'd expect.

Expected Behaviour

  1. Run the workflow
  2. It passes with no interuptions during deployment, allowing all tables to deploy succesfully.
  3. Any errors experienced are SQL-looking errors (I.E. issues with the model code and not our deployment mechanism)

Additional context

Would recommend running the pod during a period where we can easily audit the VPC flow logs as well as potentially execing into the pod while it's experiencing this error to better diagnose what's happening. Might be worth monitoring the pods via Kibana etc to see if there's any obviously anomalous stuff in the resource usage etc.

@jhpyke jhpyke added bug Something isn't working data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools labels Jan 19, 2024
@jacobwoffenden jacobwoffenden moved this to 👀 TODO in Analytical Platform Feb 15, 2024
Copy link
Contributor

This issue is being marked as stale because it has been open for 60 days with no activity. Remove stale label or comment to keep the issue open.

@github-actions github-actions bot added the stale label Mar 20, 2024
Copy link
Contributor

This issue is being closed because it has been open for a further 7 days with no activity. If this is still a valid issue, please reopen it, Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024
@github-project-automation github-project-automation bot moved this from 👀 TODO to 🎉 Done in Analytical Platform Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools stale
Projects
Archived in project
Development

No branches or pull requests

1 participant