-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
airbyte-cdk: Improve Error Handling in Legacy CDK #37576
Conversation
…koff_time methods to http error handler
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, out-of-scope for now but interesting to keep in mind is the parsing of the response. For example:
airbyte/airbyte-integrations/connectors/source-salesforce/source_salesforce/rate_limiting.py
Lines 18 to 26 in 07db1ca
# We've had a couple of customers with ProtocolErrors, namely: | |
# * A self-managed instance during `BulkSalesforceStream.download_data`. This customer had an abnormally high number of ConnectionError | |
# which seems to indicate problems with his network infrastructure in general. The exact error was: `urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(905 bytes read, 119 more expected)', IncompleteRead(905 bytes read, 119 more expected))` | |
# * A cloud customer with very long syncs. All those syncs would end up with the following error: `urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))` | |
# Without much more information, we will make it retryable hoping that performing the same request will work. | |
exceptions.ChunkedEncodingError, | |
# We've had examples where the response from Salesforce was not a JSON response. Those cases where error cases though. For example: | |
# https://github.com/airbytehq/airbyte-internal-issues/issues/6855. We will assume that this is an edge issue and that retry should help | |
exceptions.JSONDecodeError, |
@girarda @brianjlai, I also had the following question: should we remove the dependency to requests from the interfaces? Right now, HttpRequestSender. send_request
returns objects that are specific to the requests library and if we change that in the CDK (I think we discussed about moving to aiohttp at some point maybe), we would have to update all the sources using the HttpRequestSender
. This is a much bigger lift though hence why I would like your take on this
airbyte-cdk/python/airbyte_cdk/sources/streams/http/http_error_handler.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/streams/http/http_error_handler.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/streams/http/http_error_handler.py
Outdated
Show resolved
Hide resolved
…efaultRetryStrategy`, `HttpStatusErrorHandler` classes. Adds new `ErrorMapping` class. Updates `HttpRequestSender` to `HttpClient`.
…MessageParser` classes and default subclasses. Update `HttpClient` for better exception handling and to encapsulate session/caching
HttpRequestSender
& HttpErrorHandler
to improve errors…d json error message parser
…er to remove `is_valid_repsonse` method.
@maxi297 This is a good question! I do like the overall idea that the HttpClient does not bleed an implementation detail like which requester library is being used under the hood. I think our ideal set up would be to return our own Request and Response interfaces that we control instead of the library PreparedRequest/PreparedResponse. What does concern me is as you mention the increase in lift. I think to avoid scope creep we can say that we can leave the interface as it is today and return the request/response from the |
Re: |
I'm fine with this, yes. We can do the interface pretty similar with the request one anyway to smooth the transition |
26047ef
to
94bd7a4
Compare
airbyte-cdk/python/airbyte_cdk/sources/streams/http/http_client.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/streams/http/error_handlers/http_status_error_handler.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll confirm everything once I have time to integrate this properly in salesforce but looking at the code, I don't see any blocker. Thanks for addressing all those comments!
airbyte-cdk/python/airbyte_cdk/sources/streams/http/error_handlers/default_backoff_strategy.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/streams/http/error_handlers/default_backoff_strategy.py
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/streams/http/error_handlers/http_status_error_handler.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last comment to discuss if you don't mind checking before ✅
airbyte-cdk/python/airbyte_cdk/sources/streams/http/error_handlers/http_status_error_handler.py
Outdated
Show resolved
Hide resolved
airbyte-cdk/python/airbyte_cdk/sources/streams/http/error_handlers/error_handler.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing work patrick. this looks great and thanks for and the discussion points
What
Part of initiative to improve error handling and debugging by providing new interfaces responsible for managing requests and error handling which include sensible error handling defaults. At this time these interfaces are not integrated into existing
HttpStream
and are only available for ad-hoc use for connectors.How
HttpClient
- Prepares and sends requests, executes follow-up actions (retrying, raising AirbyteTracedExceptions) depending on error mappingErrorHandler
abstract class andHttpStatusErrorHandler
- enables response error/exception mapping to response actions, failure types, and error messages and error mapping lookup. Defaults toHttpStatusErrorHandler
.ErrorResolution
dataclass to encapsulate response actions, failure types, and error messages for defined exceptions or request status codes.BackoffStrategy
abstract class andDefaultBackoffStrategy
- manages max_retries, max_time, andbackoff_time
. Defaults toDefaultBackoffStrategy
.ErrorMessageParser
abstract class andJsonErrorMessageParser
- into be instantiated in in stream and passed toHttpClient
to parse error messages in responses. Defaults toJsonErrorMessageParser
.Recommended Reading Order
http_client.py
, start w/_init__
and thensend_request
error_handler.py
andhttp_status_error_handler.py
response_models.py
backoff_strategy.py
anddefault_backoff_strategy.py
rate_limiting.py
exceptions.py
json_error_message_parser.py
anderror_message_parser.py