Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source S3: Check config settings for CSV file format #20262

Merged
merged 14 commits into from
Dec 14, 2022
2 changes: 1 addition & 1 deletion airbyte-integrations/connectors/source-s3/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ COPY source_s3 ./source_s3
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.1.26
LABEL io.airbyte.version=0.1.27
LABEL io.airbyte.name=airbyte/source-s3
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ def check_connection(self, logger: AirbyteLogger, config: Mapping[str, Any]) ->
The error object will be cast to string to display the problem to the user.
"""
try:
self.stream_class(**config)._get_master_schema()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could take way too much time if user has thousands of lines in their files. I believe check should be pretty fast. As an option, could we try doing the same just for a small chunk of data in each of the files?

for file_info in self.stream_class(**config).filepath_iterator():
# TODO: will need to split config.get("path_pattern") up by stream once supporting multiple streams
# test that matching on the pattern doesn't error
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ def test_check_connection(config):
with patch.object(instance.stream_class, "filepath_iterator", MagicMock()):
ok, error_msg = instance.check_connection(logger, config=config)

assert ok
assert not error_msg
assert not ok
assert error_msg


def test_streams(config):
Expand Down
3 changes: 2 additions & 1 deletion docs/integrations/sources/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,8 @@ The Jsonl parser uses pyarrow hence,only the line-delimited JSON format is suppo

| Version | Date | Pull Request | Subject |
|:--------|:-----------|:----------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|
| 0.1.26 | 2022-11-08 | [19006](https://github.com/airbytehq/airbyte/pull/19006) | Add virtual-hosted-style option |
| 0.1.27 | 2022-12-08 | [20262](https://github.com/airbytehq/airbyte/pull/20262) | Get master schema on check connectio |
| 0.1.26 | 2022-11-08 | [19006](https://github.com/airbytehq/airbyte/pull/19006) | Add virtual-hosted-style option |
| 0.1.24 | 2022-10-28 | [18602](https://github.com/airbytehq/airbyte/pull/18602) | Wrap errors into AirbyteTracedException pointing to a problem file |
| 0.1.23 | 2022-10-10 | [17991](https://github.com/airbytehq/airbyte/pull/17991) | Fix pyarrow to JSON schema type conversion for arrays |
| 0.1.23 | 2022-10-10 | [17800](https://github.com/airbytehq/airbyte/pull/17800) | Deleted `use_ssl` and `verify_ssl_cert` flags and hardcoded to `True` |
Expand Down