Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support to to_singer to build schema from catalog #42

Merged
merged 2 commits into from
Oct 29, 2024
Merged

Conversation

keyn4
Copy link
Contributor

@keyn4 keyn4 commented Oct 28, 2024

Changes:

  • Add support to to_singer to generate a schema using the catalog, it uses env var USE_CATALOG_SCHEMA
  • it will add to the schema additional columns that were added during ETL

Others:

  • Moved Reader to its own module to avoid circular imports.
  • Moved parse_objs to singer.oy to avoid circular imports.

Copy link
Member

@hsyyid hsyyid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keyn4 left some questions for you below

@@ -225,13 +303,21 @@ def to_singer(
Allow or not objects to the parsed, if false defaults types to str.

"""
catalog_schema = os.environ.get("USE_CATALOG_SCHEMA", False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be os.environ.get("USE_CATALOG_SCHEMA", "false") == "true"? Env vars are always strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thank you! 😃

@@ -196,6 +205,75 @@ def deep_convert_datetimes(value):
return value.strftime("%Y-%m-%dT%H:%M:%S.%fZ")
return value

def parse_objs(x):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come you moved this here from etl_utils.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid circular imports, to_export here uses to_singer from singer.py and I wanted to use this function in singer.py functions :c

import pandas as pd
import pyarrow.parquet as pq

class Reader:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this just a full copy paste moving it out of etl_utils or did you change anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes full copy paste, no changes at all

@hsyyid hsyyid merged commit 229971d into master Oct 29, 2024
0 of 4 checks passed
@hsyyid hsyyid deleted the hgi-3800 branch October 29, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants