-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dagster-tableau] Exploring embedded data sources #27218
base: master
Are you sure you want to change the base?
[dagster-tableau] Exploring embedded data sources #27218
Conversation
…data sources as well in connections
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution @VenkyRules!
I've added some comments. Also, could we add more tests to test these changes?
): | ||
data_source_id = published_data_source_data["luid"] | ||
published_data_source_list = embedded_data_source_data.get("parentPublishedDatasources", []) | ||
if len(published_data_source_list) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1] I believe you don't need this condition. If published_data_source_list is empty, nothing will be added to data_source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes removed this if condition.
properties=published_data_source_data, | ||
) | ||
) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1] Here you could test if data_source
which explicitly states that we fallback to the embedded data sources because no published data source was added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added an alternative logic to this. Please review it once.
if data_source_id and data_source_id not in data_source_ids: | ||
data_source_ids.add(data_source_id) | ||
embedded_data_source_data["luid"] = data_source_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment here explaining why we use luid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below is the code snippet from translator.py which is called by resources.py here it seems luid is mandatory and in case of embedded_data_sources it was missing. Hence we are creating luid's for those using id of embedded_data_sources.
method name -> def from_content_data ()
data_sources_by_id={
data_source.properties["luid"]: data_source
for data_source in content_data
if data_source.content_type == TableauContentType.DATA_SOURCE
},
|
||
data_source_ids = [] | ||
for embedded_data_source in sheet_embedded_data_sources: | ||
embedded_data_source_list = embedded_data_source.get("parentPublishedDatasources", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
embedded_data_source_list = embedded_data_source.get("parentPublishedDatasources", []) | |
published_data_source_list = embedded_data_source.get("parentPublishedDatasources", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name updated to published_data_source_list it makes more sense
for published_data_source in embedded_data_source.get("parentPublishedDatasources", []) | ||
} | ||
|
||
data_source_ids = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_source_ids
used to be a set - could we still use a set here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yess have updated the implementation. Please review once again.
embedded_data_source_list = embedded_data_source.get("parentPublishedDatasources", []) | ||
if not embedded_data_source_list: | ||
data_source_ids.append(embedded_data_source["id"]) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the else statement here, iterating over an empty list won't add anything to data_source_ids in the code below. Adding a comment before the if not embedded_data_source_list
condition and here would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yess have removed this if statement and iterated over the list so ids would be added if list is not empty.
But have added a flag which lets us know if there were any published data sources already added. And if flag is unchanged we go further and add embedded_data_source id to the list. Please let me know if this logic makes sense to you. Have also added similar comments to these methods.
@maximearmstrong - Thanks for your review and comments. Yes will try to add some tests for these latest changes. |
Summary & Motivation
How I Tested These Changes
Tested on local system with the help of docker desktop