Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-tableau] Exploring embedded data sources #27218

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

VenkyRules
Copy link

Summary & Motivation

  1. Current implementation was fetching limited metadata from tableau which was only limited to id and names, but have added few more fields like upstreamTables and databases details and many more fields.
  2. Earlier we were only showing published data sources and ignoring embedded data sources. With this changes we are showing embedded data sources in case published data sources are not present.

How I Tested These Changes

Tested on local system with the help of docker desktop

Copy link

vercel bot commented Jan 20, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dagster-docs-legacy ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 20, 2025 11:33am

Copy link
Contributor

@maximearmstrong maximearmstrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @VenkyRules!

I've added some comments. Also, could we add more tests to test these changes?

):
data_source_id = published_data_source_data["luid"]
published_data_source_list = embedded_data_source_data.get("parentPublishedDatasources", [])
if len(published_data_source_list) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1] I believe you don't need this condition. If published_data_source_list is empty, nothing will be added to data_source

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes removed this if condition.

properties=published_data_source_data,
)
)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1] Here you could test if data_source which explicitly states that we fallback to the embedded data sources because no published data source was added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added an alternative logic to this. Please review it once.

if data_source_id and data_source_id not in data_source_ids:
data_source_ids.add(data_source_id)
embedded_data_source_data["luid"] = data_source_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment here explaining why we use luid?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is the code snippet from translator.py which is called by resources.py here it seems luid is mandatory and in case of embedded_data_sources it was missing. Hence we are creating luid's for those using id of embedded_data_sources.

method name -> def from_content_data ()

        data_sources_by_id={
            data_source.properties["luid"]: data_source
            for data_source in content_data
            if data_source.content_type == TableauContentType.DATA_SOURCE
        },


data_source_ids = []
for embedded_data_source in sheet_embedded_data_sources:
embedded_data_source_list = embedded_data_source.get("parentPublishedDatasources", [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
embedded_data_source_list = embedded_data_source.get("parentPublishedDatasources", [])
published_data_source_list = embedded_data_source.get("parentPublishedDatasources", [])

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name updated to published_data_source_list it makes more sense

for published_data_source in embedded_data_source.get("parentPublishedDatasources", [])
}

data_source_ids = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_source_ids used to be a set - could we still use a set here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yess have updated the implementation. Please review once again.

embedded_data_source_list = embedded_data_source.get("parentPublishedDatasources", [])
if not embedded_data_source_list:
data_source_ids.append(embedded_data_source["id"])
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the else statement here, iterating over an empty list won't add anything to data_source_ids in the code below. Adding a comment before the if not embedded_data_source_list condition and here would be useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yess have removed this if statement and iterated over the list so ids would be added if list is not empty.
But have added a flag which lets us know if there were any published data sources already added. And if flag is unchanged we go further and add embedded_data_source id to the list. Please let me know if this logic makes sense to you. Have also added similar comments to these methods.

@VenkyRules
Copy link
Author

VenkyRules commented Jan 23, 2025

@maximearmstrong - Thanks for your review and comments. Yes will try to add some tests for these latest changes.

@VenkyRules VenkyRules changed the title Feat/exploring embedded data sources [dagster-tableau] Exploring embedded data sources Jan 24, 2025
@VenkyRules VenkyRules marked this pull request as ready for review January 24, 2025 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants