-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat!: pass datasource_type and datasource_id to form_data #19981
feat!: pass datasource_type and datasource_id to form_data #19981
Conversation
51c96ba
to
026c064
Compare
026c064
to
70339a3
Compare
Codecov Report
@@ Coverage Diff @@
## master #19981 +/- ##
==========================================
- Coverage 66.47% 66.36% -0.12%
==========================================
Files 1726 1726
Lines 64767 64853 +86
Branches 6828 6828
==========================================
- Hits 43055 43038 -17
- Misses 19977 20080 +103
Partials 1735 1735
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
eeec7eb
to
7e2331b
Compare
@@ -23,7 +23,8 @@ | |||
@dataclass | |||
class CommandParameters: | |||
actor: User | |||
dataset_id: int = 0 | |||
datasource_type: str = "" | |||
datasource_id: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we have datasource_id as required with a default to 0, so I kept with the same pattern and set datasource_type to an empty string. It almost seems like we should have two different classes here when a key exists and when it doesn't so that we can be more explicit about what is required in each case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea to break it into multiple classes 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but if it's ok, I'll put this into a different PR since it's currently one class, and outside this scope.
if state["owner"] != actor.get_user_id(): | ||
raise TemporaryCacheAccessDeniedError() | ||
tab_id = self._cmd_params.tab_id | ||
contextual_key = cache_key( | ||
session.get("_id"), tab_id, dataset_id, chart_id | ||
session.get("_id"), tab_id, datasource_id, chart_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't we need to add datasource_type
or we would get the same cache hit for datasource_id but different datasource_type?
session.get("_id"), tab_id, datasource_id, chart_id | |
session.get("_id"), tab_id, datasource_id, datasource_id, chart_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks for catching!
"chart_id": chart_id, | ||
"form_data": INITIAL_FORM_DATA, | ||
} | ||
resp = client.post("api/v1/explore/form_data", json=payload) | ||
print(resp.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty
f6e162c
to
d6228fb
Compare
dcba962
to
c17eef2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @eschutho! I left some comments as a result of a first-pass review.
const sliceId = getUrlParam(URL_PARAMS.sliceId); | ||
if (!datasetId && !sliceId) { | ||
if (!datasourceId && !sliceId && !datasourceType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to consider old URLs that use the datasetId
. Maybe read the old parameter and assign it to datasourceId
and set datasourceType
to table
.
We also need to change the alert message 'The URL is missing the dataset_id or slice_id parameters.'
to also include the dataset_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sounds good. I think we set a default for type then we won't be able to validate if it's missing.
superset/explore/utils.py
Outdated
return check_dataset_access(datasource_id) | ||
raise DatasourceNotFoundValidationError | ||
|
||
|
||
def check_dataset_access(dataset_id: int) -> Optional[bool]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check_dataset_access
is not being used anymore, right? If so, can we remove it?
superset/explore/utils.py
Outdated
from superset.views.base import is_user_admin | ||
from superset.views.utils import is_owner | ||
|
||
|
||
def check_datasource_access(datasource_id: int, datasource_type: str) -> Optional[bool]: | ||
if datasource_id: | ||
if datasource_type == DatasourceType.TABLE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if check_dataset_access
depends on the datasource_type
. Shouldn’t the check occur independently of the data source type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, it's not dependent because there's only one type, but when we start adding other types, we'll do something like this:
def check_query_access(query_id: int) -> Optional[bool]:
if query_id:
query = QueryDAO.find_by_id(query_id)
if query:
can_access_datasource = security_manager.raise_for_access(
query=query)
if can_access_datasource:
return True
raise QueryNotFoundError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this code because the query functionality isn't ready yet, but I think it may make sense to have it. I'll add it back in.
@@ -23,7 +23,8 @@ | |||
@dataclass | |||
class CommandParameters: | |||
actor: User | |||
dataset_id: int = 0 | |||
datasource_type: str = "" | |||
datasource_id: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea to break it into multiple classes 👍🏼
|
||
logger = logging.getLogger(__name__) | ||
|
||
CACHE_IMPORT_PATH = "superset.extensions.metastore_cache.SupersetMetastoreCache" | ||
|
||
|
||
class ExploreFormDataCache(Cache): | ||
def get(self, *args: Any, **kwargs: Any) -> Optional[Union[str, Markup]]: | ||
cache = self.cache.get(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem here with old keys because the contextual_key
is now being generated with the datasource_type
but the old keys didn't have it so it will miss the cache and return None
for old keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I added a check using the old keys for update and delete as well. I don't really think there's a good way to remap these, unfortunately. The cache persistence is configurable but the default is 7 days.. maybe we can delete this fallback code after three months or so.
@@ -64,7 +65,7 @@ def run(self) -> Optional[str]: | |||
# Generate a new key if tab_id changes or equals 0 | |||
tab_id = self._cmd_params.tab_id | |||
contextual_key = cache_key( | |||
session.get("_id"), tab_id, dataset_id, chart_id | |||
session.get("_id"), tab_id, datasource_id, chart_id, datasource_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will miss the cache here for old keys because they were generated without the datasource_type
. We can add logic to also query for the old format or assume that new keys will be created and think about how to clean the old entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently on update, if the first key can't be found, it will create a new key and return it. That seems ok to me.. do you think that could work for this temporary cache?
@@ -67,65 +67,91 @@ def dataset_id() -> int: | |||
return dataset.id | |||
|
|||
|
|||
@pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have one fixture that returns the datasource
with both id
and type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, good call
|
||
from superset.extensions import cache_manager | ||
from superset.utils.core import DatasourceType | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test that checks for retrieval using keys that were generated without a datasource type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, got it on line 38. 👍
@@ -84,7 +88,8 @@ export const URL_PARAMS = { | |||
export const RESERVED_CHART_URL_PARAMS: string[] = [ | |||
URL_PARAMS.formDataKey.name, | |||
URL_PARAMS.sliceId.name, | |||
URL_PARAMS.datasetId.name, | |||
URL_PARAMS.datasourceId.name, | |||
URL_PARAMS.datasourceType.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michael-s-molina this could be an old URL, too? I'll datasetId back in here..
Thanks for the review and added context @michael-s-molina. I made some small changes based on the feedback, but still have to come back to this and add in the check access query logic that I removed. |
0027b4a
to
2bfdb1b
Compare
ce09ce1
to
0deb744
Compare
if state["owner"] != get_owner(actor): | ||
raise TemporaryCacheAccessDeniedError() | ||
tab_id = self._cmd_params.tab_id | ||
contextual_key = cache_key( | ||
session.get("_id"), tab_id, dataset_id, chart_id | ||
session.get("_id"), tab_id, datasource_id, chart_id, datasource_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: I still need to write up something here for deleting old keys. Unless it's ok that it just expires on it's own. The user will get a 404 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are generating new keys for misses, i think it's fine to let the keys expire
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's do that, it's only for delete and update. Unless anyone else has thoughts.
assert isinstance(command.run(), str) | ||
|
||
|
||
# TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to finish writing up these tests, too.
This is mostly done.. I just found one other thing last minute that I'd like to fix. I'll still welcome any feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
2 things i want to call out though is having a regression testing to make sure this works intended and we aren't deploying an huge bugs into the explore view.
Also is possible for us to add some statsd metrics to track hit vs. misses with the cache. I think this will help us understand how the logic is effecting the end user with pulling data. (this can be another ticket)
/testenv up |
@jinghua-qa Ephemeral environment spinning up at http://54.213.76.156:8080. Credentials are |
Testing are mostly done, tested regression for the following behavior: did not find major issues in explore, found 1 issue with chart share in dashboard: Screen.Recording.2022-06-01.at.12.53.19.PM.mov. |
Thanks @jinghua-qa, I didn't know about that feature! I checked it on my local and then on the ephemeral and I see the highlighting.. is this the expected behavior? Screen.Recording.2022-06-02.at.9.15.38.AM.mov |
38ec60f
to
917dcf4
Compare
🚢 🚢 🚢 |
Ephemeral environment shutdown and build artifacts deleted. |
SUMMARY
With SIP68 we will be deprecating the ConnectorRegistry and instead having a fixed set of datasources. We currently pass both a dataset_id and datasource_type in the form_data, but we are only passing in the dataset_id to the api itself. In cases where there is no form data, we usually default to a "table" datasource, but this PR allows us to be more flexible about having different types of datasources in the future and changes the api to pass in both the datasource_type and id. We are introducing the intent to start using more datasources in SIP81.
The goal with the cache keys and temporary explore state is to be able to read from the existing format (dataset_id) but write to the new format (datasource_id and datasource_type). Any existing keys in the old format would default to a type of 'table'.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
No visual changes
TESTING INSTRUCTIONS
Existing tests have been updated and a few new tests added to account for existing form_data cache structure.
To test, you should be able to go through the entire explore flow without any issues.
ADDITIONAL INFORMATION