Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema browser doesn't show columns for some ping tables #988

Open
sunahsuh opened this issue Sep 23, 2019 · 4 comments
Open

Schema browser doesn't show columns for some ping tables #988

sunahsuh opened this issue Sep 23, 2019 · 4 comments
Assignees

Comments

@sunahsuh
Copy link

I noticed this while trying to look at the schema for telemetry.voice under the "BigQuery (Beta)" source, but I see the same for crash, event, and main, which are all direct ping tables (but interestingly, not downgrade, first_shutdown, voice_feedback, which are also direct ping tables.) @emtwo suggested we might have issues with fetching schemas for views, but from the looks of bigquery's schema browser it looks like nearly all of the tables in moz-fx-data-derived-datasets:telemetry are views.

More curiousness that I just noticed: the voice schema is visible in the "BigQuery (Alpha)" source.

@emtwo
Copy link

emtwo commented Oct 2, 2019

I've investigated and it looks like both BigQuery (Alpha) and BigQuery (Beta) are actually referencing the same views and there are other views in the same data source that have schemas visible. So as you said, @sunahsuh, it seems this is unexpected and is probably unrelated to anything being a view.

However, it seems there are timeouts occurring when the schema processing task is running and so some table information seems to never get updated/stored. Probably this data source just has a lot of tables and it doesn't make it in time to process schemas for all of them.

Will need further investigation to see whether it's a small difference in timeout and we can just increase it or if the processing needs to be broken up further in some way.

@emtwo
Copy link

emtwo commented Oct 2, 2019

A link back to the same issue filed elsewhere: https://bugzilla.mozilla.org/show_bug.cgi?id=1584036

@emtwo
Copy link

emtwo commented Oct 9, 2019

Another update:
We ran the schema processing function manually, without a timeout. It took ~20min (the timeout is 10min). This resolved the issue for the time being, telemetry.voice and the other tables should all have visible schemas now.

Essentially what happened was that recently many tables were removed/added within a short time frame so when redash was processing the schema changes, it needed more time to prune/remove the old tables and add new ones. This was timing out with every run and schema updates were never happening.

We will need to decide whether we just increase the timeout to accommodate such scenarios and/or rewrite the update function to be more efficient depending on how frequently big schema changes like this are likely to occur.

@sunahsuh
Copy link
Author

sunahsuh commented Oct 9, 2019

Awesome, thanks for looking into this @emtwo! I'm good to close this issue since the immediate issue is fixed, but if you want to keep this open to track the decision for a long-term solution that's okay with me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants