Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Glue Data Catalog with Athena #285

Closed
jasonthomas opened this issue Nov 28, 2017 · 5 comments
Closed

Use Glue Data Catalog with Athena #285

jasonthomas opened this issue Nov 28, 2017 · 5 comments
Milestone

Comments

@jasonthomas
Copy link
Member

jasonthomas commented Nov 28, 2017

We've seen truncated results when querying against information_schema table for table metadata. AWS recommends querying against glue instead.

This functionality was added upstream (getredash#2045). Can we pull this in?
/cc @robotblake

@washort
Copy link

washort commented Jan 8, 2018

This code is now in our fork.

@madalincm
Copy link

I tried running queries against athena in stage and there are multiple error displayed. I.e:

  • Running a query against ‘stmo.containers_testpilottest’ throws the following error: “Error running query: HIVE_BAD_DATA: Field userContextId's type INT64 in parquet is incompatible with type varchar defined in table schema” (https://pipeline-sql.stage.mozaws.net/queries/244/source)

  • Running a query against ‘normandy_stage.auto_parquet_log_normandy_app_docker_app’ throws the following error: ‘Error running query: HIVE_PARTITION_SCHEMA_MISMATCH: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'fields' in table 'normandy_stage.auto_parquet_log_normandy_app_docker_app' is declared as type 'structagent:string,errno:bigint,lang:string,method:string,path:string,rid:string,t:bigint,user_agent_browser:string,user_agent_os:string,user_agent_version:bigint', but partition 'date=2017-08-24/hour=18' declared column 'fields' as type 'structagent:string,errno:bigint,lang:string,method:string,path:string,rid:string,t:bigint'.’ (https://pipeline-sql.stage.mozaws.net/queries/245/source#table)

@rafrombrc I'm not sure how relevant is the testing done in Stage using Athena. Not sure how we should proceed with this issue.

@washort
Copy link

washort commented Jan 11, 2018

Since this is just a toggle, we can try turning it on in production and switch it back off if it gives these errors.

@madalincm
Copy link

I am blocked from verifying queries on Athena in prod. I have logged #306

@madalincm
Copy link

Athena is now available in prod. I tried running some queries against Athena and it seems to work fine. Marking bug as verified fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants