-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-1063] [Bug] BigQueryException: Error while reading data, error message: Schema mismatch #1047
Comments
Hi there! Thanks for the issue submission. Could you help us out by explaining more of what you mean by:
What's the commands you are using, or is it just a Likewise, you've provided some column types but are there are specific model lineage we should be aware of? |
Updating dbt should not result in dataproc job failing
I run |
@gbmarc1 could you provide us with a simple dbt model that works using dbt-core<1.6.9 and dbt-bigquery<1.6.9 but raises that "Schema mismatch" exception with dbt-core>1.7.0 and dbt-bigquery>1.7.0? We'll need to be able to reproduce this on our side in order to determine how to proceed. Could you provide us a detailed set of steps that would allow us to reproduce this? |
@dbeatty10 Sorry for the delay. You can reproduce the error with this model:
|
I've just upgraded to dbt-core 1.7.5 and dbt-bigquery 1.7.3 and this is still an issue.
|
@gbmarc1 We were able to reproduce the error report for 1.7, but we were not able to reproduce it for 1.6. Could you check again if the example you gave us works in 1.6 for you? Alternatively, @dlubawy if you have an example that works on 1.6 but doesn't work on 1.7, would you please share it? Namely, this didn't work for us: import pandas as pd
def model(dbt, session):
dbt.config(submission_method="serverless", materialized="table")
df = pd.DataFrame(
[
{"column_name": [{"name": "hello", "my_list": ["h", "e", "l", "l", "o"]}]},
]
)
return df But this did work: import pandas as pd
def model(dbt, session):
dbt.config(submission_method="serverless", materialized="table")
df = pd.DataFrame(
[
{"column_name": {"name": "hello", "my_list": ["h", "e", "l", "l", "o"]}},
]
)
return df |
#dbeatty10 you are right. As specified in the issue description, it does NOT work on 1.7.x but DOES on 1.6.x |
@dbeatty10 the problem we are seeing is from an example like this:
This code will run on v1.6 and materialize the table correctly, but it does not run in v1.7 (a regression): |
I have exactly the same problem with DBT 1.7.x. So I am not able to store repeated records to the BigQuery. Is there anyone is addressing the issue. Otherwise, I have to use the version 1.6.8. |
Moving up the queue! |
For someone who still get stuck with this problem. Add this in the code: session.conf.set('intermediateFormat', "orc") Or session.conf.set('intermediateFormat', "parquet")
session.conf.set('enableListInference', "true") For indirect write, spark use parquet as the default format to store the data in the temporary bucket, and it requires enabling list inference to store repeated record data to BigQuery. Or simply, we can config the format as 'orc', it is more efficient for data ingestion. I think DBT should add this |
Is this a new bug in dbt-bigquery?
Current Behavior
Dataproc job fails unexpectedly using dbt-core>1.7.0 and dbt-bigquery>1.7.0 but works using dbt-core<1.6.9 and dbt-bigquery<1.6.9 with schema mismatch error.
Expected Behavior
Updating dbt should not result in dataproc job failing
Steps To Reproduce
The column type is:
Relevant log output
Environment
Additional Context
Would impact anyone moving from Python models version 1.6 to 1.7 using BQ
Currently can't reproduce, will pair with Doug to repro.
No response
The text was updated successfully, but these errors were encountered: