-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change dict to json #1028
Change dict to json #1028
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to work either, at least with what I tried:
Table([{"dict_value": {"a": 1, "b": 2}}]).to_bigquery("aweisgrau.test", if_exists="drop")
yields
google_bigquery ERROR {'reason': 'invalid', 'location': 'gs://bkt-tmc-mem-wfp/856f7b4d-b0b1-4e1a-8a56-d8f9bfb1f091.csv', 'message': 'Error while reading data, error message: syntax error while parsing object key - invalid literal; last read: \'{\'\'; expected string literal; line_number: 2 byte_offset_to_start_of_line: 22 column_index: 0 column_name: "dict_value" column_type: JSON value: "{\\\'a\\\': 1, \\\'b\\\': 2}" File: gs://bkt-tmc-mem-wfp/856f7b4d-b0b1-4e1a-8a56-d8f9bfb1f091.csv'}
Maybe if we look for any columns with dicts and convert them to JSON strings before sending them to bigquery? And keeping the column type defined as JSON |
Orrrr change the way Python dicts are materialized to csvs to be json compliant or somethinf |
Thank you so much, I had a feeling I was missing something and couldn't make the time to more fully test it, so I super appreciate this. I like the idea of converting dicts to JSON strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works but the problem with this implementation is that the Table column is changed in a persistent way after the load. This is probably not great as a side effect.
tbl = Table([{'dict_value': {'a': 1, 'b': 2}}])
type(tbl[0]['dict_value'])
>> dict
tbl.to_bigquery('aweisgrau.test', if_exists='drop')
type(tbl[0]['dict_value'])
>> str
I think options include:
- figure out if there's an elegant way to make a separate version of the table temporarily just for the load to bigquery without changing the original table object
- require that folks do this conversion themself and define the schema with the new string column as a JSON type if they want this behavior
Yeah, I knew something kind of funky was going on. I think I lean towards your second idea, it's more flexible and explicit. |
Ok, so what I've done is just json.dumps() all the dicts and lists, then in dbt I'm using PARSE_JSON. That felt easier in this case than specifying the entire table schema. If that feels like a reasonable expectation, then I think we should just take out the |
Yes that sounds right to me |
What's the status of this? From this conversation it sounds like you want to update the PR to remove "dict": "RECORD" instead of replacing it with "dict":"JSON" - lmk when that's done and I can approve and merge (or feel free to do something else if I've misunderstood, of course!) |
I actually think our conclusion was to take |
Maybe if any column best type is a dict, we catch the key error and re-raise it with a more verbose description of the situation and how to address it? |
Here's my suggestion of what we could do here: #1068 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Addressed by PR #1068 |
Noo @KasiaHinkson sorry - #1068 merged a change into THIS branch, not into main. This PR is now ready to be merged into main, not deleted. |
Im going to merge though since it seems like we're in agreement that it's good to go. |
Ohhh yep, my apologies! No power for 2 days turns my brain off, apparently 😆 |
RECORD type requires the shape of the dictionary to be explicit and fails when the Parsons table has a dict value. I've tested with New/Mode data, and it loads successfully as JSON type.