Change dict to json #1028

KasiaHinkson · 2024-04-02T22:32:02Z

RECORD type requires the shape of the dictionary to be explicit and fails when the Parsons table has a dict value. I've tested with New/Mode data, and it loads successfully as JSON type.

austinweisgrau

This doesn't seem to work either, at least with what I tried:

Table([{"dict_value": {"a": 1, "b": 2}}]).to_bigquery("aweisgrau.test", if_exists="drop")

yields

google_bigquery ERROR {'reason': 'invalid', 'location': 'gs://bkt-tmc-mem-wfp/856f7b4d-b0b1-4e1a-8a56-d8f9bfb1f091.csv', 'message': 'Error while reading data, error message: syntax error while parsing object key - invalid literal; last read: \'{\'\'; expected string literal; line_number: 2 byte_offset_to_start_of_line: 22 column_index: 0 column_name: "dict_value" column_type: JSON value: "{\\\'a\\\': 1, \\\'b\\\': 2}" File: gs://bkt-tmc-mem-wfp/856f7b4d-b0b1-4e1a-8a56-d8f9bfb1f091.csv'}

austinweisgrau · 2024-04-03T00:15:39Z

Maybe if we look for any columns with dicts and convert them to JSON strings before sending them to bigquery? And keeping the column type defined as JSON

austinweisgrau · 2024-04-03T03:15:45Z

Orrrr change the way Python dicts are materialized to csvs to be json compliant or somethinf

KasiaHinkson · 2024-04-03T13:13:20Z

Thank you so much, I had a feeling I was missing something and couldn't make the time to more fully test it, so I super appreciate this. I like the idea of converting dicts to JSON strings

austinweisgrau

This works but the problem with this implementation is that the Table column is changed in a persistent way after the load. This is probably not great as a side effect.

tbl = Table([{'dict_value': {'a': 1, 'b': 2}}])

type(tbl[0]['dict_value'])
>> dict

tbl.to_bigquery('aweisgrau.test', if_exists='drop')
type(tbl[0]['dict_value'])
>> str

I think options include:

figure out if there's an elegant way to make a separate version of the table temporarily just for the load to bigquery without changing the original table object
require that folks do this conversion themself and define the schema with the new string column as a JSON type if they want this behavior

KasiaHinkson · 2024-04-04T13:48:35Z

Yeah, I knew something kind of funky was going on. I think I lean towards your second idea, it's more flexible and explicit.

KasiaHinkson · 2024-04-08T14:07:03Z

Ok, so what I've done is just json.dumps() all the dicts and lists, then in dbt I'm using PARSE_JSON. That felt easier in this case than specifying the entire table schema. If that feels like a reasonable expectation, then I think we should just take out the dict: RECORD, since that doesn't work, and add some documentation where it would fit best. What do you think?

austinweisgrau · 2024-04-08T21:50:04Z

Yes that sounds right to me

shaunagm · 2024-05-23T20:13:03Z

What's the status of this? From this conversation it sounds like you want to update the PR to remove "dict": "RECORD" instead of replacing it with "dict":"JSON" - lmk when that's done and I can approve and merge (or feel free to do something else if I've misunderstood, of course!)

…-coop/parsons into kasiahinkson/list-data-type

KasiaHinkson · 2024-05-29T13:45:21Z

I actually think our conclusion was to take dict out entirely and add documentation that before running a copy function, users should convert dictionaries to json strings and then use PARSE_JSON in BQ. I'm not 100% sure where to write this documentation

austinweisgrau · 2024-05-29T19:42:22Z

Maybe if any column best type is a dict, we catch the key error and re-raise it with a more verbose description of the situation and how to address it?

austinweisgrau · 2024-05-30T23:04:04Z

Here's my suggestion of what we could do here: #1068

austinweisgrau

LGTM!

KasiaHinkson · 2024-07-11T20:08:45Z

Addressed by PR #1068

austinweisgrau · 2024-07-11T20:10:49Z

Noo @KasiaHinkson sorry - #1068 merged a change into THIS branch, not into main. This PR is now ready to be merged into main, not deleted.

austinweisgrau · 2024-07-11T20:11:24Z

Im going to merge though since it seems like we're in agreement that it's good to go.

KasiaHinkson · 2024-07-11T20:17:15Z

Ohhh yep, my apologies! No power for 2 days turns my brain off, apparently 😆

Change dict to json

1b5195e

KasiaHinkson requested a review from austinweisgrau April 2, 2024 22:32

austinweisgrau requested changes Apr 3, 2024

View reviewed changes

convert dict to json string in parsons table

792f157

austinweisgrau requested changes Apr 3, 2024

View reviewed changes

KasiaHinkson added 2 commits April 4, 2024 08:55

remove type change

ae9ebdd

Merge branch 'main' into kasiahinkson/list-data-type

ed42479

KasiaHinkson added 3 commits May 29, 2024 08:42

Merge branch 'main' into kasiahinkson/list-data-type

5ae6cff

remove dict since it needs to be a JSON string

532bd5a

Merge branch 'kasiahinkson/list-data-type' of https://github.com/move…

27bdb71

…-coop/parsons into kasiahinkson/list-data-type

austinweisgrau mentioned this pull request May 30, 2024

Remove support for dict type, add helpful exception message #1068

Merged

Remove support for dict type, add helpful exception message (#1068)

d9a1faa

austinweisgrau approved these changes Jul 11, 2024

View reviewed changes

KasiaHinkson closed this Jul 11, 2024

austinweisgrau reopened this Jul 11, 2024

austinweisgrau merged commit 118d744 into main Jul 11, 2024
34 checks passed

austinweisgrau deleted the kasiahinkson/list-data-type branch July 11, 2024 20:12

austinweisgrau mentioned this pull request Oct 2, 2024

BigQuery copy method can convert dict column to JSON string #1143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change dict to json #1028

Change dict to json #1028

KasiaHinkson commented Apr 2, 2024

austinweisgrau left a comment

austinweisgrau commented Apr 3, 2024

austinweisgrau commented Apr 3, 2024

KasiaHinkson commented Apr 3, 2024

austinweisgrau left a comment

KasiaHinkson commented Apr 4, 2024

KasiaHinkson commented Apr 8, 2024

austinweisgrau commented Apr 8, 2024

shaunagm commented May 23, 2024

KasiaHinkson commented May 29, 2024

austinweisgrau commented May 29, 2024

austinweisgrau commented May 30, 2024

austinweisgrau left a comment

KasiaHinkson commented Jul 11, 2024

austinweisgrau commented Jul 11, 2024 •

edited

Loading

austinweisgrau commented Jul 11, 2024

KasiaHinkson commented Jul 11, 2024

Change dict to json #1028

Change dict to json #1028

Conversation

KasiaHinkson commented Apr 2, 2024

austinweisgrau left a comment

Choose a reason for hiding this comment

austinweisgrau commented Apr 3, 2024

austinweisgrau commented Apr 3, 2024

KasiaHinkson commented Apr 3, 2024

austinweisgrau left a comment

Choose a reason for hiding this comment

KasiaHinkson commented Apr 4, 2024

KasiaHinkson commented Apr 8, 2024

austinweisgrau commented Apr 8, 2024

shaunagm commented May 23, 2024

KasiaHinkson commented May 29, 2024

austinweisgrau commented May 29, 2024

austinweisgrau commented May 30, 2024

austinweisgrau left a comment

Choose a reason for hiding this comment

KasiaHinkson commented Jul 11, 2024

austinweisgrau commented Jul 11, 2024 • edited Loading

austinweisgrau commented Jul 11, 2024

KasiaHinkson commented Jul 11, 2024

austinweisgrau commented Jul 11, 2024 •

edited

Loading