Skip to content

Commit

Permalink
[SPARK-51152][PYTHON][SQL][DOCS] Add usage examples for the get_json_…
Browse files Browse the repository at this point in the history
…object function

### What changes were proposed in this pull request?
The pr aims to add some usage examples for function `get_json_object`, including: `get_json_object('[{"a":"b"},{"a":"c"}]', '$[0].a')`,`get_json_object('[{"a":"b"},{"a":"c"}]', '$[*].a')`.

### Why are the changes needed?
When `JSON` is an `array`, some users may not know how to retrieve its data through `get_json_object`.
Let's add some usage examples.

### Does this PR introduce _any_ user-facing change?
Yes, Spark end-users will learn how to use `get_json_object` to obtain JSON data of type array through examples.

### How was this patch tested?
- Pass GA
- Manually Test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49875 from fusheng9399/add-json-example.

Lead-authored-by: fusheng <fusheng9399@gmail.com>
Co-authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: panbingkun <panbingkun@apache.org>
  • Loading branch information
3 people authored and panbingkun committed Feb 18, 2025
1 parent 2c76dff commit ef0685a
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 3 deletions.
44 changes: 41 additions & 3 deletions python/pyspark/sql/functions/builtin.py
Original file line number Diff line number Diff line change
Expand Up @@ -20146,11 +20146,49 @@ def get_json_object(col: "ColumnOrName", path: str) -> Column:

Examples
--------
Example 1: Extract a json object from json string

>>> data = [("1", '''{"f1": "value1", "f2": "value2"}'''), ("2", '''{"f1": "value12"}''')]
>>> df = spark.createDataFrame(data, ("key", "jstring"))
>>> df.select(df.key, get_json_object(df.jstring, '$.f1').alias("c0"), \\
... get_json_object(df.jstring, '$.f2').alias("c1") ).collect()
[Row(key='1', c0='value1', c1='value2'), Row(key='2', c0='value12', c1=None)]
>>> df.select(df.key,
... get_json_object(df.jstring, '$.f1').alias("c0"),
... get_json_object(df.jstring, '$.f2').alias("c1")
... ).show()
+---+-------+------+
|key| c0| c1|
+---+-------+------+
| 1| value1|value2|
| 2|value12| NULL|
+---+-------+------+

Example 2: Extract a json object from json array

>>> data = [
... ("1", '''[{"f1": "value1"},{"f1": "value2"}]'''),
... ("2", '''[{"f1": "value12"},{"f2": "value13"}]''')
... ]
>>> df = spark.createDataFrame(data, ("key", "jarray"))
>>> df.select(df.key,
... get_json_object(df.jarray, '$[0].f1').alias("c0"),
... get_json_object(df.jarray, '$[1].f2').alias("c1")
... ).show()
+---+-------+-------+
|key| c0| c1|
+---+-------+-------+
| 1| value1| NULL|
| 2|value12|value13|
+---+-------+-------+

>>> df.select(df.key,
... get_json_object(df.jarray, '$[*].f1').alias("c0"),
... get_json_object(df.jarray, '$[*].f2').alias("c1")
... ).show()
+---+-------------------+---------+
|key| c0| c1|
+---+-------------------+---------+
| 1|["value1","value2"]| NULL|
| 2| "value12"|"value13"|
+---+-------------------+---------+
"""
from pyspark.sql.classic.column import _to_java_column

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ import org.apache.spark.unsafe.types.UTF8String
Examples:
> SELECT _FUNC_('{"a":"b"}', '$.a');
b
> SELECT _FUNC_('[{"a":"b"},{"a":"c"}]', '$[0].a');
b
> SELECT _FUNC_('[{"a":"b"},{"a":"c"}]', '$[*].a');
["b","c"]
""",
group = "json_funcs",
since = "1.5.0")
Expand Down

0 comments on commit ef0685a

Please sign in to comment.