[SPARK-51152][PYTHON][SQL][DOCS] Add usage examples for the get_json_…

…object function ### What changes were proposed in this pull request? The pr aims to add some usage examples for function `get_json_object`, including: `get_json_object('[{"a":"b"},{"a":"c"}]', '$[0].a')`，`get_json_object('[{"a":"b"},{"a":"c"}]', '$[*].a')`. ### Why are the changes needed? When `JSON` is an `array`, some users may not know how to retrieve its data through `get_json_object`. Let's add some usage examples. ### Does this PR introduce _any_ user-facing change? Yes, Spark end-users will learn how to use `get_json_object` to obtain JSON data of type array through examples. ### How was this patch tested? - Pass GA - Manually Test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49875 from fusheng9399/add-json-example. Lead-authored-by: fusheng <fusheng9399@gmail.com> Co-authored-by: Ruifeng Zheng <ruifengz@foxmail.com> Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: panbingkun <panbingkun@apache.org>
apache · Feb 18, 2025 · ef0685a · ef0685a
1 parent 2c76dff
commit ef0685a
Show file tree

Hide file tree

Showing 2 changed files with 45 additions and 3 deletions.
diff --git a/python/pyspark/sql/functions/builtin.py b/python/pyspark/sql/functions/builtin.py
@@ -20146,11 +20146,49 @@ def get_json_object(col: "ColumnOrName", path: str) -> Column:
 
     Examples
     --------
+    Example 1: Extract a json object from json string
+
     >>> data = [("1", '''{"f1": "value1", "f2": "value2"}'''), ("2", '''{"f1": "value12"}''')]
     >>> df = spark.createDataFrame(data, ("key", "jstring"))
-    >>> df.select(df.key, get_json_object(df.jstring, '$.f1').alias("c0"), \\
-    ...                   get_json_object(df.jstring, '$.f2').alias("c1") ).collect()
-    [Row(key='1', c0='value1', c1='value2'), Row(key='2', c0='value12', c1=None)]
+    >>> df.select(df.key,
+    ...     get_json_object(df.jstring, '$.f1').alias("c0"),
+    ...     get_json_object(df.jstring, '$.f2').alias("c1")
+    ... ).show()
+    +---+-------+------+
+    |key|     c0|    c1|
+    +---+-------+------+
+    |  1| value1|value2|
+    |  2|value12|  NULL|
+    +---+-------+------+
+
+    Example 2: Extract a json object from json array
+
+    >>> data = [
+    ... ("1", '''[{"f1": "value1"},{"f1": "value2"}]'''),
+    ... ("2", '''[{"f1": "value12"},{"f2": "value13"}]''')
+    ... ]
+    >>> df = spark.createDataFrame(data, ("key", "jarray"))
+    >>> df.select(df.key,
+    ...     get_json_object(df.jarray, '$[0].f1').alias("c0"),
+    ...     get_json_object(df.jarray, '$[1].f2').alias("c1")
+    ... ).show()
+    +---+-------+-------+
+    |key|     c0|     c1|
+    +---+-------+-------+
+    |  1| value1|   NULL|
+    |  2|value12|value13|
+    +---+-------+-------+
+
+    >>> df.select(df.key,
+    ...     get_json_object(df.jarray, '$[*].f1').alias("c0"),
+    ...     get_json_object(df.jarray, '$[*].f2').alias("c1")
+    ... ).show()
+    +---+-------------------+---------+
+    |key|                 c0|       c1|
+    +---+-------------------+---------+
+    |  1|["value1","value2"]|     NULL|
+    |  2|          "value12"|"value13"|
+    +---+-------------------+---------+
     """
     from pyspark.sql.classic.column import _to_java_column
 

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -42,6 +42,10 @@ import org.apache.spark.unsafe.types.UTF8String
     Examples:
       > SELECT _FUNC_('{"a":"b"}', '$.a');
        b
+      > SELECT _FUNC_('[{"a":"b"},{"a":"c"}]', '$[0].a');
+       b
+      > SELECT _FUNC_('[{"a":"b"},{"a":"c"}]', '$[*].a');
+       ["b","c"]
   """,
   group = "json_funcs",
   since = "1.5.0")