{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1725335088.0","currentOid":""},"activityList":{"items":[{"before":"8fbeaf5dd7ba91151df6d15d115fb215ef19e545","after":"d8f9d8d22eb6ac55b6505782a6a51d1b201a04a3","ref":"refs/heads/branch-3.5","pushedAt":"2024-09-05T13:02:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-49152][SQL][FOLLOWUP] DelegatingCatalogExtension should also use V1 commands\n\n### What changes were proposed in this pull request?\n\nThis is a followup of https://github.com/apache/spark/pull/47660 . If users override `spark_catalog` with\n`DelegatingCatalogExtension`, we should still use v1 commands as `DelegatingCatalogExtension` forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS.\n\nThis PR also forces to use v1 commands for certain commands that do not have a v2 version.\n\n### Why are the changes needed?\n\nAvoid introducing behavior changes to Spark plugins that implements `DelegatingCatalogExtension` to override `spark_catalog`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nnew test case\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47995 from amaliujia/fix_catalog_v2.\n\nLead-authored-by: Wenchen Fan \nCo-authored-by: Rui Wang \nCo-authored-by: Wenchen Fan \nSigned-off-by: Wenchen Fan \n(cherry picked from commit f7cfeb534d9285df381d147e01de47ec439c082e)\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-49152][SQL][FOLLOWUP] DelegatingCatalogExtension should also u…"}},{"before":"9676b1c48cba47825ff3dd48e609fa3f0b046c02","after":"f7cfeb534d9285df381d147e01de47ec439c082e","ref":"refs/heads/master","pushedAt":"2024-09-05T13:02:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-49152][SQL][FOLLOWUP] DelegatingCatalogExtension should also use V1 commands\n\n### What changes were proposed in this pull request?\n\nThis is a followup of https://github.com/apache/spark/pull/47660 . If users override `spark_catalog` with\n`DelegatingCatalogExtension`, we should still use v1 commands as `DelegatingCatalogExtension` forwards requests to HMS and there are still behavior differences between v1 and v2 commands targeting HMS.\n\nThis PR also forces to use v1 commands for certain commands that do not have a v2 version.\n\n### Why are the changes needed?\n\nAvoid introducing behavior changes to Spark plugins that implements `DelegatingCatalogExtension` to override `spark_catalog`.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nnew test case\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47995 from amaliujia/fix_catalog_v2.\n\nLead-authored-by: Wenchen Fan \nCo-authored-by: Rui Wang \nCo-authored-by: Wenchen Fan \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-49152][SQL][FOLLOWUP] DelegatingCatalogExtension should also u…"}},{"before":"182353d74e439e7bc524a3886ecbbc2e61a2127c","after":"9676b1c48cba47825ff3dd48e609fa3f0b046c02","ref":"refs/heads/master","pushedAt":"2024-09-05T12:59:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48348][SPARK-48376][SQL] Introduce `LEAVE` and `ITERATE` statements\n\n### What changes were proposed in this pull request?\nThis PR proposes introduction of `LEAVE` and `ITERATE` statement types to SQL Scripting language:\n- `LEAVE` statement can be used in loops, as well as in `BEGIN ... END` compound blocks.\n- `ITERATE` statement can be used only in loops.\n\nThis PR introduces:\n- Logical operators for both statement types.\n- Execution nodes for both statement types.\n- Interpreter changes required to build execution plans that support new statement types.\n- New error if statements are not used properly.\n- Minor changes required to support new keywords.\n\n### Why are the changes needed?\nAdding support for new statement types to SQL Scripting language.\n\n### Does this PR introduce _any_ user-facing change?\nThis PR introduces new statement types that will be available to users. However, script execution logic hasn't been done yet, so the new changes are not accessible by users yet.\n\n### How was this patch tested?\nTests are introduced to all test suites related to SQL scripting.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47973 from davidm-db/sql_scripting_leave_iterate.\n\nAuthored-by: David Milicevic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48348][SPARK-48376][SQL] Introduce LEAVE and ITERATE state…"}},{"before":"9a7b6e5c31bfcca4283ed6bc22df10b743e9a470","after":"182353d74e439e7bc524a3886ecbbc2e61a2127c","ref":"refs/heads/master","pushedAt":"2024-09-05T12:44:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49523][CONNECT] Increase maximum wait time for connect server to come up for testing\n\n### What changes were proposed in this pull request?\nThis PR increases the max time we wait for a connect server to come up for testing. The current threshold is too low, and is causing flakyness.\n\n### Why are the changes needed?\nIt makes connect tests less flaky.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nIt is test infra code.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47994 from WweiL/deflake-will-it-work.\n\nAuthored-by: Wei Liu \nSigned-off-by: Herman van Hovell ","shortMessageHtmlLink":"[SPARK-49523][CONNECT] Increase maximum wait time for connect server …"}},{"before":"37f2fa99c31d7563f4020557fc50b74d7cb758bc","after":"9a7b6e5c31bfcca4283ed6bc22df10b743e9a470","ref":"refs/heads/master","pushedAt":"2024-09-05T07:24:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HeartSaVioR","name":"Jungtaek Lim","path":"/HeartSaVioR","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1317309?s=80&v=4"},"commit":{"message":"[SPARK-49474][SS] Classify Error class for FlatMapGroupsWithState user function error\n\n### What changes were proposed in this pull request?\n\nAdd new error classification for errors occurring in the user function that is used in FlatMapGroupsWithState.\n\n### Why are the changes needed?\n\nThe user provided function can throw any type of error. Using the new error framework for better error messages and classification.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, better error message with error class for Foreach sink user function failures.\n\n### How was this patch tested?\n\nUpdated existing tests and added new unit test in FlatMapGroupsWithStateSuite.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47940 from liviazhu-db/liviazhu-db/classify-flatmapgroupswithstate-error.\n\nAuthored-by: Livia Zhu \nSigned-off-by: Jungtaek Lim ","shortMessageHtmlLink":"[SPARK-49474][SS] Classify Error class for FlatMapGroupsWithState use…"}},{"before":"5d32fa98043ccd0d154d8631ba30c32005e05125","after":"71a82e1758c825287329cb3196d6031bbbb7f8eb","ref":"refs/heads/branch-3.4","pushedAt":"2024-09-05T06:59:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-49408][SQL] Use IndexedSeq in ProjectingInternalRow\n\n### What changes were proposed in this pull request?\nIn ProjectingInternalRow, accessing colOrdinals causes poor performace. Replace colOrdinals with the IndexedSeq type.\n\n### Why are the changes needed?\nReplace colOrdinals with the IndexedSeq type.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nNo need to add UT\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47890 from wzx140/project-row-fix.\n\nLead-authored-by: wzx \nCo-authored-by: Kent Yao \nSigned-off-by: Kent Yao \n(cherry picked from commit 37f2fa99c31d7563f4020557fc50b74d7cb758bc)\nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-49408][SQL] Use IndexedSeq in ProjectingInternalRow"}},{"before":"e5ec16efa1784e1ba88c683d50e589a7c8affe65","after":"8fbeaf5dd7ba91151df6d15d115fb215ef19e545","ref":"refs/heads/branch-3.5","pushedAt":"2024-09-05T06:58:45.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-49408][SQL] Use IndexedSeq in ProjectingInternalRow\n\n### What changes were proposed in this pull request?\nIn ProjectingInternalRow, accessing colOrdinals causes poor performace. Replace colOrdinals with the IndexedSeq type.\n\n### Why are the changes needed?\nReplace colOrdinals with the IndexedSeq type.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nNo need to add UT\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47890 from wzx140/project-row-fix.\n\nLead-authored-by: wzx \nCo-authored-by: Kent Yao \nSigned-off-by: Kent Yao \n(cherry picked from commit 37f2fa99c31d7563f4020557fc50b74d7cb758bc)\nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-49408][SQL] Use IndexedSeq in ProjectingInternalRow"}},{"before":"628cd751a909e63a9df0bca317b6e611a736ff85","after":"37f2fa99c31d7563f4020557fc50b74d7cb758bc","ref":"refs/heads/master","pushedAt":"2024-09-05T06:58:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-49408][SQL] Use IndexedSeq in ProjectingInternalRow\n\n### What changes were proposed in this pull request?\nIn ProjectingInternalRow, accessing colOrdinals causes poor performace. Replace colOrdinals with the IndexedSeq type.\n\n### Why are the changes needed?\nReplace colOrdinals with the IndexedSeq type.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nNo need to add UT\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47890 from wzx140/project-row-fix.\n\nLead-authored-by: wzx \nCo-authored-by: Kent Yao \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-49408][SQL] Use IndexedSeq in ProjectingInternalRow"}},{"before":"f37977771c944b420fc728bb2d97775642e0e0aa","after":"628cd751a909e63a9df0bca317b6e611a736ff85","ref":"refs/heads/master","pushedAt":"2024-09-05T05:04:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-49241][CORE] Add `OpenTelemetryPush` Sink with `opentelemetry` profile\"\n\nThis reverts commit 2eb2e39eda79edfb21d605edf340c8fd3958168e.","shortMessageHtmlLink":"Revert \"[SPARK-49241][CORE] Add OpenTelemetryPush Sink with `opente…"}},{"before":"2bf5b3d0db5f7fd73bd79355f0c98cf5f586133d","after":"5d32fa98043ccd0d154d8631ba30c32005e05125","ref":"refs/heads/branch-3.4","pushedAt":"2024-09-05T04:59:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48965][SQL] Use the correct schema in `Dataset#toJSON`\n\nIn `Dataset#toJSON`, use the schema from `exprEnc`. This schema reflects any changes (e.g., decimal precision, column ordering) that `exprEnc` might make to input rows.\n\n`Dataset#toJSON` currently uses the schema from the logical plan, but that schema does not necessarily describe the rows passed to `JacksonGenerator`: the function passed to `mapPartitions` uses `exprEnc` to serialize the input, and this could potentially change the precision on decimals or rearrange columns.\n\nHere's an example that tricks `UnsafeRow#getDecimal` (called from `JacksonGenerator`) to mistakenly assume the decimal is stored as a Long:\n```\nscala> case class Data(a: BigDecimal)\nclass Data\n\nscala> sql(\"select 123.456bd as a\").as[Data].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res0: Array[String] = Array({\"a\":68719476.745})\n\nscala>\n```\nHere's an example that tricks `JacksonGenerator` to ask for a string from an array and an array from a string. This case actually crashes the JVM:\n```\nscala> case class Data(x: Array[Int], y: String)\nclass Data\n\nscala> sql(\"select repeat('Hey there', 17) as y, array_repeat(22, 17) as x\").as[Data].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nException in task 0.0 in stage 0.0 (TID 0)\njava.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$5(JacksonGenerator.scala:129) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$5$adapted(JacksonGenerator.scala:128) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.writeArrayData(JacksonGenerator.scala:258) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$23(JacksonGenerator.scala:201) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.writeArray(JacksonGenerator.scala:249) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n...\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n\nbash-3.2$\n```\nBoth these cases work correctly without `toJSON`.\n\nBefore the PR, converting the dataframe to a dataset of Tuple would preserve the column names in the JSON strings:\n```\nscala> sql(\"select 123.456d as a, 12 as b\").as[(Double, Int)].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res0: Array[String] = Array({\"a\":123.456,\"b\":12})\n\nscala>\n```\nAfter the PR, the JSON strings use the field name from the Tuple class:\n```\nscala> sql(\"select 123.456d as a, 12 as b\").as[(Double, Int)].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res1: Array[String] = Array({\"_1\":123.456,\"_2\":12})\n\nscala>\n```\n\nNew tests.\n\nNo.\n\nCloses #47982 from bersprockets/to_json_issue.\n\nAuthored-by: Bruce Robbins \nSigned-off-by: Dongjoon Hyun \n(cherry picked from commit 5375ce2acfe206eb64fb8bede44fe47c643fcd46)\nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48965][SQL] Use the correct schema in Dataset#toJSON"}},{"before":"d83bf8c9b775d52bc359d0c9a491eeeb0f0d6917","after":"e5ec16efa1784e1ba88c683d50e589a7c8affe65","ref":"refs/heads/branch-3.5","pushedAt":"2024-09-05T04:51:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48965][SQL] Use the correct schema in `Dataset#toJSON`\n\nIn `Dataset#toJSON`, use the schema from `exprEnc`. This schema reflects any changes (e.g., decimal precision, column ordering) that `exprEnc` might make to input rows.\n\n`Dataset#toJSON` currently uses the schema from the logical plan, but that schema does not necessarily describe the rows passed to `JacksonGenerator`: the function passed to `mapPartitions` uses `exprEnc` to serialize the input, and this could potentially change the precision on decimals or rearrange columns.\n\nHere's an example that tricks `UnsafeRow#getDecimal` (called from `JacksonGenerator`) to mistakenly assume the decimal is stored as a Long:\n```\nscala> case class Data(a: BigDecimal)\nclass Data\n\nscala> sql(\"select 123.456bd as a\").as[Data].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res0: Array[String] = Array({\"a\":68719476.745})\n\nscala>\n```\nHere's an example that tricks `JacksonGenerator` to ask for a string from an array and an array from a string. This case actually crashes the JVM:\n```\nscala> case class Data(x: Array[Int], y: String)\nclass Data\n\nscala> sql(\"select repeat('Hey there', 17) as y, array_repeat(22, 17) as x\").as[Data].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nException in task 0.0 in stage 0.0 (TID 0)\njava.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$5(JacksonGenerator.scala:129) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$5$adapted(JacksonGenerator.scala:128) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.writeArrayData(JacksonGenerator.scala:258) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$23(JacksonGenerator.scala:201) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.writeArray(JacksonGenerator.scala:249) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n...\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n\nbash-3.2$\n```\nBoth these cases work correctly without `toJSON`.\n\nBefore the PR, converting the dataframe to a dataset of Tuple would preserve the column names in the JSON strings:\n```\nscala> sql(\"select 123.456d as a, 12 as b\").as[(Double, Int)].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res0: Array[String] = Array({\"a\":123.456,\"b\":12})\n\nscala>\n```\nAfter the PR, the JSON strings use the field name from the Tuple class:\n```\nscala> sql(\"select 123.456d as a, 12 as b\").as[(Double, Int)].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res1: Array[String] = Array({\"_1\":123.456,\"_2\":12})\n\nscala>\n```\n\nNew tests.\n\nNo.\n\nCloses #47982 from bersprockets/to_json_issue.\n\nAuthored-by: Bruce Robbins \nSigned-off-by: Dongjoon Hyun \n(cherry picked from commit 5375ce2acfe206eb64fb8bede44fe47c643fcd46)\nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48965][SQL] Use the correct schema in Dataset#toJSON"}},{"before":"5375ce2acfe206eb64fb8bede44fe47c643fcd46","after":"f37977771c944b420fc728bb2d97775642e0e0aa","ref":"refs/heads/master","pushedAt":"2024-09-05T04:48:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49458][CONNECT][PYTHON] Supply server-side session id via ReattachExecute\n\n### What changes were proposed in this pull request?\n\nThe server-side session id was not supplied via ReattachExecute, resulting in a situation where OPERATION_NOT_FOUND was thrown while SESSION_CHANGED was expected. Now, the PR makes sure that a server-side session id is always supplied through ReattachExecute so that the server can raise the correct error.\n\n### Why are the changes needed?\n\nCorrect error handling when the server restarts.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nAdded a test to test_client.py.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47930 from changgyoopark-db/SPARK-49458.\n\nAuthored-by: Changgyoo Park \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-49458][CONNECT][PYTHON] Supply server-side session id via Reat…"}},{"before":"0c25fbeaef8370285e82591ab63cf79a7630f7de","after":"5375ce2acfe206eb64fb8bede44fe47c643fcd46","ref":"refs/heads/master","pushedAt":"2024-09-05T04:43:33.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48965][SQL] Use the correct schema in `Dataset#toJSON`\n\n### What changes were proposed in this pull request?\n\nIn `Dataset#toJSON`, use the schema from `exprEnc`. This schema reflects any changes (e.g., decimal precision, column ordering) that `exprEnc` might make to input rows.\n\n### Why are the changes needed?\n\n`Dataset#toJSON` currently uses the schema from the logical plan, but that schema does not necessarily describe the rows passed to `JacksonGenerator`: the function passed to `mapPartitions` uses `exprEnc` to serialize the input, and this could potentially change the precision on decimals or rearrange columns.\n\nHere's an example that tricks `UnsafeRow#getDecimal` (called from `JacksonGenerator`) to mistakenly assume the decimal is stored as a Long:\n```\nscala> case class Data(a: BigDecimal)\nclass Data\n\nscala> sql(\"select 123.456bd as a\").as[Data].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res0: Array[String] = Array({\"a\":68719476.745})\n\nscala>\n```\nHere's an example that tricks `JacksonGenerator` to ask for a string from an array and an array from a string. This case actually crashes the JVM:\n```\nscala> case class Data(x: Array[Int], y: String)\nclass Data\n\nscala> sql(\"select repeat('Hey there', 17) as y, array_repeat(22, 17) as x\").as[Data].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nException in task 0.0 in stage 0.0 (TID 0)\njava.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$5(JacksonGenerator.scala:129) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$5$adapted(JacksonGenerator.scala:128) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.writeArrayData(JacksonGenerator.scala:258) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$23(JacksonGenerator.scala:201) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n\tat org.apache.spark.sql.catalyst.json.JacksonGenerator.writeArray(JacksonGenerator.scala:249) ~[spark-catalyst_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]\n...\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n\nbash-3.2$\n```\nBoth these cases work correctly without `toJSON`.\n\n### Does this PR introduce _any_ user-facing change?\n\nBefore the PR, converting the dataframe to a dataset of Tuple would preserve the column names in the JSON strings:\n```\nscala> sql(\"select 123.456d as a, 12 as b\").as[(Double, Int)].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res0: Array[String] = Array({\"a\":123.456,\"b\":12})\n\nscala>\n```\nAfter the PR, the JSON strings use the field name from the Tuple class:\n```\nscala> sql(\"select 123.456d as a, 12 as b\").as[(Double, Int)].toJSON.collect\nwarning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation`\nval res1: Array[String] = Array({\"_1\":123.456,\"_2\":12})\n\nscala>\n```\n\n### How was this patch tested?\n\nNew tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47982 from bersprockets/to_json_issue.\n\nAuthored-by: Bruce Robbins \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48965][SQL] Use the correct schema in Dataset#toJSON"}},{"before":"3f3b52969585315f9218d58bd2dc438414e4ad38","after":"0c25fbeaef8370285e82591ab63cf79a7630f7de","ref":"refs/heads/master","pushedAt":"2024-09-05T04:25:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"itholic","name":"Haejoon Lee","path":"/itholic","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/44108233?s=80&v=4"},"commit":{"message":"[SPARK-49085][SQL] Remove special casing for Protobuf functions in Connect\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to remove special casing for Protobuf functions in Connect\n\nThis PR try resolving the follow tasks described from ticket:\n\n- Remove the special casing from the connect planner\n- Add the needed constructors to the protobuf expressions\n- Update protobuf functions and make them use unresolved function path\n\n### Why are the changes needed?\n\nFor unifying SQL Scala interface between Connect and Classic.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo API changes from users' perspective\n\n### How was this patch tested?\n\nThe existing CI should pass\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #47885 from itholic/remove_special_casing.\n\nAuthored-by: Haejoon Lee \nSigned-off-by: Haejoon Lee ","shortMessageHtmlLink":"[SPARK-49085][SQL] Remove special casing for Protobuf functions in Co…"}},{"before":"e76c6c9de0834ebda0cc56e55b673febc15a96e5","after":"3f3b52969585315f9218d58bd2dc438414e4ad38","ref":"refs/heads/master","pushedAt":"2024-09-05T04:20:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49427][CONNECT][SQL] Create a shared interface for MergeIntoWriter\n\n### What changes were proposed in this pull request?\nThis PR creates a shared interface for MergeIntoWriter.\n\n### Why are the changes needed?\nWe are creating a shared Scala Spark SQL interface for Classic and Connect.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47963 from hvanhovell/SPARK-49427.\n\nAuthored-by: Herman van Hovell \nSigned-off-by: Herman van Hovell ","shortMessageHtmlLink":"[SPARK-49427][CONNECT][SQL] Create a shared interface for MergeIntoWr…"}},{"before":"75e53b78b67e131adfcc5d4749bb3be94a73d6df","after":"e76c6c9de0834ebda0cc56e55b673febc15a96e5","ref":"refs/heads/master","pushedAt":"2024-09-05T02:11:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49414][CONNECT][SQL] Add Shared DataFrameReader interface\n\n### What changes were proposed in this pull request?\nThis PR creates a shared interface for DataFrameReader.\n\n### Why are the changes needed?\nWe are creating a shared Scala Spark SQL interface for Classic and Connect.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nExisting tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47975 from hvanhovell/SPARK-49414.\n\nAuthored-by: Herman van Hovell \nSigned-off-by: Herman van Hovell ","shortMessageHtmlLink":"[SPARK-49414][CONNECT][SQL] Add Shared DataFrameReader interface"}},{"before":"9f2312d1e5bcc5300fe62477f77b3c8106238f2e","after":"75e53b78b67e131adfcc5d4749bb3be94a73d6df","ref":"refs/heads/master","pushedAt":"2024-09-05T00:14:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49512][K8S][DOCS] Drop K8s v1.27 Support\n\n### What changes were proposed in this pull request?\n\nThis PR aims to update K8s docs to recommend K8s v1.28+ for Apache Spark 4.0.0.\n- As of now (2024-09-04), v1.28, v1.29, v1.30, and v1.31 are available.\n\nThis is a kind of follow-up of the following previous PR because Apache Spark 4.0.0 schedule is delayed slightly.\n- #46168\n\n### Why are the changes needed?\n\n**1. K8s community archived v1.27.19 on 2024-07-16 and starts to release v1.31.0 from 2024-08-13**\n- https://kubernetes.io/releases/#release-v1-31\n- https://kubernetes.io/releases/patch-releases/#non-active-branch-history\n\n**2. Default K8s Versions in Public Cloud environments**\n\nThe default K8s versions of public cloud providers are already moving to K8s 1.30 like the following.\n\n- EKS: v1.30 (Default)\n- GKE: v1.30 (Rapid), v1.29 (Regular), v1.29 (Stable)\n- AKS: v1.29 (Default), v1.30 (Support)\n\n**3. End Of Support**\n\nIn addition, K8s 1.27 reached or will reach a standard support EOL in two weeks before Apache Spark 4.0.0 release.\n\n| K8s | EKS | AKS | GKE |\n| ---- | ------- | ------- | ------- |\n| 1.27 | 2024-07 | 2024-07 | 2024-09-16 |\n\n- [EKS EOL Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar)\n- [AKS EOL Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar)\n- [GKE EOL Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule)\n\n### Does this PR introduce _any_ user-facing change?\n\n- No, this is a documentation-only change about K8s versions.\n- Apache Spark K8s Integration Test is currently using K8s **v1.30.0** on Minikube already.\n\n### How was this patch tested?\n\nManual review.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47990 from dongjoon-hyun/SPARK-49512.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-49512][K8S][DOCS] Drop K8s v1.27 Support"}},{"before":"cc3df38ce4b76d3a7d5578ac5824afe498ed59fa","after":"9f2312d1e5bcc5300fe62477f77b3c8106238f2e","ref":"refs/heads/master","pushedAt":"2024-09-05T00:11:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49478][CONNECT] Handle null metrics in ConnectProgressExecutionListener\n\n### What changes were proposed in this pull request?\n\nHandling null `TaskMetrics` in `ConnectProgressExecutionListenerSuite` by reporting 0 `inputBytesRead` on null.\n\n### Why are the changes needed?\n\nOn task end, `TaskMetrics` may be `null`, as in the case of task failure (see [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L83)). This can cause NPEs for failed tasks with null metrics.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nAdded a new test for task done with `null` metrics.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47944 from davintjong-db/connect-progress-listener-null-metrics.\n\nLead-authored-by: Davin Tjong \nCo-authored-by: Davin Tjong <107501978+davintjong-db@users.noreply.github.com>\nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-49478][CONNECT] Handle null metrics in ConnectProgressExecutio…"}},{"before":"cec1252b0a2234cf64759a27758a9234de091828","after":"2bf5b3d0db5f7fd73bd79355f0c98cf5f586133d","ref":"refs/heads/branch-3.4","pushedAt":"2024-09-04T21:05:43.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist\n\n### What changes were proposed in this pull request?\n\nThis is a followup to https://github.com/apache/spark/pull/46254 . Instead of using object arrays when nulls are present, continue to use primitive arrays when appropriate. This PR sets the null bits appropriately for the primitive array copy.\n\nPrimitive arrays are faster than object arrays and won't create unnecessary objects.\n\n### Why are the changes needed?\n\nThis will improve performance and memory usage, when nulls are present in the `ColumnarArray`.\n\n### Does this PR introduce _any_ user-facing change?\n\nThis is expected to be faster when copying `ColumnarArray`.\n\n### How was this patch tested?\n\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46372 from gene-db/primitive-nulls.\n\nAuthored-by: Gene Pang \nSigned-off-by: Wenchen Fan \n(cherry picked from commit bf2e25459fe46ca2b1d26e1c98c873923fc135e1)\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays …"}},{"before":"2ed6c3e511f322c5fd01953736c376a85ff2c687","after":"cc3df38ce4b76d3a7d5578ac5824afe498ed59fa","ref":"refs/heads/master","pushedAt":"2024-09-04T20:53:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"allisonwang-db","name":"Allison Wang","path":"/allisonwang-db","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/66282705?s=80&v=4"},"commit":{"message":"[SPARK-48493][PYTHON] Enhance Python Datasource Reader with direct Arrow Batch support for improved performance\n\n### What changes were proposed in this pull request?\nThis pull request proposes enhancing the Python Datasource Reader by adding an option to yield Arrow batches directly. This change aims to significantly improve performance compared to the existing approach of using tuples or Rows. The implementation takes advantage of the existing work with MapInArrow (referenced in SPARK-46253).\n\n### Why are the changes needed?\nThe changes are needed to address performance issues in the Python Datasource Reader. The current method of sending data as tuples or Rows is inefficient, leading to slower data processing times. By allowing the Datasource Reader to yield Arrow batches directly, we can use the more efficient Arrow format, significantly speeding up data processing. Tests have shown this approach to be up to 8x faster (in a preliminary test with a High Energy Physics Datasource reader for the ROOT data format), particularly benefiting use cases involving large datasets.\n\n### Does this PR introduce _any_ user-facing change?\nYes, this PR introduces a user-facing change by adding an option to the Python Datasource Reader that allows users to yield Arrow batches directly.\n\n### How was this patch tested?\nA new test was added to the Python Datasource test suite. Additionally, it was manually tested using a custom Python datasource for performance testing.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46826 from LucaCanali/arrowBatchesInPythonDatasource.\n\nAuthored-by: Luca Canali \nSigned-off-by: allisonwang-db ","shortMessageHtmlLink":"[SPARK-48493][PYTHON] Enhance Python Datasource Reader with direct Ar…"}},{"before":"7718777c3e57da85426378881f9ce9f6ed743a1d","after":"d83bf8c9b775d52bc359d0c9a491eeeb0f0d6917","ref":"refs/heads/branch-3.5","pushedAt":"2024-09-04T14:51:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49509][CORE] Use `Platform.allocateDirectBuffer` instead of `ByteBuffer.allocateDirect`\n\nThis PR aims to use `Platform.allocateDirectBuffer` instead of `ByteBuffer.allocateDirect`.\n\nhttps://github.com/apache/spark/pull/47733#pullrequestreview-2251276385\n\nAllocating off-heap memory should use the `allocateDirectBuffer` API provided `by Platform`.\n\nNo\n\nGA\n\nNo\n\nCloses #47987 from cxzl25/SPARK-49509.\n\nAuthored-by: sychen \nSigned-off-by: Dongjoon Hyun \n(cherry picked from commit 2ed6c3e511f322c5fd01953736c376a85ff2c687)\nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-49509][CORE] Use Platform.allocateDirectBuffer instead of `B…"}},{"before":"c5293ecb017b55ff661ea05353e4463a08d0073c","after":"2ed6c3e511f322c5fd01953736c376a85ff2c687","ref":"refs/heads/master","pushedAt":"2024-09-04T14:50:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-49509][CORE] Use `Platform.allocateDirectBuffer` instead of `ByteBuffer.allocateDirect`\n\n### What changes were proposed in this pull request?\nThis PR aims to use `Platform.allocateDirectBuffer` instead of `ByteBuffer.allocateDirect`.\n\n### Why are the changes needed?\nhttps://github.com/apache/spark/pull/47733#pullrequestreview-2251276385\n\nAllocating off-heap memory should use the `allocateDirectBuffer` API provided `by Platform`.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nGA\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47987 from cxzl25/SPARK-49509.\n\nAuthored-by: sychen \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-49509][CORE] Use Platform.allocateDirectBuffer instead of `B…"}},{"before":"4a79e73035ea48ee8facb4ad371d23fd0f45d64e","after":"c5293ecb017b55ff661ea05353e4463a08d0073c","ref":"refs/heads/master","pushedAt":"2024-09-04T12:06:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"[SPARK-49426][CONNECT][SQL] Create a shared interface for DataFrameWriterV2\n\n### What changes were proposed in this pull request?\nThis PR creates a shared interface for DataFrameWriterV2.\n\n### Why are the changes needed?\nWe are creating a shared Scala Spark SQL interface for Classic and Connect.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #47962 from hvanhovell/SPARK-49426.\n\nAuthored-by: Herman van Hovell \nSigned-off-by: Herman van Hovell ","shortMessageHtmlLink":"[SPARK-49426][CONNECT][SQL] Create a shared interface for DataFrameWr…"}},{"before":"3e22c47146e78f243028e71fc7e9dd2b9c16b685","after":"4a79e73035ea48ee8facb4ad371d23fd0f45d64e","ref":"refs/heads/master","pushedAt":"2024-09-04T10:29:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-49445][UI] Support show tooltip in the progress bar of UI\n\n### What changes were proposed in this pull request?\nThis PR aims to support show tooltip in the progress bar of UI.\n\n### Why are the changes needed?\nCurrently, too much content has been added to the progress bar content in the UI, but the width of the progress bar is limited and it cannot display so much content.\n\n\"image\"\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nlocal test\n\n\"image\"\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47908 from cxzl25/SPARK-49445.\n\nLead-authored-by: sychen \nCo-authored-by: Kent Yao \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-49445][UI] Support show tooltip in the progress bar of UI"}},{"before":"90a236eefd509f6ed1ffb48b7bbb32c395d3b940","after":"3e22c47146e78f243028e71fc7e9dd2b9c16b685","ref":"refs/heads/master","pushedAt":"2024-09-04T08:09:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48960][CONNECT] Makes spark-submit works with Spark connect\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to add the support of `--remote` at `bin/spark-submit` so it can use Spark Connect easily. This PR inclues:\n- Make `bin/spark-submit` working with Scala Spark Connect client\n- Pass `--conf` and loaded configurations to both Scala and Python Spark Connect clients\n\n### Why are the changes needed?\n\n`bin/pyspark --remote` already works. We should also make `bin/spark-submit` works in order for end users to try Spark Connect out and to have the consistent way.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes,\n- `bin/spark-submit` supports `--remote` option in Scala.\n- `bin/spark-submit` supports `--conf` and loaded Spark configurations to pass to the clients in Scala and Python\n\n### How was this patch tested?\n\nPython:\n\n```bash\necho \"from pyspark.sql import SparkSession;spark = SparkSession.builder.getOrCreate();assert 'connect' in str(type(spark));assert spark.range(1).first()[0] == 0\" > test.py\n```\n\n```bash\n./bin/spark-submit --name \"testApp\" --remote \"local\" test.py\n```\n\nScala:\n\nhttps://github.com/HyukjinKwon/spark-connect-example\n\n```bash\ngit clone https://github.com/HyukjinKwon/spark-connect-example\ncd spark-connect-example\nbuild/sbt package\ncd ..\ngit clone https://github.com/apache/spark.git\ncd spark\nbuild/sbt package\nsbin/start-connect-server.sh\nbin/spark-submit --name \"testApp\" --remote \"sc://localhost\" --class com.hyukjinkwon.SparkConnectExample ../spark-connect-example/target/scala-2.13/spark-connect-example_2.13-0.0.1.jar\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47434 from HyukjinKwon/SPARK-48960.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48960][CONNECT] Makes spark-submit works with Spark connect"}},{"before":"7febde1bf3e39b314b9d2184986b719888b8b195","after":"7718777c3e57da85426378881f9ce9f6ed743a1d","ref":"refs/heads/branch-3.5","pushedAt":"2024-09-04T07:37:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][DOCS] Fix site.SPARK_VERSION pattern in RDD Programming Guide\n\nFix site.SPARK_VERSION pattern in RDD Programming Guide. I found this when I was developing #47968\n\ndoc fix\n\nno\n\ndoc build\n\nno\n\nCloses #47985 from yaooqinn/version.\n\nAuthored-by: Kent Yao \nSigned-off-by: Hyukjin Kwon \n(cherry picked from commit 90a236eefd509f6ed1ffb48b7bbb32c395d3b940)\nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[MINOR][DOCS] Fix site.SPARK_VERSION pattern in RDD Programming Guide"}},{"before":"339d1c9d9d50bce63316ac788626bea998e71b06","after":"90a236eefd509f6ed1ffb48b7bbb32c395d3b940","ref":"refs/heads/master","pushedAt":"2024-09-04T07:36:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][DOCS] Fix site.SPARK_VERSION pattern in RDD Programming Guide\n\n### What changes were proposed in this pull request?\n\nFix site.SPARK_VERSION pattern in RDD Programming Guide. I found this when I was developing #47968\n\n### Why are the changes needed?\ndoc fix\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\ndoc build\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #47985 from yaooqinn/version.\n\nAuthored-by: Kent Yao \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[MINOR][DOCS] Fix site.SPARK_VERSION pattern in RDD Programming Guide"}},{"before":"d8adf4b3109c0e6353c8c8a300eea5cf6aa3caca","after":"7febde1bf3e39b314b9d2184986b719888b8b195","ref":"refs/heads/branch-3.5","pushedAt":"2024-09-04T05:14:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"MaxGekk","name":"Maxim Gekk","path":"/MaxGekk","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1580697?s=80&v=4"},"commit":{"message":"[SPARK-49275][SQL][3.5] Fix return type nullness of the xpath expression\n\n### What changes were proposed in this pull request?\n\nThis is a cherry-pick of https://github.com/apache/spark/pull/47796.\n\nThe `xpath` expression incorrectly marks its return type as array of non-null strings. However, it can actually return an array containing nulls. This can cause NPE in code generation, such as query `select coalesce(xpath(repeat('', id), 'a')[0], '') from range(1, 2)`.\n\n### Why are the changes needed?\n\nIt avoids potential failures in queries that uses the `xpath` expression.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nA new unit test. It would fail without the change in the PR.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47959 from chenhao-db/fix_xpath_nullness_3.5.\n\nAuthored-by: Chenhao Li \nSigned-off-by: Max Gekk ","shortMessageHtmlLink":"[SPARK-49275][SQL][3.5] Fix return type nullness of the xpath expression"}},{"before":"39d4bd8b3d995ca39f1a253cc14532cd2a9f857e","after":"339d1c9d9d50bce63316ac788626bea998e71b06","ref":"refs/heads/master","pushedAt":"2024-09-04T01:38:46.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-49202][PS] Apply `ArrayBinarySearch` for histogram\n\n### What changes were proposed in this pull request?\n Apply `ArrayBinarySearch` for histogram\n\n### Why are the changes needed?\nthis expression is dedicated for histogram, and supports codegen\n\n```\n\n(5) Project [codegen id : 1]\nOutput [2]: [__group_id#37, cast(CASE WHEN ((__value#38 >= 1.0) AND (__value#38 <= 12.0)) THEN CASE WHEN (__value#38 = 12.0) THEN 11 WHEN (static_invoke(ArrayExpressionUtils.binarySearchNullSafe([1.0,1.9166666666666665,2.833333333333333,3.75,4.666666666666666,5.583333333333333,6.5,7.416666666666666,8.333333333333332,9.25,10.166666666666666,11.083333333333332,12.0], __value#38)) > 0) THEN static_invoke(ArrayExpressionUtils.binarySearchNullSafe([1.0,1.9166666666666665,2.833333333333333,3.75,4.666666666666666,5.583333333333333,6.5,7.416666666666666,8.333333333333332,9.25,10.166666666666666,11.083333333333332,12.0], __value#38)) ELSE (-static_invoke(ArrayExpressionUtils.binarySearchNullSafe([1.0,1.9166666666666665,2.833333333333333,3.75,4.666666666666666,5.583333333333333,6.5,7.416666666666666,8.333333333333332,9.25,10.166666666666666,11.083333333333332,12.0], __value#38)) - 2) END WHEN isnan(__value#38) THEN cast(raise_error(USER_RAISED_EXCEPTION, map(keys: [errorMessage], values: [Histogram encountered NaN value.]), NullType) as int) ELSE cast(raise_error(USER_RAISED_EXCEPTION, map(errorMessage, printf(value %s out of the bins bounds: [%s, %s], __value#38, 1.0, 12.0)), NullType) as int) END as double) AS __bucket#46]\nInput [2]: [__group_id#37, __value#38]\n```\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nCI and manually check\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #47970 from zhengruifeng/ps_apply_binary_search.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-49202][PS] Apply ArrayBinarySearch for histogram"}},{"before":"7d564b7d1e535d1c6c1a828ce35411dfda1037ec","after":"39d4bd8b3d995ca39f1a253cc14532cd2a9f857e","ref":"refs/heads/master","pushedAt":"2024-09-04T01:26:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-49504][BUILD] Add `jjwt` profile\n\n### What changes were proposed in this pull request?\n\nThis PR aims to add a new profile `jjwt` to provide `jjwt-impl` and `jjwt-jackson` jars files in a Spark distribution.\n\n### Why are the changes needed?\n\nTo provide an easy way to build a Spark distribution.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nManually.\n\n**BEFORE**\n```\n$ mvn dependency:tree --pl assembly --am | grep jjwt-impl\n[INFO] +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n```\n\n**AFTER**\n```\n$ mvn dependency:tree --pl assembly --am -Pjjwt | grep jjwt-impl\n[INFO] +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:compile\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:test\n[INFO] | +- io.jsonwebtoken:jjwt-impl:jar:0.12.6:compile\n```\n\nOr, build the distribution.\n\n```\n$ dev/make-distribution.sh -Pjjwt\n$ ls dist/jars/jj*\ndist/jars/jjwt-api-0.12.6.jar\ndist/jars/jjwt-impl-0.12.6.jar\ndist/jars/jjwt-jackson-0.12.6.jar\n```\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #47979 from dongjoon-hyun/SPARK-49504.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-49504][BUILD] Add jjwt profile"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAErdp7uAA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}