Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-32815][ML][3.0] Fix LibSVM data source loading error on file p…
…aths with glob metacharacters ### What changes were proposed in this pull request? In the PR, I propose to fix an issue with LibSVM datasource when both of the following are true: * no user specified schema * some file paths contain escaped glob metacharacters, such as `[``]`, `{``}`, `*` etc. The fix is a backport of #29670, and it is based on another bug fix for CSV/JSON datasources #29659. ### Why are the changes needed? To fix the issue when the follow two queries try to read from paths `[abc]`: ```scala spark.read.format("libsvm").load("""/tmp/\[abc\].csv""").show ``` but would end up hitting an exception: ``` Path does not exist: file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-00000-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm; org.apache.spark.sql.AnalysisException: Path does not exist: file:/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-6ef0ae5e-ff9f-4c4f-9ff4-0db3ee1f6a82/[abc]/part-00000-26406ab9-4e56-45fd-a25a-491c18a05e76-c000.libsvm; at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:770) at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:373) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) ``` ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Added UT to `LibSVMRelationSuite`. Closes #29675 from MaxGekk/globbing-paths-when-inferring-schema-ml-3.0. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
- Loading branch information