-
Notifications
You must be signed in to change notification settings - Fork 28.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-41586][PYTHON] Introduce
pyspark.errors
and error classes fo…
…r PySpark ### What changes were proposed in this pull request? This PR proposes to introduce `pyspark.errors` and error classes to unifying & improving errors generated by PySpark under a single path. To summarize, this PR includes the changes below: - Add `python/pyspark/errors/error_classes.py` to support error class for PySpark. - Add `ErrorClassesReader` to manage the `error_classes.py`. - Add `PySparkException` to handle the errors generated by PySpark. - Add `check_error` for error class testing. This is an initial PR for introducing error framework for PySpark to facilitate the error management and provide better/consistent error messages to users. While such an active work is being done on the [SQL side to improve error messages](https://issues.apache.org/jira/browse/SPARK-37935), so far there is no work to improve error messages in PySpark. So, I'd expect to also initiate the effort on error message improvement for PySpark side from this PR. Eventually, the errors massage will be shown as below, for example: - PySpark, `PySparkException` (thrown by Python driver): ```python >>> from pyspark.sql.functions import lit >>> lit([df.id, df.id]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../spark/python/pyspark/sql/utils.py", line 334, in wrapped return f(*args, **kwargs) File ".../spark/python/pyspark/sql/functions.py", line 176, in lit raise PySparkException( pyspark.errors.exceptions.PySparkException: [COLUMN_IN_LIST] lit does not allow a column in a list. ``` - PySpark, `AnalysisException` (thrown by JVM side, and capture in PySpark side): ``` >>> df.unpivot("id", [], "var", "val").collect() Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../spark/python/pyspark/sql/dataframe.py", line 3296, in unpivot jdf = self._jdf.unpivotWithSeq(jids, jvals, variableColumnName, valueColumnName) File ".../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File ".../spark/python/pyspark/sql/utils.py", line 209, in deco raise converted from None pyspark.sql.utils.AnalysisException: [UNPIVOT_REQUIRES_VALUE_COLUMNS] At least one value column needs to be specified for UNPIVOT, all columns specified as ids; 'Unpivot ArraySeq(id#2L), ArraySeq(), var, [val] +- LogicalRDD [id#2L, int#3L, double#4, str#5], false ``` - Spark, `AnalysisException`: ```scala scala> df.select($"id").unpivot(Array($"id"), Array.empty,variableColumnName = "var", valueColumnName = "val") org.apache.spark.sql.AnalysisException: [UNPIVOT_REQUIRES_VALUE_COLUMNS] At least one value column needs to be specified for UNPIVOT, all columns specified as ids; 'Unpivot ArraySeq(id#0L), ArraySeq(), var, [val] +- Project [id#0L] +- Range (0, 10, step=1, splits=Some(16)) ``` **Next up** for this PR include: - Migrate more errors into `PySparkException` across all modules (e.g, Spark Connect, pandas API on Spark...). - Migrate more error tests into error class tests by using `check_error`. - Define more error classes onto `error_classes.py`. - Add documentation. ### Why are the changes needed? Centralizing error messages & introducing identified error class provides the following benefits: - Errors are searchable via the unique class names and properly classified. - Reduce the cost of future maintenance for PySpark errors. - Provide consistent & actionable error messages to users. - Facilitates translating error messages into different languages. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Adding UTs & running the existing static analysis tools (`dev/lint-python`) Closes #39387 from itholic/SPARK-41586. Authored-by: itholic <haejoon.lee@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
- Loading branch information
1 parent
47068db
commit b8100b5
Showing
16 changed files
with
398 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
.. Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
.. http://www.apache.org/licenses/LICENSE-2.0 | ||
.. Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
====== | ||
Errors | ||
====== | ||
|
||
.. currentmodule:: pyspark.errors | ||
|
||
.. autosummary:: | ||
:toctree: api/ | ||
|
||
PySparkException.getErrorClass | ||
PySparkException.getMessageParameters |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
""" | ||
PySpark exceptions. | ||
""" | ||
from pyspark.errors.exceptions import PySparkException | ||
|
||
|
||
__all__ = [ | ||
"PySparkException", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
import json | ||
|
||
|
||
ERROR_CLASSES_JSON = """ | ||
{ | ||
"COLUMN_IN_LIST": { | ||
"message": [ | ||
"<func_name> does not allow a column in a list." | ||
] | ||
} | ||
} | ||
""" | ||
|
||
ERROR_CLASSES_MAP = json.loads(ERROR_CLASSES_JSON) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
from typing import Dict, Optional, cast | ||
|
||
from pyspark.errors.utils import ErrorClassesReader | ||
|
||
|
||
class PySparkException(Exception): | ||
""" | ||
Base Exception for handling errors generated from PySpark. | ||
""" | ||
|
||
def __init__( | ||
self, | ||
message: Optional[str] = None, | ||
error_class: Optional[str] = None, | ||
message_parameters: Optional[Dict[str, str]] = None, | ||
): | ||
# `message` vs `error_class` & `message_parameters` are mutually exclusive. | ||
assert (message is not None and (error_class is None and message_parameters is None)) or ( | ||
message is None and (error_class is not None and message_parameters is not None) | ||
) | ||
|
||
self.error_reader = ErrorClassesReader() | ||
|
||
if message is None: | ||
self.message = self.error_reader.get_error_message( | ||
cast(str, error_class), cast(Dict[str, str], message_parameters) | ||
) | ||
else: | ||
self.message = message | ||
|
||
self.error_class = error_class | ||
self.message_parameters = message_parameters | ||
|
||
def getErrorClass(self) -> Optional[str]: | ||
""" | ||
Returns an error class as a string. | ||
.. versionadded:: 3.4.0 | ||
See Also | ||
-------- | ||
:meth:`PySparkException.getMessageParameters` | ||
""" | ||
return self.error_class | ||
|
||
def getMessageParameters(self) -> Optional[Dict[str, str]]: | ||
""" | ||
Returns a message parameters as a dictionary. | ||
.. versionadded:: 3.4.0 | ||
See Also | ||
-------- | ||
:meth:`PySparkException.getErrorClass` | ||
""" | ||
return self.message_parameters | ||
|
||
def __str__(self) -> str: | ||
return f"[{self.getErrorClass()}] {self.message}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# -*- encoding: utf-8 -*- | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
import unittest | ||
|
||
from pyspark.errors.utils import ErrorClassesReader | ||
|
||
|
||
class ErrorsTest(unittest.TestCase): | ||
def test_error_classes(self): | ||
# Test error classes is sorted alphabetically | ||
error_reader = ErrorClassesReader() | ||
error_class_names = error_reader.error_info_map | ||
for i in range(len(error_class_names) - 1): | ||
self.assertTrue( | ||
error_class_names[i] < error_class_names[i + 1], | ||
f"Error class [{error_class_names[i]}] should place" | ||
f"after [{error_class_names[i + 1]}]", | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
import unittest | ||
from pyspark.errors.tests.test_errors import * # noqa: F401 | ||
|
||
try: | ||
import xmlrunner | ||
|
||
testRunner = xmlrunner.XMLTestRunner(output="target/test-reports", verbosity=2) | ||
except ImportError: | ||
testRunner = None | ||
unittest.main(testRunner=testRunner, verbosity=2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
import re | ||
from typing import Dict | ||
|
||
from pyspark.errors.error_classes import ERROR_CLASSES_MAP | ||
|
||
|
||
class ErrorClassesReader: | ||
""" | ||
A reader to load error information from error_classes.py. | ||
""" | ||
|
||
def __init__(self) -> None: | ||
self.error_info_map = ERROR_CLASSES_MAP | ||
|
||
def get_error_message(self, error_class: str, message_parameters: Dict[str, str]) -> str: | ||
""" | ||
Returns the completed error message by applying message parameters to the message template. | ||
""" | ||
message_template = self.get_message_template(error_class) | ||
# Verify message parameters. | ||
message_parameters_from_template = re.findall("<([a-zA-Z0-9_-]+)>", message_template) | ||
assert set(message_parameters_from_template) == set(message_parameters), ( | ||
f"Undifined error message parameter for error class: {error_class}. " | ||
f"Parameters: {message_parameters}" | ||
) | ||
table = str.maketrans("<>", "{}") | ||
|
||
return message_template.translate(table).format(**message_parameters) | ||
|
||
def get_message_template(self, error_class: str) -> str: | ||
""" | ||
Returns the message template for corresponding error class from error_classes.py. | ||
For example, | ||
when given `error_class` is "EXAMPLE_ERROR_CLASS", | ||
and corresponding error class in error_classes.py looks like the below: | ||
.. code-block:: python | ||
"EXAMPLE_ERROR_CLASS" : { | ||
"message" : [ | ||
"Problem <A> because of <B>." | ||
] | ||
} | ||
In this case, this function returns: | ||
"Problem <A> because of <B>." | ||
For sub error class, when given `error_class` is "EXAMPLE_ERROR_CLASS.SUB_ERROR_CLASS", | ||
and corresponding error class in error_classes.py looks like the below: | ||
.. code-block:: python | ||
"EXAMPLE_ERROR_CLASS" : { | ||
"message" : [ | ||
"Problem <A> because of <B>." | ||
], | ||
"subClass" : { | ||
"SUB_ERROR_CLASS" : { | ||
"message" : [ | ||
"Do <C> to fix the problem." | ||
] | ||
} | ||
} | ||
} | ||
In this case, this function returns: | ||
"Problem <A> because <B>. Do <C> to fix the problem." | ||
""" | ||
error_classes = error_class.split(".") | ||
len_error_classes = len(error_classes) | ||
assert len_error_classes in (1, 2) | ||
|
||
# Generate message template for main error class. | ||
main_error_class = error_classes[0] | ||
if main_error_class in self.error_info_map: | ||
main_error_class_info_map = self.error_info_map[main_error_class] | ||
else: | ||
raise ValueError(f"Cannot find main error class '{main_error_class}'") | ||
|
||
main_message_template = "\n".join(main_error_class_info_map["message"]) | ||
|
||
has_sub_class = len_error_classes == 2 | ||
|
||
if not has_sub_class: | ||
message_template = main_message_template | ||
else: | ||
# Generate message template for sub error class if exists. | ||
sub_error_class = error_classes[1] | ||
main_error_class_subclass_info_map = main_error_class_info_map["subClass"] | ||
if sub_error_class in main_error_class_subclass_info_map: | ||
sub_error_class_info_map = main_error_class_subclass_info_map[sub_error_class] | ||
else: | ||
raise ValueError(f"Cannot find sub error class '{sub_error_class}'") | ||
|
||
sub_message_template = "\n".join(sub_error_class_info_map["message"]) | ||
message_template = main_message_template + " " + sub_message_template | ||
|
||
return message_template |
Oops, something went wrong.