Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48593][PYTHON][CONNECT] Fix the string representation of lambda function #46948

Closed

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Fix the string representation of lambda function

Why are the changes needed?

I happen to hit this bug

Does this PR introduce any user-facing change?

yes

before

In [2]: array_sort("data", lambda x, y: when(x.isNull() | y.isNull(), lit(0)).otherwise(length(y) - length(x)))
Out[2]: ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/core/formatters.py:711, in PlainTextFormatter.__call__(self, obj)
    704 stream = StringIO()
    705 printer = pretty.RepresentationPrinter(stream, self.verbose,
    706     self.max_width, self.newline,
    707     max_seq_length=self.max_seq_length,
    708     singleton_pprinters=self.singleton_printers,
    709     type_pprinters=self.type_printers,
    710     deferred_pprinters=self.deferred_printers)
--> 711 printer.pretty(obj)
    712 printer.flush()
    713 return stream.getvalue()

File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:411, in RepresentationPrinter.pretty(self, obj)
    408                         return meth(obj, self, cycle)
    409                 if cls is not object \
    410                         and callable(cls.__dict__.get('__repr__')):
--> 411                     return _repr_pprint(obj, self, cycle)
    413     return _default_pprint(obj, self, cycle)
    414 finally:

File ~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:779, in _repr_pprint(obj, p, cycle)
    777 """A pprint that just redirects to the normal repr function."""
    778 # Find newlines and replace them with p.break_()
--> 779 output = repr(obj)
    780 lines = output.splitlines()
    781 with p.group():

File ~/Dev/spark/python/pyspark/sql/connect/column.py:441, in Column.__repr__(self)
    440 def __repr__(self) -> str:
--> 441     return "Column<'%s'>" % self._expr.__repr__()

File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:626, in UnresolvedFunction.__repr__(self)
    624     return f"{self._name}(distinct {', '.join([str(arg) for arg in self._args])})"
    625 else:
--> 626     return f"{self._name}({', '.join([str(arg) for arg in self._args])})"

File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:962, in LambdaFunction.__repr__(self)
    961 def __repr__(self) -> str:
--> 962     return f"(LambdaFunction({str(self._function)}, {', '.join(self._arguments)})"

TypeError: sequence item 0: expected str instance, UnresolvedNamedLambdaVariable found

after

In [2]: array_sort("data", lambda x, y: when(x.isNull() | y.isNull(), lit(0)).otherwise(length(y) - length(x)))
Out[2]: Column<'array_sort(data, LambdaFunction(CASE WHEN or(isNull(x_0), isNull(y_1)) THEN 0 ELSE -(length(y_1), length(x_0)) END, x_0, y_1))'>

How was this patch tested?

CI, added test

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the fix_string_rep_lambda branch June 13, 2024 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants