Skip to content

Conversation

shujingyang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds missing context manager support (__enter__ and __exit__ methods) to the Spark Connect SparkSession class to enable with statement usage for automatic resource management.

The regular PySpark SparkSession already supports context manager syntax like:

with SparkSession.builder.master("local").getOrCreate() as session:
    session.range(5).show()

However, this functionality was missing from the Spark Connect implementation, causing inconsistency between the two APIs.

Why are the changes needed?

  1. API Consistency: The Spark Connect SparkSession should have feature parity with the regular PySpark SparkSession
  2. Resource Management: Context manager support enables automatic cleanup of sessions, preventing resource leaks
  3. User Experience: Provides a more Pythonic way to handle session lifecycle management
  4. Documentation Compliance: The existing docstrings in regular SparkSession show examples using context manager syntax, but this didn't work in Spark Connect mode

Does this PR introduce any user-facing change?

Yes, this PR enables new functionality for Spark Connect users:

Before this PR:

# This would fail in Spark Connect mode
with SparkSession.builder.remote("sc://localhost").getOrCreate() as session:
    session.range(5).show()
# AttributeError: __enter__ method missing

After this PR:

# This now works in Spark Connect mode
with SparkSession.builder.remote("sc://localhost").getOrCreate() as session:
    session.range(5).show()
# Session is automatically stopped when exiting the with block

How was this patch tested?

UTs

Was this patch authored or co-authored using generative AI tooling?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants