Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update readme tracker #37

Merged
merged 1 commit into from
May 24, 2024
Merged

Update readme tracker #37

merged 1 commit into from
May 24, 2024

Conversation

abrassel
Copy link
Contributor

Description

I went through the pyspark documentation and attempted to

  1. Map the docs onto spark connect
  2. Go through the source code and determine which are and aren't implemented.

Related Issue(s)

Documentation

https://spark.apache.org/docs/latest/api/python/index.html

@sjrusso8
Copy link
Owner

Thanks for doing this! Just a few notes for some of the sections.

  1. Lets remove the section for sparkContext and the row from the SparkSession table, it's a JVM attribute and isn't support with spark connect
  2. Update the comment for remote to be refer to Spark Connection connect string and have it linked this page https://github.com/apache/spark/blob/master/connector/connect/docs/client-connection-string.md
  3. I think enableHiveSupport is not supported with spark connect
  4. These under StreamingQuery are implemented.
    • id
    • run_id (should be changed to runId)
    • name
    • awaitTermination
    • lastProgress
    • recentProgress
    • isActive
    • status
  5. These under DataFrameReader are implemented.
    • format
    • load
    • option
    • options
    • table

I'm not sure if UdfRegistration, and UdtfRegistration would be possible in rust. I think each of those depends on the JVM or a specific python function to be serialized and then evaluated on the workers.

@abrassel
Copy link
Contributor Author

I think that we can probably do UDFs if we use pyo3 or equivalent to generate python lambdas

@abrassel
Copy link
Contributor Author

thanks for the feedback @sjrusso8 ! I think I implemented all of the changes.

Copy link
Owner

@sjrusso8 sjrusso8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sjrusso8 sjrusso8 merged commit 32c8f3c into sjrusso8:main May 24, 2024
3 checks passed
@abrassel abrassel deleted the readme_tracker branch May 24, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup Documentation - Spark Core Classes
2 participants