Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align DataFrame boilerplate in Python and Scala #366

Closed
lintool opened this issue Oct 20, 2019 · 2 comments
Closed

Align DataFrame boilerplate in Python and Scala #366

lintool opened this issue Oct 20, 2019 · 2 comments
Assignees

Comments

@lintool
Copy link
Member

lintool commented Oct 20, 2019

Currently, the Scala DF boilerplate is something like:

RecordLoader.loadArchives("src/test/resources/warc/example.warc.gz", sc).extractValidPagesDF()
  .select($"Url")
  .show(20, false)

And Python:

WebArchive(sc, sqlContext, "src/test/resources/warc/example.warc.gz").pages() \
    .select("url") \
    .show(20, False)

It would make sense to align?

So Python would look like:

RecordLoader.loadArchives("src/test/resources/warc/example.warc.gz", sc, sqlContext).pages() \
    .select("url") \
    .show(20, False)

And from the Scala end, let's just change extractValidPagesDF() to pages() to match the Python end? This runs the slight risk of confusion with RDD operations, but I think the risk is minimal.

@ianmilligan1 @ruebot thoughts?

@ruebot
Copy link
Member

ruebot commented Oct 20, 2019

That was the general consensus in #231

@lintool
Copy link
Member Author

lintool commented Oct 25, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants