-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Idiom for loading DataFrames #231
Comments
I'm generally agnostic but this pushes me in the camp of having a DF-specific interface. The second syntax example you gave with the |
Fine by me. I can see moving towards strict DataFrames helping out on the AUK side of things. |
+1 for strict dataframes and hiding away RDDs |
👍 We can close this issue after #350 is merged. |
Closed with e32ae17. |
In my original implementation I wrote a
DataFrameLoader
, but it seems to have rapidly fallen out of use... We should decide on the idiom we want for loading DataFrames.Current implementation:
The downside of this is that the user has access to raw RDDs, which is what
loadArchives
returns... this is asking for trouble in mixing RDDs and DFs in unpredictable ways?Another option would be to introduce a DF interface that does not give access to RDDs. Something like:
The other nice feature is that we can have much shorter DF names like
pages
,links
,images
,image_links
, etc. - don't need theDF
part to disambiguate becauseDataFrameLoader
makes this clear. One more nice features is the ability to selectively reduce scope down the road and hide RDDs from the user, as we move completely over to DFs.I'm leaning towards this design, but would be happy to hear opinions from others...
The text was updated successfully, but these errors were encountered: