Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClipData #3

Closed
pdeffebach opened this issue Apr 17, 2021 · 9 comments
Closed

ClipData #3

pdeffebach opened this issue Apr 17, 2021 · 9 comments

Comments

@pdeffebach
Copy link

Hello! reposted from ActuaryUtilities here.

I really like the functions xlcopy() and xlcopy(data). I think it's really awesome to have such easy interoperability with excel.

I'm glad this is in it's own package now, and I'm really sorry I missed that discourse thread announcing this package. That's my bad, I should have done my due diligence some more.

Yesterday I created ClipboardCSV which takes some of the core features from xlcopy and puts them in a standalone package. Here are some examples of how it works

Looking at this package, I think my implementation has a few features that would be really nice to merge into this one.

  1. It interacts better with tables. ClipboardCSV's tabletoclip is designed to work with Tables-compatible objects. While xlcopy is designed to work with arrays. ClipboardCSV provides the arraytoclip for this feature. I think being explicit about the import source is good.
  2. It provides some MWE functions. I do not work with excel a lot, but I would imagine that if you are porting over a complicated workbook to Julia, an easy first step would be to have the tables be objects that you create in the code. This will also make it easy for people to share MWEs on discourse.
  3. I provide some nice printing methods via PrettyTables. I could imagine a user might want to copy and paste tables more easily, in particular a latex table. I am happy to drop this dependency if you think it isn't needed.

One of the benefits of xlcopy only working with arrays is that you don't have to make a decision about the type of table as an output. But CSV.File is a perfectly fine table type that has all the features one needs. Though given that this is designed for interactive use, having DataFrames as a dependency might make sense so we can give people a more fully-featured table type that they are probably using anyways.

Let me know your thoughts!

The same example from before, but now with tabs as delimiters for working with excel and with the new arraytoclip function.

julia> using DataFrames, ClipboardCSV

julia> df = DataFrame(a = [1, 2], b = [3, 4], c = [5, 6], d = [7, 8])
2×4 DataFrame
 Row │ a      b      c      d     
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   11      3      5      7
   22      4      6      8

julia> tabletoclip(df)
a	b	c	d
1	3	5	7
2	4	6	8

julia> cliptotable()
2-element CSV.File{false}:
 CSV.Row: (a = 1, b = 3, c = 5, d = 7)
 CSV.Row: (a = 2, b = 4, c = 6, d = 8)

julia> @tabletomwe df
df = """
a,b,c,d
1,3,5,7
2,4,6,8
""" |> IOBuffer |> CSV.File |> DataFrame



julia> eval(first(Meta.parse(clipboard(), 1)))
2×4 DataFrame
 Row │ a      b      c      d     
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   11      3      5      7
   22      4      6      8

julia> m = Tables.matrix(df)
2×4 Matrix{Int64}:
 1  3  5  7
 2  4  6  8

julia> arraytoclip(m)
1	3	5	7
2	4	6	8
@alecloudenback
Copy link
Owner

I like a lot of the ideas here; one of the things I wanted to add was something like what you've done with the table I/O (ie the header is recognizeable).

One thought - because the argument or lack thereof disambiguates what the user is doing, I could see simplfiying the API:

cliptable() # copies clipboard into Tables.jl compatible objects
cliptable(dataframe) # copies Tables.jl compartible object to clipboard 

# without headers
cliparray() # copies clipboard into array
cliptable(arr) # copies arr to clipboard 

Alternatively (not sure if this syntax would work):

clipdata() # copies clipboard into Tables.jl compatible objects
clipdatadataframe) # copies Tables.jl compartible object to clipboard 

# without headers
clipdata(;header=false) # copies clipboard into array
clipdata(;asarray=true) # alternate of the above line
clipdata(arr) # copies arr to clipboard 

I haven't dug through, but I'm not sure what the meta-programming used in the example above does? Is it integral to what you are doing with the package or just demoing something?

I'd be happy to work towards a combined package (maybe a name like DataClip.jl would be more descriptive and agnostic of the data source)? My goals are to have a really simple in-out of different tools (esp Excel) but also keeping the data very portable (like what you did with using Tables.jl instead of assuming use would want a DataFrame) and not requiring data headers all the time.

@pdeffebach
Copy link
Author

Wonderful! I'm glad you want to work collaboratively on this.

I like your idea w.r.t. dispatch, but I think it's non-trivial. Something can be both a Table and <: AbstractArray. See this lengthy issue in DataFrames. So it's hard to find a rule that "just works". But keyword arguments are a good idea to disambiguate things.

I'm probably partial to different functions for copying somthing to clipboard and taking something from the clipboard. They do sufficiently different things that having them have the same name seems a bit odd.

The metaprogramming bit was so that the MWE in the code would have the same name as the object that's being created.

w.r.t. making it agnostic about the output, we can take the same approach as CSV.read and have

clipdata(DataFrame)

which would return a DataFrame. If empty it would return a CSV.File

@alecloudenback
Copy link
Owner

Couple of questions/thoughts:

clipdata(DataFrame)

Feels a little backwards to me, normally I would expect to take the results of clipdata and pass to DataFrame, like CSV does: DataFrame(CSV.File("..."))

I'm probably partial to different functions for copying somthing to clipboard and taking something from the clipboard. They do sufficiently different things that having them have the same name seems a bit odd.

The standard library function clipboard does use the same name for both directions:

help?> clipboard

  clipboard(x)

  Send a printed form of x to the operating system clipboard ("copy").

  ──────────────────────────────────────────────────────────────────────────────────────

  clipboard() -> AbstractString

  Return a string with the contents of the operating system clipboard ("paste").

I think that one of the really nice things about some of the Julia data ecosystem is the consistency across Julia base. I guess if there's an imposible disambiguation given the more advanced feature set then it could make sense to have different names for the different directions. I haven't worked through the cases though, seems like you might have?

@pdeffebach
Copy link
Author

pdeffebach commented Apr 20, 2021

Great. I think your points are all correct. I have started a new package, called ClipData which incorporates issues raised in this conversation into the API. The new API lives in ClipData.jl and has the following functions

  • tableclip()
  • arrayclip()
  • tablemwe()
  • arraymwe()
  • tableclip(t)
  • arrayclip(t)
  • tablemwe(t)
  • arraymwe(t)

That is to say, we follow the clipboard API as you suggested, using dispatch appropriately.

There is also no dependency on DataFrames. The user gets a CSV.File object and can call DataFrame on that as needed.

However we require a dispatch on arrays. This gets rid ambiguities from AbstractArrays that are also tables. If we were going to require keyword arguments for all those anyways they might as well be separate functions.

Does this API seem good to you?

@alecloudenback
Copy link
Owner

I think what you have makes sense. Looking at the code I see the kwargs passing to CSV is clever and could be useful. Will test it out and report back.

What is mwe in the code/API? The source wasn't very clear to me.

@alecloudenback
Copy link
Owner

alecloudenback commented Apr 20, 2021

I like it! Just tested in with some basic to/from Excel and worked well. Tested a few kwargs and it was nice to, e.g. normalizenames.

Minor suggestion: make the package name consistent with the API. E.g. either:

  • DataClip.jl with functions arrayclip and tableclip
  • ClipData.jl with functions cliparray and cliptable

I can totally envision myself getting it backwards in the future if the order is different between the package names and functions.

Another thing I noticed - when pasting into a spreadsheet with tableclip(t), there is an extra row that gets pasted:
image

Are the mwe functions just testing functionality?

@pdeffebach
Copy link
Author

Ah good call. About both the naming and the extra row at the bottom.

I've made you a collaborator on ClipData.jl. If you have time, feel free to take a stab at either of those issues.

The mwe functions seem like nice things to have, maybe to make a script more workbook-like. So a user can take something from excel and put it in the script directly rather than needing a second file.

@pdeffebach
Copy link
Author

pdeffebach commented Apr 26, 2021

Update: I fixed the problem with the extra new line.

All the functions are re-named.

I also added tests.

Now the next step is to add documentation.

@alecloudenback
Copy link
Owner

Great, thanks for making those updates!

@alecloudenback alecloudenback changed the title ClipboardCSV ClipData Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants