Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Juno Display #1121

Merged
merged 13 commits into from
Nov 13, 2016
Merged

Juno Display #1121

merged 13 commits into from
Nov 13, 2016

Conversation

MikeInnes
Copy link
Contributor

@MikeInnes MikeInnes commented Nov 2, 2016

This patch adds an extremely lightweight dependency on Juno.jl, and implements rich display inside of Juno, like so:

screenshot 2016-11-02 19 11 33

In future I'd like to improve on this by having access to the entire dataframe rather than truncating it, but this is something for now.

const SIZE = 25

@render Inline frame::AbstractDataFrame begin
width = min(size(frame, 2), SIZE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use 4 spaces for indentation for consistency with the rest of the repo

@nalimilan
Copy link
Member

This is nice, but why does DataFrames need to depend on Juno for that? I'd rather put this into a separate package that Juno would use. It doesn't sound right to add dependencies on each GUI which will be developed in the future, especially since DataFrames is a dependency of a lot of packages.

width = min(size(frame, 2), SIZE)
height = min(size(frame, 1), SIZE)
header = map(x->strong(string(x)), names(frame)[1:width]')
body = hcat([frame[1:height,i] for i = 1:width]...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splatting vectors of arbitrary length is generally not a good idea as it will trigger compilation of specialized methods without a clear gain. I think you can just use Array(frame[1:height, 1:width]) here.

@MikeInnes
Copy link
Contributor Author

MikeInnes commented Nov 2, 2016

See for example the discussion over at Gadfly: GiovineItalia/Gadfly.jl#902.

To a large extent this is a stop gap; we'll be able to remove these kinds of dependencies once conditional modules are working. Until then, the only realistic way for two large modules A and B to communicate, without one depending on the other, is to implement a small interface package that both depend on.

As long as the package is a small pure-Julia dependency there's very little downside. We're also working on making these interfaces more general (see e.g. WebDisplay.jl which Juno will move towards) so that a single dependency can provide an interface to multiple frontends. It'll just take a couple of iterations to get to the ideal situation.

@MikeInnes
Copy link
Contributor Author

MikeInnes commented Nov 3, 2016

Actually, there is an alternative, which is for DataFrames to provide the minimal interface rather than Juno. If there was a DataFramesCore.jl package which exported a minimal set of definitions (AbstractDataFrame alone may be sufficient in this case) without the functionality and dependencies, then Juno could depend on that. This could also be useful for packages like Gadfly and reduce the number of packages with hard dependencies on DataFrames generally.

@ViralBShah
Copy link
Contributor

I love the rich display!

@nalimilan
Copy link
Member

Actually that interface is precisely being developed right now at https://github.com/davidagold/AbstractTables.jl So I'd rather wait for it to be ready; anyway we won't make a new release without it (except maybe for bug fixes).

@MikeInnes
Copy link
Contributor Author

That package seems pretty substantial to me. If it provides a bulk of functionality (i.e. NullableArrays and StructuredQueries), as opposed to just the minimal interface, it's not what I mean, though it may be the right direction.

Aside from that, it doesn't seem like the DataFrames overhaul will be ready for some time. Can we figure something out for the existing release? If you have concrete concerns about the effects of this patch I can try to address those.

@davidagold
Copy link

If the dependency is lightweight and Julia only, then I support this PR.

I know that in principle we should be looking to remove dependencies from DF rather than add them. However, I consider the following mitigating points in this case: (i) integration of DataFrame and the Julia IDE is something I think we should encourage; (ii) there’s a clear path towards removing the dependency eventually (via AbstractTables or conditional module requirements); and (iii) there is evidently a plan to prevent the proliferation of such dependencies via WebDisplays.jl.

Re: AbstractTables.jl. This package could be a solution for this and related interfacing issues, but I've been questioning lately whether or not AbstractTable is the right abstraction. It serves functionality by means of inheritance (AbstractTable being an abstract type), but arguably an interface would serve better. For instance, ideally CSV.Sources would be able to rendered in Juno using the same rich display, but it's unnecessarily restrictive to require that CSV.Source <: AbstractTable. This will take time to think about and resolve. So for now, I think the present proposal is appropriate.

@@ -579,3 +579,18 @@ function showcols(io::IO, df::AbstractDataFrame) # -> Void
end

showcols(df::AbstractDataFrame) = showcols(STDOUT, df) # -> Void

using Juno
import Juno: Inline, LazyTree, Table, Row, strong
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need import rather than using here?


const SIZE = 25

@render Inline frame::AbstractDataFrame begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frame is more commonly called df.

using Juno
import Juno: Inline, LazyTree, Table, Row, strong

const SIZE = 25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to allow users to tweak this somewhere in Juno?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but we'd have to implement a Juno.jl API for that. I think a better approach would be to extend this view to be capable of showing the entire dataframe at once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Though showing the full data frame could be quite problematic if it contains one million rows and one hundred columns. Better take a subset first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if we lazy-load it as needed

@nalimilan
Copy link
Member

OK, let's go with that then until we have optional dependencies. Though please add tests for the new feature.

@quinnj
Copy link
Member

quinnj commented Nov 7, 2016

FWIW, I'm not jazzed about this, purely for the dependency reasons. I'd much rather create a separate package for now that can ship w/ a Juno bundle or Atom bundle by default that provides the functionality and requires DataFrames and Juno.

That and I think we need to at least play the part of "squeaky wheel" in Base for conditional modules and/or help in any way we can to push that forward. Bloating one of the most popular Julia packages with a "requirement" on optional stuff just seems like a cop out.

@MikeInnes
Copy link
Contributor Author

It's inherently somewhat tricky to test conditional dependencies (with language support or not), since by definition the functionality is not available unless both packages are loaded. For now, I've added a simple test that Juno will be able to call the method; if you want I can add a test dependency on Atom.jl and add more comprehensive testing.

@nalimilan
Copy link
Member

If we add this feature, we should bite the bullet and test it thoroughly even if that means adding more requirements. Test dependencies are not so much a concern as runtime dependencies anyway.

@MikeInnes
Copy link
Contributor Author

MikeInnes commented Nov 7, 2016

That and I think we need to at least play the part of "squeaky wheel" in Base for conditional modules and/or help in any way we can to push that forward.

FWIW, supporting the use cases for this feature is a great way to do that. Having packages in existence where the feature provides a clear immediate benefit (fewer deps) is far more compelling than a few people saying "wouldn't it be nice if...". It also helps for figuring out the implementation issues and design (explanation by extrication and all that).

@MikeInnes
Copy link
Contributor Author

Ok, better test added.

@nalimilan
Copy link
Member

Thanks. The test failure is real, though.


const SIZE = 25

function tomat(df::AbstractDataFrame)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is essentially convert(Matrix, ...), except that it doesn't throw when undefined entries are present. Better improve the existing method then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert(Matrix, ...) also throws on NAs so this would involve a change of semantics – are you sure that's what you want?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed. Let's keep it that way, then, but please remove tomat to something more explicit (to_matrix?), as this one reads too much like 🍅.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, will go with a less fruity name :)

@MikeInnes
Copy link
Contributor Author

Tests will require JuliaLang/METADATA.jl#6973 to pass

@@ -6,3 +6,4 @@ SortingAlgorithms
Reexport
Compat 0.8.4
FileIO 0.1.2
Juno
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests require a specific version of Juno to be able to pass, that should be set as the minimum required Juno version.

function tomat(df::AbstractDataFrame)
res = Array{Any}(size(df))
for (j, col) in enumerate(columns(df)), i = 1:length(col)
isassigned(col, i) && (res[i, j] = col[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indent

@@ -579,3 +579,26 @@ function showcols(io::IO, df::AbstractDataFrame) # -> Void
end

showcols(df::AbstractDataFrame) = showcols(STDOUT, df) # -> Void

using Juno
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, redundant with following line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using will import all of the symbols exported from Juno, but using Juno: stuff gets specific, potentially unexported symbols, so it's not entirely redundant, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right. only other thing that should be needed that isn't listed yet is @render, so adding that could replace the first line.

Mike, please communicate in sentences, single word responses (and commit messages) are not an effective way of justifying anything.

@@ -35,4 +38,7 @@ module TestShow
show(io, A)
A = DataFrames.RepeatedVector([1, 2, 3], 1, 5)
show(io, A)

render(Juno.Editor(), df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should try to verify the result is correct instead of just that the method doesn't error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeInnes Is there any way of doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of this is a JSON representation designed to be consumed by the frontend; in other words it's an implementation detail that's not part of the API. I'm assuming you don't want the DataFrames tests to break if/when Juno's implementation changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't have to compare the exact output, but anything that the code in this package should be adding to the representation should be tested for as a sanity check. otherwise it could easily be broken by accident and regress here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this sounds like another indication that this code shouldn't live in DataFrames.jl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see why. The important thing on the DataFrames side is that it's using Juno.jl's API correctly, and calling the function tests that. All of the actual functionality and logic that could regress is on the Juno side and can be tested there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've added a test for the implementation. Given that there's no logic it's equivalent to checking the source code hasn't changed, but whatever, it's a test.

@nalimilan
Copy link
Member

Any idea why tests failed only on 0.4?

@MikeInnes
Copy link
Contributor Author

Ok, added a test guard for 0.4 – Juno only calls this method on 0.5 anyway.

render(Juno.Editor(), df)
if VERSION > v"0.5-"
using Atom, Juno
render(Juno.Editor(), df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indent here

@ararslan
Copy link
Member

ararslan commented Nov 7, 2016

Tony had requested--and I agree--that more descriptive commit messages be used, but the two most recent messages are "four word commit message" and just ".". The commits should be squashed on merge, but in case someone mistakenly merges without squashing, it would be nice if the commit messages at least vaguely reflected the changes implemented. I've been guilty of this as well, but I've been trying to be better about it. Anyway, just something to consider. 😉

@MikeInnes
Copy link
Contributor Author

The commit log in its current form is not a good way to convey intent regardless of the messages, because there's little ordering and it's mixed with lots of noise from small fixups. If the DataFrames maintainers particularly value having a pretty commit log (perhaps they want to hang it on their wall or something?) I'm more than happy to rebase once the patch is ready to go.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum
ornare, purus vel consequat consectetur, urna enim imperdiet sem, ut
dapibus nibh nisi quis dolor. Pellentesque erat libero, ornare vitae
volutpat vitae, viverra eget ante. Ut tortor nulla, suscipit eget
facilisis vitae, fermentum non metus. Praesent sed porttitor enim, et
commodo justo. Suspendisse vitae turpis in ipsum tempor placerat. Donec
varius, lectus quis ultricies lobortis, purus orci rhoncus justo, id
gravida mauris lacus porta sem. Pellentesque tincidunt felis porttitor
volutpat luctus. Sed lorem enim, varius eget iaculis non, ultrices
efficitur libero. Maecenas cursus risus metus, ut cursus purus iaculis
vel. Praesent maximus viverra nunc vel tempor. Phasellus finibus cursus
elementum.

Vivamus non erat mattis lorem lacinia convallis. Nullam commodo, tortor
et porttitor interdum, velit dolor feugiat magna, eu egestas magna
lacus et lorem. Curabitur rhoncus tellus nec pharetra euismod. Maecenas
ut lorem nisl. Phasellus elementum massa mi, id ullamcorper leo
fringilla dapibus. Nullam sit amet massa at diam porta porttitor. Cras
lacinia pellentesque mauris, non vestibulum quam tempus tincidunt.
Integer convallis lectus non nisl tincidunt, et posuere tortor aliquet.
Mauris sodales lacus at malesuada eleifend. Phasellus ut sodales massa.
Vivamus aliquet ex eu risus pellentesque, nec sodales massa dictum.
Mauris tincidunt ligula nulla, ut pharetra augue consequat ut. Sed
faucibus venenatis sem vel pretium. Ut bibendum eleifend neque.

Nunc risus risus, vestibulum ac consectetur sit amet, congue eu massa.
Morbi volutpat eros arcu, id sodales erat maximus vitae. Nunc gravida
et felis sit amet ultrices. Etiam congue tortor sapien, eu cursus
tellus pretium quis. Donec finibus lorem mi, sed venenatis metus porta
vel. Nulla non mi cursus, condimentum enim accumsan, dignissim ligula.
In facilisis tortor eu pharetra tincidunt. Nam bibendum massa elementum
enim rhoncus mollis. Nunc volutpat consequat eros, in mollis lorem
vestibulum at. Praesent feugiat, sem in efficitur pharetra, dui massa
luctus purus, id luctus nulla odio ut tellus. Pellentesque porttitor
dictum consectetur. Aenean turpis nunc, bibendum nec enim eu, lobortis
laoreet est. Duis felis mi, molestie ut malesuada in, gravida vel
purus. Nullam vitae placerat enim. Nunc vitae tortor at massa finibus
dapibus in sit amet elit. Ut mollis tellus vel metus laoreet, vitae
lobortis ante tempor.

Nam eget molestie mauris. Nunc sollicitudin tortor ut pellentesque
dictum. Proin faucibus leo eu ex volutpat egestas. Phasellus convallis
vitae nulla non interdum. Donec enim diam, auctor quis nunc et, gravida
molestie felis. Ut facilisis ex in consectetur blandit. Mauris placerat
bibendum diam, a tempor felis tempus in. In elementum, leo quis
scelerisque tempus, tellus augue suscipit dui, vitae hendrerit ante est
sit amet erat.

Fusce sodales magna id elit sollicitudin, sit amet mattis quam
tincidunt. Aliquam erat volutpat. Morbi nec leo imperdiet, tempor nulla
non, rutrum turpis. Phasellus fermentum tristique nibh, sagittis
iaculis sapien iaculis in. Ut tincidunt tristique ligula nec rutrum.
Donec sem ex, sollicitudin non turpis sit amet, ullamcorper euismod
justo. Nunc leo erat, consectetur tincidunt mi a, sodales tristique
elit. Cras purus nulla, ultrices nec molestie sit amet, ultrices eget
purus. Quisque elementum purus tortor, nec egestas quam aliquet vel.
Sed efficitur at nulla vitae scelerisque. Mauris interdum hendrerit
libero blandit dictum. Curabitur lacinia finibus dictum.
@ViralBShah
Copy link
Contributor

I would like to hang the julia commit log on my wall. :-)

@MikeInnes
Copy link
Contributor Author

On reflection – if this integration is barely worth a few extra characters in require, then it's certainly not worth the hours of time I'm now sinking into it. I would have liked to use this to start prototyping things like spreadsheet views, but that's not even a feature I'd use much personally, so if it's all the same to you I'm happy to focus on more productive things and stick with plain text output here indefinitely.

@MikeInnes MikeInnes closed this Nov 9, 2016
@nalimilan
Copy link
Member

Hey, keep cool, looks like the PR is ready now. :-) Sorry if the lack of support for optional dependencies made it more painful than it should.

@quinnj Are you OK with this living in DataFrames until we have a better solution?

@nalimilan nalimilan reopened this Nov 9, 2016
@quinnj
Copy link
Member

quinnj commented Nov 9, 2016

I already "cast my vote", so to speak in saying I'd rather not, but I also realize that it's just one vote. Happy to go along if more people want this in.

@nalimilan
Copy link
Member

Not many people have voted, and I'm myself torn about this. But well, I think I'd rather have this feature leave here for now rather than not exist at all.

@StefanKarpinski
Copy link
Member

I vote for having it here since having it merged will keep it from bitrotting and its presence will prod people to figure out a better way to do it. The perfect is the enemy of the good and all that.

@nalimilan nalimilan merged commit 49da3cc into JuliaData:release-0.8.4 Nov 13, 2016
@nalimilan
Copy link
Member

Thanks @MikeInnes. Could you make the same PR against master so that we don't lose support in the next release?

@MikeInnes
Copy link
Contributor Author

Will do, thanks @nalimilan and others for reviewing and pushing this through.

function _render(df::AbstractDataFrame)
width = min(size(df, 2), SIZE)
height = min(size(df, 1), SIZE)
header = map(x->strong(string(x)), names(df)[1:width]')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

julia> [:foo, :bar]'
WARNING: the no-op `transpose` fallback is deprecated, and no more specific `transpose` method for Symbol exists. Consider `permutedims(x, [2, 1])` or writing a specific `transpose(x::Symbol)` method if appropriate.
1×2 Array{Symbol,2}:
 :foo  :bar

This seems like broken behaviour on Julia's part – I can only transpose arrays of transposable objects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transpose is recursive.

JuliaLang/julia#17834

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transpose is a linear algebraic operation, and recursive on elements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where the error is coming from, but the ultimate behaviour – arrays of Xs are transposable but arrays of Ys are not – seems pretty arbitrary to me. What makes a number transposable, other than as a hack to get array transpose to work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reshape(names(df)[1:width], 1, 2) works, btw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants