Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Html repr #3425

Merged
merged 42 commits into from
Oct 24, 2019
Merged

Html repr #3425

merged 42 commits into from
Oct 24, 2019

Conversation

jsignell
Copy link
Contributor

@jsignell jsignell commented Oct 21, 2019

This PR supersedes #1820 - see that PR for original discussion. See this gist to try out the new MultiIndex and options functionality.

TODO:

  • Add support for Multi-indexes
  • Probably good to have some opt-in or fail back system in case where we (or users) know that the rendering will not work
  • Add some tests

@pep8speaks
Copy link

pep8speaks commented Oct 21, 2019

Hello @jsignell! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-10-24 15:59:14 UTC

@jsignell
Copy link
Contributor Author

I don't really like how the attrs look without the alignment of keys and values. I will think about how to make that more legible

@dcherian
Copy link
Contributor

Honestly this is looking super good and ready to merge already!

Thanks @jsignell

The only thing that looks funny to me is the alternating black-gray text coloring below:
image

@jsignell
Copy link
Contributor Author

Yeah I agree that is a strange part of the original PR. The version in the fiddle doesn't do that, so I'd really like to hear back from @benbovy about whether the fiddle css is more uptodate.

@rabernat
Copy link
Contributor

Julia, it's fantastic to see you working on this! Thanks so much!

@shoyer
Copy link
Member

shoyer commented Oct 22, 2019

This looks fantastic! Really great to have you working on this!

Some specific feedback:

  1. Yes, I agree it would be nice to align the Attributes text, maybe this could make use of an HTML table?
  2. Maybe bold would look better than underline for coordinates that are dimensions? I find the underline makes text a little hard to read. Happy to leave this up to you, though.
  3. On my Ubuntu desktop, the "+" icon is bright yellow, which I'm guessing is not what was intended. Maybe it would better to stick with the three circle "data" icon?
    image
  4. What happens if there is a really long variable name? I had a version of the jsfiddle that expanded names when you hovered over them.

xarray/core/formatting_html.py Outdated Show resolved Hide resolved
d['id'] = 'section-' + str(uuid.uuid4())

# TODO: no value preview if not in memory
d['preview'] = format_values_preview(obj.values, max_char=70)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same concern as above

xarray/core/formatting_html.py Outdated Show resolved Hide resolved
xarray/core/formatting_html.py Outdated Show resolved Hide resolved
@benbovy
Copy link
Member

benbovy commented Oct 22, 2019

This is really nice @jsignell !

I'd really like to hear back from @benbovy about whether the fiddle css is more uptodate.

We did a lot of experiments, even some after starting the implementation in #1820, so I admit this is quite messy. I also did some clean-up and tweaks directly in that PR, so unfortunately there is no fiddle of reference that exactly corresponds to what's in here. Sorry for that!

The html/css in this PR is the cleanest one, I think.

The latest fiddles (for Dataset) are these two ones:

the only thing that looks funny to me is the alternating black-gray text coloring

In the HTML repr, the width of the space character is less large that for the monospace font used in the text repr, so I originally used this trick to better distinguish between the values in the inline data repr. Now that each value is encapsulated in it's own span, we could tweak the padding instead (or both... I don't have strong opinion on this).

@benbovy
Copy link
Member

benbovy commented Oct 22, 2019

Just one small issue in the gist with the vertical alignment within a variable row, and the height of the icon container (I'm using Firefox):

Screenshot 2019-10-22 at 09 33 04

@jsignell
Copy link
Contributor Author

Yeah the data preview is also shifted vertically.

@jsignell
Copy link
Contributor Author

I just updated the gist with these changes:

  • got rid of alternating data colors in preview
  • fixed icon alignment in row, and rest of data in row
  • got rid of some not-useful spans
  • changed from underline to bold for index coords

@jsignell
Copy link
Contributor Author

Is there ever a situation in which dimensions would be expandable? It seems like it is hard-coded to collapsed.

I am trying to generalize the css to reduce the length.

@shoyer
Copy link
Member

shoyer commented Oct 22, 2019

Is there ever a situation in which dimensions would be expandable? It seems like it is hard-coded to collapsed.

I don't think it would ever be expanded. The collapsed icon is just there for visual consistency.

@crusaderky crusaderky mentioned this pull request Oct 22, 2019
12 tasks
@jsignell
Copy link
Contributor Author

Ok I think I've addressed all the comments. I just updated the gist with the latest output. Tomorrow I'll start adding tests. The css is now ~300 lines and it gets injected everytime the dataset is rendered. I'm not sure if that is considered too long and if it is I don't quite know what to do about it.

@shoyer
Copy link
Member

shoyer commented Oct 22, 2019

The css is now ~300 lines and it gets injected everytime the dataset is rendered. I'm not sure if that is considered too long and if it is I don't quite know what to do about it.

You could consider running the CSS through some sort of preprocessing tool to minify it, e.g., to remove all the spaces. But my guess is that it's probably fine.

It would be informative to measure the size of the output HTML encoded as UTF-8 bytes for some example datasets. For context, a simple line chart from matplotlib is ~20 KB when saved as a PNG image. Anything in that ball-park would be OK, up to perhaps a few hundred KB for really big datasets with lots of variables.

@@ -134,6 +134,12 @@ def __array__(self: Any, dtype: DTypeLike = None) -> np.ndarray:
def __repr__(self) -> str:
return formatting.array_repr(self)

def _repr_html_(self):
if OPTIONS["display_style"] == "classic":
classic = repr(self).replace("<", "&lt;").replace(">", "&gt;")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def _repr_html_(self):
if OPTIONS["display_style"] == "classic":
classic = repr(self).replace("<", "&lt;").replace(">", "&gt;")
return "<pre>{repr}</pre>".format(repr=classic)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use an f-string here:

Suggested change
return "<pre>{repr}</pre>".format(repr=classic)
return f"<pre>{classic}</pre>"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that xarray was already supporting >=3.6 👍

setup.py Outdated Show resolved Hide resolved
@shoyer
Copy link
Member

shoyer commented Oct 22, 2019

I just gave this a try in Chrome / Google Colab. It's looking great!

Here's the rasm dataset. It weighs in at exactly 20 KB!
image

The alignment of "Dimensions" line does looks a little off -- would that be easy to fix?

There's also a strange alignment thing with the three circle icon in array repr, which maybe could be moved to the right side?
image

@shoyer
Copy link
Member

shoyer commented Oct 23, 2019

We probably should be careful to use html.escape() pervasively to protect against malformed data. We can trust user code, but probably should not trust all data -- it should be safe to load and view arbitrary netCDF files in xarray.

@codecov
Copy link

codecov bot commented Oct 23, 2019

Codecov Report

Merging #3425 into master will decrease coverage by 1.09%.
The diff coverage is 23.12%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #3425     +/-   ##
=========================================
- Coverage   96.16%   95.06%   -1.1%     
=========================================
  Files          65       66      +1     
  Lines       13436    13587    +151     
=========================================
- Hits        12921    12917      -4     
- Misses        515      670    +155
Impacted Files Coverage Δ
xarray/core/options.py 97.91% <100%> (+0.09%) ⬆️
xarray/core/dataset.py 95.89% <20%> (-0.23%) ⬇️
xarray/core/formatting_html.py 21.05% <21.05%> (ø)
xarray/core/common.py 91.92% <42.85%> (-1.29%) ⬇️
xarray/convert.py 100% <0%> (ø) ⬆️
xarray/plot/facetgrid.py 96.73% <0%> (ø) ⬆️
xarray/plot/plot.py 96.38% <0%> (ø) ⬆️
xarray/backends/rasterio_.py 96.85% <0%> (ø) ⬆️
xarray/plot/utils.py 97.74% <0%> (ø) ⬆️
xarray/core/resample_cftime.py 91.48% <0%> (ø) ⬆️
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0c336f...3a9edae. Read the comment docs.

@shoyer
Copy link
Member

shoyer commented Oct 23, 2019

Here's what we're currently looking at!
image

@jsignell
Copy link
Contributor Author

I'm still worried about size, but minifying the css and even minifying the html did not make a big difference. It looks like the html_repr for rasm is 38kb .

@shoyer
Copy link
Member

shoyer commented Oct 23, 2019

It looks like the html_repr for rasm is 38kb .

How are you measuring this?

I see:

>>> len(rasm._repr_html_().encode('utf8'))
19624

@jsignell
Copy link
Contributor Author

jsignell commented Oct 23, 2019

Oh I was looking at bytes not len. I didn't have .encode('utf8')

@jsignell
Copy link
Contributor Author

Ok I am going to write some tests now. I updated the gist. Do people like the names "classic" and "html" for the display_style options? Another option would be "classic" and "rich".

@benbovy
Copy link
Member

benbovy commented Oct 23, 2019

Or maybe "text" and "html"?

Hopefully "html" will eventually become the classic display :)

@jsignell
Copy link
Contributor Author

I added some tests. I can add more if anyone has a test case that they want to be sure is covered.

@jsignell
Copy link
Contributor Author

Ok I think it's fixed now

@benbovy
Copy link
Member

benbovy commented Oct 24, 2019

Yes it looks nice!

@jsignell
Copy link
Contributor Author

It isn't too hard to get it working with dark theme, but requires slight tweaks to the colors to align with those provided as vars by jlab.

@dcherian
Copy link
Contributor

Awesome. Thanks @jsignell Let's leave dark theme compatibility for a future PR.

@dcherian dcherian merged commit ba48fbc into pydata:master Oct 24, 2019
@shoyer
Copy link
Member

shoyer commented Oct 24, 2019

Should we expand the .data section of the DataArray HTML repr by default? That would look a little more consistent with the text repr.

@benbovy
Copy link
Member

benbovy commented Oct 24, 2019

Thanks a lot @jsignell ! It is great to see this merged finally!

@jsignell jsignell deleted the html_repr branch October 24, 2019 16:58
@jsignell jsignell mentioned this pull request Oct 24, 2019
4 tasks
@mrocklin
Copy link
Contributor

Thanks @jsignell (and all). I'm really jazzed about this.

@choldgraf
Copy link

choldgraf commented Oct 24, 2019

this is awesome! I was so excited about it that I updated my little blog post with the latest xarray master https://predictablynoisy.com/xarray-explore-ieeg

I was a little sad at the gigantic repr that came out of my DataArray, and now it is nice and tidy html!

@benbovy
Copy link
Member

benbovy commented Oct 24, 2019

Cool!

I guess this post about how you use notebooks in your blog is still up to date?

I looks like we'll need to fix a couple of issues (the channel coordinate looks weird), but maybe it is not related to the front-end.

Besides variable names, we probably need to somehow handle overflow for variables that have a long list of dimension labels.

@choldgraf
Copy link

choldgraf commented Oct 24, 2019

Actually, no that's totally out of date now haha. Thanks for reminding me I should update that

edit: realized I should have said what I'm actually using - I'm using this:https://jupyterbook.org/features/page.html

@jsignell
Copy link
Contributor Author

I looks like we'll need to fix a couple of issues (the channel coordinate looks weird), but maybe it is not related to the front-end.

I am opening a PR. I didn't escape dtypes and this dtype is <U5 which html does terribly with

@shoyer
Copy link
Member

shoyer commented Oct 24, 2019

To verify proper escaping, we could validate the generated HTML as XML:
https://stackoverflow.com/questions/13742538/how-to-validate-xml-using-python-without-third-party-libs

This could be a good use case for Hypothesis. Or you could manually build a dataset with variables of every dtype.

@jsignell jsignell mentioned this pull request Oct 24, 2019
4 tasks
@jsignell
Copy link
Contributor Author

I opened a PR and included a small test. It can probably go in while we work on a more complete testing infrastructure.

@jsignell
Copy link
Contributor Author

Besides variable names, we probably need to somehow handle overflow for variables that have a long list of dimension labels.

These are handled in the same way as long variable names, they truncate with an ellipsis and then expand on hover.

@benbovy
Copy link
Member

benbovy commented Oct 25, 2019

These are handled in the same way as long variable names, they truncate with an ellipsis and then expand on hover.

Oh yes I missed it.

dcherian added a commit to dcherian/xarray that referenced this pull request Oct 25, 2019
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 25, 2019
…e-multiple-dims

* upstream/master:
  change ALL_DIMS to equal ellipsis (pydata#3418)
  Escaping dtypes (pydata#3444)
  Html repr (pydata#3425)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 28, 2019
* upstream/master:
  Another groupby.reduce bugfix. (pydata#3403)
  add icomoon license (pydata#3448)
  change ALL_DIMS to equal ellipsis (pydata#3418)
  Escaping dtypes (pydata#3444)
  Html repr (pydata#3425)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 29, 2019
* upstream/master:
  upgrade black verison to 19.10b0 (pydata#3456)
  Remove outdated code related to compatibility with netcdftime (pydata#3450)
  Remove deprecated behavior from dataset.drop docstring (pydata#3451)
  jupyterlab dark theme (pydata#3443)
  Drop groups associated with nans in group variable (pydata#3406)
  Allow ellipsis (...) in transpose (pydata#3421)
  Another groupby.reduce bugfix. (pydata#3403)
  add icomoon license (pydata#3448)
  change ALL_DIMS to equal ellipsis (pydata#3418)
  Escaping dtypes (pydata#3444)
  Html repr (pydata#3425)
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 30, 2019
commit bc39877
Merge: 507b1f6 278d2e6
Author: dcherian <deepak@cherian.net>
Date:   Tue Oct 29 09:36:30 2019 -0600

    Merge remote-tracking branch 'upstream/master' into dask-tokenize

    * upstream/master:
      upgrade black verison to 19.10b0 (pydata#3456)
      Remove outdated code related to compatibility with netcdftime (pydata#3450)
      Remove deprecated behavior from dataset.drop docstring (pydata#3451)
      jupyterlab dark theme (pydata#3443)
      Drop groups associated with nans in group variable (pydata#3406)
      Allow ellipsis (...) in transpose (pydata#3421)
      Another groupby.reduce bugfix. (pydata#3403)
      add icomoon license (pydata#3448)
      change ALL_DIMS to equal ellipsis (pydata#3418)
      Escaping dtypes (pydata#3444)
      Html repr (pydata#3425)

commit 507b1f6
Author: dcherian <deepak@cherian.net>
Date:   Tue Oct 29 09:34:47 2019 -0600

    Fix window test

commit 4ab6a66
Author: dcherian <deepak@cherian.net>
Date:   Thu Oct 24 14:30:57 2019 -0600

    Implement __dask_tokenize__
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

html repr of xarray object (for the notebook)
10 participants