Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard fixes #100

Merged
merged 23 commits into from
Mar 25, 2016
Merged

Dashboard fixes #100

merged 23 commits into from
Mar 25, 2016

Conversation

jbednar
Copy link
Member

@jbednar jbednar commented Mar 4, 2016

This PR adds:

  • Support for Castra-format files (via dask imported dynamically, thus not adding a required dependency)
  • Support for plotting pure counts (rows in the data, e.g. taxi trips, not associated with any particular field)
  • Support for plotting counts for the census data

It's also close to adding support for colorization by census race categories, but I'm not sure how to add that in a general way. Below are some trivial diffs that are sufficient to get racial color categories shown, in a hardcoded way that removes support for non-categorical data. With this PR and those diffs applied, there will initially be an error on startup because the default Field is counts in census.yml, but if the Field is then changed to Race to match the new Count Categories aggregate declared below, it should work.

So, @brendancol, can you take it from here? Can you make dashboard.py support categorical information where appropriate? I've added the race colors to the census.yml file already, but it would take me a while to figure out how to look that information up when needed, based on the "cat_colors" pointer I added to census.yml (but which could be changed if needed). There will also of course need to be some logic to switch between tf.interpolate and tf.colorize.

If we want to avoid errors for nonsensical combinations, just as for the earliest client.py file from ages ago, we'll presumably need to start declaring datatypes for these objects so that the buttons generally work rather than generally fail...

0045-jbednar:~/datashader/examples> git diff
diff --git a/examples/dashboard/dashboard.py b/examples/dashboard/dashboard.py
index 5d304de..0c32239 100644
--- a/examples/dashboard/dashboard.py
+++ b/examples/dashboard/dashboard.py
@@ -55,8 +55,8 @@ class GetDataset(RequestHandler):
                          self.model.active_axes[1],
                          self.model.active_axes[2],
                          self.model.aggregate_function(self.model.field))
-        pix = tf.interpolate(agg, (255, 204, 204), 'red',
-                             how=self.model.transfer_function)
+        color_field = 'race_colors'
+        pix = tf.colorize(agg, self.model.config[color_field])

         # serialize to image
         img_io = pix.to_bytesio()
@@ -72,6 +72,7 @@ class AppState(object):
         self.load_config_file(config_file)

         self.aggregate_functions = OrderedDict()
+        self.aggregate_functions['Count Categories'] = ds.count_cat
         self.aggregate_functions['Count'] = ds.count
         self.aggregate_functions['Mean'] = ds.mean
         self.aggregate_functions['Sum'] = ds.sum

@jbednar
Copy link
Member Author

jbednar commented Mar 8, 2016

It would also be good if it wouldn't re-run the entire pipeline if only the transfer fn changes. That way people can quickly try out different transfer_fns.

@jbednar
Copy link
Member Author

jbednar commented Mar 10, 2016

@brendancol, while this is all fresh in your mind, there are some other small dashboard-related fixes in the issues list that could be addressed before you move on to other things...

@@ -28,7 +31,6 @@
from webargs import fields
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we handle the dashboard's dependence on webargs? Should we just state that in the README? I don't think we'd necessarily want to make webargs a dependency of datashader, at least not until conda supports optional dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an example, so my opinion is that the dependencies don't matter as long as they're listed explicitly somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need to make it easy for people to run the example, though. After getting an error message about it, I briefly looked for webargs on conda, tried some bogus versions from non-main channels that didn't work, and eventually pip-installed webargs (which worked). Most people probably aren't that dedicated.

@jbednar
Copy link
Member Author

jbednar commented Mar 17, 2016

Is the area highlighted by the hover tool correct? It doesn't seem to be. E.g. there's a hotspot just to the left of the blue area:

image

but if I hover directly over that, seemingly enclosing the hotspot in a blue box, the counts aren't particularly high:

image

Yet if I move the mouse down to the cell below and to the left, I get high counts indicative of a hotspot:

image

Does the displayed blue box need to be moved to accurately reflect the area in the hover information?

@jbednar
Copy link
Member Author

jbednar commented Mar 17, 2016

The behavior in the corners with a large hover-box size also looks suspicious -- shouldn't the box be the same size throughout the array (with at most a pixel of rounding more or less), not cropped to a quarter the size in the corners?

@jbednar
Copy link
Member Author

jbednar commented Mar 17, 2016

I guess this may be addressed by the proposed switch to averaging pixel values for reaggregation, but I can't quite make sense of the values for some combinations of Field and Aggregate. E.g.:

image

What does counting the Fare field mean? Counting how many non-empty Fare values there are? If so, presumably it's not in $; not sure what to do about that.

@jbednar
Copy link
Member Author

jbednar commented Mar 17, 2016

Maybe report "Avg Fare ($) Count: xxxx", once it's averaging the values instead of re-aggregation? I.e., show the aggregate explicitly, not just the field?

@jbednar
Copy link
Member Author

jbednar commented Mar 19, 2016

I added some commits for some useful bits, including out-of-core operation based on code from jcrist, but it's not quite ready to merge because the link to the census.castra file is blank (because I haven't yet heard back about options for hosting that file). It's also strange that castra has to be downloaded from a special channel; is there any way to get castra from a more public place (@jcrist?)

x_end=max_val,
y_range=(0,18))

self.model.legend_vbox.children = [legend_fig]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once it settles down a bit, it would be good to move the legend support into a function or a class, with appropriate parameters, with an eye to eventually moving it out of the dashboard.py file and into a datashader library file.

@brendancol brendancol merged commit 2bcda51 into master Mar 25, 2016
@brendancol brendancol deleted the dashboard-fixes branch March 25, 2016 21:09
@jbednar jbednar mentioned this pull request Apr 6, 2016
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants