Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ufunc 'over' not supported for the input types in 7_Networks.ipynb #792

Closed
jbednar opened this issue Sep 27, 2019 · 8 comments
Closed

ufunc 'over' not supported for the input types in 7_Networks.ipynb #792

jbednar opened this issue Sep 27, 2019 · 8 comments

Comments

@jbednar
Copy link
Member

jbednar commented Sep 27, 2019

With current master and xarray 0.11.3 or 0.13 and pandas 0.25.1, the http://datashader.org/user_guide/7_Networks.html#Graphs-with-categories section of the user guide is failing with the error:

TypeError: ufunc 'over' not supported for the input types, and the inputs could not 
be safely coerced to any supported types according to the casting rule ''safe''

Presumably that code used to work, since the page looks fine on the website. Python's complaining about overlaying the node and edge plots in this case, even though overlaying those same types of plots in other cells above and below works fine using the same code. Plus if I hack graphplot to overlay just the nodes on themselves or just the edges on themselves, the overlay works, suggesting that the edges and nodes in this case are individually fine but end up as incompatible types of object that can't be overlaid with over. Yet the two plots appear to have the same type (<class 'datashader.transfer_functions.Image'>) and dtype (uint32).

I'll try to see when this problem started, but it has the same behavior on xarray 0.11.3 and 0.13, so if it's due to a change in xarray it would presumably be from an older version.

@jbednar
Copy link
Member Author

jbednar commented Sep 27, 2019

I had the same over problem with xarray 0.9.6 and pandas 0.23, which are quite old by now, so it's a mystery.

@jbednar
Copy link
Member Author

jbednar commented Sep 27, 2019

Reproducer:

import math
import numpy as np
import pandas as pd

import datashader as ds
import datashader.transfer_functions as tf
from datashader.layout import random_layout, circular_layout, forceatlas2_layout
from datashader.bundling import connect_edges, hammer_bundle

from itertools import chain

np.random.seed(1)
cats,n,m = 4,10,20

cnodes = pd.concat([
           pd.DataFrame.from_records([("node"+str(i+100*c),"c"+str(c)) for i in range(n)], 
                        columns=['name','cat']) 
             for c in range(cats)], ignore_index=True)
cnodes.cat=cnodes.cat.astype('category')

cedges = pd.concat([
           pd.DataFrame(np.random.randint(n*c,n*(c+1), size=(m, 2)), 
                        columns=['source', 'target'])
         for c in range(cats)], ignore_index=True)

cvsopts = dict(plot_height=400, plot_width=400)

def nodesplot(nodes, name=None, canvas=None, cat=None):
    canvas = ds.Canvas(**cvsopts) if canvas is None else canvas
    aggregator=None if cat is None else ds.count_cat(cat)
    agg=canvas.points(nodes,'x','y',aggregator)
    return tf.spread(tf.shade(agg, cmap=["#FF3333"]), px=3, name=name)

def edgesplot(edges, name=None, canvas=None):
    canvas = ds.Canvas(**cvsopts) if canvas is None else canvas
    return tf.shade(canvas.line(edges, 'x','y', agg=ds.count()), name=name)

def graphplot(nodes, edges, name="", canvas=None, cat=None):
    if canvas is None:
        xr = nodes.x.min(), nodes.x.max()
        yr = nodes.y.min(), nodes.y.max()
        canvas = ds.Canvas(x_range=xr, y_range=yr, **cvsopts)
        
    np = nodesplot(nodes, name + " nodes", canvas, cat)
    ep = edgesplot(edges, name + " edges", canvas)
    print(type(np),np.dtype,type(ep),ep.dtype)
    return tf.stack(ep, np, how="over", name=name)

rd = random_layout(     cnodes, cedges)
rd_d = graphplot(rd, connect_edges(rd,cedges), "Random layout",          cat="cat")
$ python over_types.py
<class 'datashader.transfer_functions.Image'> uint32 <class 'datashader.transfer_functions.Image'> uint32
Traceback (most recent call last):
  File "over_types.py", line 52, in <module>
    rd_d = graphplot(rd, connect_edges(rd,cedges), "Random layout",          cat="cat")
  File "over_types.py", line 47, in graphplot
    return tf.stack(ep, np, how="over", name=name)
  File "/Users/jbednar/datashader/datashader/transfer_functions.py", line 120, in stack
    out = tz.reduce(tz.flip(op), [i.data for i in imgs])
  File "/Users/jbednar/miniconda3/envs/pyviz/lib/python3.7/site-packages/toolz/functoolz.py", line 303, in __call__
    return self._partial(*args, **kwargs)
  File "/Users/jbednar/miniconda3/envs/pyviz/lib/python3.7/site-packages/toolz/functoolz.py", line 737, in flip
    return func(b, a)
TypeError: ufunc 'over' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

@jbednar
Copy link
Member Author

jbednar commented Sep 27, 2019

(but tf.stack(np, np, how="over", name=name) works, as does tf.stack(ep, ep, how="over", name=name); it's only tf.stack(ep, np, how="over", name=name) that doesn't.)

@jbednar
Copy link
Member Author

jbednar commented Sep 27, 2019

It also works if you remove , cat, which otherwise causes it to use an aggregator of count_cat rather than count. Somehow a categorical aggregator is returning a different tf.Image type than other aggregators? I can't see any difference in the return value; eventually it becomes an Image either way. The starting point (the aggregate) is different for categoricals, with an extra dimension, but that dimension is collapsed in the categorical case so that it seems like everything should be the same types in the two cases. Somehow not, though!

@jsignell
Copy link
Member

The only thing I can think of is maybe the issue is with versions of numba.

@jbednar
Copy link
Member Author

jbednar commented Sep 30, 2019

That's a good suggestion, thanks. numba is 0.45.1 in my main environment, but I see the same symptoms with 0.39.0. Datashader's setup.py suggests that the original requirement was 0.37.0, but I wasn't able to test 0.37 or 0.38 due to conda failing to solve with my existing environment in that case. I can probably get it to install in a separate environment, but it doesn't look promising as an explanation...

@jbednar
Copy link
Member Author

jbednar commented Sep 30, 2019

It doesn't appear to be an issue with an external library changing, but with the Datashader code itself. The notebook and the reproducer work fine with the last release (Datashader 0.7.0, revision 1b9f300), and as recently as commit ff89603 on August 23. The first commit where the reproducer breaks is e42694b, which corresponds to merging pull request #779. #779 does include changes to the construction of categorical xarray objects, and specifically appears to change from having coordinates as lists:

class count_cat(Reduction):
        def finalize(bases, **kwargs):
            dims = kwargs['dims'] + [self.column]
            
            coords = kwargs['coords'] + [cats]

            return xr.DataArray(bases[0], dims=dims, coords=coords)

to coordinates as ordered dictionaries, which presumably provides names to the coordinates:

class count_cat(Reduction):
        def finalize(bases, **kwargs):
            dims = kwargs['dims'] + [self.column]

            coords = kwargs['coords']
            coords[self.column] = cats

            return xr.DataArray(bases[0], dims=dims, coords=coords)

Maybe xarray is balking at merging the categorical and non categorical Image objects because of some declared difference in the names of the coordinates, even though the coordinates themselves are the same? @jonmmease , can you explain what this change was for, or if you can think of something else that changed in this PR that could be causing this problem?

@jonmmease
Copy link
Collaborator

This change (from list to OrderedDict) was to separately control the ordering of the coordinates and dimensions to match what we (@philippjfr and I) thought was a better convention (dimensions ordered y the x, coordinates ordered x then y). I think what's missing is that this logic wasn't added to the categorical branch of the shade function. I'll take a closer look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants