Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix algorithms "newton", "gradient_descen" and "proximal_grad" with cupy inputs #87

Merged
merged 6 commits into from
Oct 9, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions dask_glm/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def gradient_descent(X, y, max_iter=100, tol=1e-14, family=Logistic, **kwargs):
stepSize = 1.0
recalcRate = 10
backtrackMult = firstBacktrackMult
beta = np.zeros(p)
beta = np.zeros_like(X._meta, shape=(p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
beta = np.zeros_like(X._meta, shape=(p))
beta = np.zeros_like(X._meta, shape=p)

I think that we probably don't need the extra parens around p here and below


for k in range(max_iter):
# how necessary is this recalculation?
Expand Down Expand Up @@ -161,7 +161,7 @@ def newton(X, y, max_iter=50, tol=1e-8, family=Logistic, **kwargs):
"""
gradient, hessian = family.gradient, family.hessian
n, p = X.shape
beta = np.zeros(p) # always init to zeros?
beta = np.zeros_like(X._meta, shape=(p))
Xbeta = dot(X, beta)

iter_count = 0
Expand Down Expand Up @@ -387,7 +387,7 @@ def proximal_grad(X, y, regularizer='l1', lamduh=0.1, family=Logistic,
stepSize = 1.0
recalcRate = 10
backtrackMult = firstBacktrackMult
beta = np.zeros(p)
beta = np.zeros_like(X._meta, shape=(p))
regularizer = Regularizer.get(regularizer)

for k in range(max_iter):
Expand All @@ -406,8 +406,8 @@ def proximal_grad(X, y, regularizer='l1', lamduh=0.1, family=Logistic,
# Compute the step size
lf = func
for ii in range(100):
beta = regularizer.proximal_operator(obeta - stepSize * gradient, stepSize * lamduh)
step = obeta - beta
beta = regularizer.proximal_operator(- stepSize * gradient + obeta, stepSize * lamduh)
step = - beta + obeta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, why this change?

Copy link
Contributor Author

@daxiongshu daxiongshu Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When obeta is a cupy array, cupy's operators +/- can't handle a dask array as the second operands in those cases. After the changes, dask's operators of +/- will be used and it works with cupy array as the second operands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could do obeta = da.from_array(obeta) before the computation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter either way. Either way is fine. Can we report this upstream to cupy? Presumably they need to return NotImplemented somwhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter either way. Either way is fine.

I know, just providing a more Dask-like solution.

Can we report this upstream to cupy? Presumably they need to return NotImplemented somwhere?

I don't know exactly what error we get, but would reporting it change anything? Would returning NotImplemented really make things much clearer to bother?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a Python object doesn't know what to do with another operand in a binary operator, it can return NotIimplemented (or maybe it's raise?) and then Python will try the other order (calling other.__rsub__(self))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrocklin , I didn't know of that. I filed cupy/cupy#4118 to start the discussion on the CuPy side, I'm not sure but maybe there's a technical reason not to do that, so let's wait to hear from maintainers.

Xbeta = X.dot(beta)

Xbeta, beta = persist(Xbeta, beta)
Expand Down
4 changes: 2 additions & 2 deletions dask_glm/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def normalize_inputs(X, y, *args, **kwargs):
raise ValueError('Multiple constant columns detected!')
mean[intercept_idx] = 0
std[intercept_idx] = 1
mean = mean if len(intercept_idx[0]) else np.zeros(mean.shape)
mean = mean if len(intercept_idx[0]) else np.zeros_like(X._meta, shape=mean.shape)
Xn = (X - mean) / std
out = algo(Xn, y, *args, **kwargs).copy()
i_adj = np.sum(out * mean / std)
Expand Down Expand Up @@ -140,7 +140,7 @@ def add_intercept(X):
raise NotImplementedError("Can not add intercept to array with "
"unknown chunk shape")
j, k = X.chunks
o = da.ones((X.shape[0], 1), chunks=(j, 1))
o = da.ones_like(X, shape=(X.shape[0], 1), chunks=(j, 1))
if is_dask_array_sparse(X):
o = o.map_blocks(sparse.COO)
# TODO: Needed this `.rechunk` for the solver to work
Expand Down