why make "push" as an “accumulator” rather than “replacer”？ #61

ljzzju · 2016-10-11T12:02:51Z

Hi, all,
It is so exciting to see this project！

Recently I have been working on huge dimension machine learning, and I found Parameter Server is indispensable.

As simply viewing parts of the codes, I wonder why the api "matrix.push" or "vector.push" adding rather than replacing the given value to the corresponding element on the ps server?

Generally, users may have several "update-like" demands, like replace, accumulate or other UDF functions to refresh the parameter values on the ps server, what should I do to satisfy these demands or you guys have some relative plans?

Thanks so much!

codlife · 2016-10-11T12:19:57Z

Hi @ljzzju I think what you consider is right!

rjagerman · 2016-10-11T15:11:52Z

Thank you for your question, you raise a good point!
When we initially started this project, the goal was to develop more scalable inference of LDA in Spark via collapsed Gibbs sampling. In this approach, the update equations are of the form n_{wk} <- n_{wk} + 1 and n_{wk} <- n_{wk} - 1. Additionally, (stochastic) gradient descent uses a similar looking update equations w <- w + gradient. Obviously, these are very simple addition operators.

Implementing the parameter server using addition was a conscious design decision because it has several advantages:

It is appropriate for the problem at hand (collapsed Gibbs sampling)
It is efficient to implement
It prevents the need for complex locking mechanisms that are typical in most key-value stores

We have had some discussion about regularized gradient descent, which requires more sophisticated updates than just addition in #28. Additionally, I propose to support a limited set of aggregators in #39 (this is easy to implement currently and can be done in a few days). Supporting any UDFs is rather complicated, due to serializing a function and sending it over the network to different JVMs, as described in #55.

I hope this helps clarify why our current design only supports addition. Nevertheless, I understand the need for other operators and I'm very interested in improving Glint. At the very least, I think supporting several aggregation types (addition, maximum, minimum, replacement) is easy to implement and definitely worth the effort.

batizty · 2017-03-02T09:34:57Z

Firstly, thanks to @rjagerman to implement Glint, and I am very interesting about it, will be used in my environment.

Then, about the operators, @rjagerman, could please publish some documents like roadmap, and I wanna to know when it comes, thanks. 👍

rjagerman · 2017-03-08T19:30:33Z

Hey, unfortunately I have had no time to work on Glint due to my every-day life obligations as a PhD. I will be working on parameter-server technology and Glint fulltime for 4 months starting June 1st. I will publish a roadmap detailing the plans once we get closer to that date.

ljzzju · 2017-03-16T12:50:37Z

That's great to hear that! If necessary, I quite like to make some contributions to this project.

batizty · 2017-03-17T02:33:02Z

Glad get the response, @rjagerman.

And I am doing some performance testing using glint with Spark, if it works well, I will share some performance data, thanks.

rjagerman added Type: question Type: feature Priority: normal labels Oct 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why make "push" as an “accumulator” rather than “replacer”？ #61

why make "push" as an “accumulator” rather than “replacer”？ #61

ljzzju commented Oct 11, 2016 •

edited

Loading

codlife commented Oct 11, 2016

rjagerman commented Oct 11, 2016

batizty commented Mar 2, 2017

rjagerman commented Mar 8, 2017

ljzzju commented Mar 16, 2017

batizty commented Mar 17, 2017

why make "push" as an “accumulator” rather than “replacer”？ #61

why make "push" as an “accumulator” rather than “replacer”？ #61

Comments

ljzzju commented Oct 11, 2016 • edited Loading

codlife commented Oct 11, 2016

rjagerman commented Oct 11, 2016

batizty commented Mar 2, 2017

rjagerman commented Mar 8, 2017

ljzzju commented Mar 16, 2017

batizty commented Mar 17, 2017

ljzzju commented Oct 11, 2016 •

edited

Loading