-
Notifications
You must be signed in to change notification settings - Fork 25
Use a more general framework for primal SVM #8
Comments
Regularized empirical risk minimization describes a lot of supervised learning, including many models outside of the linear predictor-based models you mentioned. I'm a little unsure what the abstract solver would do. What did you have in mind? |
Well, I mean Pegasos is only an instance of a stochastic gradient descent method (aka Robbins-Monro algorithm). This is applicable for multiple objectives that can be seen as an (empirical) expectation---meaning a sum over loss functions (as these mentioned above). So, I would expect an implementation of the Robbins-Monro algorithm (in JuliaOpt?) that optimises the objective using sampled partial gradients. Based on that, an reg. empirical risk minimisation (ERM) wrapper could be initialised with the data, the loss function, the regulariser (and there gradients), and a solver, which could be, e.g., stochastic gradient descent, gradient descent with line search, or another solver (perhaps restricted to convex and differentiable functions). Then, an SVM simply instantiates ERM with an l2 regularisation and the hinge-loss. To leverage the specific form further, one could think also of a general dual formulation. What do you think? I assume that such a framework would make it perhaps easier to implement further supervised learning algorithms in a consistent way (LogReg, Lasso, linear regression,... for two or multiple classes). |
If you're still interested in a general SGD tool, I think it could be quite useful. I've done a little work on this in the past, but it's very difficult to get right in general: you need a lot of indirection to handle all possible cases. At some point, I'll put my most recent attempt online. |
Yes. But I have to think about it a bit more careful - hopefully next weekend. |
Hi a put my attempt online. You can find it here: https://github.com/BigCrunsh/RegERMs.jl. |
I would suggest to use some other techniques for learning primal/dual SVMs as well, like Modified Frank-Wolfe algorithm etc. In our lab we are working on this as well, so it would be interesting to contribute! |
Having other solvers would be great. Please do submit PR's. |
@jumutc: Does it make sense to consider this in an even more general framework, too. The work on the regularized empirical risk framework (https://github.com/JuliaStats/RegERMs.jl) has been dormant for some time. I am wondering if we should recommence the discussion and use the SVM as a kind of kick-off. @johnmyleswhite: What do you think? Too early? Not worth it? |
Dear @BigCrunsh and @johnmyleswhite, I have noticed several very important issues with
In our lab we have implemented most of the above mentioned "features" in |
@jumutc: yes definitely. Here some comments:
@jumutc: Does that sound reasonable to you? |
@BigCrunsh: yes, for sure I was mainly working on different SGD- and RDA-based approaches and there were several issues I have encountered while implementing all of them in Julia. I would like to suggest the following:
@BigCrunsh and @johnmyleswhite: what do you think? |
I'm still trying to take a typing break, so this won't be as thorough as I'd like. (1) I think all of this material needs to be unified over time. My preference has been to pick small problems and get them right, because it's really hard to get the big problems right from the start. (2) I think the main challenge is to design interfaces that apply very generically and use multiple dispatch for specialization. For instance, you want something like (3) There's potentially a high cost to abstraction if inlining fails, so jumping to abstractions at the start is very hard to get right while keeping up high performance. FWIW, I think the Distributions package is our best current example of how to do things -- which does everything in terms of standardized interface + fallbacks that handle problems when there's no custom solution. I guess I tend to think of generic SGD + optimization problem as this fallback in machine learning: in most problems, you have additional structure you can exploit that a generic interface can't know about. Sorry for not having anything deeper to say. |
I agree with @johnmyleswhite, unification is an ongoing and expensive process. However, that shouldn't stop you / us tackling the points you (@jumutc) mentioned. Specifically,
Looking forward to your contributions! |
Hey guys, |
I only had a quick look to your code. It looks like you are implementing a specific algorithm for the primal solution of the SVM. In general, the primal solution for SVMs is an instance of, what I would call, "Regularized empirical risk minimization". Depending on the loss function and the regularizer, it also captures logistic regression and linear regression. There is also a generalization for the dual solution. Does it make sense to build a solver for these general type and then inherited from this structure?
The text was updated successfully, but these errors were encountered: