I've used the linear programming bit of COIN-OR - Clp and Cbc. They may not be as good as the commercial solvers, but they're still way better than the other open-source options. That is particularly impressive given that, AIUI, they are largely the work of one man, John Forrest.
For general optimisation, i've used NLopt. It was very easy to use, and has a pretty sensible set of algorithms.
One thing i've learnt is that comparing optimisation algorithms is really hard. One might be better at one problem, one at another. One pretty general axis of variation is problem size: some algorithms scale to orders of magnitude more variables (or constraints etc) than others.
In particular, i get the impression that a lot of the cutting-edge research is about being able to solve huge problems at all, or in hours instead of days. Perhaps those algorithms will be more useful for ML hyperparameter optimisation, but unfortunately, my needs are about solving much simpler problems in a fraction of a second. I get a lot more excited about Michael Powell's algorithms, which are cunning but simple, and work well on my problems.
> They may not be as good as the commercial solvers, but they're still way better than the other open-source options.
Agreed. CBC/Clp are the best open-source LP/MIP solvers out there. SCIP is better, but it's not fully free.
> For general optimisation, i've used NLopt. It was very easy to use, and has a pretty sensible set of algorithms.
I noticed the omission of IPOPT support in NLopt. IPOPT is one of the best large-scale nonconvex NLP solvers out there (if you have exactly 2nd order derivative information, typically from AD).
> One thing i've learnt is that comparing optimisation algorithms is really hard. One might be better at one problem, one at another. One pretty general axis of variation is problem size: some algorithms scale to orders of magnitude more variables (or constraints etc) than others.
Very true. The academic world has standardized on Dolan-More charts for comparing optimizers on a standard corpus of problems, and those are the closest we have to quantifying general performance. But the no-free lunch theorem applies. There's also a lot of randomness and chance involved, e.g. initialization points can drastically affect nonconvex solver performance on the same problem. So can little things like ordering of constraints or the addition of redundant non-binding constraints (!) for MIPs (this was demonstrated by Fischetti). That's why modern MIP solvers include a random perturbation parameter.
> ML hyperparameter optimisation
Most types of hyperparameter optimization are black-box optimization problem with no analytic derivative information, which much of research in deterministic optimization does not address. Black-box problems typically rely on DFO (Derivative-Free Optimization) methods. DFO methods are even harder to quantify in terms of performance. The Nelder-Mead Simplex method is as good a method as any.
Interesting! My problems are derivative-free, and my gut feeling is that a lot of real-world engineering-type problems are, because at some point they involve some gnarly nonlinear maths. In my case, my variables are about the positions of some knots from which i construct a monotonic cubic spline, which i then use as a parameter to another model; my objective function is how well that model reproduces some real measurements. I get a headache just writing it all down.
I've had good results from simplex; i'm a bit sad to hear there isn't some magical algorithm that will give me a massive speedup over it.
I haven't tried automatic differentiation. That seems extremely daunting given the volume of code in my problem. Someone has done some groundwork on AD of the main library i use in my model, but it would still be a serious bit of work.
Optimal knot placement, though seemingly simple, can be a gnarly problem due to degeneracy in many cases. Anecdotally I've found quadratic splines to still be somewhat tractable with exact methods, but polynomial splines with cubic or higher orders can be tricky, especially when they are large-scale.
For derivative-free optimization, you can sometimes get speedups from just adopting a embarassingly parallel algorithm like Particle Swarm. These types of stochastic algorithms aren't necessarily algorithmically better than Nelder-Mead Simplex, but they can explore a larger terrain in parallel, which may help. If you can find some parallel stochastic code, you should be able to spin up a VM on the cloud with lots of cores to help you search the space.
No, but I was a big-time user and really pushed it to its limits with my large-scale nonconvex problems. I never actually tinkered with its internals but I did feed it some very unorthodox formulations. :)
John Forrest is a hell of a character too if you ever talk to him in person. He's retired to a villa in Italy but still works on Clp, Cbc, and various consulting projects for fun: http://fastercoin.com/f2_files/about.html
I'm not working on linear programming stuff at the moment, but i am seriously thinking that if i ever get back to it, i'll ask my boss if we can rustle up some budget to hire him!
>One thing i've learnt is that comparing optimisation algorithms is really hard
It still boggles my mind how simplex, something that theoretically runs worse that interior point methods is competitive (at least based on what I've read online). I guess with these problems, the instances of the problem has a huge effect on how long approaches take.
>In particular, i get the impression that a lot of the cutting-edge research is about being able to solve huge problems at all
This is also the impression I get. When I look at benchmarks (btw, does anyone know an updated publication for these? A lot of what I find seems to be from years ago. I'm not sure how fast development of these are, but it would be nice to be updated once in a while) it always has some measure of time along with number of instances solved.
>Perhaps those algorithms will be more useful for ML hyperparameter optimisation
I thought about this, and maybe the reason why they largely don't use these have to do with getting results that are good enough (generalize well). A global optimum might not be worth the effort, so they stick to some variant of gradient descent. The measure they look at, after all, is performance on the test set.
Aside from that, there may be something specific about a well defined problem that they can use to speed computations up, and a more general approach probably can't assume these for other instances of NLP for example.
> It still boggles my mind how simplex, something that theoretically runs worse that interior point methods is competitive (at least based on what I've read online). I guess with these problems, the instances of the problem has a huge effect on how long approaches take.
Well, theoretical worst cases are just that. They are worst case bounds. People think NP-hard problem are intractable, but this is a misunderstanding. Many NP-hard problems can actually be solved with fairly good average case performance.
Case in point: the Simplex method is worst-case exponential time, but its average case performance is actually pretty good.
When Khachiyan first came up with his elipsoid method, it was polynomial time but in practice it was too slow. Karmarkar's interior point algorithm was polynomial time too, but performs much efficiently.
These days though, solve times are predominantly affected by solver heuristics and randomness. The choice of Simplex vs Interior Point does not make a significant difference in many cases.
For general optimisation, i've used NLopt. It was very easy to use, and has a pretty sensible set of algorithms.
One thing i've learnt is that comparing optimisation algorithms is really hard. One might be better at one problem, one at another. One pretty general axis of variation is problem size: some algorithms scale to orders of magnitude more variables (or constraints etc) than others.
In particular, i get the impression that a lot of the cutting-edge research is about being able to solve huge problems at all, or in hours instead of days. Perhaps those algorithms will be more useful for ML hyperparameter optimisation, but unfortunately, my needs are about solving much simpler problems in a fraction of a second. I get a lot more excited about Michael Powell's algorithms, which are cunning but simple, and work well on my problems.