A better inliner for OCaml, and why it matters

vegabook · on Feb 24, 2016

I love Ocaml but have ditched it for Erlang/Elixir, losing a lot of elegance on the way, but massively gaining on horizontal scalability and pragmatism. I would love to stay with Ocaml but there just isn't enough of a clear roadmap to it on multicore (seemingly fraught with controversy within the community), or even multi-node distributed computing that does not require me to reverse engineer someone's 10-year-old, badly documented thesis project. I almost feel like the Ocaml project is stuck optimizing a single-core past, and that it might be better to devise a new Ocaml-based language altogether, designed from the ground up for distributed computing. I'd be there immediately.

amirmc · on Feb 24, 2016

Have you seen any of the work on OCaml multicore? KC's blog posts talk about it e.g. http://kcsrk.info/ocaml/multicore/2015/05/20/effects-multico...

FreezerburnV · on Feb 24, 2016

Is there any kind of timeline for when multicore is actually going to be implemented though? Last thing I saw was some posts from last year saying something about "if everything goes well, it will be in 4.03". As of now, I haven't been able to find anything about it being in 4.03, or anything else about the timeline to implementation. It would be great if it were coming soon, but I'm pretty sure there have been rumblings about it for years now, with nothing more solid than "Coming Soon(tm)".

amirmc · on Feb 24, 2016

> nothing more solid than "Coming Soon(tm)".

That's untrue. There's way more out there than 'coming soon'. There's a repo with code, talks at the OCaml Workshop and blog posts describing it.

This isn't the kind of project where you want to 'move fast and break things'.

nickpsecurity · on Feb 25, 2016

I've been wanting to ask one of you about certified compilation for Ocaml. SML has FLINT and CakeML. I know Leroy et al were working on a Mini-ML compiler. Are there any results yet on a certifying compiler for Ocaml, though? Anyone made a lot of progress? Or can it be converted to equivalent ML that can go through something like CakeML?

Last high-assurance work I saw done with Ocaml was Esterel's SCADE generator, which certified object code by hand. They praised the compiler for how much work they avoided w/ minimal mods. They conceivably could've gotten more done if that was automated, though.

vegabook · on Feb 24, 2016

... moreover, github's charts don't show it to have much of a recent pulse.

https://github.com/ocamllabs/ocaml-multicore/graphs/contribu...

avsm · on Feb 24, 2016

The runtime aspects of the multicore runtime has been pretty stable. Most of the effort currently is going into the algebraic effects extension that is used to map direct-style concurrency into multiple (parallel) cores: https://github.com/ocamllabs/ocaml-effects

amirmc · on Feb 24, 2016

That doesn't mean there isn't work taking place. Here's a recent video where KC talks about it.

https://m.youtube.com/watch?list=PLnqUlCo055hU46uoONmhYGUbYA...

Edit: and there's also more than one repo out there.

fsloth · on Feb 25, 2016

I understand you are writing a server. To be fair, not all compute loads are embarrassingly parallel. For those tasks single thread performance - especially from the point of view of memory bandwidth - is critical.

Single core is not 'past'. 1. CPU:s are not getting that much faster every generation (one could argue that this means parallellization is critical, but...) 2. The number of cores on the average desktop is not that much. One of the reasons I think is that in many cases the individual tasks that must be performed sequentially in a typical desktop program are so tiny or hard to split that that the platform overhead and complexity of multiple threads cause the benefits from actually utilizing multiple threads to be a bit harder to reach than just using a language with a nice parallellization story. By 'benefits' I mean making the program actually faster for the user.

Of course, I'm talking about Amdahl's law https://en.m.wikipedia.org/wiki/Amdahl%27s_law

vegabook · on Feb 25, 2016

Yep - my use case analyses the streaming incoming data using compute-intensive algorithms. I need very high performance single core, for which Erlang is unsuitable. Therefore I'm using calls to numpy. But then I need to distribute data widely in heterogenous, changeable ways, and for that, Erlang wins hands down.

twic · on Feb 26, 2016

Are you calling from Erlang to Numpy? If so, wow, and how is that working out?

vegabook · on Feb 27, 2016

Nothing fancy - I message pass via zeromq to a cluster running python/numpy instances. It's pretty coarse granularity - the (BFGS curve fitting) optimizations I need from numpy will take between 0.5 and 3 seconds usually so I'm not suffering too much on serialization time in comparison. The optimizations involved do not need to be run in real time against the very latest streaming data, they just need to be as recent as possible, which makes this solution quite workable.

tadlan · on March 7, 2016

Have you tried dask?

giancarlostoro · on Feb 24, 2016

I'm not an expert in either language but have you also given F# a though? The compiler is open source and should run under Mono. I do know it's heavily influenced by OCaml.

vegabook · on Feb 24, 2016

I really need Linux to be the first class citizen (yes I see MS has bought Xamarin but still). Also last I looked, F# wasn't looking any better than Ocaml on multinode distributed computing (please do correct me if I'm wrong).

profquail · on Feb 25, 2016

As another poster mentioned, you can use Akka.Net with F# (that project was started and is still run by F# developers).

Another option with deeper language integration is mbrace: http://mbrace.io/

Vesa Karvonen ported CML to F# as Hopac, which has very good performance: https://github.com/Hopac/Hopac

Finally, there're Orleans and Naiad from MSFT:

https://github.com/dotnet/orleans

https://github.com/MicrosoftResearch/Naiad

giancarlostoro · on Feb 24, 2016

Xamarin is a C# shop, mono as a platform for F# is independent of this, and if I'm not mistaken the F# compiler is open source[0] under the Apache 2.0 License. There are a few shops who have praised F# for it's multicore capabilities[1], but again I am not a full expert so it may require independent research. MonoDevelop supports F# as well. There are Linux build instructions in the GitHub repository.

[0]: https://github.com/fsharp/fsharp [1]: http://fsharp.org/testimonials/#grange-insurance-1

ZenoArrow · on Feb 24, 2016

If you're looking for to use a distributed architecture with F# it's worth checking out Akka.NET:

http://getakka.net/

Ono-Sendai · on Feb 24, 2016

I'm working on such a language, called Winter: http://www.forwardscattering.org/post/22 Hopefully it will be open sourced in the not-to-distant future.

dman · on Feb 25, 2016

Just read through your blog - found it very informative! Hope to see Winter open sourced soon.

Ono-Sendai · on Feb 26, 2016

Thanks :)

bajsejohannes · on Feb 25, 2016

A lot of the things I loved in Ocaml, I've found in Rust. Additionally, Rust has great multicore support. You won't find very mature multi-node libraries in Rust yet, though.

LeonidasXIV · on Feb 25, 2016

It is not that surprising, since the Rust compiler used to be written in OCaml, so naturally OCaml influenced the dasign of Rust in some way.

Glad to see OCaml concepts get popularized by other languages, that's a win-win situation for everyone.

rixed · on Feb 24, 2016

This is not clear. Multicore and distributed computing are unrelated at best. To make it clearer, can you explain what you are doing in erlang that you couldn't do in ocaml?

vegabook · on Feb 24, 2016

I am message passing thousands of financial ticks per second to hundreds of terminal-user consumers, each of which may wish to preselect filters that they wish applied to each series, or combinations of series. This is trivially easy to organize in Erlang/OTP, it is trivially easy to scale, and I have no clue how I would organize it in Ocaml without going to message passing first principles, and in which case I would have to build the entire load balancing, PID and security infrastructure from scratch. Moreover I do not see how you do not see that multicore and multi-node are related. On Erlang, having two 8-core boxes is almost the same as having one 16-core box. Again, if there is something I am missing about the Ocaml environment I would be happy to know about it.

jallmann · on Feb 24, 2016

It sounds the real benefit for you is in OTP rather than anything intrinsic to Erlang's multicore implementation.

> not see that multicore and multi-node are related

There is a huge distinction in implementation, which Erlang/OTP abstracts over for you somewhat (but not entirely...). By optimizing for the distributed (multi-node) scenario, you decouple the that work from the language implementation -- and multiple processes per CPU would benefit just as well as a cluster. A distributed cluster does not benefit from a multi-core implementation. Given the baggage most languages have with the GIL (including OCaml), optimizing for single threaded performance seems like the correct decision here.

With Erlang, distribution is accomplished with the epmd registry, something similar could be done with OCaml as well. As you noted, there is no OTP-like system for OCaml, but the absence of one does not hinge on any multicore implementation. Single-threaded OCaml processes handle concurrency just fine, and usually faster than Erlang, to boot.

Personally, if the OCaml multicore work ends up slowing down the core runtime appreciably, I'm going to be upset.

vegabook · on Feb 25, 2016

> Personally, if the OCaml multicore work ends up slowing down the core runtime appreciably, I'm going to be upset

That's what I meant in my original post about multicore seeming controversial. That's part of the reason that for me, a super-speedy single-core Ocaml coupled with an industrial strength OTP-style distributed computing framework, would be awesome. I'm actually not that excited about multicore. My use case is lots of streaming data coming in, needing to be parsed, lots of heavy lifting stats work done in real time, then distributed to clients. So it needs both very high performance on the single thread (currently using GPU-accelerated calls to Numpy), and then very efficiently be able to distribute heterogenous data to lots of endpoints. I really would love to be able to do it all in Ocaml.

wyager · on Feb 24, 2016

You may be interested in Cloud Haskell. It's basically Erlang networking nicely re-implemented in Haskell with the strong static typing you expect from Haskell. Gets you some of the advantages of both erlang and ocaml.

elihu · on Feb 25, 2016

For what it's worth, "Cloud Haskell", the project, corresponds to the distributed-process library, which may be easier to search for than cloud Haskell. The API is described in chapter 14 of Parallel and Concurrent Programming in Haskell, by Simon Marlow. (I haven't been paying close attention to the distributed-process library, so I don't know whether that chapter is up-to-date with recent versions of the library.)

wyager · on Feb 25, 2016

Interesting. I actually have that book on my shelf, but haven't gotten around to reading it yet. I'll check it out.

lmm · on Feb 24, 2016

Similarly, Scala with Akka gives you an ML-like language with first-class Linux support, excellent multicore performance, and Erlang-style distribution.

rixed · on March 4, 2016

Regarding the multicore thing it had been answered already: if you have to run a distributed service in a 8 cores machine then just run 8 binaries. If you are using threads to write more elegant synchronous-like code in replacement of even driven asynchronous io then you do not need a concurrent runtime for that.

Now, regarding the availability of libraries to build distributed system in ocaml you are totally right. For years it seams everybody was happy with mpi, then we had some sort of map reduce framework, and that's it as far as I'm aware. I would expect the growing popularity of mirage OS to improve this situation, though.

duaneb · on Feb 24, 2016

> Multicore and distributed computing are unrelated at best.

So give them both up and live in a single threaded world? Seems bleak.

lmm · on Feb 25, 2016

No, the point is that if you're going to do distributed computing then multicore is a waste of time - you might as well just run one process on each core and treat each core like a separate machine (where some machines happen to have a particularly low-latency/high-bandwidth network connection). You still need a good distributed runtime, but if you believe distributed is the future then multicore is a dead end.

duaneb · on Feb 25, 2016

> if you're going to do distributed computing then multicore is a waste of time

Not if you're doing I/O bound work....

xiaoma · on Feb 25, 2016

What elegance was lost? Where is the biggest pain point on your new stack?

mwcampbell · on Feb 24, 2016

I wonder what kind of impact this will have on the performance of Mirage. It seems to me that whole-program optimization is potentially a great benefit for a unikernel written almost entirely in a high-level language like OCaml.

wk_end · on Feb 24, 2016

To be clear, I don't believe flambda is providing anything on the scale of whole program optimization; it inlines more aggressively, but not the entire program.

Drup · on Feb 24, 2016

That is not exactly correct, it also does cross module inlining. Now, suppose you tweak a bit your parameters to inline ... liberally. It's going to inline all the mirage functors directly into your main, which enables various things.

Of course, that's going to take a bit of time to compile.

Also, IIRC, all the ingredients are in place for link time optimizations, but they were not developed for ocaml 4.03.

cm3 · on Feb 25, 2016

What about dead code elimination? A simple executable using Core is >20MB.

lpw25 · on Feb 25, 2016

The size of Core executables is mostly addressed by module aliases. Unfortunately the public release of Core still uses packing instead of module aliases because oasis/ocamlbuild don't easily support them.

cm3 · on Feb 25, 2016

Slightly off-topic, but will there be a consolidation of oasis, ocamlbuild, corebuild, to go with opam?

cm3 · on Feb 25, 2016

Will we also get dead code elimination generally speaking in the compiler? I remember a mailing post where one of the flambda devs announced he managed to generate standalone hello world of 43k but that was just a PoC.

tempodox · on Feb 25, 2016

I'm really looking forward to modular implicits. Having something like Haskell type classes is a great tool for abstraction.