I love Ocaml but have ditched it for Erlang/Elixir, losing a lot of elegance on the way, but massively gaining on horizontal scalability and pragmatism. I would love to stay with Ocaml but there just isn't enough of a clear roadmap to it on multicore (seemingly fraught with controversy within the community), or even multi-node distributed computing that does not require me to reverse engineer someone's 10-year-old, badly documented thesis project. I almost feel like the Ocaml project is stuck optimizing a single-core past, and that it might be better to devise a new Ocaml-based language altogether, designed from the ground up for distributed computing. I'd be there immediately.
Is there any kind of timeline for when multicore is actually going to be implemented though? Last thing I saw was some posts from last year saying something about "if everything goes well, it will be in 4.03". As of now, I haven't been able to find anything about it being in 4.03, or anything else about the timeline to implementation. It would be great if it were coming soon, but I'm pretty sure there have been rumblings about it for years now, with nothing more solid than "Coming Soon(tm)".
I've been wanting to ask one of you about certified compilation for Ocaml. SML has FLINT and CakeML. I know Leroy et al were working on a Mini-ML compiler. Are there any results yet on a certifying compiler for Ocaml, though? Anyone made a lot of progress? Or can it be converted to equivalent ML that can go through something like CakeML?
Last high-assurance work I saw done with Ocaml was Esterel's SCADE generator, which certified object code by hand. They praised the compiler for how much work they avoided w/ minimal mods. They conceivably could've gotten more done if that was automated, though.
The runtime aspects of the multicore runtime has been pretty stable. Most of the effort currently is going into the algebraic effects extension that is used to map direct-style concurrency into multiple (parallel) cores: https://github.com/ocamllabs/ocaml-effects
I understand you are writing a server. To be fair, not all compute loads are embarrassingly parallel. For those tasks single thread performance - especially from the point of view of memory bandwidth - is critical.
Single core is not 'past'. 1. CPU:s are not getting that much faster every generation (one could argue that this means parallellization is critical, but...) 2. The number of cores on the average desktop is not that much. One of the reasons I think is that in many cases the individual tasks that must be performed sequentially in a typical desktop program are so tiny or hard to split that that the platform overhead and complexity of multiple threads cause the benefits from actually utilizing multiple threads to be a bit harder to reach than just using a language with a nice parallellization story. By 'benefits' I mean making the program actually faster for the user.
Yep - my use case analyses the streaming incoming data using compute-intensive algorithms. I need very high performance single core, for which Erlang is unsuitable. Therefore I'm using calls to numpy. But then I need to distribute data widely in heterogenous, changeable ways, and for that, Erlang wins hands down.
Nothing fancy - I message pass via zeromq to a cluster running python/numpy instances. It's pretty coarse granularity - the (BFGS curve fitting) optimizations I need from numpy will take between 0.5 and 3 seconds usually so I'm not suffering too much on serialization time in comparison. The optimizations involved do not need to be run in real time against the very latest streaming data, they just need to be as recent as possible, which makes this solution quite workable.
I'm not an expert in either language but have you also given F# a though? The compiler is open source and should run under Mono. I do know it's heavily influenced by OCaml.
I really need Linux to be the first class citizen (yes I see MS has bought Xamarin but still). Also last I looked, F# wasn't looking any better than Ocaml on multinode distributed computing (please do correct me if I'm wrong).
Xamarin is a C# shop, mono as a platform for F# is independent of this, and if I'm not mistaken the F# compiler is open source[0] under the Apache 2.0 License. There are a few shops who have praised F# for it's multicore capabilities[1], but again I am not a full expert so it may require independent research. MonoDevelop supports F# as well. There are Linux build instructions in the GitHub repository.
A lot of the things I loved in Ocaml, I've found in Rust. Additionally, Rust has great multicore support. You won't find very mature multi-node libraries in Rust yet, though.
This is not clear. Multicore and distributed computing are unrelated at best.
To make it clearer, can you explain what you are doing in erlang that you couldn't do in ocaml?
I am message passing thousands of financial ticks per second to hundreds of terminal-user consumers, each of which may wish to preselect filters that they wish applied to each series, or combinations of series. This is trivially easy to organize in Erlang/OTP, it is trivially easy to scale, and I have no clue how I would organize it in Ocaml without going to message passing first principles, and in which case I would have to build the entire load balancing, PID and security infrastructure from scratch. Moreover I do not see how you do not see that multicore and multi-node are related. On Erlang, having two 8-core boxes is almost the same as having one 16-core box. Again, if there is something I am missing about the Ocaml environment I would be happy to know about it.
It sounds the real benefit for you is in OTP rather than anything intrinsic to Erlang's multicore implementation.
> not see that multicore and multi-node are related
There is a huge distinction in implementation, which Erlang/OTP abstracts over for you somewhat (but not entirely...). By optimizing for the distributed (multi-node) scenario, you decouple the that work from the language implementation -- and multiple processes per CPU would benefit just as well as a cluster. A distributed cluster does not benefit from a multi-core implementation. Given the baggage most languages have with the GIL (including OCaml), optimizing for single threaded performance seems like the correct decision here.
With Erlang, distribution is accomplished with the epmd registry, something similar could be done with OCaml as well. As you noted, there is no OTP-like system for OCaml, but the absence of one does not hinge on any multicore implementation. Single-threaded OCaml processes handle concurrency just fine, and usually faster than Erlang, to boot.
Personally, if the OCaml multicore work ends up slowing down the core runtime appreciably, I'm going to be upset.
> Personally, if the OCaml multicore work ends up slowing down the core runtime appreciably, I'm going to be upset
That's what I meant in my original post about multicore seeming controversial. That's part of the reason that for me, a super-speedy single-core Ocaml coupled with an industrial strength OTP-style distributed computing framework, would be awesome. I'm actually not that excited about multicore. My use case is lots of streaming data coming in, needing to be parsed, lots of heavy lifting stats work done in real time, then distributed to clients. So it needs both very high performance on the single thread (currently using GPU-accelerated calls to Numpy), and then very efficiently be able to distribute heterogenous data to lots of endpoints. I really would love to be able to do it all in Ocaml.
You may be interested in Cloud Haskell. It's basically Erlang networking nicely re-implemented in Haskell with the strong static typing you expect from Haskell. Gets you some of the advantages of both erlang and ocaml.
For what it's worth, "Cloud Haskell", the project, corresponds to the distributed-process library, which may be easier to search for than cloud Haskell. The API is described in chapter 14 of Parallel and Concurrent Programming in Haskell, by Simon Marlow. (I haven't been paying close attention to the distributed-process library, so I don't know whether that chapter is up-to-date with recent versions of the library.)
Similarly, Scala with Akka gives you an ML-like language with first-class Linux support, excellent multicore performance, and Erlang-style distribution.
Regarding the multicore thing it had been answered already: if you have to run a distributed service in a 8 cores machine then just run 8 binaries. If you are using threads to write more elegant synchronous-like code in replacement of even driven asynchronous io then you do not need a concurrent runtime for that.
Now, regarding the availability of libraries to build distributed system in ocaml you are totally right. For years it seams everybody was happy with mpi, then we had some sort of map reduce framework, and that's it as far as I'm aware.
I would expect the growing popularity of mirage OS to improve this situation, though.
No, the point is that if you're going to do distributed computing then multicore is a waste of time - you might as well just run one process on each core and treat each core like a separate machine (where some machines happen to have a particularly low-latency/high-bandwidth network connection). You still need a good distributed runtime, but if you believe distributed is the future then multicore is a dead end.
I wonder what kind of impact this will have on the performance of Mirage. It seems to me that whole-program optimization is potentially a great benefit for a unikernel written almost entirely in a high-level language like OCaml.
To be clear, I don't believe flambda is providing anything on the scale of whole program optimization; it inlines more aggressively, but not the entire program.
That is not exactly correct, it also does cross module inlining. Now, suppose you tweak a bit your parameters to inline ... liberally. It's going to inline all the mirage functors directly into your main, which enables various things.
Of course, that's going to take a bit of time to compile.
Also, IIRC, all the ingredients are in place for link time optimizations, but they were not developed for ocaml 4.03.
The size of Core executables is mostly addressed by module aliases. Unfortunately the public release of Core still uses packing instead of module aliases because oasis/ocamlbuild don't easily support them.
Will we also get dead code elimination generally speaking in the compiler? I remember a mailing post where one of the flambda devs announced he managed to generate standalone hello world of 43k but that was just a PoC.