In my experience, kernel threads have a cost that is sufficiently far from that ...

pcwalton · on Aug 6, 2015

So since you're talking about scalability into the millions of threads, I think what you actually want is stackless coroutines rather than M:N threading with separate user-level stacks. If you have 1M threads, even one page of stack for each will result in 4G of memory use. That's assuming no fragmentation or delayed reclamation from GC. Stacks, even when relocating, are too heavyweight for that kind of extreme concurrent load. With a stackless coroutine model, it's easier to reason about how much memory you're using per request; with a stack model, it's extremely dynamic, and compilers will readily sacrifice stack space for optimization behind your back (consider e.g. LICM).

Stackless coroutines are great--you can get to nginx levels of performance with them--but they aren't M:N threading as seen in Golang. Once you have a stack, as Erlang and Go do, you've already paid a large portion of the cost of 1:1 threading.

Scramblejams · on Aug 6, 2015

Thanks for the tip, stackless coroutines are new to me. Any of that on the Rust roadmap?

Scramblejams · on Aug 6, 2015

Am I correct that coroutines are intrinsically non-preemptive? If so, I'll need to keep looking.

ansible · on Aug 6, 2015

Am I correct that coroutines are intrinsically non-preemptive? If so, I'll need to keep looking.

That's usually the case. Coroutines have their uses, but having used goroutines, that is my current preference.

pcwalton · on Aug 6, 2015

Coroutines are preemptible at I/O boundaries or manual synchronization points. Those synchronization points could be inserted by the compiler, but if you do that you're back into goroutine land, which typically isn't better than 1:1. In particular, it seems quite difficult to achieve scalability to millions of threads with "true" preemption, which requires either stacks or aggressive CPS transformation.