That's not true. Well it's true in a very narrow technical sense, but it's not really true.
For example, the amount of housekeeping python does in order to execute a function call is staggering. It leads to all sorts of nice functionality, but nevertheless (plus C++/D does it almost entirely without housekeeping. Either no housekeeping, or 1 level of indirection).
Python has so many indirections for a function call it hardly even makes sense to talk about it in numbers of indirects.
Assembly hello world on my machine : 86,607 cpu cycles (of which < 20 actually in the program)
Syscalls used by the assembly version : 2 (write and exit)
Python hello world on my machine (.pyc was available) : 59,099,731 instructions (including half a million branch misses)
Syscalls used by python to execute 'print "hello, world"' : 1139 (each of which causes a program reschedule)
These programs do the same thing. Programmers often forget that things they take for granted are not in fact free, they may not even be O(1). Memory allocation. Subprocess execution. Function calls in scripting languages. Syscalls. Writing to files. Allocation of bytes on disks. All of these things come at a really, really high cost, and most not even O(1) costs (e.g. memory allocation is O(N^2) on a busy server as long as things actually fit in main memory, and O(N^4) or even worse when using virtual memory).
Sadly using memory does not even have bounded complexity. At some point, just attempting to use virtual memory might cause virtual memory to be allocated just for the lookup. This is generally referred to as "thrashing" and you're very likely to have rebooted your machine before this completes because it'll be frozen for minutes, sometimes hours, if this happens.
Likewise the memory model is useful, but huge. Strings in C++ take one byte + the actual contents of the string. Strings in python take up 60 bytes + twice the length of the string. And that's assuming you just set a variable to the string. If you construct the string, the difference is going to be much bigger.
The point here is that things that are io-bound (esp. memory bound) in python may be cpu-bound in C++ or D, simply because you avoid doing all the indirections that higher level languages do.
> The point here is that things that are io-bound (esp. memory bound) in python may be cpu-bound in C++ or D, simply because you avoid doing all the indirections that higher level languages do.
I think you mean the reverse: "things that are cpu-bound (esp. memory bound) in python may be io-bound in C++ or D".
How many instructions were executed after the interpreter was loaded into memory (a much more realistic analog to the twisted server model)?
Once loaded into memory, any program which is bound by IO to memory (i.e. moving the stack from memory to caches/registers) will show up in tools not as being IO bound, but CPU bound.
And yes, CPU bound programs will benefit greatly from moving the hotspots into a linked module written in C or Cython.
I have no problem with moving away from Python (I'm in the process of doing this myself), but the costs associated with re-writing an entire program (especially one complicated enough to only handle 50 requests per second) are non trivial, and if there was simply a small CPU hotspot, it could have been smoothed away in a number of ways that don't involve learning a new language.
In short, everything points towards OP moving to D because of a personal desire instead of a real business case.
Used to write translators for computer languages. Biggest was PL/M to C. That was easy because PL/M had fewer constructs than C. I managed to recognize constant declarations and map them to #defines or consts which actually made the code More readable.
But these days, languages have features that may be completely orthogonal to other languages. Automatic translation may not be possible. Still it would be by far the cheapest solution.
I thought system calls were packaged into the binary itself and didn't necessarily cause a job to re-schedule. But just caused a context switch to take place, then execution continues.
I thought re-scheduling only happened on interrupt, or a thread reaching a blocked stated.
About 10 years ago I remember prototyping some code on Linux with a perl script running a java program as a "coroutine" (er, service) via request/response pairs over a socket (not http). Then we moved it to AIX, where it was essentially unusable due to the lost time slice each time an IO sys call was made. On Linux, the remaining time slices were recovered and immediately used. On AIX, the time slice was simply lost until the next process scheduler tick. Ouch.
Technically they cause a context-switch and a scheduler run upon return (I belive, not 100% sure), but you're right that does not necessarily result in getting put on the back of the work queue.
If their system was truly just IO bound, then moving to D wouldn't help them.