Essentially without generics, any kind of polymorphic code needs to use dynamic dispatch. For example, if you add a generic sort() function for collections, you would need to lookup the comparison function for any arguments at runtime. Generic programming - be it through type parameters, ad hoc polymorphism, or whatever - allows a language implementation to know what that comparison function is at compile time and inline it accordingly. This is true for all kinds of things you may want to do generically. Sorting, layout of objects, iteration, etc.
Ultimately generic typing is a way of better expressing the constraints of a program to a compiler, which means the compiler can do a better job of generating code for specific instances. The downside is that the compiler needs to generate code explicitly for each concrete instance of a generic function or type, which means that it takes more compile time and code size is larger.
Edit: I'd also like to point out that JIT compilers and sufficiently-smart AOT compilers can do this sort of inlining if they can prove that for any combination of run time types passed into a sort() they have the same comparison function and inline it as needed. That said it's the same (or higher) complexity as a generics system, with more pitfalls (for example, if they can't prove it but make a guess the sort function never changes, then get the guess wrong, they have to de-optimize).
Monomorphization generally opens the door for faster execution on a modern processor.
If you have a layer of indirection (i.e., no genetics so you need to dispatch at runtime) then you wind up with an additional jump instruction.
While modern processors and branch prediction make jumps relatively cheap, they can’t avoid instruction cache misses if the dispatch is happening frequently.
By removing the layer of indirection, the compiler can choose to inline the implementation instead of using a jmp; this keeps our icache clean, and can lead to faster execution.
However, there’s a real cost to this! Inlining is, in broad strokes, often faster; but that’s not always true if you’re e.g. inlining a large number of instructions into a tight loop. As with all performance, profile to know the truth.
What is more meaningful perhaps is not the lack of jump, but another optimization “pass” after the inline. Let’s say the compare function needs an object passed to it (e.g. in Java’s case). In this case the call site would create two new objects, pass them to the compare function and jump there. Instead in case of an inline and optimization case the call site might avoid object creation entirely if they would only be destructured in the compare part.
The ultra short version of what he is saying is that generics allow you to provide information to the compiler that enables inlining. So instead of doing something like this (pseudo assembly)
PUSH x
PUSH y
CALL compare
CMP x y
POP
You end up with
CMP x y
Of course this will vary based on the type that you're optimizing for but in all cases you're removing the need to push and pop the stack for every comparison. I'm not smart enough to talk about other optimizations enabled by generics though.
Anytime you can move a choice from runtime to compile time, it has the ability to speed up runtime. Such that if a lot of what your sort routine is doing at runtime is finding out what the type of the objects being sorted is, then moving that to a choice at compile time can recover all of that time.
Note, this is not just generics. Java could use overloading for years before generics to get similar benefits. Common LISP has type annotations that can give similar benefits. Generics is just a much easier/stronger to use construct for this in many cases.
This author is making the case in terms of improved performance, which I find interesting.