There was a C library - posted here but I cannot remember the name - which used the preprocessor to avoid invoking a function call per comparison. You'd use it something like this:
#define SORT_ELEMENT_TYPE int
#define SORT_COMPARE(a, b) sign(a - b)
#include "specialized_sort.h"
And the header would define a function with the following signature:
I've written code just like this. Another option is to make the entire sort function a C macro. This would be feasible for something simple like quicksort.
But generics provide another benefit: in theory with them, the compiler should be able to determine that two high-level types are actually the same machine type, so in the above example, SORT_ELEMENT_TYPE int and long would not generate duplicate code on many machines.
It's not necessary to use macros. Just make both the sort function which the comparison function is passed to, as well as the comparison function itself, both "static inline". The compiler is smart enough to figure it out.
Why is this limited to the preprocessor? I imagine the compiler proper could inline the compare function parameter into the qsort implementation body in the typical case, at the qsort() call site. (Assuming qsort's implementation was in a header, as the original comment posited).
You're right. I tried it here https://godbolt.org/z/457dYWfq5 and if the generic function is visible to the call site and inlineable, then the compare function can be inlined too. I didn't know this, I had always thought that the semantics of function pointer prevented this from being achievable!
If both the comparison function as well as the sort function which the comparison function is passed to are declared 'static inline' and available in a header, the compiler is smart enough to figure it out and inline the code, provided optimizations are turned on.
Does the necessary pointer casting in the implementations also make it slower than it should be (the specialized versions generated by compilers that support generics wouldn't have any casting)?
Its not the pointer casting, its the indirect function call that probably can be cached but it will always be another memory location that the CPU will need to jump to.
If the compare function is inlined into the qsort() body, there wouldn't need to be any function call. I think that was the point of the question above.
The way to answer this question is to look at the assembly code generated by the compiler with various optimization levels and see whether the compiler ever inlines the compare function.