Not mentioned: the type-specialized code is larger in code size, which can cause...

bestouff · on May 24, 2022

Same thing in e.g. Rust: you can either monomorphize (expand code for each generic it's used with), use dynamic dispatch via a vtable or (with help from a crate) dispatch via an enum. Each solution has its tradeoffs, you are empowered to choose.

dralley · on May 25, 2022

Granted, "you" does mean you, in practice if you're using dependencies, everyone else has likely chosen static dispatch.

kaba0 · on May 24, 2022

Though C macros are so bad they give the whole term a bad name. You might as well write a sed script for “macro expansion” instead of what the preprocessor does.

abainbridge · on May 24, 2022

I think the case against the C preprocessor is overblown. What's the worst problem C macros have caused you? I've been programming C for 27 years and I can't really think of them causing any bugs in my programs. The worst thing for me is that #defines don't show up in the debug info.

colonwqbang · on May 24, 2022

C programmers tend to reach for the preprocessor in cases where other tools would have been more appropriate. In particular, #ifdef. If you haven't experienced this then maybe you've worked with people that have uncommonly good taste.

I agree there's nothing particularly terrible about the preprocessor as such, if used tastefully. Especially if you compare it to other macro engines like maybe TeX, CPP strikes me as quite simple and clean.

Even newer languages like Haskell choose to adopt it after all.

abainbridge · on May 25, 2022

Good point about #ifdef. I'm not sure what other tools are more appropriate though. I went through a phase of moving target specific code into separate files instead of using #ifdef. But that makes the build system more complex and makes it harder to eyeball the difference between two targets' implementation of the same function. This is just an intrinsically complex problem and not one I blame the preprocessor for. As always, aiming for minimum total complexity is the goal. #ifdef is a useful tool in that persuit.

That said, I'd be interested to hear about other solutions that are better. I guess Zig's comptime is a candidate, but I see that as only a superficial improvement (not a criticism - I can't think of anything it could do better).

colonwqbang · on May 25, 2022

I agree, #ifdef is sometimes the best tool for the job. I didn't mean that it's always wrong, but it does tend to be overused in my experience. For instance, hardcoding assumptions about the target platform when a dynamic lookup could have been used instead. E.g. compare old skool hardcoded addresses in Linux drivers to modern devicetrees.

kaba0 · on May 25, 2022

I don’t think Zig’s comptime is a superficial improvement, especially not compared to C “macros”. They are such a simple, single concept yet replace C++’s templates and const functions.

But a macro system without understanding the AST is just shitty.

Koshkin · on May 24, 2022

Except I do not think you could (not easily, anyway), the C macro processor is a rather sophisticated piece of code.

anon25783 · on May 25, 2022

Is it really though? I feel like I could implement a C macro preprocessor in Python in like two or three days.

wudangmonk · on May 25, 2022

People usually overestimate their capabilities when it comes to such things. In this case I think you are underestimating yourself, I do not see how anyone can write something as bad as the C preprocessor if they had 2-3 days to work with.

gpderetta · on May 25, 2022

Writing a conforming C preprocessor is in fact quite hard. The spec is very hairy.

jjtheblunt · on May 24, 2022

curious what you mean by

> the type-specialized code is larger in code size, which can cause its own performance problems due to icache.

since IIUC you mean the alternative would be essentially non-reified types, so the icache would involve more frequent branches, thus blowing cache in general, even with branch prediction.

anyway i'm pretty sure i'm missing your point, so figured may as well ask.

mcronce · on May 25, 2022

When you ("you" in this comment is probably most usefully read as "a compiler" :) ) monomorphize a generic function or data structure with the various type parameters that it's been called with, you make a copy of the code for every set of type parameters it's been called with.

Each of these copies requires space for the instructions...this means a larger binary on disk, more memory required when it runs, and more pressure on the CPU's caches, especially precious L1I.

The cache pressure issue is typically the big one worth thinking about, although binary size itself can be an issue for some cases - I've primarily got embedded use cases in mind, but I'm sure there are others I'm not thinking of.

Anyway, back to cache pressure. If there are a whole bunch of different monomorphizations of a given function, and they're all called frequently, that could mean that the CPU will frequently need to refer to slower L2$, or much slower L3$ (or much much slower RAM) to load instructions. That's no bueno from a performance standpoint.

Because of this, there are cases when dynamic dispatch can outperform monomorphization - it's definitely not as simple as "vtable slower, monomorphization faster" across the board, even if that is an OK rule of thumb.

ayende · on May 26, 2022

The linker can usually merge identical functions, and the compiler does this too. You'll only get different copies if they are relevant, and usually the branch elimination is more than enough to compensate.

Another aspect about this is that the cache will hold the _hot_ stuff. In most systems, even if you have three copies of a function, it will likely only have one of those that is hot.