The author cites "concern about binary size" bloat as a valid concern his collea...

steveklabnik · on Aug 23, 2020

Yes, if you were to duplicate the code, you'd be doing the same thing as monomorphization. But it's possible that they wouldn't actually duplicate it, and do something else that would be smaller, like, casting to a void pointer or something.

The compiler can do some optimizations like this, yes, but there's a lot more work to do in this area.

As an example of something folks still do by hand sometimes, take the starts_with method on Path: https://doc.rust-lang.org/stable/std/path/struct.Path.html#m...

It looks like this:

    pub fn starts_with<P: AsRef<Path>>(&self, base: P) -> bool {
        self._starts_with(base.as_ref())
    }

    fn _starts_with(&self, base: &Path) -> bool {
        iter_after(self.components(), base.components()).is_some()
    }

Why split this into two functions? Well, you can get smaller code size this way, because you're sort of "hand-de-duplicating" the parts that aren't generic. Once we call .as_ref, everything else is actually identical, but the compiler isn't good enough at this yet to do this itself, so we do it by hand.

I wouldn't say this technique is super super common or well-known, just for the standard library (because it's used everywhere) and for folks that are sensitive to code size, like embedded or wasm people.

habitue · on Aug 23, 2020

Additionally when you make something easy, people do it more than they would if it were annoying and manual. You might imagine that if you had to hand roll everything you'd be more cognizant of how many copy/pastes you did.

__cuervo · on Aug 24, 2020

This is exactly what I meant. Thanks Steve for the clear explanation!

In the next part of the article I'll hopefully explore some drawbacks of choosing generics all the way down. Either driver would've likely perform a little better if developed fully independently, but by a smaller margin that I would've expected.

bestouff · on Aug 24, 2020

There's already a crate for that: https://lib.rs/crates/momo

mlindner · on Aug 24, 2020

Wouldn't "dyn" get you the equivalent in Rust? Runtime generics as opposed to compile time.

steveklabnik · on Aug 24, 2020

Nope; dyn is dynamic dispatch, this is statically dispatched.

It is true that dyn means you don't get monomorphization, and can help with binary sizes.

EDIT: thinking about this some more, I wanted to say that it does feel similar, but one big difference that’s easy to explain is that dyn will change the way the value is represented in memory, and this will not. Conceptually, both do “cast and then call this single function”, but in the dyn case, the cast would be to a trait object, whereas this casts directly to &Path.

ridiculous_fish · on Aug 23, 2020

Generics can increase code bloat because they give the compiler less information, or the information is available too late.

For example, see how switching from a concrete to a generic type increases the size of clone() by 4x: https://rust.godbolt.org/z/qbYr3v

This is because at the point the compiler synthesizes clone() it has less information about the type.

codeflo · on Aug 23, 2020

Very interesting. Does this mean certain optimizations run before the monomorphization step? Do you know why? Compilation performance is the obvious thing that comes to mind.

ridiculous_fish · on Aug 24, 2020

Yes there are optimizations that run before monomorphization; this one occurs during macro expansion.

It's a bit of a chicken-and-egg problem. To monomorphize clone(), someone must emit an implementation first. But an optimal implementation requires analyses that aren't available until later in the pipeline. Here the optimization kicks in for types deriving Copy, but a generic parameter is enough to defeat it.

eddyb · on Aug 25, 2020

IIRC, that "optimization" mostly avoids wasting time compiling a complex `Clone` implementation, when simply returning `*self` suffices (there are some crates with at lot of `#[derive(Copy, Clone)]` types). We try to avoid having a lot of logic like that too early, for precisely the reasons you mention.

I'd be interested in an example where LLVM can't optimize the general version, as it means we might want to do this through MIR shims instead (which can be generated when collecting the monomorphic instances to codegen - this is what happens when you clone a tuple or closure, for example).

eddyb · on Aug 25, 2020

Did you mean to link to a different example, or different compiler flags?

The link you provide shows only one function, because LLVM has optimized both to be identical, and deduplicated them.

(If you disable the "Directives" filter, you can see a `.set example::clone_concrete, example::clone_abstract`, which aliases one to the other)

5462 · on Aug 26, 2020

The behavior differs between (the present) nightly and rustc 1.45.2; the nightly available when this link was posted matched the 1.45.2 behavior.

The output with 1.45.2 is as follows:

  example::clone_concrete:
        mov     eax, edi
        ret

  example::clone_abstract:
        mov     ecx, edi
        and     ecx, -256
        xor     eax, eax
        xor     edx, edx
        cmp     dil, 1
        sete    dl
        cmove   eax, ecx
        or      eax, edx
        ret

eddyb · on Aug 27, 2020

Fascinating coincidence! It was probably the LLVM upgrade (https://github.com/rust-lang/rust/pull/73526) landing, probably before the comment was even posted (but the nightly would only show up with the upgraded LLVM the next day).

kev009 · on Aug 23, 2020

In C, because you often use void * for containers, there may be less code bloat than a container multiply used using generics which accessors are monomorphized for each type. In practice this is pretty trivial difference and can be bounded and understood well enough.