Friendship ended with the garbage collector

55555 · on Aug 27, 2021

I assume the title is a reference to this meme: https://knowyourmeme.com/memes/friendship-ended-with-mudasir

Arnavion · on Aug 27, 2021

Yes it is.

lbriner · on Aug 27, 2021

This might be a stupid question but can't we mix the best of both worlds in a programming language? To have GC where it makes sense and doesn't add too much overhead but allowing non-GC objects that require a bit more care and where you know you need to free memory yourself or using some basic smart_ptr-style tracking object?

Just like .net allows you to use "unsafe". It is a clear marker that you are stepping outside of the safe zone and need to be careful, but at least you can.

flohofwoe · on Aug 27, 2021

You can already do that in any GC language (to some extent) by drastically reducing the amount and creation/destruction-frequency of garbage collected objects (for instance by reducing "object granularity" and using object pools).

But this means that the programmer suddenly needs to actively think about memory management (and already in the design/planning phase, because later is too late), while the whole selling point of automatic memory management solutions was that this wouldn't be necessary anymore (which was a deception all along, but that's a different topic).

guenthert · on Aug 27, 2021

> You can already do that in any GC language (to some extent) by drastically reducing the amount and creation/destruction-frequency of garbage collected objects (for instance by reducing "object granularity" and using object pools).

Or allocating on the stack (just mind the size of the allocation and the current depth of the stack). Common Lisp offers the DYNAMIC-EXTENT declaration for that: http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec...

rwmj · on Aug 27, 2021

But it's still a better trade off: The programmer can not think about memory management most of the time, but if they want to they can optimize hot paths in the code. Optimization (in all programming languages) often involves complicated and non-intuitive code backed by careful measurement.

pradn · on Aug 27, 2021

Yes, this approach works best with the "premature optimization is the root of all evil" axiom. When I worked on high-performance servers in C#, we would occasionally find ways to 1) reduce allocations 2) make allocations either very short-lived so they stay in the short-term GC timeframe or very long-lived (object pooling/reuse or creating object-aggregates for pseudo-slab allocation).

reecko · on Aug 27, 2021

> To have GC where it makes sense and doesn't add too much overhead but allowing non-GC objects that require a bit more care and where you know you need to free memory yourself or using some basic smart_ptr-style tracking object?

D is designed exactly like that. In fact, using value types and smart pointers (using `typecons`) is kind of required if you want to keep GC pauses tolerable. On the flipside, a simple GC means no additional overheads like write barriers or remembered sets. Additionally, a borrow checker for D has also been in development (I have not personally checked it out to comment on it though).

Someone · on Aug 27, 2021

Some languages have separate “value types” and “reference types” (https://en.wikipedia.org/wiki/Value_type_and_reference_type) exactly for that reason, with value types being copied when passed into or returned from a function, so they never get shared. That makes it easy to determine whether they can be stack allocated.

Intended use is to use value types for small immutable values (small because copying large objects is more costly, and immutable so that the copying doesn’t make a difference, semantically)

That can significantly decrease the number of garbage collected objects and with it memory usage (the smaller the object, the larger the relative overhead of a few bytes of data for use by the garbage collector)

pjmlp · on Aug 28, 2021

Yes you can and plenty of them do.

Have a look at D, Active Oberon, Modula-3, C#, F#, VB (6 and .NET), many BASIC dialects, Eiffel, Haskell (now with experimental lifetimes), Swift (RC is a GC algorithm), C++/CLI, C++/CX, Unreal C++, Common Lisp, among many others.

The big problem is a big lack of quality when teaching CS subjects, so many learn GC and think all GC languages are like JavaScript.

throw149102 · on Aug 27, 2021

I think what you esssentially want is a language with a powerful effect system and substructural typing, along with regions for memory management. This is like what Rust does with ownership semantics and the borrow checker, where normal types are linear/affine, that is, they can only be used (exactly,at most) once. What would be more powerful is having some effect of GC'd function, and some effect of non-GC'd function, and you can only pass the barrier from GC to non-GC if it's a linear/affine type. But when you're in a GC function you can use whatever types you want to. So instead of being stuck with the borrow checker, you can just decide at a drop of a hat to enter GC mode to write your linked list.

rwmj · on Aug 27, 2021

A strategy that works for some GC'd languages is to force a minor collection at some safe point and then run your "non-GC'd" code taking care to limit allocations to what fits in the minor heap. I wrote a small game using this strategy where the safe point happened before waiting for a vertical sync to refresh the screen (where otherwise we'd have been sitting around doing nothing).

janto · on Aug 27, 2021

To some degree you could use a GC only language to create a large pool (eg a numpy array) and "allocate" items in there with indices as pointers. Would cover some of the use cases for manual memory management.

edit: scooped by a sibling comment! :) ah well, here's what you could use to structure objects in a numpy array https://numpy.org/doc/stable/user/basics.rec.html?highlight=...

secondaryacct · on Aug 27, 2021

That s what we do in low latency java in big banks, 3 GC per day, between 5 and 6 am, anything more and 10 people are profiling.

kaba0 · on Aug 27, 2021

Is that truly a good idea in case of the JVM? Of course there will be problems where this is the correct approach, but in certain cases simply letting the very advanced GC do its job is actually faster.

bobbylarrybobby · on Aug 27, 2021

It’s not about speed (throughout) but about latency. If the GC takes (say) several minutes during a period of downtime so that it doesn’t have to run at all during uptime, then it’s worth it.

kaba0 · on Aug 27, 2021

What about ZGC then? If throughput is not the metric to optimize for, ZGC promises less than 1 ms pauses for up to TBs of heap size.

pjmlp · on Aug 28, 2021

Java suffers from Python 2 like syndrome with Java 8.

Those shops aren't going to use ZGC.

Nor they will profit from the several performance improvements, including the free JIT cache across runs with PGO data on Hotspot and J9, that used to be a commercial feature on third party JDKs.

nextaccountic · on Aug 28, 2021

What about a concurrent and parallel GC that doesn't stop the world?

cratermoon · on Aug 27, 2021

The author of the article mentions that trick and links to https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...

winrid · on Aug 27, 2021

I've been impressed by nim's GC. My understanding is it's not super complicated. Not having a big header on each object and having each thread have its own GC goes a long way.

mojuba · on Aug 27, 2021

What happens to objects shared between threads then?

elcritch · on Aug 27, 2021

It depends on the GC you're using with Nim. Guessing the OP means the newer ARC GC, then you can use `move` semantics to move some data to another thread, or you can do a deepCopy. You can do shared data structures but that requires a bit more work.

winrid · on Aug 27, 2021

It becomes more difficult. There are a few supported memory managers by Nim (including Go's GC!), but with ARC you explicitly move or deep copy the structure.

Obviously this is a bit more difficult than say Java, but then you never have to stop the world.

jrochkind1 · on Aug 27, 2021

Similar to ObjC reference counting? (Not sure, I don't use ObjC and am not an expert in it, it just reminds me of what I've heard of it, so honest question).

https://clang.llvm.org/docs/AutomaticReferenceCounting.html

cratermoon · on Aug 27, 2021

Yes, the author has re-discovered reference counting.

YorickPeterse · on Aug 27, 2021

Not so sure, does Objective-C have a single ownership model and move semantics? As far as I know it doesn't, which would make it pretty different.

flohofwoe · on Aug 28, 2021

Objective-C (and I guess Swift too?) has ARC (see link in grandparent) which attempts to statically track the lifetime of objects at compile time. This essentially results in a "move operation" if the compiler detects that the original reference isn't used anymore after the assigment. It's basically halfway between C++'s shared_ptr-vs-unique_ptr approach, and Rust's lifetime and ownership tracking.

pjmlp · on Aug 28, 2021

ARC was a marketing gimmick, after the failure of getting a conservative GC to properly work with the underlying C semantics alongside the mix of frameworks compiled in different modes.

The only thing it does is to automate the retain/release calls of Cocoa classes, or classes that offer similar API.

It doesn't apply to any other C like types from Objective-C, and basically the generated machine code is hardly different than when devs keep doing those calls by hand.

Yes there is some potential for the compiler to remove needless set of retain/release pairs for short lived objects, it doesn't happen all the time though.

Swift, also despite the ARC marketing, made the sensible choice to go with ARC, because one of the key designs was interoperability with Objective-C runtime.

As shown by RCW/CCW runtime used by .NET for interoperability with COM's AddRef/Release, there is a lot of more machinery required to have a tracing GC cooperate with RC scheme, so it was understandable this route wasn't chosen for Swift.

As some papers prove, current performance isn't that great, which is also Swift 5.5 is bringing a more aggressive optimization, which is turned off by default.

Check WWDC 2021, "ARC in Swift: Basics and beyond".

js8 · on Aug 27, 2021

> Of course this approach has its own downside: a program may panic at when dropping an owned value, if it still has one or more references pointing to it.

I wonder if this could be also handled by having a custom error handler available on each created reference, to decide what to do with the "hanging" value, whether to reassign ownership, or invalidate the reference, or panic, or do whatever else.

rwmj · on Aug 27, 2021

Your solution to the supposed overhead of garbage collection is to make every single object fat and have an extra test every time a pointer to an object is dropped?

js8 · on Aug 27, 2021

No, the handler(s) would only be called when the owner wants to dispose the object and there are still references to it.

rwmj · on Aug 27, 2021

How do you know there are still references without counting them somewhere or inspecting all other objects? And what would the handler really do? What common code has other owners willing to take over ownership (and you have to handle this problem recursively too).

js8 · on Aug 27, 2021

In the article, they already assume reference counting of references, see the previous paragraph:

"When the reference goes out of scope, the count is reduced. When an owned value goes out of scope and its reference count is not zero, the program terminates with an error (which I'll refer to as a "panic")."

throwaway81523 · on Aug 27, 2021

Single ownership (std::unique_ptr) works pretty well in C++ a lot of the time, but not always. Reference counting including unique_ptr also have to be locked for updates, slowing down multi-threaded programs, as exemplified by the notorious CPython GIL (global interpreter lock). And the idea of transferring ownership between lightweight processes leads to memory fragmentation within the processes. Erlang deep-copies message contents except for large binaries, so each lightweight process has its own semispace GC and maintains good cache locality. There are also styles of programming (such as using RB trees as persistent data structures with multiple versions sharing parts of the structure) that are not conducive to single ownership.

At the end of the day, in a general purpose language, it's great to have machinery to avoid needing GC when possible, but you really do want to have GC to fall back on.

jonv · on Aug 27, 2021

> Reference counting including unique_ptr also have to be locked for updates

This is not true. The counter itself can have its count updated atomically with a compare & exchange. I think this is what actually happens in most cpp std libs, but haven't checked in a while. (This doesn't mean the object they point to is thread safe though!)

But here's my question: Why do C++ programmers always refer to the overhead of "reference counting", when the overhead is not in the counting, but in the memory management: the use of doug lea's malloc or its derivatives such as ptmalloc3. This is what "new" wraps, and the STL objects all use it to resize themselves too. These routines are written by geniuses, but if you actually read up on how they work, they are doing a lot of (completely necessary) stuff, and one of the longest-running std lib calls you can make. (Way longer than an exp(), for example).

logicchains · on Aug 27, 2021

>Why do C++ programmers always refer to the overhead of "reference counting", when the overhead is not in the counting, but in the memory management

Presumably they're looking at reference counting as an alternative to unique_ptr, and in that context the memory management overhead is already being paid.

throwaway81523 · on Aug 27, 2021

Unique_ptr is basically a pointer to a refcounted object whose refcount is always 0 or 1, and therefore doesn't have to be stored in the object itself. If the unique_ptr is a non-null address than the object is alive (refcount=1), if all the unique_ptrs are null then the object is dead (so it gets freed), and C++'s fancy scope rules and move semantics statically ensure that there is at most one non-null unique_ptr to the object at any time.

I have been under the impression that glibc implements both unique_ptr and shared_ptr (that's the one with arbitrary refcounts) using locks, but ok, if it's done with something like CMPXCHG that's still a heck of a lot slower than ordinary copying.

As for overhead, well, if an object lives for a long time and lots of references are updated or moved around, then that itself can be expensive, besides the malloc/free costs. Copying garbage collectors by comparison are simple and fast, at the expense of making your program gobble a lot more memory, limiting the language so that the gc can know where all the pointers are, and having to stop the program during gc.

spacechild1 · on Aug 27, 2021

> I have been under the impression that glibc implements both unique_ptr and shared_ptr (that's the one with arbitrary refcounts) using locks

Why would unique_ptr ever need a lock?

> As for overhead, well, if an object lives for a long time and lots of references are updated or moved around

If performance is important you only pass a shared_ptr if the callee needs to keep it around, otherwise just pass a plain (const) reference. You can also pass a (const) reference to the shared_ptr and then the callee can decide if they actually need to keep it. Moving shared_ptr is free, because the refcount stays the same.

jonv · on Aug 27, 2021

With std::shared_ptr each copy has a pointer to the ref count & there is only one ref count, atomically updated with a cmpxchg instruction. Not sure what you mean by "references are updated or moved around"?

But I agree that GC has higher thruput than malloc/free as most gc allocations are a single op, at the cost of higher initial memory usage and worst-case latency. That's why "scripting" languages like python / perl use refcounting as they need to start quickly and might not run for very long, whereas java / c# programs tend to be longer running.

loup-vaillant · on Aug 27, 2021

I'd be very surprised if GCC actually used locks for unique_ptr. A unique_ptr owns the object it points to, so we know for a fact that it doesn't need to check anything before deleting it.

throwaway81523 · on Aug 27, 2021

You might be right about that, i.e. I probably derped up. It's been a long while since I looked at it, and the issue might only be with shared_ptr. I'm starting to remember finding out that shared_ptr was surprisingly expensive even in single threaded programs. It looks like some finer grained versions of shared_ptr (like atomic_shared_ptr) arrived recently. I haven't checked into them yet.

BenFrantzDale · on Aug 27, 2021

Typically unique_ptr is represented as a pointer (check sizeof). For shared_ptr, there’s an atomic refcount but no locks. Operations on the shared_ptr object itself aren’t threadsafe (you can’t have the same shared_ptr assigned by two threads at the same time without locking), just like int, but the reference count is atomic, so copies of shared_ptr can safely be shared among threads. Atomic_shared_ptr adds atomicity to the pointer itself. I believe that usually uses locks.

jonv · on Aug 27, 2021

Not sure what you mean here, std::unique_ptr is for when you want to auto destruct a ptr when going out of scope (ref counting limited to one) and std::shared_ptr is normal reference counting, they aren't alternatives to each other but specific things for specific situations?

jcelerier · on Aug 27, 2021

> (Way longer than an exp(), for example).

eh, not that much.

use.cpp:

    double rand_double() { return 16.; }
    int rand_int() { return 16; }
    void use(void*) { }

main.cpp:

    #include <iostream>
    #include <chrono>

    void run();
    int main()
    {
      auto t0 = std::chrono::high_resolution_clock::now();
      for(int i = 0; i < 10000000; i++) {
        run();
      }
      auto t1 = std::chrono::high_resolution_clock::now();
      std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << std::endl;
    }

alloc.cpp:

    #include <cstdlib>

    int rand_int();
    void use(void*);
    void run()
    {
      auto res = malloc(rand_int());
      use(res);
      free(res);
    }

exp.cpp:

    #include <cmath>

    double rand_double();
    void use(void*);
    void run()
    {
      auto res = exp(rand_double());
      use(&res);
    }

I get ~100ms for the exp, ~130ms for the malloc/free at -O3 / -Ofast with both clang and gcc (and checking with perf ensures that the time is indeed spent in these functions). So it's a bit faster, but not exactly fast (or rather, the math functions are really fucking slow)

jonv · on Aug 27, 2021

Excuse me if I have mis-read your sample program but it looks like you are allocating 16 bytes, then immediately freeing it, again and again in a loop. Since you're not allocating anything else, the allocate and free will happen almost immediately. Malloc() has a run time that can vary a lot depending on what has already been allocated & freed in the past, so this test is not meaningful.

Not only that, but the original doug lea's malloc has a bin specifically for 16 byte allocations: http://gee.cs.oswego.edu/dl/html/malloc.html (the smallest size), so you've specifically chosen a very fast example.

In order to properly test malloc() you have to pre-generate some pseudo random numbers for your allocation sizes, and allocate and free unpredictably. But it's more complicated than that, as in the very occasional case when malloc() has to ask the OS for more heap, that system call may not "finish" until your program first writes to that region of memory... (ie malloc() returns but hasn't finished!)

jcelerier · on Aug 27, 2021

I actually tried with up to 256 bytes and got the exact same timings. But sure, that's the absolute possible best case considering glibc thread pool, but it's (allocating and freeing in a loop) still a pretty common case

jonv · on Aug 27, 2021

But when you enter the loop you have no other "objects" in your program allocated at all, so it doesn't have to do a search for a suitably sized block. Near-identical timings for changing the size to 256 is what I would expect.

quietbritishjim · on Aug 27, 2021

That's definitely very interesting. I do wonder if malloc and free would take longer if there would multiple threads in the program, which is more relevant to this discussion; surely exp would not, assuming sufficient free cores. Even more interesting would be if you malloced in one thread and freed in another, given that allocators usually make use of thread-local pools.

pjc50 · on Aug 27, 2021

.. are you sure it's not the rand functions that are slow?

arc-in-space · on Aug 27, 2021

You mean the ones that RET 16? My computer person instincts tell me no, it's not those.

jcelerier · on Aug 27, 2021

the rand functions are

    double rand_double() { return 16.; }
    int rand_int() { return 16; }

here are profiling traces:

https://imgur.com/a/5vkblDr

rwmj · on Aug 27, 2021

> The counter itself can have its count updated atomically with a compare & exchange.

At some point you have to consider how that hardware implements atomic variables and cmpxchg, because it is certainly not free.

danachow · on Aug 29, 2021

> Why do C++ programmers always refer to the overhead of "reference counting", when the overhead is not in the counting, but in the memory management:

You’re mistaken. In a multi core or multi processor system, atomic operations necessary for shared RC are relatively expensive. Heap allocation cost is paid for any dynamic object no matter the type of GC/RC used, so that is not what is being compared.

pjc50 · on Aug 27, 2021

The traditional alternative is what I might call "static reference counting", where the programmer knows by inspection/convention (C) or through the type system (Rust) how many references exist at any point statically, so they can skip the dynamic reference count entirely.

I agree that malloc/new is comparatively expensive, but that ends up as a cost borne by all heap-allocated variables in every language. Unless people use special techniques like pool or slab allocators.

loup-vaillant · on Aug 27, 2021

There's a huge difference between C/C++'s malloc() and a precise garbage collector: with the former all objects are fixed. With the latter you can move them around. This means malloc() is more constrained than a GC, and that makes it slower.

The real gains of managing memory manually is not in the speed of malloc(), but in the fact that in practice you can perform far fewer heap allocations.

zozbot234 · on Aug 27, 2021

Rust also allows for non-thread safe reference-counted objects (Rc<>) that don't have the overhead of atomic instructions for updating the reference count. And it's getting support for custom local allocators, addressing the overhead of malloc/free. Along with a bunch of further advantages compared to C++ shared_ptr (such as being able to dispense with a pointer indirection in some cases, and allowing for sub-borrows that don't even need a refcount update because they can't possibly affect the 'ownership' of the object).

YorickPeterse · on Aug 27, 2021

In Inko, the reference counts are regular integers, not atomics. They don't need to be atomic because the same object can't be used by different processes.

Moving objects between processes itself doesn't lead to fragmentation, as processes don't have their own heap in the data structure sense. That is, the structures representing a process don't have some sort of "heap" field. Instead, the OS threads running processes maintain the heaps (each thread has its own heap). This brings several benefits:

1. Since the heap is physically detached from processes, we can move without copying

2. Since the heap is still thread-local, allocations don't need synchronisation

3. Since the heap is detached from the processes, each process is smaller, allowing for more processes to run concurrently

4. Since objects (and their children) can only be owned by one process, we also don't have to worry about multiple processes trying to write into the same object

This is only possible because we statically guarantee that a value `T` can't be _shared_ between processes, and because we'll disallow sending references between processes.

vbezhenar · on Aug 27, 2021

May be it's time to implement hardware CoW for RAM? Or can OS do it efficiently?

throwaway81523 · on Aug 27, 2021

I don't understand what you're getting at, but you can implement CoW at the page level with MMU hardware. That's how Redis dumping works, for example (the process forks into two, with one dumping to disk and the other continuing to take updates, CoW'ing pages that get modified and sharing the rest). What would CoW even mean inside a single address space? The idea of persistent data structures (as used in functional programming and other places) is that you never visibly modify anything, but instead make new objects that share structure with the old objects, relying on the GC to clean up data that stops being reachable.

BenFrantzDale · on Aug 27, 2021

I don’t know about that, but I am using stlab::copy_on_write in production and it’s amazing. It’s like a cross among a T, a unique_ptr<T>, and a shared_ptr<const T>. https://github.com/stlab/libraries/blob/main/stlab/copy_on_w...

chmike · on Aug 27, 2021

I don't see how CoW would work with threads. It doesn't make sense to me.

foota · on Aug 27, 2021

That sounds an awful lot like the x86 cache coherency protocol.

hutzlibu · on Aug 27, 2021

What is CoW?

detaro · on Aug 27, 2021

copy-on-write

ameixaseca · on Aug 28, 2021

You can implement a linked list in safe Rust (even a doubly linked list if your prefer), there are a number of examples available if you google for it - for instance:

https://gist.github.com/matey-jack/3e19b6370c6f7036a9119b79a...

In fact, the Rust book explicitly mentions doubly linked lists when discussing Pin:

https://doc.rust-lang.org/std/pin/index.html#example-intrusi...

The ownership system might make it less trivial to implement, but the statement that you cannot implement it in safe Rust is incorrect regardless.

dexen · on Aug 27, 2021

I like this approach, and exploration of alternatives to popular solution is good. In particular the simple approach to circular structures looks good - including the reverse lexial order deallocation.

Shadonototro · on Aug 27, 2021

GC can be a good thing, when you want to quickly sketch a new program of simply want to act as a scripting language and do a task as quick as possible without too much thinking

That said, i'd still prefer a simple Reference Counting than a full GC as it's quite memory hungry and collections introduce pauses.. wich is not a desirable behavior at all, even though pauses can be solved, i'm no a fan of the idea that a language manage my memory in unpredictable ways

The reason i use D is because i can use it just like C/C++ (with modern niceties such as module, slices, metaprogramming and fast compile time) with my own allocators, completely ignoring the GC

But whenever i need a quick and dirty "script" (it almost never happen), instead of using bash or python, well i simply just use D with its GC, but again, as i said above, i'd prefer a simple RC instead.. but yeah.. it's no big deal since i prefer to manage memory my way anyways

A perfect language understands your intent without impacting your workflow with slow compile time due to heavy, restrictive and slow compile time static analysis (borrow checker for example)

aszen · on Aug 28, 2021

I recently watched an interesting talk about a new language and the author quite well described that the best memory management strategy varies by use case.

So web servers benefit from arena allocation

GUIs benefit from ARC

Short running programs can benefit a lot from just leaking memory.

JaggerJo · on Aug 27, 2021

not directly related to this post, but it’s mentioned that ‘async’ spawns a process. This seems very heavy.

I guess Process does not refer to an OS Process here?

jstx1 · on Aug 27, 2021

Concurrency and multiprocessing terminology is the worst. Trying to learn about it, it seems like every language is doing their own thing with their own set of terms and definitions.

YorickPeterse · on Aug 27, 2021

My bad for not making it more clear: it refers to a lightweight process here (e.g. like Erlang), not an OS process. Merely spawning these is basically just an allocation (Inko doesn't use PIDs either), so about as fast as it gets.

habibur · on Aug 27, 2021

We have gone full circle with GCs.

It was reference counting in the 80s and 90s. And then tracing GC people stepped in. "Reference counting is worse of both worlds. It's slower than GC because all these memory have to be touched!"

But real life experience continues to show ref counting doing better. And GC users pre allocating object pool to avoid GC pauses. Proponents still in denial. "What if a large object is deleted? That will also create GC pause like delay."

And now here we stand. Back to reference counting.

flohofwoe · on Aug 27, 2021

> But real life experience continues to show ref counting doing better.

Hmm... not in my experience at least, for instance Apple's ARC (which should be much better than "dumb" shared_ptr refcounting because the compiler does some static analysis and drops redundant retain/release operations) still can have a shockingly high runtime overhead and requires careful manual tweaking and general handholding to the point where the traditional manual memory management results in simpler code, at least in hot code paths.

Now GCs may well be worse, but performance-wise they really can't be much worse than ARC. ARC may sometimes prevent unpredictable GC spikes, but only by spreading out the same (or worse) cost along the timeline (which is at least something, but many GCs simply haven't been designed for the "prevent spikes" requirement).

There simply is no silver bullet for dynamic memory management, at least if both memory usage and performance matters.

chronial · on Aug 27, 2021

Which real life experience are you referring to? Go has an extremely fast GC that has minimal pauses. Python still doesn't have "proper" multithreading because RC is too slow.

The problem with rc is not memory access but synchronization.

coldtea · on Aug 27, 2021

>Python still doesn't have "proper" multithreading because RC is too slow.

RC speed is not even close to the reason Python doesn't have proper multithreading.

(The GIL, and API/ABI guarantees to third party C-based extensions making it difficult to remove it, is more like it. The GIL was added to aid in RC atomicity - but it's not the speed of RC that's the issue).

loup-vaillant · on Aug 27, 2021

> Go has an extremely fast GC that has minimal pauses.

Actually, I've heard that Go sacrificed speed big time to minimise its pauses. Garbage collectors generally face a latency/throughput tradeoff, and I believe Go is no exception.

That said, fast GC with reasonable pauses existed long before Go. I personally know of OCaml and its generational, incremental GC. I expect Go built on that knowledge to find its own sweet spot.

jlouis · on Aug 27, 2021

Any memory allocation scheme faces that trade-off, more or less; especially so in a concurrent environment that also employs parallelism.

loup-vaillant · on Aug 27, 2021

Precisely. That’s why I was suspicious of this claim that there is a GC out there that is both "extremely fast" and has "minimal pauses". Something’s got to give.

amelius · on Aug 27, 2021

Has anyone tried to build a transpiler from Python to Go?

chronial · on Aug 27, 2021

There are versions of Python without RC: https://www.python.org/download/alternatives/. There is no doubt that Python would be better off without RC, the problem is that Python extensions rely on RC. So CPython (the main python implemenation) can't just do the switch.

If you want more information about this topic, there is a nice talk by Larry Hastings: https://www.youtube.com/watch?v=pLqv11ScGsQ

amelius · on Aug 27, 2021

Thanks. It seems that the JyNI project is trying to make a bridge between CPython and Jython, so from that I take that it is somehow possible to have extensions which more or less rely on RC (or at least the C API) while you can use a GC (Java's GC in this case) at the same time.

https://www.jyni.org/

p_l · on Aug 27, 2021

Unfortunately, the real test of "is this language proper python" is "it behaves exactly as the horrible bytecode interpreter in PyEvalFrameEx C function".

So it's hard to make alternative implementations that don't die.

habibur · on Aug 27, 2021

Which one would you say is a relatively better bytecode interpreter? I was looking for one.

p_l · on Aug 27, 2021

One that has a spec and test suite for the implemented VM, for starters.

kungito · on Aug 27, 2021

We have not gone full circle. We are just expanding into various scenarios where a gc language is preferable to non gc language and vice versa. There will never be a perfect compiler nor do we always want to do everything manually.

pjmlp · on Aug 28, 2021

Not really, only marketing makes RC seem better than tracing GC.

https://github.com/ixy-languages/ixy-languages

https://forums.swift.org/t/swift-performance/28776

It is no wonder that Swift 5.5 brings aggressive compiler optimisations that can even break code that naively relies on basic ARC behaviour.

RC also has pauses, pretty lengthy ones with cascade deletions, and can even lead to stack overflows.

"CppCon 2016: Herb Sutter “Leak-Freedom in C++... By Default""

https://youtu.be/JfmTagWcqoE

Which is why as GC algorithm reference counting is only for toy implementations.

Any reference counting implementation that cares about performance in multicore environments is almost indistinguishable from a tracing GC implementation.

In fact, C++/WinRT, uses background threads to diminish the performance impact of calling destructor and cascading deletions.

https://devblogs.microsoft.com/oldnewthing/20191018-00/?p=10...

habibur · on Aug 28, 2021

Game developers tell a different story.

But you can find on net whatever you seek, whatever you are looking for.

> Which is why as GC algorithm reference counting is only for toy implementations.

That's what I heard from last decade. And then experience has outgrown likes or dislikes.

pjmlp · on Aug 28, 2021

The game developers that use Unreal C++?

The fact that Swift and Objective-C get outperformed in any research about automatic memory management is quite real, not urban myths.

xxs · on Aug 27, 2021

>real life experience continues to show ref counting doing better

What experiences would be that? "GC pauses" exist in stop-the-world garbage collector's phase. If you need a concurrent one - they exist just as well.

chronogram · on Aug 27, 2021

Ever since Java became immensely popular GC has been just as popular, but that didn't mean reference counting ever went away. They're branching paths and different tools and more people than ever before pick and choose what to use for the job. No circle was ever made.

habibur · on Aug 27, 2021

Right Java made GC popular. And right now Rust is making reference counting, borrow-owner popular. This is what I am observing.

That languages get popular is no surprise. But what was a little bit surprising for me was that the tech the language uses gets spread and popular all over -- outside of that language as long as that language stays popular.

I right now am think automatic reference counting is the way to go. In a sense C++ RAII is also ARC. But a compiler level support is better.

pjmlp · on Aug 28, 2021

Swift and Objective-C like performance? No thanks.

MaxBarraclough · on Aug 27, 2021

Borrower-checking is a significant development over traditional manual memory management, rendering various categories of error impossible. Before borrower-checking, we could really only do this by using GC.

deepsun · on Aug 27, 2021

How do you deal with cycles in reference counting?

hgs3 · on Aug 27, 2021

A language can introduce the concept of a strong and weak reference or use a cycle detection algorithm.

deepsun · on Aug 27, 2021

Wouldn't cycle detection algorithm basically become a GC under a different name?

hgs3 · on Aug 27, 2021

RC is GC and whether a cycle detection algorithm uses tracing depends upon the algorithm itself. Unlike traditional tracing, a cycle detection algorithm can leverage knowledge of the reference counter which opens new possibilities. In fact, cycle detection doesn't necessarily need to be run at all - only in the ambiguous cases where RC alone isn't enough. When a code path does necessitate cycle detection, the algorithm could run immediately and, depending upon the nature of the cycle/how localized it is, the runtime could cache the knowledge for next time. Even where it must always be run, the pauses are smaller and deterministic which means it avoids the long/unpredictable/sporadic pause times tracing collectors are infamous for.

raverbashing · on Aug 27, 2021

I agree.

"Garbage collecting", so, you mean, you're leaving garbage around then? Why?

With RC you're acting on the deallocation as soon as you are allowed to. Sure, you don't need to actually deallocate at that time, but you're acting upon it.

"Oh but what about loops" well, you can work around those (and they are rare).

My point is not that GCs are not useful. My point is that there is a better way of dealing with allocations if you know your object is not used anymore and not just treating memory as infinite.

p_l · on Aug 27, 2021

And then you get some pretty hilarious pauses and stuttering because such naive refcount system suck on all metrics other than simplicity of code (and even that is arguable).

And despite what some people think, malloc/free are not O(1) free lunch.

zozbot234 · on Aug 27, 2021

You can use arenas for the cases where you really can't do without delayed/batched deallocation. But deterministic deallocation as provided by RC is a very good default for many cases, including where low-latency is a goal.

p_l · on Aug 27, 2021

RC is not deterministic. That's the whole thing - when you free a refcounted object, you do not deterministically know how long is it going to take, and on systems that try to avoid RC-induced pauses, you don't get to know when the memory is going to be released.

If you want to talk determinism, you need to actually invest into deterministic behaviour - something that either requires dropping dynamic memory allocation altogether, or tends to go for GC with known determinism guarantees (for example, IBM Metronome).

raverbashing · on Aug 27, 2021

Well, naive code is going to suck, be it RC or GC

> And despite what some people think, malloc/free are not O(1) free lunch

Of course. Be it through a GC allocator or not. If you're at the point where the allocator is giving you trouble then GC/RC is the least of your problems ;)

alpaca128 · on Aug 27, 2021

> not being able to implement certain patterns in safe code (e.g. linked lists)

This may be correct for doubly linked lists in the classical implementation, but I've never needed one outside of programming exercises. Trees can be safely implemented in at least 2 ways. Rust is limiting, but usually more in terms of approach, not possibilities. As long as the program isn't interacting with the "unsafe" outside environment it's usually no problem avoiding unsafe blocks completely.

tick_tock_tick · on Aug 27, 2021

ehh depends on what you're actually coding one job intrusive doubly-linked list were everywhere and very appropriate while my current job never touch them at all.

arethuza · on Aug 27, 2021

I believe the Linux kernel has its own implementation of doubly-linked lists:

https://0xax.gitbooks.io/linux-insides/content/DataStructure...

Edit: Not that I know much about the Linux kernel, but I do remember it being used as a counter example when someone stated before that nobody uses doubly-linked lists.

zozbot234 · on Aug 27, 2021

Rust developers are exploring new safe ownership patterns (such as 'QCell' and 'GhostCell') that should make it feasible to implement double linked lists without unsafety. Not very simple, though: these solutions do come with an inherent tradeoff wrt. lack of modularity.

JoeAltmaier · on Aug 27, 2021

I'm amused when I see all the hoops folks jump thru to solve 'memory reference bugs'. Garbage collectors 'leak' if you forget to null a reference in scope after you're done with it. Those null pointers can fault if used afterward.

Its as much work to use a garbage collector efficiently as it is to allocate and delete your own memory. And with about the same bugs. It's been problematic from the start.

Not a fan of automatic systems to 'manage' memory. Overhead for automatic systems create lag, latency and bloat. I'm old-fashioned, and create code that has strict ownership of allocated memory by subsystem. If a module that allocates also frees, leaks become negligible. If references are not shared but accessed formally, nulls become returned errors etc.

Discipline is hard, but leaky buggy garbage-collecting behemoths are hard to live with too.

messe · on Aug 27, 2021

You might be interested in Zig[1]. Allocation is manual. Standard library containers and functions require an allocator to be passed to them, so nothing allocates on the heap unless it's explicitly told to. Cleanup can be done at end of scope via defer / errdefer.

Seamless interop with C, even when cross compiling which Zig handles natively: the standard install of Zig comes with a "zig cc" wrapper around clang that allows you to cross compile[2][3] C and zig code for any number of target architectures (it includes C standard library headers and code for them).

Zig is the only language I've come across which functions perfectly as a better C.

[1]: https://ziglang.org

[2]: https://actually.fyi/posts/zig-makes-rust-cross-compilation-...

[3]: https://andrewkelley.me/post/zig-cc-powerful-drop-in-replace...

nine_k · on Aug 27, 2021

I think you slightly overstate the problem.

Most of the time (in my practice, like 99% of the time) you don't have to care about nulling references. The objects that need collection are created in local scopes of functions, and get detached and ready for GC right when the function returns. When you have many small functions, this happens often, so unused objects don't spend much time being referenced and ineligible for GC.

The problematic part is long-living mutable structures, which you hopefully have few, and know well where they are in your code.

Also, as an exercise, I suggest that you try to imagine a Lisp with explicit memory management.

JoeAltmaier · on Aug 27, 2021

That's all fine for locals. How about members of objects? The object may be long-lived, the buffer it uses may be short-lived, and it can hang around unless you remember to null it.

Remembering to null references is not much easier than remembering to delete things. It's trading one issue for another.

And now there's two ways to remember to manage memory.

nine_k · on Aug 27, 2021

Easy: objects should not normally have members with different lifetimes, and ideally any mutable members at all.

There are exceptions to that, like queues and caches, but they are few and should be closely watched.

echelon · on Aug 27, 2021

I'm laughing, because Rust solves all of this and is a simple and easy language to learn. No GC, ref counts when you want or need them, and manual memory management that is a breeze and proven at compile time. I hate even saying this because I get the feeling it scares away newbies that think these concepts are hard.

I already write web services in Rust. It's no more difficult than Java or Python.

I can't wait for the ecosystem to heat up more, because it'll make it compelling to switch to Rust at work.

zozbot234 · on Aug 27, 2021

Rust is certainly easier to teach and learn than some other languages (think C++) but learning it effectively still requires quite a bit of discipline. It's not really made for the bug prone cowboy-coding style that seems to be the norm in more 'dynamic' languages (Python, JavaScript, Ruby etc.) and to a lesser extent in more mainstream, 'enterprise' friendly languages like Go, Java/Scala/Kotlin, C#/VB.NET, Erlang etc.

echelon · on Aug 27, 2021

Fair point. If you want to hack something together and don't care too much for the details, Rust will slow you down.

But if you care enough to be writing tests or put lots of thought into API and schema, then maybe you're in the disciplined camp that would get a ton of benefit from switching.

If you're using C++, you should switch for any greenfield project that doesn't need to use legacy libraries. (Writing wrappers for C libraries isn't hard at all.)

pjmlp · on Aug 28, 2021

What about using C++ in LLVM to support rustc?

traverseda · on Aug 27, 2021

>Its as much work to use a garbage collector efficiently as it is to allocate and delete your own memory.

Typically I just use weak references or weak reference sets/dicts/whatever. That's python for you though.

I presume you mean it's just as hard in some specific language, not just as hard in general.

zabzonk · on Aug 27, 2021

As someone who has spent most of his career writing high-performance server-based software, I've never really understood why people like GC. Even back in the day when I was writing BASIC programs (a long time ago now) I thought GC was pants, and spent some considerable time trying to avoid it happening.

How to avoid it? Think. Be careful. Write good test suites. Run them a lot. It really isn't so hard.

alpaca128 · on Aug 27, 2021

> Think. Be careful. Write good test suites. Run them a lot. It really isn't so hard.

This mindset lead to countless security problems and other bugs in all kinds of software. Even now we still find exploits in sudo and openSSL. And the problem wasn't developers leaning back and saying "today I'm going to program carelessly and do a bad job because why the hell not".

zabzonk · on Aug 27, 2021

Do we have good test suites for those programs? If so, could you direct me to the code for them? And of course these are run-once programs (not that excuses them)- I am talking about 24/7 stuff.

kaba0 · on Aug 27, 2021

A run-once program can just as easily compromise the system with a single memory bug and some clever ROP programming.

pjc50 · on Aug 27, 2021

You get the computer to do slightly more work in exchange for significantly easier development, plus you've obliterated entire categories of bugs that can be hard to trace (use-after-free etc).

GC was also originally developed for LISP; I'm not sure it's even conceptually possible to have a manually-manged lisp, although I think there's a refcount implementation somewhere.

_ph_ · on Aug 27, 2021

It would be possible to have manually-managed Lisp. I think there is actually nothing in the standard which explicitly asks for a GC being present - just that there is no "free" function in the standard either. The GC of SBCL is to my knowledge written in Lisp itself. Carefully so, that no heap allocations are caused by the code.

tgbugs · on Aug 28, 2021

This is not accurate. The SBCL runtime and GC are written in C because there are better tools in C for debugging the kinds of errors that one encounters at that level of the system. I can't find the note in the SBCL source repo but if you dig around a bit you will find it.

pjmlp · on Aug 28, 2021

Which is anyway just one implementation.

Many tend to mix the choice of implementation language, with other criteria that the authors cared more about.

pjmlp · on Aug 28, 2021

Common Lisp allows for manual memory management on the low level layers.

That is after all how Lisp workstations like Interlisp-D or Lisp Machines were implemented, and those features carried on into Common Lisp.

opportune · on Aug 27, 2021

I’ve been writing high performance C++ for a while now and this is true. We rarely have memory leaks or use-after-free and when we do, they’re pretty mild. It is one of the easier problems to avoid, especially if you enforce standards on developers to avoid the most common mistakes (eg the new keyword should be avoided if possible unless providing an rvalue to initialize something safer). I can imagine new or bad developers without strong guidance, or under extreme pressure to ship, would have a harder time with it though.

Concurrency is a much harder problem and something we deal with more often. I would consider memory management close to not a concern.

zabzonk · on Aug 27, 2021

> Concurrency is a much harder problem and something we deal with more often. I would consider memory management close to not a concern.

Absolutely. The longest lasting bug I've ever had was in some multi-threaded code in a trading server (and nothing to do with memory - shared resources getting corrupted). When I tracked it down (after about 6 months!) I repeatedly banged my head on my desk, yelling "You idiot, you idiot, you idiot!" - referring to me.

Compared to stuff like this memory leaks are really easy to test for, detect, and fix. Anyone can write a test harness to see if there is a leak.

coldtea · on Aug 27, 2021

>Even back in the day when I was writing BASIC programs (a long time ago now) I thought GC was pants, and spent some considerable time trying to avoid it happening.

Could it be that you never shed the bias from your young 80s BASIC experience, despite the world around you changing?

zabzonk · on Aug 27, 2021

70s actually. And the world hasn't changed - if you want efficient, performant code that manages resources (not just memory) correctly, you don't want GC.

djur · on Aug 27, 2021

What BASIC were you using in the 70s that had garbage collection?

zabzonk · on Aug 27, 2021

Among others https://www.wikiwand.com/en/BASIC-PLUS

djur · on Aug 28, 2021

Thanks! That's really interesting. There's a lot of material out there about BASIC for early microcomputers but most of what I've read about minis has been more focused on languages that were considered interesting to "hackers" (i.e. Lisp and C).

alephu5 · on Aug 27, 2021

Why use a language with a GC at all if you don't like it?

MaxBarraclough · on Aug 27, 2021

> It really isn't so hard.

This doesn't align with the continuing serious security problems associated with memory-management bugs.

> The Chromium project finds that around 70% of our serious security bugs are memory safety problems.

https://www.chromium.org/Home/chromium-security/memory-safet...

djur · on Aug 27, 2021

Why not just use a language without GC?

zabzonk · on Aug 27, 2021

I do. I'm suggesting that others do too.