This might be a stupid question but can't we mix the best of both worlds in a programming language? To have GC where it makes sense and doesn't add too much overhead but allowing non-GC objects that require a bit more care and where you know you need to free memory yourself or using some basic smart_ptr-style tracking object?
Just like .net allows you to use "unsafe". It is a clear marker that you are stepping outside of the safe zone and need to be careful, but at least you can.
You can already do that in any GC language (to some extent) by drastically reducing the amount and creation/destruction-frequency of garbage collected objects (for instance by reducing "object granularity" and using object pools).
But this means that the programmer suddenly needs to actively think about memory management (and already in the design/planning phase, because later is too late), while the whole selling point of automatic memory management solutions was that this wouldn't be necessary anymore (which was a deception all along, but that's a different topic).
> You can already do that in any GC language (to some extent) by drastically reducing the amount and creation/destruction-frequency of garbage collected objects (for instance by reducing "object granularity" and using object pools).
But it's still a better trade off: The programmer can not think about memory management most of the time, but if they want to they can optimize hot paths in the code. Optimization (in all programming languages) often involves complicated and non-intuitive code backed by careful measurement.
Yes, this approach works best with the "premature optimization is the root of all evil" axiom. When I worked on high-performance servers in C#, we would occasionally find ways to 1) reduce allocations 2) make allocations either very short-lived so they stay in the short-term GC timeframe or very long-lived (object pooling/reuse or creating object-aggregates for pseudo-slab allocation).
> To have GC where it makes sense and doesn't add too much overhead but allowing non-GC objects that require a bit more care and where you know you need to free memory yourself or using some basic smart_ptr-style tracking object?
D is designed exactly like that. In fact, using value types and smart pointers (using `typecons`) is kind of required if you want to keep GC pauses tolerable. On the flipside, a simple GC means no additional overheads like write barriers or remembered sets. Additionally, a borrow checker for D has also been in development (I have not personally checked it out to comment on it though).
Some languages have separate “value types” and “reference types” (https://en.wikipedia.org/wiki/Value_type_and_reference_type) exactly for that reason, with value types being copied when passed into or returned from a function, so they never get shared. That makes it easy to determine whether they can be stack allocated.
Intended use is to use value types for small immutable values (small because copying large objects is more costly, and immutable so that the copying doesn’t make a difference, semantically)
That can significantly decrease the number of garbage collected objects and with it memory usage (the smaller the object, the larger the relative overhead of a few bytes of data for use by the garbage collector)
Have a look at D, Active Oberon, Modula-3, C#, F#, VB (6 and .NET), many BASIC dialects, Eiffel, Haskell (now with experimental lifetimes), Swift (RC is a GC algorithm), C++/CLI, C++/CX, Unreal C++, Common Lisp, among many others.
The big problem is a big lack of quality when teaching CS subjects, so many learn GC and think all GC languages are like JavaScript.
I think what you esssentially want is a language with a powerful effect system and substructural typing, along with regions for memory management. This is like what Rust does with ownership semantics and the borrow checker, where normal types are linear/affine, that is, they can only be used (exactly,at most) once. What would be more powerful is having some effect of GC'd function, and some effect of non-GC'd function, and you can only pass the barrier from GC to non-GC if it's a linear/affine type. But when you're in a GC function you can use whatever types you want to. So instead of being stuck with the borrow checker, you can just decide at a drop of a hat to enter GC mode to write your linked list.
A strategy that works for some GC'd languages is to force a minor collection at some safe point and then run your "non-GC'd" code taking care to limit allocations to what fits in the minor heap. I wrote a small game using this strategy where the safe point happened before waiting for a vertical sync to refresh the screen (where otherwise we'd have been sitting around doing nothing).
To some degree you could use a GC only language to create a large pool (eg a numpy array) and "allocate" items in there with indices as pointers. Would cover some of the use cases for manual memory management.
Is that truly a good idea in case of the JVM? Of course there will be problems where this is the correct approach, but in certain cases simply letting the very advanced GC do its job is actually faster.
It’s not about speed (throughout) but about latency. If the GC takes (say) several minutes during a period of downtime so that it doesn’t have to run at all during uptime, then it’s worth it.
Java suffers from Python 2 like syndrome with Java 8.
Those shops aren't going to use ZGC.
Nor they will profit from the several performance improvements, including the free JIT cache across runs with PGO data on Hotspot and J9, that used to be a commercial feature on third party JDKs.
I've been impressed by nim's GC. My understanding is it's not super complicated. Not having a big header on each object and having each thread have its own GC goes a long way.
It depends on the GC you're using with Nim. Guessing the OP means the newer ARC GC, then you can use `move` semantics to move some data to another thread, or you can do a deepCopy. You can do shared data structures but that requires a bit more work.
It becomes more difficult. There are a few supported memory managers by Nim (including Go's GC!), but with ARC you explicitly move or deep copy the structure.
Obviously this is a bit more difficult than say Java, but then you never have to stop the world.
Similar to ObjC reference counting? (Not sure, I don't use ObjC and am not an expert in it, it just reminds me of what I've heard of it, so honest question).
Objective-C (and I guess Swift too?) has ARC (see link in grandparent) which attempts to statically track the lifetime of objects at compile time. This essentially results in a "move operation" if the compiler detects that the original reference isn't used anymore after the assigment. It's basically halfway between C++'s shared_ptr-vs-unique_ptr approach, and Rust's lifetime and ownership tracking.
ARC was a marketing gimmick, after the failure of getting a conservative GC to properly work with the underlying C semantics alongside the mix of frameworks compiled in different modes.
The only thing it does is to automate the retain/release calls of Cocoa classes, or classes that offer similar API.
It doesn't apply to any other C like types from Objective-C, and basically the generated machine code is hardly different than when devs keep doing those calls by hand.
Yes there is some potential for the compiler to remove needless set of retain/release pairs for short lived objects, it doesn't happen all the time though.
Swift, also despite the ARC marketing, made the sensible choice to go with ARC, because one of the key designs was interoperability with Objective-C runtime.
As shown by RCW/CCW runtime used by .NET for interoperability with COM's AddRef/Release, there is a lot of more machinery required to have a tracing GC cooperate with RC scheme, so it was understandable this route wasn't chosen for Swift.
As some papers prove, current performance isn't that great, which is also Swift 5.5 is bringing a more aggressive optimization, which is turned off by default.
Check WWDC 2021, "ARC in Swift: Basics and beyond".
> Of course this approach has its own downside: a program may panic at when dropping an owned value, if it still has one or more references pointing to it.
I wonder if this could be also handled by having a custom error handler available on each created reference, to decide what to do with the "hanging" value, whether to reassign ownership, or invalidate the reference, or panic, or do whatever else.
Your solution to the supposed overhead of garbage collection is to make every single object fat and have an extra test every time a pointer to an object is dropped?
How do you know there are still references without counting them somewhere or inspecting all other objects? And what would the handler really do? What common code has other owners willing to take over ownership (and you have to handle this problem recursively too).
In the article, they already assume reference counting of references, see the previous paragraph:
"When the reference goes out of scope, the count is reduced. When an owned value goes out of scope and its reference count is not zero, the program terminates with an error (which I'll refer to as a "panic")."
Single ownership (std::unique_ptr) works pretty well in C++ a lot of the time, but not always. Reference counting including unique_ptr also have to be locked for updates, slowing down multi-threaded programs, as exemplified by the notorious CPython GIL (global interpreter lock). And the idea of transferring ownership between lightweight processes leads to memory fragmentation within the processes. Erlang deep-copies message contents except for large binaries, so each lightweight process has its own semispace GC and maintains good cache locality. There are also styles of programming (such as using RB trees as persistent data structures with multiple versions sharing parts of the structure) that are not conducive to single ownership.
At the end of the day, in a general purpose language, it's great to have machinery to avoid needing GC when possible, but you really do want to have GC to fall back on.
> Reference counting including unique_ptr also have to be locked for updates
This is not true. The counter itself can have its count updated atomically with a compare & exchange. I think this is what actually happens in most cpp std libs, but haven't checked in a while. (This doesn't mean the object they point to is thread safe though!)
But here's my question: Why do C++ programmers always refer to the overhead of "reference counting", when the overhead is not in the counting, but in the memory management: the use of doug lea's malloc or its derivatives such as ptmalloc3. This is what "new" wraps, and the STL objects all use it to resize themselves too. These routines are written by geniuses, but if you actually read up on how they work, they are doing a lot of (completely necessary) stuff, and one of the longest-running std lib calls you can make. (Way longer than an exp(), for example).
>Why do C++ programmers always refer to the overhead of "reference counting", when the overhead is not in the counting, but in the memory management
Presumably they're looking at reference counting as an alternative to unique_ptr, and in that context the memory management overhead is already being paid.
Unique_ptr is basically a pointer to a refcounted object whose refcount is always 0 or 1, and therefore doesn't have to be stored in the object itself. If the unique_ptr is a non-null address than the object is alive (refcount=1), if all the unique_ptrs are null then the object is dead (so it gets freed), and C++'s fancy scope rules and move semantics statically ensure that there is at most one non-null unique_ptr to the object at any time.
I have been under the impression that glibc implements both unique_ptr and shared_ptr (that's the one with arbitrary refcounts) using locks, but ok, if it's done with something like CMPXCHG that's still a heck of a lot slower than ordinary copying.
As for overhead, well, if an object lives for a long time and lots of references are updated or moved around, then that itself can be expensive, besides the malloc/free costs. Copying garbage collectors by comparison are simple and fast, at the expense of making your program gobble a lot more memory, limiting the language so that the gc can know where all the pointers are, and having to stop the program during gc.
> I have been under the impression that glibc implements both unique_ptr and shared_ptr (that's the one with arbitrary refcounts) using locks
Why would unique_ptr ever need a lock?
> As for overhead, well, if an object lives for a long time and lots of references are updated or moved around
If performance is important you only pass a shared_ptr if the callee needs to keep it around, otherwise just pass a plain (const) reference. You can also pass a (const) reference to the shared_ptr and then the callee can decide if they actually need to keep it. Moving shared_ptr is free, because the refcount stays the same.
With std::shared_ptr each copy has a pointer to the ref count & there is only one ref count, atomically updated with a cmpxchg instruction. Not sure what you mean by "references are updated or moved around"?
But I agree that GC has higher thruput than malloc/free as most gc allocations are a single op, at the cost of higher initial memory usage and worst-case latency. That's why "scripting" languages like python / perl use refcounting as they need to start quickly and might not run for very long, whereas java / c# programs tend to be longer running.
I'd be very surprised if GCC actually used locks for unique_ptr. A unique_ptr owns the object it points to, so we know for a fact that it doesn't need to check anything before deleting it.
You might be right about that, i.e. I probably derped up. It's been a long while since I looked at it, and the issue might only be with shared_ptr. I'm starting to remember finding out that shared_ptr was surprisingly expensive even in single threaded programs. It looks like some finer grained versions of shared_ptr (like atomic_shared_ptr) arrived recently. I haven't checked into them yet.
Typically unique_ptr is represented as a pointer (check sizeof). For shared_ptr, there’s an atomic refcount but no locks. Operations on the shared_ptr object itself aren’t threadsafe (you can’t have the same shared_ptr assigned by two threads at the same time without locking), just like int, but the reference count is atomic, so copies of shared_ptr can safely be shared among threads. Atomic_shared_ptr adds atomicity to the pointer itself. I believe that usually uses locks.
Not sure what you mean here, std::unique_ptr is for when you want to auto destruct a ptr when going out of scope (ref counting limited to one) and std::shared_ptr is normal reference counting, they aren't alternatives to each other but specific things for specific situations?
#include <iostream>
#include <chrono>
void run();
int main()
{
auto t0 = std::chrono::high_resolution_clock::now();
for(int i = 0; i < 10000000; i++) {
run();
}
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << std::endl;
}
alloc.cpp:
#include <cstdlib>
int rand_int();
void use(void*);
void run()
{
auto res = malloc(rand_int());
use(res);
free(res);
}
exp.cpp:
#include <cmath>
double rand_double();
void use(void*);
void run()
{
auto res = exp(rand_double());
use(&res);
}
I get ~100ms for the exp, ~130ms for the malloc/free at -O3 / -Ofast with both clang and gcc (and checking with perf ensures that the time is indeed spent in these functions). So it's a bit faster, but not exactly fast (or rather, the math functions are really fucking slow)
Excuse me if I have mis-read your sample program but it looks like you are allocating 16 bytes, then immediately freeing it, again and again in a loop. Since you're not allocating anything else, the allocate and free will happen almost immediately. Malloc() has a run time that can vary a lot depending on what has already been allocated & freed in the past, so this test is not meaningful.
Not only that, but the original doug lea's malloc has a bin specifically for 16 byte allocations: http://gee.cs.oswego.edu/dl/html/malloc.html (the smallest size), so you've specifically chosen a very fast example.
In order to properly test malloc() you have to pre-generate some pseudo random numbers for your allocation sizes, and allocate and free unpredictably. But it's more complicated than that, as in the very occasional case when malloc() has to ask the OS for more heap, that system call may not "finish" until your program first writes to that region of memory... (ie malloc() returns but hasn't finished!)
I actually tried with up to 256 bytes and got the exact same timings. But sure, that's the absolute possible best case considering glibc thread pool, but it's (allocating and freeing in a loop) still a pretty common case
But when you enter the loop you have no other "objects" in your program allocated at all, so it doesn't have to do a search for a suitably sized block. Near-identical timings for changing the size to 256 is what I would expect.
That's definitely very interesting. I do wonder if malloc and free would take longer if there would multiple threads in the program, which is more relevant to this discussion; surely exp would not, assuming sufficient free cores. Even more interesting would be if you malloced in one thread and freed in another, given that allocators usually make use of thread-local pools.
> Why do C++ programmers always refer to the overhead of "reference counting", when the overhead is not in the counting, but in the memory management:
You’re mistaken.
In a multi core or multi processor system, atomic operations necessary for shared RC are relatively expensive.
Heap allocation cost is paid for any dynamic object no matter the type of GC/RC used, so that is not what is being compared.
The traditional alternative is what I might call "static reference counting", where the programmer knows by inspection/convention (C) or through the type system (Rust) how many references exist at any point statically, so they can skip the dynamic reference count entirely.
I agree that malloc/new is comparatively expensive, but that ends up as a cost borne by all heap-allocated variables in every language. Unless people use special techniques like pool or slab allocators.
There's a huge difference between C/C++'s malloc() and a precise garbage collector: with the former all objects are fixed. With the latter you can move them around. This means malloc() is more constrained than a GC, and that makes it slower.
The real gains of managing memory manually is not in the speed of malloc(), but in the fact that in practice you can perform far fewer heap allocations.
Rust also allows for non-thread safe reference-counted objects (Rc<>) that don't have the overhead of atomic instructions for updating the reference count. And it's getting support for custom local allocators, addressing the overhead of malloc/free. Along with a bunch of further advantages compared to C++ shared_ptr (such as being able to dispense with a pointer indirection in some cases, and allowing for sub-borrows that don't even need a refcount update because they can't possibly affect the 'ownership' of the object).
In Inko, the reference counts are regular integers, not atomics. They don't need to be atomic because the same object can't be used by different processes.
Moving objects between processes itself doesn't lead to fragmentation, as processes don't have their own heap in the data structure sense. That is, the structures representing a process don't have some sort of "heap" field. Instead, the OS threads running processes maintain the heaps (each thread has its own heap). This brings several benefits:
1. Since the heap is physically detached from processes, we can move without copying
2. Since the heap is still thread-local, allocations don't need synchronisation
3. Since the heap is detached from the processes, each process is smaller, allowing for more processes to run concurrently
4. Since objects (and their children) can only be owned by one process, we also don't have to worry about multiple processes trying to write into the same object
This is only possible because we statically guarantee that a value `T` can't be _shared_ between processes, and because we'll disallow sending references between processes.
I don't understand what you're getting at, but you can implement CoW at the page level with MMU hardware. That's how Redis dumping works, for example (the process forks into two, with one dumping to disk and the other continuing to take updates, CoW'ing pages that get modified and sharing the rest). What would CoW even mean inside a single address space? The idea of persistent data structures (as used in functional programming and other places) is that you never visibly modify anything, but instead make new objects that share structure with the old objects, relying on the GC to clean up data that stops being reachable.
You can implement a linked list in safe Rust (even a doubly linked list if your prefer), there are a number of examples available if you google for it - for instance:
I like this approach, and exploration of alternatives to popular solution is good. In particular the simple approach to circular structures looks good - including the reverse lexial order deallocation.
GC can be a good thing, when you want to quickly sketch a new program of simply want to act as a scripting language and do a task as quick as possible without too much thinking
That said, i'd still prefer a simple Reference Counting than a full GC as it's quite memory hungry and collections introduce pauses.. wich is not a desirable behavior at all, even though pauses can be solved, i'm no a fan of the idea that a language manage my memory in unpredictable ways
The reason i use D is because i can use it just like C/C++ (with modern niceties such as module, slices, metaprogramming and fast compile time) with my own allocators, completely ignoring the GC
But whenever i need a quick and dirty "script" (it almost never happen), instead of using bash or python, well i simply just use D with its GC, but again, as i said above, i'd prefer a simple RC instead.. but yeah.. it's no big deal since i prefer to manage memory my way anyways
A perfect language understands your intent without impacting your workflow with slow compile time due to heavy, restrictive and slow compile time static analysis (borrow checker for example)
I recently watched an interesting talk about a new language and the author quite well described that the best memory management strategy varies by use case.
So web servers benefit from arena allocation
GUIs benefit from ARC
Short running programs can benefit a lot from just leaking memory.
Concurrency and multiprocessing terminology is the worst. Trying to learn about it, it seems like every language is doing their own thing with their own set of terms and definitions.
My bad for not making it more clear: it refers to a lightweight process here (e.g. like Erlang), not an OS process. Merely spawning these is basically just an allocation (Inko doesn't use PIDs either), so about as fast as it gets.
It was reference counting in the 80s and 90s. And then tracing GC people stepped in. "Reference counting is worse of both worlds. It's slower than GC because all these memory have to be touched!"
But real life experience continues to show ref counting doing better. And GC users pre allocating object pool to avoid GC pauses. Proponents still in denial. "What if a large object is deleted? That will also create GC pause like delay."
And now here we stand. Back to reference counting.
> But real life experience continues to show ref counting doing better.
Hmm... not in my experience at least, for instance Apple's ARC (which should be much better than "dumb" shared_ptr refcounting because the compiler does some static analysis and drops redundant retain/release operations) still can have a shockingly high runtime overhead and requires careful manual tweaking and general handholding to the point where the traditional manual memory management results in simpler code, at least in hot code paths.
Now GCs may well be worse, but performance-wise they really can't be much worse than ARC. ARC may sometimes prevent unpredictable GC spikes, but only by spreading out the same (or worse) cost along the timeline (which is at least something, but many GCs simply haven't been designed for the "prevent spikes" requirement).
There simply is no silver bullet for dynamic memory management, at least if both memory usage and performance matters.
Which real life experience are you referring to? Go has an extremely fast GC that has minimal pauses. Python still doesn't have "proper" multithreading because RC is too slow.
The problem with rc is not memory access but synchronization.
>Python still doesn't have "proper" multithreading because RC is too slow.
RC speed is not even close to the reason Python doesn't have proper multithreading.
(The GIL, and API/ABI guarantees to third party C-based extensions making it difficult to remove it, is more like it. The GIL was added to aid in RC atomicity - but it's not the speed of RC that's the issue).
> Go has an extremely fast GC that has minimal pauses.
Actually, I've heard that Go sacrificed speed big time to minimise its pauses. Garbage collectors generally face a latency/throughput tradeoff, and I believe Go is no exception.
That said, fast GC with reasonable pauses existed long before Go. I personally know of OCaml and its generational, incremental GC. I expect Go built on that knowledge to find its own sweet spot.
Precisely. That’s why I was suspicious of this claim that there is a GC out there that is both "extremely fast" and has "minimal pauses". Something’s got to give.
There are versions of Python without RC: https://www.python.org/download/alternatives/. There is no doubt that Python would be better off without RC, the problem is that Python extensions rely on RC. So CPython (the main python implemenation) can't just do the switch.
Thanks. It seems that the JyNI project is trying to make a bridge between CPython and Jython, so from that I take that it is somehow possible to have extensions which more or less rely on RC (or at least the C API) while you can use a GC (Java's GC in this case) at the same time.
Unfortunately, the real test of "is this language proper python" is "it behaves exactly as the horrible bytecode interpreter in PyEvalFrameEx C function".
So it's hard to make alternative implementations that don't die.
We have not gone full circle. We are just expanding into various scenarios where a gc language is preferable to non gc language and vice versa. There will never be a perfect compiler nor do we always want to do everything manually.
Which is why as GC algorithm reference counting is only for toy implementations.
Any reference counting implementation that cares about performance in multicore environments is almost indistinguishable from a tracing GC implementation.
In fact, C++/WinRT, uses background threads to diminish the performance impact of calling destructor and cascading deletions.
Ever since Java became immensely popular GC has been just as popular, but that didn't mean reference counting ever went away. They're branching paths and different tools and more people than ever before pick and choose what to use for the job. No circle was ever made.
Right Java made GC popular. And right now Rust is making reference counting, borrow-owner popular. This is what I am observing.
That languages get popular is no surprise. But what was a little bit surprising for me was that the tech the language uses gets spread and popular all over -- outside of that language as long as that language stays popular.
I right now am think automatic reference counting is the way to go. In a sense C++ RAII is also ARC. But a compiler level support is better.
Borrower-checking is a significant development over traditional manual memory management, rendering various categories of error impossible. Before borrower-checking, we could really only do this by using GC.
RC is GC and whether a cycle detection algorithm uses tracing depends upon the algorithm itself. Unlike traditional tracing, a cycle detection algorithm can leverage knowledge of the reference counter which opens new possibilities. In fact, cycle detection doesn't necessarily need to be run at all - only in the ambiguous cases where RC alone isn't enough. When a code path does necessitate cycle detection, the algorithm could run immediately and, depending upon the nature of the cycle/how localized it is, the runtime could cache the knowledge for next time. Even where it must always be run, the pauses are smaller and deterministic which means it avoids the long/unpredictable/sporadic pause times tracing collectors are infamous for.
"Garbage collecting", so, you mean, you're leaving garbage around then? Why?
With RC you're acting on the deallocation as soon as you are allowed to. Sure, you don't need to actually deallocate at that time, but you're acting upon it.
"Oh but what about loops" well, you can work around those (and they are rare).
My point is not that GCs are not useful. My point is that there is a better way of dealing with allocations if you know your object is not used anymore and not just treating memory as infinite.
And then you get some pretty hilarious pauses and stuttering because such naive refcount system suck on all metrics other than simplicity of code (and even that is arguable).
And despite what some people think, malloc/free are not O(1) free lunch.
You can use arenas for the cases where you really can't do without delayed/batched deallocation. But deterministic deallocation as provided by RC is a very good default for many cases, including where low-latency is a goal.
RC is not deterministic. That's the whole thing - when you free a refcounted object, you do not deterministically know how long is it going to take, and on systems that try to avoid RC-induced pauses, you don't get to know when the memory is going to be released.
If you want to talk determinism, you need to actually invest into deterministic behaviour - something that either requires dropping dynamic memory allocation altogether, or tends to go for GC with known determinism guarantees (for example, IBM Metronome).
> And despite what some people think, malloc/free are not O(1) free lunch
Of course. Be it through a GC allocator or not. If you're at the point where the allocator is giving you trouble then GC/RC is the least of your problems ;)
> not being able to implement certain patterns in safe code (e.g. linked lists)
This may be correct for doubly linked lists in the classical implementation, but I've never needed one outside of programming exercises. Trees can be safely implemented in at least 2 ways. Rust is limiting, but usually more in terms of approach, not possibilities. As long as the program isn't interacting with the "unsafe" outside environment it's usually no problem avoiding unsafe blocks completely.
ehh depends on what you're actually coding one job intrusive doubly-linked list were everywhere and very appropriate while my current job never touch them at all.
Edit: Not that I know much about the Linux kernel, but I do remember it being used as a counter example when someone stated before that nobody uses doubly-linked lists.
Rust developers are exploring new safe ownership patterns (such as 'QCell' and 'GhostCell') that should make it feasible to implement double linked lists without unsafety. Not very simple, though: these solutions do come with an inherent tradeoff wrt. lack of modularity.
I'm amused when I see all the hoops folks jump thru to solve 'memory reference bugs'. Garbage collectors 'leak' if you forget to null a reference in scope after you're done with it. Those null pointers can fault if used afterward.
Its as much work to use a garbage collector efficiently as it is to allocate and delete your own memory. And with about the same bugs. It's been problematic from the start.
Not a fan of automatic systems to 'manage' memory. Overhead for automatic systems create lag, latency and bloat.
I'm old-fashioned, and create code that has strict ownership of allocated memory by subsystem. If a module that allocates also frees, leaks become negligible. If references are not shared but accessed formally, nulls become returned errors etc.
Discipline is hard, but leaky buggy garbage-collecting behemoths are hard to live with too.
You might be interested in Zig[1]. Allocation is manual. Standard library containers and functions require an allocator to be passed to them, so nothing allocates on the heap unless it's explicitly told to. Cleanup can be done at end of scope via defer / errdefer.
Seamless interop with C, even when cross compiling which Zig handles natively: the standard install of Zig comes with a "zig cc" wrapper around clang that allows you to cross compile[2][3] C and zig code for any number of target architectures (it includes C standard library headers and code for them).
Zig is the only language I've come across which functions perfectly as a better C.
Most of the time (in my practice, like 99% of the time) you don't have to care about nulling references. The objects that need collection are created in local scopes of functions, and get detached and ready for GC right when the function returns. When you have many small functions, this happens often, so unused objects don't spend much time being referenced and ineligible for GC.
The problematic part is long-living mutable structures, which you hopefully have few, and know well where they are in your code.
Also, as an exercise, I suggest that you try to imagine a Lisp with explicit memory management.
That's all fine for locals. How about members of objects? The object may be long-lived, the buffer it uses may be short-lived, and it can hang around unless you remember to null it.
Remembering to null references is not much easier than remembering to delete things. It's trading one issue for another.
And now there's two ways to remember to manage memory.
I'm laughing, because Rust solves all of this and is a simple and easy language to learn. No GC, ref counts when you want or need them, and manual memory management that is a breeze and proven at compile time. I hate even saying this because I get the feeling it scares away newbies that think these concepts are hard.
I already write web services in Rust. It's no more difficult than Java or Python.
I can't wait for the ecosystem to heat up more, because it'll make it compelling to switch to Rust at work.
Rust is certainly easier to teach and learn than some other languages (think C++) but learning it effectively still requires quite a bit of discipline. It's not really made for the bug prone cowboy-coding style that seems to be the norm in more 'dynamic' languages (Python, JavaScript, Ruby etc.) and to a lesser extent in more mainstream, 'enterprise' friendly languages like Go, Java/Scala/Kotlin, C#/VB.NET, Erlang etc.
Fair point. If you want to hack something together and don't care too much for the details, Rust will slow you down.
But if you care enough to be writing tests or put lots of thought into API and schema, then maybe you're in the disciplined camp that would get a ton of benefit from switching.
If you're using C++, you should switch for any greenfield project that doesn't need to use legacy libraries. (Writing wrappers for C libraries isn't hard at all.)
As someone who has spent most of his career writing high-performance server-based software, I've never really understood why people like GC. Even back in the day when I was writing BASIC programs (a long time ago now) I thought GC was pants, and spent some considerable time trying to avoid it happening.
How to avoid it? Think. Be careful. Write good test suites. Run them a lot. It really isn't so hard.
> Think. Be careful. Write good test suites. Run them a lot. It really isn't so hard.
This mindset lead to countless security problems and other bugs in all kinds of software. Even now we still find exploits in sudo and openSSL. And the problem wasn't developers leaning back and saying "today I'm going to program carelessly and do a bad job because why the hell not".
Do we have good test suites for those programs? If so, could you direct me to the code for them? And of course these are run-once programs (not that excuses them)- I am talking about 24/7 stuff.
You get the computer to do slightly more work in exchange for significantly easier development, plus you've obliterated entire categories of bugs that can be hard to trace (use-after-free etc).
GC was also originally developed for LISP; I'm not sure it's even conceptually possible to have a manually-manged lisp, although I think there's a refcount implementation somewhere.
It would be possible to have manually-managed Lisp. I think there is actually nothing in the standard which explicitly asks for a GC being present - just that there is no "free" function in the standard either. The GC of SBCL is to my knowledge written in Lisp itself. Carefully so, that no heap allocations are caused by the code.
This is not accurate. The SBCL runtime and GC are written in C because there are better tools in C for debugging the kinds of errors that one encounters at that level of the system. I can't find the note in the SBCL source repo but if you dig around a bit you will find it.
I’ve been writing high performance C++ for a while now and this is true. We rarely have memory leaks or use-after-free and when we do, they’re pretty mild. It is one of the easier problems to avoid, especially if you enforce standards on developers to avoid the most common mistakes (eg the new keyword should be avoided if possible unless providing an rvalue to initialize something safer). I can imagine new or bad developers without strong guidance, or under extreme pressure to ship, would have a harder time with it though.
Concurrency is a much harder problem and something we deal with more often. I would consider memory management close to not a concern.
> Concurrency is a much harder problem and something we deal with more often. I would consider memory management close to not a concern.
Absolutely. The longest lasting bug I've ever had was in some multi-threaded code in a trading server (and nothing to do with memory - shared resources getting corrupted). When I tracked it down (after about 6 months!) I repeatedly banged my head on my desk, yelling "You idiot, you idiot, you idiot!" - referring to me.
Compared to stuff like this memory leaks are really easy to test for, detect, and fix. Anyone can write a test harness to see if there is a leak.
>Even back in the day when I was writing BASIC programs (a long time ago now) I thought GC was pants, and spent some considerable time trying to avoid it happening.
Could it be that you never shed the bias from your young 80s BASIC experience, despite the world around you changing?
70s actually. And the world hasn't changed - if you want efficient, performant code that manages resources (not just memory) correctly, you don't want GC.
Thanks! That's really interesting. There's a lot of material out there about BASIC for early microcomputers but most of what I've read about minis has been more focused on languages that were considered interesting to "hackers" (i.e. Lisp and C).