I've been trying to fit big-enough long-running stuff into JVMs for a few years, and have found that minimizing the amount of garbage is paramount. Its a bit like games- or C programming.
Recent JVM features like 8-bit strings and not having a size-limit on the interned pools etc have been really helpful.
But, for my workloads, the big wastes are still things like java.time.Instant and the overhead of temporary strings (which, these days, copy the underlying data. My code worked better when split strings used to just be views).
There are collections for much more memory-efficient (and faster) maps and things, and also efficient (and fast) JSON parsing etc. I have evaluated and benchmarked and adopted a few of these kinds of things.
Now, when I examine heap-dumps and try and work out where more I can save bytes to keep GC at bay, I mostly see fragments of Instant and String, which are heavily used in my code.
If there was only a library that did date manipulation and arithmetic with longs instead of Instant :(
> If there was only a library that did date manipulation and arithmetic with longs instead of Instant :(
You can always pass around long timestamps and just convert to Instant whenever you need to do any date/time processing. Provided the Instant doesn't escape the method it's allocated in, it should be optimized via inlining and Scalar Replacement so that it doesn't generate garbage. Of course, you'd be adding in the overhead of dividing up your long in to seconds/nanos each time.
Note: if this doesn't work on OpenJDK, try GraalVM: it's Partial Escape Analysis should do a better job at finding ways of eliding heap allocations.
The worst things happen when value objects are stuck on the heap for a while and then turn into garbage when the value is updated. Escape Analysis doesn't help there, only a good GC can help.
There's a saying that "only the good die young" which applies to Java GC. If your Instants and Strings are really short lived then the GC for those is nearly free. For your workload are these objects living on the heap for long enough to be promoted beyond the young generation?
If you are looking to do low latency, then G1 isn't the best choice these days, though. Shenandoah or ZGC are both more advanced algorithms that can greatly reduce the pauses caused by GC activity.
> There are collections for much more memory-efficient (and faster) maps and things, and also efficient (and fast) JSON parsing etc. I have evaluated and benchmarked and adopted a few of these kinds of things.
That sounds very interesting. Can you provide links to the benchmarks for fast JSON parsing (libraries)? And the fast maps?
I can quickly list our findings (from testing on our specific actual workloads; ymmv etc):
For collections, we used to use trove but migrated to fastutil a few years ago.
For JSON parsing, we are processing lots of very small messages, so use LazyJson. The biggest downside to LazyJson is it doesn't have cheap iteration of keys; the framework could easily provide it. For larger documents, say over a few MB, libraries like Jackson are faster.
Yeah, perhaps Java isn't the right tool for our job. And yeah, more recent benchmarking and testing might suggest newer, better libraries than those I have just listed.
Its horrific the lengths you have to go to to get good performance Java for the workloads we have; python prototypes run much faster with pypy, and I think that is really about heap management more than code generation.
For those of us who know C/C++, its kinda uncomfortable when staring at code and thinking "that temporary string there? 40+ bytes just for the object header!" and things. But, of course, there are advantages to working in memory-safe languages.
Wow. I just replaced Java HashMap<Long,Object> with fastutils map for critical cache on a very compute-heavy project of mine, and it instantly got 25% faster. Thank you.
There is another benefit from using the fastutil library, decreased memory usage, which in some cases very significant depending of course on your data structures.
C# really hits a weird sweet spot that people on the outside don't really recognize due to C#'s heritage as a Java clone , previous windows dependencies and anti-MS bias of the early 00s.
While i still really can't stand C#'s VB based naming conventions and property happyness the language is extremely practical in many situations.
Writing LINQ you still think mostly in SQL but don't have to worry about string escaping, while on the other side of the spectrum you have structs,spans,stack allocs and in/out/ref vars that gives the code written in the language a chance to keep up with C/C++ on modern CPU's where cache-friendlyness is paramount.
Java on the other hand has opted to avoid much of this and thus has to depend on GC's magically becomming while when working with C# developers can dig down and optimize things on their own instead.
> For those of us who know C/C++, its kinda uncomfortable when staring at code and thinking "that temporary string there? 40+ bytes just for the object header!" and things. But, of course, there are advantages to working in memory-safe languages.
And of course, there are memory-safe languages that don't use 40+bytes-long object headers, and are in fact highly competitive with C/C++. For one thing, Go can be made memory safe if you avoid concurrent mutability.
Not if you compare on same hardware but the amount of memory itself that you can end up saving you can re-invest it into compute. I've seen massive perf/$ gains for log-parsing and memory-mapped data profiling use cases.
Not even that is required - the popular synthetic benchmarks have Golang (1.14) beating out Java which is echoed by a lot of users and their usecases.
A lot of Java claims seem to come from future/unreleased efforts which muddy a lot of comparisons.
Where is it beating Java by any significant amount? The regex benchmark uses Java's native regex matcher, whereas the golang implementation uses a PCRE regex matcher, so not an apples to apples comparison. The mandelbrot benchmark are minutely different on my machine (2.06s vs 2.13s). Note that the test machines that websites use are quite old (certain optimizations or available APIs the JVM has are not supported by those older processors).
Now if you look at the bottom of the list, you'll see where Java far outperforms golang, especially in the binary-trees benchmark (12.67s vs 9.14s and 25.19s vs 8.28s in favor of Java both times).
My refutation followed with updated synthetic benchmark results.
Your serve I believe. If you have data to prove that, for the workloads in the article, Java would outperform Go I would be very interested in seeing it.
Synthetic benchmarks are not representative of real world performance. People who have dealt with large programs can testify to this. Yet, the binary-trees benchmark is allocation heavy, and typical of workloads similar to those described in the article, and it shows how far ahead Java is able to pull compared to golang.
It's a fact that the JVM's JIT and GCs are way more advanced than what golang has to offer.
You're absolutely right. Its one reason I struggle with the modern fashion for immutable classes and FP, they are always making copies of everything, seems crazy.
Ideally, a good compiler that understands FP will, behind the scenes, detect when it's safe to mutate the old data rather than creating a copy. That's a big part of why Haskell manages to be neck-and-neck with C despite being functionally pure.
Where it gets tricky is in an environment like the JVM where programming in that style was not anticipated, and introducing any optimizations along these lines for the benefit of the proverbial Scala fans needs to be balanced against the obligation not to adversely impact idiomatic Java code.
That said, even without that, it's not necessarily crazy. It's just a value call: Do you believe that more functional code is easier to maintain, and perhaps value that above raw performance? I'm old enough to remember similar debates about how object-oriented C++ code should be, and to have at least encountered Usenet posts from similar debates about how structured C code should be. I don't bring this up by way of trying to weasel in some "historical inevitability" argument - these are legitimate debates, and there are still problem domains where coding guidelines may discourage, or even prohibit, certain structured programming practices. For very good reasons.
Rust gives you FP-style without loads of copies and C-like performance. Immutable data-structures aren't idiomatic, but it's ownership gives you most of the same benefits.
The nice thing about Rust is that it gives you the main benefits of FP even when you're not programming in "FP style". The Rust "sharing xor mutability" default model provides underlying semantics and ease of analysis that's quite comparable to what you get with a pure-functional language. Of course extended mutability as with Cell<>, RefCell<> etc. undermines this, but these are only used when necessary.
Would you say idiomatic Haskell is faster or slower than idiomatic use of Java and the JVM? I'm interested in actual experience and preferably benchmarks of real cases, no thought experiments please :)
(If this sounds harsh, it's not my intention. In another HN thread I had someone "explain" to me how Java and Java's OOP is "not suitable for business software development". If this seems like a bizarre statement which disregards more than a decade of business software development -- this is why I ask for actual experience and not opinions or "I think this can't be right").
Lots of people hate Java. It's unsuitable for those people.
There are certain things you might have trouble getting Java to perform, such as strict latency requirements. Another language might be more suitable for that. But that generally doesn't describe business software.
Desktop software is somewhere you probably don't want to use Java. Although if it's business desktop software, it might be a good fit since it can be cross platform (but ugly - but if it's business, it might not matter). We've built several desktop apps for warehouse computers in Java.
Getting the JVM on a machine may or may not be a hurdle. This is one of many reasons why Go is getting popular - you can just build a binary.
The positive of Java is that if you want to do something in it, someone else has probably tried. It has several large organizations backing professional quality libraries and frameworks that have had a ton of resources poured into them. It's easy to build on the shoulders of giants while relying on 3rd party libraries that don't have a bus factor of 1 - this is rare in many other languages.
If you want stability and well trodden paths, it's hard to go wrong with Java. We've had projects that continued to just work from Java 1.4 to Java 8 - a span of almost 15 years without ever having to touch the code. Java 9 was a bit of a hurdle because of project Jigsaw.
> Lots of people hate Java. It's unsuitable for those people.
Understood, but I'm specifically excluding those opinions because they tell me nothing and are unrelated to suitability. Some Smalltalk folk will tell you nothing that is not Smalltalk is suitable for anything; how much would you value their opinion when determining whether a language is suitable for development?
"I hate $LANGUAGE, therefore it's unsuitable for $DOMAIN" is the lowest, less useful form of opinion. It belongs in the realm of flamewars, not of informed decisions.
Suitability to me is not related to whether I hate or like a language. I hate COBOL. I've worked with it. I'd never in a million years argue it's not suitable for banking systems, because that would run contrary to established history.
As for other applications: I agree Java is not suitable for everything. I specifically argued about business software. That said, what about Minecraft, a hugely successful desktop game? :)
> Getting the JVM on a machine may or may not be a hurdle.
This was/is one of Java's mistakes. It's always been oddly hostile to JVM bundling, although some projects do so anyway (e.g. Jira). More generally, Java makes the mistake of making itself known to the user. The user is expected to install a JVM, rather than one being bundled with the application, and they're then expected to ensure it updates itself appropriately, complete with an annoying taskbar icon and always-resident auto-updater (on Windows, that is).
The user shouldn't even know the word 'Java'. Applications written in Pascal, for instance, are just applications. The user isn't made aware of the technology used.
Especially unfortunate considering that, as far as I can tell, JavaFX is really a pretty good GUI toolkit (I've only dabbled). Perhaps things will change as ahead-of-time compilation for Java becomes more mainstream.
> Perhaps things will change as ahead-of-time compilation for Java becomes more mainstream
I understand JIT compilation is pretty advanced these days. Wouldn't this go against it? Or maybe the approach can be mixed, but if so, you'd still need the runtime environment ("the JVM").
A mixed approach can be done, yes. If dynamic classloading is needed, you need to bundle a JIT as well (or at least a traditional interpreter). Excelsior JET has used this hybrid approach for years. [0] I imagine it should be possible to omit the JIT if it can be determined that it's not needed.
If you ever try to write a bash script that calls a Java program in a tight loop, you'll know the limitations of a fat interpreting runtime with JIT compilation. Another relevant use case is the Stateless Lambda server-side architecture.
I believe Java (OpenJDK at least) is slow to start even if you disable the JIT and go with pure interpretation, and even if you disable runtime bytecode verification. It's just generally heavyweight and slow to get off the ground.
I imagine JIT compilation should help with this, as you suggest.
Since Java 8, there are two ways to build a binary: the javapackager tool, and the Ant JavaFX tasks (with the OpenJDK, install the openjfx package). It'll include the JRE [~50MB], and there are pros and cons with that obviously.
"One can, with sufficient effort, essentially write C code in Haskell using various unsafe primitives. We would argue that this is not true to the spirit and goals of Haskell, and we have attempted in this paper to remain within the space of reasonably idiomatic Haskell. However, we have made abundant use of strictness annotations, explicit strictness, and unboxed vectors. We have, more controversially perhaps, used unsafe array subscripting in places. Are our choices reasonable?"
“Unsuitable” doesn’t mean it can’t be done. Having written a lot of business software in Java, I actually agree it’s unsuitable. The only way I’ve been able to make it bearable is by using Lombok and pcollections.
How so? Performance? Reliability? Verbosity? Which language is suitable in your opinion?
To me this kind of opinion-based... um, opinions... fly in the face of evidence. Java has been used for more than a decade to deploy business software to great success. What more evidence does one need? It'd be like arguing "COBOL is unsuitable for banking".
In the meantime, opinions have shifted on best engineering practices, and Java has naturally run the gamut of all these opinions. And because there are tons of Java systems, there's no shortage of examples of failed/bad projects one could pick on. I wonder how many languages/platforms would have fared better...
"Unsuitable" meaning it's not a great choice. Yes, it's been done a lot (whether or not you can really say "to great success" is another matter, I think).
What I mean is that I think using Java meant the company was required to spend more resources than should have been necessary to accomplish their goal.
BTW, when I say "Java", I mean the language, not the platform. Java-the-platform is very suitable to business software. Java-the-language, less so. Java OOP brings with it a ridiculous amount of incidental complexity, boilerplate, and impedance mismatches. Of course you can make it work, I do it every day. I just think it's a poor choice. It's certainly not the worst choice, and the platform + available libraries is definitely a plus.
And as I mentioned in my original post, there are way to make it more suitable through various hacks (like Lombok) and libraries, but other languages are more suitable out of the box (including other JVM languages).
> Ideally, a good compiler that understands FP will, behind the scenes, detect when it's safe to mutate the old data rather than creating a copy. That's a big part of why Haskell manages to be neck-and-neck with C despite being functionally pure.
Not talking against Haskell or FP; The disadvantage is being hard to reason about performance. Ensuring compiler will optimize something will take more cognitive load than simply writing straightforward code.
Imagine having a map() method on an array that produces another array. In a long chained pipeline of such things, compiler may be able to elide extra allocations and generate asm very near to handwritten for loops. But the abstraction breaks when the method does I/O or has side effects - you can't reorder anything in order to elide allocations. (Well this example may not make sense in Haskell because Haskell is lazily evaluated).
However there is a limit to what compiler can do, and it might manifest in edge cases like variation in order of imports. I'd rather have straightforward code than relying on compiler optimizations.
I am not arguing "against abstractions" like some Go fanboys tell you map and filter are less efficient. It is always possible to apply same map filter to iterators/lazy streams instead of arrays and get same performance in straightforward code. But that's not same as keeping in mind what heuristics compiler uses to optimize the code.
I suppose in FP/Haskell you would structure your code differently, so that it wouldn't have side effects when mapping arrays. That way you could even parallelize it trivially.
High level languages produce a lot less code than than less optimized implementations. Difficulties with reasoning about execution/performance is the penalty you potentially have to pay.
In my experience it indeed may be hard to understand performance with Haskell. That said, it comes with excellent tooling to overcome it in most of the cases.
Everything is a trade-off. Immutable objects enables easier safety when having multiple threads do work concurrently. Shared mutable state is still very difficult to do correctly, and at the point where you're introducing locks then you've crippled performance.
We have so many cores now that it tends to be a positive trade-off to have many threads doing some wasteful work (copies, extra GC pressure, potentially multiple threads duplicating the same work) than trying to have a perfectly optimized single thread.
>Its one reason I struggle with the modern fashion for immutable classes and FP, they are always making copies of everything, seems crazy
It depends how it's implemented. It's possible to get very nice performance with immutability and copying through use of an arena allocator, as your stuff will essentially always be in cache (due to reusing the arena), and allocation/deallocation is just bumping a pointer. Of course, not everything easily fits into this approach, but a surprisingly large amount of code can, if designed with it in mind (and using a language that supports it without too much pain, like C/C++).
The language Zig is particularly interesting in this regard because everything that allocates takes the allocator as a param, and it has built-in arena allocators in the standard lib.
Disclaimer: I'm not a languages expert, but, I think there's a case to make for performance vs clarity / readability. FP is great for parallelisation, multi-core work, things like web servers and other internet-facing services. But probably not the best for number crunching. Horizontal vs vertical scaling, I think.
I believe you can express your problem (and solution) better using FP, once you have it solved you can zoom in and replace the most demanding segments with iterative programming, or go down lower to the bare metal.
If an object is truly immutable, its object identity shouldn't matter in the vast majority of cases, and so it should be okay to just copy the whole thing, instead of passing around references to it.
Unfortunately, the legacy Java semantics of == means that they can't do this proactively. But didn't Java get opt-in value types recently?
I wonder how things would have stacked with OpenJ9 - AdoptOpenJDK project makes OpenJ9 builds available for Java 8/11/13/14 - so it should be trivial to include it in the benchmarks.
We have been experimenting with it in light of the Oracle licensing situation and it does provide interesting set of options - AOT, various GCs (metronome, gencon, balanced) along with many other differentiators to OpenJDK like JITServer which offloads JIT compilation to remote nodes.
It doesn't get as much coverage when it should - it's production hardened - IBM has used it and still uses it for all their products - and it's fully open source.
You mean the licensing situation where Oracle completed open-sourcing the entire JDK and made Java free of field-of-use restrictions for the first time in its history?
If you're talking about the JDK builds you download from Oracle, then there are two (each linking to the other): one paid, for support customers, and one 100% free and open-source: http://jdk.java.net/
Many organizations need to have supported 1.7 and 1.8 releases and it's a lot of money to spend on per core licensing which is a new thing after Oracle took over. The link you posted do not have free updated binaries for JDK 7 or 8. For those you have to pay. A lot.
So it makes sense to look for non Oracle JDK and along with OpenJDK, OpenJ9 is a great choice.
First, what do you mean by "after Oracle took over"? JDK 7 and 8 were released by Oracle (6 was the last one under Sun). Second, you always had to pay for the JDK after ~5 years of free public updates, and the price now is lower than before. Finally, OpenJDK is the name of Oracle's one and only Java implementation project, with contributions from other companies; OpenJ9 is, indeed, a completely separate project by IBM, but ~80% of that JDK is copied from OpenJDK. And those paid support subscriptions? They fund most of OpenJDK's development.
Not sure what you're getting defensive about - the Oracle JDK licensing change was a recent thing than JDK 7 or 8 release dates. And the per core licensing isn't cheap. So that gets companies to seek cheaper alternatives. What is there to argue about?
It's a fact that Oracle changed licensing on JDK 7 and 8 they admit that themselves.
The support license was changed to a subscription model that lowered prices after 7 and 8 were out of the ~5-year free public updates period. Companies wishing to stay on old versions and buy support for them pay less than they did or expected to before the change.
Well with per core licensing combined with monthly subscriptions it's not that cut and dry that you will pay less. I can't remember what we paid for Sun Java in licensing but it wasn't per core.
Java 7 required payment for support for four years before the license change [1]:
> The price is $25 per month per processor for servers and cloud instances, with volume discounts available. ... The previous pricing for the Java SE Advanced program cost $5,000 for a license for each server processor plus a $1,100 annual support fee per server processor, as well as $110 one-time license fee per named user and a $22 annual support fee per named user (each processor has a ten-user minimum).
So from (1,100 + 22 * 10) per year per processor + (5,000 + 110 * 10), the cost went down to $300 per processor per year (no one-time fee, no per-user license). That's a >4x drop in price (of course, there were, and are, various bulk discounts, but it's a big price reduction nonetheless).
Was old Java license per Processor or Per Core? Because the new one is Per Core. So if previously I had 4 sockets with 4 cores per socket - I paid 4x$LICENSE - now I am paying 16x$LICENSE. No?
Java 6 was released in 2006. The first dual-core Opterons and Xeons came out in 2005. The licensing for Java 6 was written in a reality where multi-core wasn't really a thing yet.
By the time Java 7 came out, you could easily use a single-socket multi-core processor for workloads where you'd have used a dual or quad socket before. It makes sense that the pricing was adjusted to match the new reality.
> When licensing Oracle programs with Standard Edition One, Standard Edition 2 or Standard Edition in the product name (with the exception of WebCenter Enterprise Capture Standard Edition, Java SE Subscription, Java SE Support, Java SE Advanced, and Java SE Suite), a processor is counted equivalent to an occupied socket; however, in the case of multi-chip modules, each chip in the multi-chip module is counted as one occupied socket.
It doesn't get any clearer than that - it is per core. So that's significantly increased cost for pretty much most enterprise users with VMWare and high core count Xeons.
I think it's actually a bit more complicated than that because we're talking about Java SE, which is in the excepted list. On Intel/AMD, every core is counted as 0.5 "processor units" (http://www.oracle.com/us/corporate/contracts/processor-core-...) so the unit "processor" price is per 2 cores.
But whatever the definition of a processor unit is, it didn't change. The prices both before and after the change are for the same unit.
> But whatever the definition of a processor unit is, it didn't change. The prices both before and after the change are for the same unit.
Umm yeah but what did change is lots of multi-core processors with increasing core count appeared and not changing the definition of processor unit to count each core as a processor means more money to Oracle. None of what you keep saying contradicts what I said - it's still a lot of money and people are not wanting to pay that much.
My point is just this: however much companies that wish to buy support pay Oracle now, it is significantly less than what they paid or expected to pay before 2019. Whatever the situation was, it was made better by the license change.
It really does get clearer than that, because I don't think it's talking about cores. None of the common Intel processors are MCM. They did make a massive 56 core Xeon by gluing two dies together and that would count.
Right, it's pretty clear elsewhere that it's per core. But the section you quoted doesn't clarify anything, it's actually contradictory and bizarre that it's present.
Only if Oracle is allowed to redefine the industry standard meaning of MCM. Read literally it does not imply per core pricing, in fact quite the opposite.
An 8 core Zen would count as 1. An 8 core Zen 2 would count as 2. A 2 core Pentium D would count as 2, but Core 2 Duo would count as 1 and a Core 2 Quad as 2. A 64 core EPYC would count as 9.
That's nice, but they also completely changed the license for java 8 in a minor security update late in the game, making it super easy to accidentally click through and putting yourself or your organization at risk of massive license violations. A trojan horse if I ever saw one.
That's not true. After ~5 years, and a notice given a year in advance, every JDK moves out of free public updates and into extended support, which is now cheaper, too.
I remember wanting to connect to a jvm with a profiler and getting a license agreement as that is now an enterprise feature and costs. It’s a slippery slope.
There are no more paid features since JDK 11. That low-overhead profiler is now free and open. For the first time in Java's history, the JDK is 100% free.
And I love you guys for it, I've used it quite often and appreciate what Oracle have done... their bad reputation somewhat precedes the good that they have done in that team.
Well, you need to pay for updates, but even that is inaccurate. Before JDK 9, Java used to have major releases, now it doesn't, and the version name is incremented every six months even though the changes are small. So instead of a major release, that's supported for some years, there's now a steady stream of small, gradual change with free perpetual support. There's also an alternative new, non-gradual upgrade model, and if companies want to use it, they need to pay.
Not true. You can use the OpenJDK for free until the end of time. If you want ongoing updates beyond six months, there are a bunch of free distributions: Azul Zulu Community (7/8/11/13/14), AdoptOpenJDK (8/11/14), etc.
Specific workload matter a lot. I had a good experience with Shenandoah collector on an application that generates very few intermediate objects, but once an object is created it stays in the heap for a while ( a custom made key/value store for a very specific use case). Shenandoah collector was the best in terms of throughput and memory utilization. Most collectors are generational, so surviving objects have to be moved from Eden to Survivor to Old. Shenandoah is not generational, and I suspect it has less work to do for objects that survive compare to other collectors. When most objects live long enough generational collectors hinder performance.
In the case of Hazelcast Jet and similar products, loads of young garbage are unavoidable because it comes from the data streaming through the pipeline. A generational GC should in principle get a great head start in this kind of workload, and our benchmarks have confirmed it.
Yep, workload matters. Generational garbage collectors are fundamentally at odds with caching/pooling of objects. They are based on the assumption that objects die young. Typically that is not the case for internal caches, though. Caches usually consist of long-living/tenured objects.
It is a stretch to claim caching is fundamentally at odds with GC. It is more correct to say that LRU breaks the generational hypothesis, because it prioritizes new entries which take a long time to be evicted. However many workloads are frequency biased and these one-hit wonders degrade the hit rate. That is why you'll see more aggressive eviction in a modern policy, so you'll have better GC behavior and higher hit rates using something like Java's Caffeine library.
Keep in mind that it's not fundamental. Generational GCs just make a bet that you can save a lot of effort by segregating the objects by age. In almost all Java workloads there's plenty of short-lived objects, and a generational GC takes care of them at an especially low cost. The price to pay for that is pretty low, basically it's the overhead of card marking (a write barrier is needed) and subsequent partial scanning of the Old Generation if there are many references from old to new objects.
Only very specialized workloads won't create much short-lived objects, and for those cases there are alternative non-generational GCs on the JVM (Z, Shenandoah).
Converting Java code to Kotlin, then compiling it with the Kotlin Native[1] is more promising from the performance point of view. Native code is always faster (assuming compiler is good enough).
An ahead-of-time compiler doesn't have the advantage of the call profile of polymorphic call sites. The JIT compiler has much more inlining opportunities, and in some cases this results in better performance.
Also, there are cases where manual memory management, which usually boils down to reference counting, has great overheads where a GC-managed runtime has no overhead at all. They involve repeatedly building up and then discarding large data structures. GC algorithms simply don't see the dead objects, whereas refcount-based management must explicitly free the memory of each object.
> The JIT compiler has much more inlining opportunities
That's largely only true for devirtualization, which tends to not be as much of an issue in AOT compiled languages due to having features that just make reliance on virtual calls less prevalent (think C++ templates as an example in the extreme).
The only other case where JITs can inline more than AOTs is across shared library boundaries, which can be useful but if it is useful in a particular place it's also typically easy to "fix" by just making that function statically linked (or implemented in the header, even) instead.
Otherwise the time constraints of JITs near universally mean they cannot optimize as well as AOTs, even though they do have more runtime information available. Unless you do a multi-tiered JIT approach like WebKit does ( https://webkit.org/blog/3362/introducing-the-webkit-ftl-jit/ ), with the last tier being the one that finally lets a full "AOT quality" optimization pass happen because you can finally justify the time spent on the optimizer. But then you also have ridiculous warmup latencies.
> Also, there are cases where manual memory management, which usually boils down to reference counting, has great overheads where a GC-managed runtime has no overhead at all. They involve repeatedly building up and then discarding large data structures. GC algorithms simply don't see the dead objects, whereas refcount-based management must explicitly free the memory of each object.
There's a lot more to this than such a simple claim. GC'd languages also almost always need to pay a zero'ing cost in conjunction with freeing memory which makes the actual free that happens a lot slower, and GC'd languages are slower the larger the object count gets while manual memory managed languages are ~constant. There's also more strategies in play for manual memory managed languages than just ref counting - such as just single ownership (std::unique_ptr, Rust's Box<>, etc..)
If you are doing something that involves repeatedly building up & and then discarding a data structure, though, then that's where a manual managed memory would run circles around a GC'd one. A simple arena allocator is a superb match for that and cannot be beat in performance. Bump-pointer allocation speed, zero GC pause, zero collection latency, etc... This is what games do for per-frame allocations, for example. Essentially a single-frame GC without a collection pass being needed. Not a lot of things actually do build up and then discard a structure repeatedly, so you don't get to use this trick very often, but when you can it's stupid fast.
I've been trying to fit big-enough long-running stuff into JVMs for a few years, and have found that minimizing the amount of garbage is paramount. Its a bit like games- or C programming.
Recent JVM features like 8-bit strings and not having a size-limit on the interned pools etc have been really helpful.
But, for my workloads, the big wastes are still things like java.time.Instant and the overhead of temporary strings (which, these days, copy the underlying data. My code worked better when split strings used to just be views).
There are collections for much more memory-efficient (and faster) maps and things, and also efficient (and fast) JSON parsing etc. I have evaluated and benchmarked and adopted a few of these kinds of things.
Now, when I examine heap-dumps and try and work out where more I can save bytes to keep GC at bay, I mostly see fragments of Instant and String, which are heavily used in my code.
If there was only a library that did date manipulation and arithmetic with longs instead of Instant :(