Note that most of the performance improvements come from PGO, which is enabled with following environment variables. PGO is not enabled in .NET 6 by default, but will be in .NET 7 IIRC.
set DOTNET_ReadyToRun=0
set DOTNET_TieredPGO=1
set DOTNET_TC_QuickJitForLoops=1
Here are my own benchmarks from a CPU intensive application without any IO and already optimized for allocations. Application runs a task graph either serially or in parallel.
.NET 5
--------------------------
| Method | Mean |
|------------ |---------:|
| RunParallel | 473.4 us |
| Run | 513.5 us |
.NET 6
--------------------------
| Method | Mean |
|------------ |---------:|
| RunParallel | 452.5 us |
| Run | 499.8 us |
.NET 6 PGO
--------------------------
| Method | Mean |
|------------ |---------:|
| RunParallel | 381.8 us |
| Run | 412.2 us |
.NET 5 - .NET 6 -> ~5%
.NET 5 - .NET 6 PGO -> ~20%
Here is what I learned from micro-optimizing a .NET application:
- Use BenchmarkDotNet[0] for general measurements and Visual Studio profiler tools for detailed inspection. They help a lot.
- Memory allocations matter. Using capturing lambdas, LINQ, even foreach on interfaces introduce allocations and slows down the application. You can use ClrHeapAllocationAnalyzer[1] to find these hidden allocations.
- Using abstractions with interfaces and casting back to concrete types cause some overhead, though PGO will probably eliminate most of these.
- Use LINQ cautiously as its variants are mostly slower than explicit coding. E.g. .Any() vs .Count == 0
- Checking Logger.IsEnabled() before calling Logger.Debug() etc. helps a lot. You can even automate this with Fody [2], but it breaks Edit&Continue and possibly .NET Hot Reload too, so it may hinder your productivity.
Any method does an extra null check and a cast to ICollection<T> which incur unnecessary performance degradation. Of course this is in micro optimization scale. If you do not call Any() on a hot path it does not matter which one you use.
> - Use LINQ cautiously as its variants are mostly slower than explicit coding. E.g. .Any() vs .Count == 0
When using LINQ also be aware that .First(predicate) is significantly slower than .Where(predicate).First() when called on List<T> and T[]. This is true for essentially all methods like Last, Single, Count etc. Don't trust Visual Studio when it's telling you to "optimize" this.
But if you want the last bit of performance, you shouldn't use LINQ anyways.
Not sure if it is still the case, but it used to be that First did a fairly naive foreach over the IEnumerable while Where has several collection specific type checks that allow it to use MoveNext and maybe other more efficient ways to traverse the collection.
LINQ is parsing a tree of System.Linq.Expression here and the cases of First(pred) etc. are just not optimized because of the added complexity with little benefit. It only recently became a problem when Visual Studio got a new built-in analyzer that tells people to "optimize" this.
As for BenchmarkDotNet, I totally agree with you in general - it's the best option available for micro-benchmarks. But if you want to run a benchmark involving a fairly complex interaction, multithreading, etc. (caching benchmark that I used is of this kind - it runs on client + server process, uses SQL Server hosted in Docker, etc.), it's rarely the best fit.
I use it for structured logging, which makes filtering and searching very convenient. E.g. I can filter by an object’s id and a property to see which tasks change the property of that specific object and in what order. Serilog[0] and Seq[1] are the best tools for this in my opinion.
Hi there, the author of the original post is here. PGO is disabled in .NET 6 by default mainly because of trade-offs associated w/ the startup time - and IMO it's totally reasonable assuming .NET 6 brings decent speed benefits even w/o PGO.
I turned it on mostly to show what you can expect from a service that runs for a while (more than a few minutes?) in a typical server-side scenario after migration to .NET 6 - IMO it's totally reasonable to turn PGO on for nearly any service of this kind.
In the same vein as what Linus is trying to do now at LTT, it would be useful if there was a "programming journalism" lab which builds "test benches" to create canonical benchmarks and verify the claims made on tech blogs (etc.).
Eg., highly standardised docker builds, on highly standardised hardware, running popular tasks using popular libraries for each "programming tech" (eg., website, stat modelling, event system, ...).
It is hard to do relevant tests of which language is the fastest.
Really, writing fast code is mostly down to the programmer. For example C is widely recognized as the fastest non-assembly language simply because it leaves a lot to the programmer, C won't magically make your terrible code fast, unless you are using time-to-segfault as a metric. Assembly is the fastest if you know what you are doing, very few know what they are doing.
So, what kind of code are you going to use for your benchmark? Highly optimized code written by experts spending way too much time, the "most idiomatic" code, code written by an average skilled programmer picked at random, code extracted from a big open source project? This can drastically change the ranking, so which one is the most relevant? If you go with the "most idiomatic" for instance, you miss out on the idea that parts can be optimized if needed, and that in real life, programmers aren't perfect and can write suboptimal code by mistake.
There is also a cultural aspect to languages that may not be caught in benchmarks. For example, C programmers tend to have a culture of performance, they tend to know about their hardware, will try to save memory, make data structure efficient, etc... Python programmers, not so much, instead they tend to value readability and development time.
You can't test languages like you test CPUs for instance. With CPUs, you just run the same code on them and time them. You can't do that for obvious reasons: your C compiler won't accept your Python code, it is necessarily an apples to oranges comparison.
> Assembly is the fastest if you know what you are doing, very few know what they are doing.
Just a nitpick but for any reasonably sized code, no. While some people can indeed do impressive optimizations on small segments of assembly, they are humans and they will fail to do trivial optimizations that are reliably done by compilers.
If garbage collection happens on a separate thread, and makes allocation much faster, is it really “slower”? You have to call malloc which will try to defragment your memory, and then later you will have to call free. Those block the calling thread, if anything, for certain problems they are slower than GC.
Garbage collection itself isn't really slow, per se. But allocating a lot of short-lived objects on the heap still means that a lot of objects have to be reclaimed fairly frequently. And at least .NET's garbage collector isn't able to do all that without pauses. And those add up.
But even if the GC is really concurrent: If there's a way of not doing that work it's still better, IMHO.
One interesting profile I've seen at work recently spent about 30 % creating objects, and another 35 % in garbage collection (of pretty much the same objects that have been created all the time). So if there was a way of not allocating that much, or not doing it on the heap, the algorithm could be about twice as fast.
But comparing it to not doing that work is somewhat dishonest — for that you use compare a malloc for each call and a destructor at the end of the scope — and surprisingly, malloc will often do much worse than a good GC implementation, with trying to defragment a bit, etc.
Also, Java can often accumulate big heaps because it only runs the GC when it absolutely must — as you mentioned, it would be unnecessarily work otherwise. It might be interesting to mention that OpenJDK is the “greenest” out of the managed languages due to that.
This is so true. The fastest implementation I’ve ever seen of a priority queue in PHP looks nothing like a priority queue by taking advantage of PHP’s sparse hash maps (aka, arrays). If you use any “standard” implementation it will be slower. I imagine this is true of most optimized algorithms in most other languages.
Usually the setups and architectural choices are so different that it would make more sense for everyone to get a good stat on which functions they are calling and how much and to make a prediction on the individual stats of each function. This does not take into account caching and multithreaded scenarios, but the map can never be the territory.
What would be representation of some real work scenarios? I think their test suite covers wide spectrum of operations enough so to be able to draw conclusions from them.
Unfortunently, after looking at the .net core implementation of this benchmarks, I wouldn't trust it at all. The code is just overengineerd to perform best at benchmark - everything hardcoded, custom routing to cover 2-3 routes with minimal overhead etc. It has nothing in common with real world code.
The Techempower entries are heavily gaming the system.
They often strip out framework functionality and hyper-optimise for the specific benchmark, including things like pre-allocating the exact amount of memory needed to serve the request, not doing route matching et all, etc.
They are basically an exercise in "how clever can we be to win the benchmark" rather than a realistic portrait of real world performance.
at least for .NET the versions that strip out framework functionality are marked separately, though this part is not that easy to understand if you don't know about it. There are several .NET entries from very low-level without MVC and without ORM up to the full stack.
But still, these benchmarks have their uses but there are a lot of caveats you need to consider when looking at the results.
I've seen the posts about all the speedups each new version of .NET gets and I'm just wondering, was .NET just alright in performance before all this? Is that the reason they can get all these speedups? :P
I'd be interested in some JVM vs .NET 6 benchmarks too, which platform to chose when.
EDIT: I know about the benchmarks and I also feel like sometimes these benchmarks are really optimized in a non-idiomatic way. I would love to know how performance idiomatic Java/.NET code is and if one is to start a new project today why one would choose the JVM over .NET or when someone would chose .NET over JVM.
C# was never really slow in the same way as python, etc. And anyway 99% of the time you're gonna be slow because the sack of meat writing the program screwed up some aspect of the system design or the code. Unless you're using something really shit-tier for perf.
The gap closed significantly with .NET core, which is why everyone was quite surprised when .NET 5 (the next iteration) had a fairly significant speedup in many scenarios.
For reference, Stack Overflow was running on .NET MVC on like 2 servers pretty recently (with some auxiliary infrastructure for CDN and search) and using MS SQL.. I think it might still be running on this setup but not 100%. Honestly I have no idea how they do it on a .NET monolith but there you go.
Stack Overflow runs on 9 web servers with (iirc) 48 logical cores (2 x 12-core Xeons) and 64GB RAM. Those servers are shared by a few apps (Talent/Job, Ads, Chat, Stack Exchange/Overflow itself) but the main app uses, on average, ~5% CPU. Those machines handle roughly 5000 requests/sec and were running .NET 5 as of September 2021 (when I moved on). That’s backed by 2 v. large SQL clusters (each consisting of a primary read/write, a secondary read-only in the primary DC and a secondary in the failover DC). Most traffic to a question page directly hits SQL - cache hit ratio tends to be low so caching in Redis for those hits tends to be not useful. As somebody mentioned below, being just a single network hop away yields really low latency (~0.016ms in this case) - that certainly helps being able to scale on little hardware - typically only 10
- 20 concurrent requests would be running in any instance at any one time because the overall request end-to-end would take < 10ms to run.
Back in full framework days we had to do a fair bit of optimisation to get great performance out of .NET, but as of .NET Core 3.1 the framework _just gets out the way_ - most memory dumps and profiling subsequent to that clearly pinpoint problem areas in your own app rather than being muddied by framework shennanigans.
Source: I used to work on the Platform Engineering team at Stack Overflow :)
That's surprising to read. Is that because of the sheer volume of question pages? I don't think I've ever been on an SO page that couldn't have been served straight from cache.
Is it? Most people come to SO from Googling their random tech problems/questions. Not sure how much value there is in caching my random Rails questions, etc
I would expect SO usage to follow a distribution like Zipfs — most visits hit a small subset of common Q/A, and there’s a ridiculously long tail of random questions getting a few visits where caching would do next to nothing. I’m fairly positive I’ve seen some post showing this was true for atleast answer-point distributions.
Though I guess it’s possible for a power distribution for page-likely-to-be-hit to still be useless for caching, because I think you could still get that distribution if 99% of hits are on nearly-unique pages; with a long enough tail, you’d still have only relatively few pages worth bothering to cache, but by far most visits are in the tail
A poster above claimed the servers were “48 logical cores (2 x 12-core Xeons) and 64GB RAM”, which really isn’t what I would consider such a “beast” of a machine when the RAM is in laptop territory, and a modest number of cores for a server.
Nowdays you can purchase that machine on the second hand market for something like $200-$300.
They are measurably faster than even contemporary laptops though plus you often get ECC ram and raid disk setups and the good old Xeon didn't used to ramp up and down in speed, it just ran fast all the time. I'd still characterise that as a beast, especially on $/performance terms (although the power consumption is a worry).
Xeon covers a pretty wide variety of chips. Of course, the Pentium II Xeons didn't have speedstep either. 12-core tells us either fairly high end but older or kind of medium-low but newer. Dual socket tells us not the really low end Xeon that shares a socket (and probably a lot more) with the high end enthusiast desktop chips.
> And anyway 99% of the time you're gonna be slow because the sack of meat writing the program screwed up some aspect of the system design or the code. Unless
Or in my experience, most of the time a service's latency is dominated by out-of-process calls, e.g. the time taken to talk to other services over http, or to retrieve data from a data store. Speeding up the runtime is welcome, but even a massive 40% speedup of something that constitutes 10% of your total latency is ... closer to a 4% reduction in latency. Design matters more.
If your program is slow because it's waiting for external service response, you're doing programming wrong. Your program should do other work in the meantime. I guess you're already doing that, but if so, doesn't that invalidate your reply here?
> you're doing programming wrong. Your program should do other work in the meantime.
It's not always true that there is other work to do in the meantime. In fact in my experience, it seldom is. "you're doing programming wrong" is a very strong statement, and not one that I take seriously in this context.
Typically you "await" the external service response, so that it is not using a thread to do that, and "other work" in the form of starting to deal with other requests can happen in the meantime, thereby increasing service throughput.
But that won't speed up a given request - you can wait for an external service more efficiently, but you can't wait faster.
Services that do not depend on any other http services or any data store do happen, but they are rare in my experience (calculation engines, I suppose). So for almost every service, when thinking about response time, you have to, first and foremost, think about the latency of the data stores or upstream services.
Time taken by processing done by the application itself. Faster processing is good even if your backend is mostly waiting for other services - now it can wait for more services at the same time and deal with their output faster, allowing your app to use less resources or handle more traffic. That is especially pronounced on scale - if you're spending X million for computing time a month, reducing it by few percent is very interesting to you.
Both can be relevant. In the real world, designing how to deal with dependencies and data stores is clearly relevant, as it can be the largest large part of the time taken to respond to a given request. It would be a design error to ignore it.
If your job is to make performant services, in .NET (or in any similar language) external services are 100% relevant to your job.
If You want to be pedantic - and you most certainly do - then only .NET performance itself is relevant to performance of . NET itself, that's a truism, as defined.
But this narrow focus is not useful - if you want to do the job, then you have to think a bit more widely, and understand what the real problem is.
Sorry but this is not pedantic at all. You're saying performance speedup is irrelevant because most time is spent waiting for external services... But that is simply not true if you're beyond just a few servers. Output of external services needs to be processed and having faster software means less resource is spent per request, thus more requests can be served for the same resources. That is very important.
I am not saying that; you are oversimplifying into a straw man. great-grand-parent comment is where I literally put a non-zero number to how relevant language perf improvement was: https://news.ycombinator.com/item?id=29295950 and later on "Both can be relevant"; which all contradict your characterisation.
The only person who said above "it's not relevant" is you, and you also said "external services are irrelevant" - You're projecting the "it's irrelevant" statement onto me here.
But the design considerations of how and when to use external services very definitely are relevant to service latency, contrary to what you say, for reasons given multiple times above. Your current odd comments are not fact-based or interesting, so I don't think that you have anything more to add to this discussion at all.
>If your program is slow because it's waiting for external service response, you're doing programming wrong. Your program should do other work in the meantime.
What work? Mine bitcoins while you wait the result for an API call?
I think any sane person here assumes any IO, especially in .NET is already async and allows other requests to use that time… Most web apps are still IO constrained.
If you're asking "how is waiting for external requests relevant to my service's response time" the answer is "because it's usually the largest part". I really don't know how to explain it more simply than that.
If you need to improve this, and that code is .NET, then the solution is a different design, also in .NET Code.
> Honestly I have no idea how they do it on a .NET monolith but there you go.
Low latency in a single rack can work wonders for performance. All those cloud services talk to each other over miles of cables and if you can slash latency to submillisecond regions, you get less wait time and free resources quicker. If you don't distribute your state across multiple Microservices, you can also save quite a bit of overhead.
Plus hardware is just wicked fast nowadays and SO has a model of millions of reads for a single write.
.NET has been faster than Java on most of the benchmarkgame benchmarks for a while, since .net core 3 or so.
More specifically though the JVM has tended to be better about optimizing naive code than .net while .net has tended to offer more tools to do your own optimizing (unsafe, simd, value types, etc). So it would be interesting to see if the performance of naive code has improved relative to Java lately
> .NET has been faster than Java on most of the benchmarkgame benchmarks for a while, since .net core 3 or so.
And which benchmarks games are those? If I go to to the Techempower benchmark and select only C# + Java. Java comes on top in every individual category of all the benchmarks.
I'm not claiming that Java is faster than .NET. Just that I don't believe one platform is significantly faster than the other.
Such programs are often specially and painstakingly constructed to avoid all the commonly used language features that are inefficient. For example, in Java, user-defined data types are heap allocated and generic code boxes everything, even primitive types (an ArrayList of ints becomes unfortuately an array of pointers).
Are these programs benchmarking typical idiomatic Java, or just some subset of the language?
I agree. That's really not usefull comparision. They should create categories for each benchmark, like:
- very naive code (shortest, most readable & easy to write code)
- idiomatic code
- optimized code without other-language-libs wrappers and without SIMD, single threaded
- optimized code without other-language-libs wrappers and without SIMD, multi-threaded
- optimized code without other-language-libs wrappers and with SIMD and/or multi-threaded
- optimized code with other-language-libs wrappers allowed and any other optimization technique
I agree it's not a useful comparison. That's why I don't give much weight to statements such as "Java comes on top in every individual category of all the benchmarks".
If you're also separately benchmarking the same C library running on its own, then it's quite interesting to benchmark a .NET wrapper around the exact same library, as it allows you to estimate the overhead from the .NET runtime itself as separated from user code (ideally you'd try this with a bunch of different C libraries).
Of course, the program should be very very clearly labelled accordingly. Since it was just labeled as "csharpcore", then I am inclined to think the submitter was treating the benchmark as a competition.
Please show the objective rules that could be used to identify "typical idiomatic Java" and "typical idiomatic C#".
Please show the objective rules to direct how comparison should be done when one languages "typical idiomatic" is not the same as some other languages "typical idiomatic" — to avoid you can write Java in any language.
There's some low hanging fruit, like not permitting specialized collections in Java for a set of integers. Because of type erasure these are all heap allocated in Java but not in .NET.
I think there's value in benchmarks showing both the fastest you can go if you need to (specializing everything to eke out max performance), and benchmarks showing how fast you will typically go if optimizing for productivity.
You could probably constrain it sufficiently for some set of problems. Maybe something like: Solve problem X using the standard library associative map by elaborating the following pseudo code.
Yes any benchmark will be invalid for some people, such is life. If you want to claim or know something specific you will have to so your own painstaking investigation or find someone who has done that work.
benchmarkgame does not attempt to compare idiomatic solutions for languages, it is closer to a “what is the best you can do” benchmark
> It is closer to a “what is the best you can do” benchmark
As I suspected. So of course this tells us very little about how fast idiomatic code is relative to other languages. "The best I can do" is to invoke hand optimised assembly language, but rarely is that the right choice.
A much more useful test would involve benchmarking some similar real world apps that solve the same problem.
Please show the objective rules that could be used to identify "idiomatic Java" and "idiomatic C#".
Please show the objective rules to direct how comparison should be done when one languages "idiomatic" is not the same as some other languages "idiomatic" — to avoid you can write Java in any language.
You're looking at pretty old results, round 18 was in 2019. I also don't think that boutique web frameworks say much about the strength of the underlying language or runtime (e.g. look at just.js).
What in the Java world is in the same maturity tier as ASP.NET is open to opinion, but at least local Java devs seem to consider Spring or Micronaut as sane defaults, and of course modern ASP.NET runs circles around those.
Java's performance hasn't really mattered since Oracle took it over. There are things MUCH worse than poor performance, and being owned by Oracle is one of them.
OpenJDK has the same goddamn license as the linux kernel. It is (yes, the open source codebase) developed by Oracle 98+% alone, and other vendors are just forks of this code base (including oracle jdk, which contains only trivial changes AND a paid support option, for those that need it)
You can hate Oracle as much as you want but their Java division is a surprisingly adapt and capable team, doing very great job at stewarding the language.
IMO that link makes .NET look very good. Aspcore, the straight off the shelf, obvious choice, is the best performing .net server? It beats Jetty and Spring but loses to a long tail of less popular frameworks
That seems a bit misleading of a comparison IMO and only one case (JSON serialisation) when I look at their data. You are also linking to a round from two years ago which is out of date. It's also showing a lot of frameworks that are not that mature and not well used in the Java camp vs ASP.NET that is widely used, full featured, has a lot of bells and whistles and a lot of plugins available for most technologies and standards. All of which could have negatively influenced performance, even the hooks to allow them to be injected in can do so even if not enabled. The fact that a full featured web framework makes it close to the top (sometimes the top) over several rounds over many of their categories of use cases I can't discount as pretty good.
i.e. Its hard to read benchmarks without context of each framework shown, the compromises they have taken, how usable it actually is for building software vs just a benchmark, what shortcuts are done in the benchmark, how idiomatic is the code, etc.
My personal experience having worked on both platforms for several years is that Java is easier to get to an acceptable performance, but the .NET runtime when you have to put the effort in has a higher upper bound of performance. It just has more tools in the CLR to work with than the JVM (e.g. value types, proper generics, spans, and more) so you can express something with a little more mechanical sympathy. Java is left with some decisions from legacy IMO that by default hurt its performance (i.e. lots of default boxing has hurt me before especially with generics). With .NET Core and future versions I think .NET is also taking up Java's default perf area as well. YMMV but if I'm worried about performance being a risk in my project .NET gives me more tools to optimize it IMO should that risk eventuate.
well the same for techemporer benchmarks.
sorry but some of that stuff is as shady as the benchmarksgame. I really don't get it, why people don't create 100% benchmarks instead of specialized ones.
>> Other readers may have a broader range of skills
Relax.
When comparing anything, the comparison has to be fair. This is obviously not the case here. It's like Java is a Ford Mustang from the dealer and C# is a Camaro with a F1 engine installed.
While I'm not super familiar with the Java world, none of the frameworks that have a significant advantage sound familiar to me - I'm not sure how mature are they, whereas Asp.NET is the solution for writing servers under .NET.
Lol look at the code. N-body for example, the C# is horrific C-in-C# code with a million optimizations (just read the comments lol), the Java code is idiomatic and not optimized at all.
Well the fastest C# entry for n-body looks like a translation of the C/C++ versions. It is a meaningful result that the primitives of the language allow for it to hang in that company. The absence of a Java version using numerics is unfortunate, that'd be a nice addition.
A lot of the coding style seems optimized around copy-pasting the C code, e.g. trying to alias the Vector methods (Vector256.Create) to their instruction name (_mm256_set1_pd). That makes the code non-idiomatic, but it also doesn't really help performance, just makes the porting easier.
The F# example is on the same runtime and a better view of using the numerics directly. As a trade-off of performance, memory, and code complexity it is actually a pretty solid balance, which I wouldn't have expected.
I can inline C code in Ruby, does that mean Ruby is as fast as C now?
I'd much rather see a comparison of idiomatic code in different languages. When I choose a language to build something in I'm not thinking "How can I write C in this language?"...
Inline C/ASM is literally a different language. System.Runtime.Intrinsics is a library. Just because this one benchmark used the library in a C-like style doesn't mean that's required.
There are many simpler implementations in all languages. I actually like that there are multiple implementations, as this lets us estimate the benefit and complexity of adopting various optimizations. Limiting the benchmark to naive implementations would penalize languages with more broad capabilities for optimization.
I'd personally prefer a benchmark limited to memory-safe implementations, though.
And they're slower than Java... Albeit super close. The parent I responded to was referencing how C# is supposedly faster than Java... But idiomatic C# is about the same (slightly slower).
You can do the same thing in Java, but apparently no one wrote it yet because it's just a silly micro-benchmark. Either way I'd rather see idiomatic code. Or just write C/C++ directly.
Well .NET are going in the same direction as Java - as the top comment here observes the speedups come mostly from profile guided optimization which has been Java's forte since Java 1.1 or so. And .NET PGO is off by default, still, so it's got a long way to go yet.
Just to add, Java now has simd in the form of the incubating Vector API. I found that it is one of the best high level low level SIMD APIs, with automatic fallback to for loops on ineligible hardware, as well as having an option for preferred lane width.
.NET has the ability to handle your memory layout explicitly with structs.
They expanded on that functionality with span and made sure that the common libraries is implemented using this.
If you do textbook OOP development for everything, you will end up with a lot of allocations and what not, which was the case, so they went through the entire base class library and rewrote all often used methods to be faster.
>If you do textbook OOP development for everything, you will end up with a lot of allocations and what not, which was the case, so they went through the entire base class library and rewrote all often used methods to be faster.
They forgot to tell the others that there is life beyond OOP and GoF design patterns.
C# is filled to the brim with functional programming features. Much of the base class library is somewhat functional (although obviously not all or even most, given the age of the BCL and stability of the API).
> was .NET just alright in performance before all this
For some niche applications (i.e. financial exchanges), .NET 5 [was/is] arguably the fastest way to implement certain ring buffer abstractions because of its interesting blend of performance and safety. There is a variant of the LMAX Disruptor developed for .NET which leverages the value semantics of the C# struct to push things beyond what the Java implementations are capable of [0].
Certainly, with enough resources and manual memory management, you could best the C# implementations using a C/C++/ASM codebase, but this is a tenuous tradeoff with practical risks that must be accounted for.
Mark and sweep garbage collection is optimal for some kinds of multi-threaded algorithms. If further minimised with judicious use of value types, it can be surprisingly difficult to outperform it even with carefully tuned C++ or Rust code.
It’s not the same, but there is this well-known framework benchmark [0], it always had the .net frameworks close to the top.
I’m guessing a lot of the speedups come from getting rid of legacy cruft. With .net core/.NET 5/6 they got rid of a lot of things compared to .NET Framework 4.8 and could play with optimizations that simply weren’t doable before. That’s just me guessing, though ;)
It's that in part. Here are some of my additional observations and or guesses.
They invested a lot of time adding language features with compiler and runtime support to avoid e.g. heap allocations/copying, like Span<> and friends, (readonly) ref structs, in/ref/out parameters (ref and out parameters existed before but were used a lot less in the runtime), or ValueTasks to some degree. This in turn enabled a lot of potential for optimizations in the compiler (aside from essentially writing an entirely new bytecode compiler with Roslyn and entirely new JIT with RyuJIT, throwing out the crufty old compilers), in the general runtime, and in the specific runtimes/frameworks e.g. ASP.NET. Those optimizations have to be implemented first however, and more and more get implemented with each new version.
I have a project I maintain that sees an almost 50% speedup from net48 to net5, and another 10-15% speedup from net5 to net6 (based on the time it takes to run the extensive test suite). It isn't even that compute heavy. From profiling it appears that a lot of these speedups are due to internal copies of data being avoided, and a lot of additional fast-paths in the runtime (e.g. fast-paths for byte-arrays or character-arrays as opposed to taking the generic array slow paths).
Another thing of note is that they added a lot of `bool Try*(..., out result)`-style APIs meant to avoid exceptions and the associated handling, and switched a lot of internal code to use these functions. E.g. in the reference source of the net48 runtime I think there are still a lot of instances of
try {
var number = int.Parse(value);
}
catch {
// slow path/error path
}
instead of the new-idiomatic .netcore and later style of
if (!int.TryParse(value, out var number)) {
// slow path/error path
}
try-catch was/is slow-ish, and throwing exceptions is too, aside from it preventing inlining by the JIT a lot of times.
And while #nullable (source annotations for what is nullable or not) and associated annotations such as MaybeNullWhen() had no direct influence on how the compiler could optimize, it probably helped people a lot writing correct code and as a side effect a lot more code became compile-time provable non-nullable which enabled further optimizations e.g. generating code that skips redundant null checks.
Right, this one has, a lot of other public or runtime-internal Try* methods have not.
And even tho this particular one has existed for a long time, that doesn't mean it was used consistently in the runtime or in the popular first and third party frameworks.
I'd argue the Try*-style, while artifacts of it were present before already, only really became widely idiomatic with dotnetcore.
Well, the Java approach is philosophically somewhat different. C#/.NET is closer to C++ where they are very willing to complicate the language and APIs to make the job of the runtime or compiler easier. Java just philosophically refuses to do that, more or less (perhaps you could argue that's changing a bit now with value types).
So in Java they just made exceptions really fast. There are lots of runtime optimizations around exceptions, for example, if you regularly parse strings that aren't numbers then the resulting exception will automatically stop having its stack trace filled out, which makes throwing drastically cheaper. The JVM can also inline the code and then convert try/catches to ordinary gotos.
Honestly I really like C#'s outvar and return success idiom. It's soooo ugly and yet slick at the same time. C does the same thing but I think the inline out var declarations make a huge difference to using them. Of course you miss out on error context an Exception or Result<T, Err> gives you but for many of the Try* functions it really doesn't matter.
In a lot of cases Try* is the outright right approach, too. Like `IDictionary.TryGetValue(key, out var value)` is better than try { var value = IDictionary.GetValue(key); } catch (KeyNotFoundException) {}` and has no race like `if (IDictionary.ContainsKey(key)) IDictionary.GetValue(key);`.
Try* functions are still free to throw in actually exceptional cases, just not on generic not-so-exceptional errors.
If you really need context, there is nothing from stopping you from implementing Try* functions in your own APIs that either have another out param for the error information, return the error information instead of a bool or use a Result<T, Err> kind of type (or a tuple), either.
.NET Core also introduced some new CLR features that are incompatible with the CLR used in .NET Framework. Span<T> for example.
This has happened before with .NET Framework 2, 3, 4 etc. but instead of making a .NET Framework 5 they rather made .NET Core cross-plattform and threw backwards-compatability-at-all-cost out of the window. While all .NET Framework applications (except the ones that do naughty stuff with reflection) that were compiled for .NET Framework 4.5 behave the same way on .NET 4.8, .NET (Core) got rid of this and lets developers bundle the CLR directly, giving them more leeway for incompatible changes.
(1) .NET Framework was slow and had some bad habbits (e.g. heap allocations, reflections, little optimizations, etc) ... especially the web stack. .NET Core/.NET fixes that, issue by issue. And since .NET is historically very close to the underlying platforms, we now see competitive outcomes (to e.g. Go, C++, etc).
(2) Performance = lower CPU/Memory Allocation = more throughput = lower Cloud costs. At scale, that makes a huge difference.
Software benchmarks are super subjective. Michael Larabel at Phoronix and Isaac Gouy of the benchmark game have done a lot in this area. But everyone says you need to take it with a gain if salt (which is often true).
There's also TPC-C benchmark suites where people benchmark their own software and claim results. Not really independent journalists there.
No, they are not, but they are just a measurement tool, not a source of absolute truth. When I studied engineering at ETH we learned "Who measures measures rubbish!" ("Wer misst misst Mist!" in German). Every measurement has errors and being aware of these errors and coping with it is part of the engineering profession. The problem with programming language benchmarks is often that the goal is to win by all means; to compare as fairly and objectively as possible instead, there must be a set of suitable rules adhered to by all benchmark implementations. Such a set of rules is e.g. given for the Are-we-fast-yet suite (https://github.com/smarr/are-we-fast-yet).
It's subjective because it can't be used as a source of truth. Of course "I measured X and my results were Y using methodology Z" can be a statement of fact but X and Z are where the subjectivity lie.
For example, benchmark game allows for warmups and so does awfy. This favors jits because it allows them to warm up when they would otherwise be slower. This might give the mistaken impression that java is a great choice for command line tools due to the performance characteristics.
In contrast, most benchmarks I've seen don't use profiler guided optimizations for C or C++. Hence the subjectivity.
And the claim of only wanting idiomatic code in awfy. This is, of course, subjective as well.
You could be correct. The point doesn't rely on what a specific benchmark collection in particular does but that there is an open discussion on what is appropriate in the context of what people using the results find important.
One SQL query going over the network will so dominate any micro-optimizations in the framework that it's a little silly for most of us to listen too closely when the ASP.NET team says they've sped up request processing another 40%. If JSON parsing request bodies and reading headers are significant, an API generally isn't doing very much.
I assume temp solutions and low hanging fruit was added in the move to .net core and the new compiler. Now that it's more stable, things can get tightened up.
Most of the performance gains are really in the middleware (for example Entity Framework) and getting rid of pre-.NET Core legacy cruft rather than the VM, AFAIK.
The speed-up very much depents on the (micro)benchmark in use. I did some measurements using the Are-we-fast-yet benchmark suite which includes both micro and larger benchmarks and got an overall speed-up (geometric mean of factors) of only 2% on x86 and even a little speed-down on x64.
You're cross-compiling from the Oberon+ language to CLR IL bytecode using your own compiler... This isn't exactly something a lot of people would do. Most people would write more or less idiomatic C# and have the "official" compiler (Roslyn) produce the byte code.
What I am saying, I guess, is that I am not quite sure how much of your benchmark results come down to the quality of IL your custom compiler spits out.
As an F# developer, I find it a little frustrating when people assume.NET = C#. If the blog post is about the speed of IL generated by the latest C# compiler, it should say so in the title instead of claiming to measure the performance of .NET in general.
The runtime team certainly looks at discrepancies where C# and F# generate different IL that should still run about the same when JIT-compiled. So while C# is the main focus (also since the runtime libraries are written in C#), F# is not forgotten and benefits from a lot of those improvements as well.
> You're cross-compiling from the Oberon+ language to CLR IL bytecode ...
It's just "compiling", not "cross-compiling"; using CLR/CIL as a language backend is an intended feature, that's why the CLR and IL are standardized in ECMA-335 and ISO 23271, and that's why it is called "common language infrastructure".
> Most people would write more or less idiomatic C#
You are welcome to write a C# version of the benchmark.
> have the "official" compiler (Roslyn)
It's not the "official" compiler, but just the C# compiler implemented by MS and community; there are a lot of other compilers too.
You're arguing semantics. GP's point is that the compiler shipped with the platform may produce better byte code, which could have an affect on the benchmark results. This seems like a reasonable point to make.
Don't forget that IL is not executed, but is just an intermediate representation, and optimizations are done by the CLR; e.g. Mono does the following optimizations (according to e.g. https://man.archlinux.org/man/mono.1.en), regardless which compiler generated the IL:
branch Branch optimizations
cfold Constant folding
cmov Conditional moves [arch-dependency]
deadce Dead code elimination
consprop Constant propagation
copyprop Copy propagation
fcmov Fast x86 FP compares [arch-dependency]
float32 Perform 32-bit float arithmetic using 32-bit operations
gshared Enable generic code sharing.
inline Inline method calls
intrins Intrinsic method implementations
linears Linear scan global reg allocation
leaf Leaf procedures optimizations
loop Loop related optimizations
peephole Peephole postpass
precomp Precompile all methods before executing Main
sched Instruction scheduling
shared Emit per-domain code
sse2 SSE2 instructions on x86 [arch-dependency]
tailc Tail recursion and tail calls
You made a claim. Someone disputed the validity of your evidence. And your response is “well you can rewrite/replicate my entire project if you like”.
I think most people are going to assume your claim is bullshit and move on with their lives. You made the unconventional claim so the burden of proof is on you.
> You made the unconventional claim so the burden of proof is on you.
My assertion is supported by sufficient evidence. The criteria of scientificity are fulfilled. You can repeat the experiment on your system yourself if you wish. Under the referenced links you will find everything necessary to do so.
You're arguing a very specific subset, which is a completely different thing than what essentially every article on .NET 6 performance claims. The performance claims are almost always about the whole thing, including various parts of the framework, the standard library and lots of low-level optimizations.
Microsoft published an enormously long article detailing many of the optimizations that were done (https://devblogs.microsoft.com/dotnet/performance-improvemen...). And it is not very suprising that pure number-crunching benchmarks only using the .NET IL would not gain very much. As much as I hate to discuss what "real world" applications are, the claims Microsoft and others are focusing on are much more relevant for typical applications where .NET is used than your examples.
Nobody is arguing against the results that you got. The question is if the results are applicable to the wider ecosystem or if there is another confounding variable that explains the outcome. Your experiment hints in this direction, and maybe someone should create another one that teases this apart, but definitive arguments either way are premature.
No one is disputing the results of your test. The question is will those results be replicated under conditions that are relevant to people writing code in a mainstream language under a much more prevalent compiler?
The answer might be yes! Everyone should always be suspicious of microbenchmarks. However people are also wise to be suspicious of benchmarks in obscure languages.
Your results introduce too many new variables for anyone to be comfortable to use it as a data point to inform their decision making.
If you want to compare different .NET versions running natively on M1 such versions must be available as a precondition. If so, just download e.g. http://software.rochus-keller.ch/Are-we-fast-yet_CLI_2021-08..., update the included runtimeconfig.json file to the .NET version in use and run it (dotnet Main.exe).
We are testing on .NET 6 now with a large LoC monolithic asp.net system and the results indeed again have improved. We already rewrote a lot when we moved to .net 5 core to be more idiomatic so I guess those things were optimized more. It is not a huge jump but definitely nice work!
Edit; will try to post some numbers when all tests succeed; it is closed source but for a large (millions LoC) codebase I think it is nice to see how it performs under the same conditions compared to our current prod.
No. Reflection is a program accessing or modifying its own program structure. There's not need for it to be unstable, languages like Lisp, Java and I assume C# have clearly defined semantics for it.
In practice there is though, it just depends on what you choose to take a dependency on.
For example, a few years ago the C# compiler did some lambda function optimization work. This broke someone's code because they were using reflection, and ultimately depended on how lambdas were getting optimized prior to the performance improvement in the compiler. The team by-designed that regression, since they make no guarantees that you can depend on a particular implementation detail of how the compiler optimizes things.
That said, when people use reflection in .NET, they're almost always programming against something that is stable and has likely worked the same way for a decade.
Also, I can't believe I didn't mention this already:
Reflection in .NET lets you dynamically invoke anything declared internal or private as well. I think it goes without saying that your code can be broken in the future if you do this.
You're confusing safety with stability. Reflection is necessarily unsafe, but not necessarily unstable. You just need to take care about keeping within bounds of guaranteed behavior.
E.g. in Java calling a method through reflection is guaranteed by the language to work. 100% stable forever.
Reflection also allows you to call internal JVM methods. This might or might not work depending on the JVM, making reflection an unsafe feature. It's still stable on the JVM's is works on though.
Perhaps parent comment meant "unstable" in the sense that it turns compile time failures into runtime ones:
e.g. without reflection, if you type "customer.GetOrders()" then it either compiles or does not, whereas reflection code that finds a method called "GetOrders" can compile just fine but you won't know if it finds a method of that name or returns null, until runtime.
How safe, audited and non-invasive is .net core by now? There's a .net program I have a VM for and that's kind of a pain. Since .net had telemetry by default, running bare on my machines was never an option, and Mono wouldn't even work.
.NET doesn't have telemetry. The .NET SDK does by default, but that's for developing, not running, .NET apps. You don't need to (and shouldn't) install the SDK on a production machine.
In other words, if you just downloaded the .NET or .NET Core runtime to host an app, there's no .NET telemetry.
As far as the .NET SDK, you can disable telemetry by setting the environment variable `DOTNET_CLI_TELEMETRY_OPTOUT` to `1` or `true`.
In which kind of world do you live that setting a single environment variable is too much technical work?[0] I have the feeling your post is more about shitting on .NET with a low effort excuse than genuine interest.
[0] set DOTNET_CLI_TELEMETRY_OPTOUT environment variable to 1 or true
Anyway, OP was worried about installing .NET because it has telemetry by default, meanwhile you can disable telemetry before running your war-app or just ship standalone? idk.
COM+ was basically Distributed COM, and was available for years before .NET. .NET Framework was implemented built on existing Win32 and COM/COM+ calls though, which is why you might see that.
You're both sort of right. There was something that was released under the name COM+, which was a bunch of services on top of DCOM. But what became the .NET CLR was also internally called COM+ (or part of COM+?) under development.
I think what happened might have been similar to what ended up happening with the .NET name later - there was a name associated with an umbrella strategy, a bunch of different technical components were under development associated with that strategy, but only some of them were released before the strategy changed again, while others were repurposed/repositioned to be part of the new strategy.
This is a common pattern with Microsoft product/feature naming and I think it's one of the reasons everyone including Microsoft developer relations people routinely comment that Microsoft "not good at naming things". It's continuing now with UWP and WinRT, where those names are actually used to refer to a bunch of different things that were once part of a now-defunct Windows strategy - some of these things are now deprecated, while others (like the WinRT core language interop model) are still the basis of most new Windows API development, but this is very confusing to developers because of their association with
the abandoned overall UWP strategy
Unfortunely the only thing that didn't die with the strategy was the deep ingrained love for everything COM that the Windows team has, and they keep going at it without realising the rest of the world is done with COM, and we only endure it due to lack of alternatives in Windows APIs.
If they only had kept the way .NET Native and C++/CX exposed COM, but that would be too easy for their ways, and those tools are now gone.
It was the year 2000, web services was the hype of the .com bubble. So Microsoft pushed the web services in the net. Hence, all the products .NET. Windows Server .NET, Visual Studio .NET, .NET Framework etc.
The library benchmarked in the article is Stl.Fusion: https://github.com/servicetitan/Stl.Fusion. I've only learned about it today, and the documentation is a bit messy, but that seems to be a really interesting project. The author describes it as a .Net library to quickly develop efficient, distributed, real-time web applications.
Quite interesting, but I would like to also see other benchmarks as the author said that the speed is constrained by the DB, ORM and other stack choices.
I was so happy with upgrading one of my app to .net 6: I saw perfs gains from 10 minutes execution time on .net 4.8 to 1'30 minutes on .net 6.
Then my boss reminded me that we had new hypervisors with SSD (the old one had still spinning platters) so now I'm not so sure the .net 6 upgrade really made my app faster.
I've never had to use it in my code, but I've had plenty of problems with its use in applications. It always seemed to me that it was Microsoft's attempt to lock people into Windows forever.
> It always seemed to me that it was Microsoft's attempt to lock people into Windows forever.
It has had first-class support for Linux and MacOS for the past five years so that certainly isn't the case these days. I actually develop C# applications on Mac and run them in production on Ubuntu, no Windows involved in the toolchain anymore.
I have a Windows .exe program that requires .NET and I have yet to figure out how to run this in Wine. I tried installing various versions of .NET in Wine but it was a nightmare.
I also tried installing Mono the .NET for Linux? Mono didn't help getting the .exe running.
I am only locked into Windows by this one .exe that I need to run.
Is .NET relevant even on Windows these days? With MSFT's Linux embrace ever tightening, why would anyone without sunk costs invest in .NET when every alternative seems to be better (or quickly getting there)?
As a .NET veteran, if anything .NET is 10x more relevant now that it's cross-platform... you don't need Windows to develop, you can containerise your apps and run it anywhere, it's significantly better now than in the old days (anyone remember debugging GAC issues?)
C# is still one of the best languages I've used, which is the reason why I've kept at it for so long - e.g. it got async/await semantics in like 2012 (just after F# did). I'm about to switch jobs to a company that uses Typescript/Node after years in .NET and I feel like I'm going to miss quite a lot of the development experience. I'm not sure which alternatives are necessarily better but again I haven't spent a significant amount of time with, for example, Golang or Scala. Swift was kind of equivalent but (AFAIK) missing some features and the vastness of the nuget package ecosystem.
Yes it is, even with all the COM love, there were only three C++ talks on the VS2022 release party, .NET stole the show for everything else.
Most desktop applications targeted at Windows are written in .NET and C++ only comes into the picture via COM/DLLs, hardly anyone writes pure Windows applications in straight C or C++, unless we are talking about games.
Running on Linux doesn't tell us much in itself. Can it compete with long-established alternatives on that platform? Does it have adoption? I think the answer to both these questions is a strong no.
At Namely our payroll and benefits systems run on .NET in Linux containers on k8s talking to Postgres, SQL Server, Redis, and Kafka, and we integrate between these services and others via gRPC. Some of these services’ endpoints are exposed via GraphQL using a Node service running the Apollo server and some are called directly via gRPC by Go, Ruby, and Python services. All of this is on Linux in k8s. The .NET services integrate with the same ELK stack and things like Jaeger. It has adoption, it is competitive, and it integrates really nicely.
Check the techempower benchmarks mentioned all over any .NET performance article. They beat on Linux nearly any other tech stack in the performance game. NFRs like debugging, monitoring, etc are all there.
From my personal perspective, roughly 75% of all applicable (non-Windows-UI) .NET greenfield project go straight on Linux. The brownfield/maintenance situation is surely different. Companies are not married to the Windows stack when you get the alternative for free. And R&Ds just follow that.
Those benchmarks are a joke. Did you look at the source code? Nobody is writing real applications like that. Most of them do a single select and push the results out to the client.
(FWIW, I spend about 40% of my working time dealing with dotnet).
I had to raise two kids so I couldn't find time to master both the Linux and Windows ecosystems so I chose the one they use at work. These days Microsoft has made me think more carefully about choosing platform and with their new proposals, Win11 and Edge The E-commerce Browser, I'm ready to jump onto Linux but how would I program that environment, when I have almost no experience in C? For me, the answer is .net.
Eh? you don't have to know C to develop for Linux. Most of us don't write C at all. I've written maybe a few thousand lines in all my life, the vast majority of them for ESP32.
I think you misunderstood the parent commenter. He said that he hasn't programmed much in C, but that doesn't mean he hasn't done much programming altogether. He's wondering why you ever felt that programming in Linux meant you had to use C. Many of the Linux APIs are written in C but there are wrappers for countless other languages, no?
Many Linux people like to use Python, for instance. Unfortunately, it's a slow language for all sorts of incidental but hard-to-fix reasons. This has led people to look for alternatives. I think Go is increasingly used as an alternative, but it lacks features like operator overloading which are a necessity in certain areas. There's JVM languages like Java, Kotlin and Scala as well. Nim is up-and-coming. Some people use functional languages like Haskell and Common Lisp. C++ is widely used but not much loved.
Thx, yes I might have. I know enough about all of those languages that you mentioned to realize, as a c-sharper I have it good. Java is a bit too weird for me. Python a bit too slow. C++ too hard, Haskel too functional ;)
I realize of course that not all Linux users are C programmers.
Given the opportunity to transfer my C# knowledge over to Linux, I'll take it.
Yes, I run it in production on Linux and develop on Mac and the experience is great. You would be correct making this statement 5 years ago before they re-wrote everything to be cross platform (that is what .NET Core is vs the old .NET)
Since we are on the topic of Linux so something exciting was merged for .Net6 that wasn't talked about; the new file interface with support for symbolic links. That sounds a bit absurd but it's been a long standing issue with challenges you wouldn't expect..
If you're doing Windows development, you'd be foolish not to use dotnet. It's pretty good off Windows now, with all the massive efforts they've put in making it run on Linux better than Mono did the past years.
I would say that you see it from different perspective.
It is Windows/Linux that are becoming irrelevant.
Future is about browser applications/mobile applications and cloud workloads.
Yes of course there are uses for desktop computing but those are specialists.
While general public will be using phones and tablets not even owning a laptop. On the phone/tablet people don't even care what is the OS.
I already know people who don't have computers at home, only tablets/phones/gaming consoles. Normal people want to play games, message each other, no one cares about OS an Microsoft knows that.
Better how? There are people convinced that C is the best because it's the fastest, and if that the only thing thst matters to them, they aren't wrong.
But not even that is true. C is not any closer to metal than C++, or bunch of other languages that compile to native code are, hell, it can’t even do proper threading natively, nor SIMD, which is just a bunch of compiler specific pragmas.
That’s exactly the issue I’m facing now. Have a team of .NET devs who feel like the walls are closing in because the corporate strategy is clearly elsewhere. My goal for now is to focus on moving off of Windows because that’s actually where all of our pain comes from.
I'm leaving my current workplace because we are making the switch the other way around. Having spent 3 years retooling everything for Docker and containers we just got a new boss that's slowly moving more in the Windows direction. Sometimes it's also just a case of being too entrenched. And then there are all the stakeholders that don't understand the tech but have way more say in what gets chosen than us techies.
- Use BenchmarkDotNet[0] for general measurements and Visual Studio profiler tools for detailed inspection. They help a lot.
- Memory allocations matter. Using capturing lambdas, LINQ, even foreach on interfaces introduce allocations and slows down the application. You can use ClrHeapAllocationAnalyzer[1] to find these hidden allocations.
- Using abstractions with interfaces and casting back to concrete types cause some overhead, though PGO will probably eliminate most of these.
- Use LINQ cautiously as its variants are mostly slower than explicit coding. E.g. .Any() vs .Count == 0
- Checking Logger.IsEnabled() before calling Logger.Debug() etc. helps a lot. You can even automate this with Fody [2], but it breaks Edit&Continue and possibly .NET Hot Reload too, so it may hinder your productivity.
[0] https://github.com/dotnet/BenchmarkDotNet
[1] https://github.com/microsoft/RoslynClrHeapAllocationAnalyzer
[2] https://github.com/jorisdebock/LoggerIsEnabled.Fody