.NET 6 vs .NET 5 speedup

notanaverageman · on Nov 21, 2021

Note that most of the performance improvements come from PGO, which is enabled with following environment variables. PGO is not enabled in .NET 6 by default, but will be in .NET 7 IIRC.

  set DOTNET_ReadyToRun=0 
  set DOTNET_TieredPGO=1 
  set DOTNET_TC_QuickJitForLoops=1

Here are my own benchmarks from a CPU intensive application without any IO and already optimized for allocations. Application runs a task graph either serially or in parallel.

  .NET 5
  --------------------------
  |      Method |     Mean |
  |------------ |---------:|
  | RunParallel | 473.4 us |
  |         Run | 513.5 us |

  .NET 6
  --------------------------
  |      Method |     Mean |
  |------------ |---------:|
  | RunParallel | 452.5 us |
  |         Run | 499.8 us |

  .NET 6 PGO
  --------------------------
  |      Method |     Mean |
  |------------ |---------:|
  | RunParallel | 381.8 us |
  |         Run | 412.2 us |

  .NET 5 - .NET 6     -> ~5%
  .NET 5 - .NET 6 PGO -> ~20%

Here is what I learned from micro-optimizing a .NET application:

- Use BenchmarkDotNet[0] for general measurements and Visual Studio profiler tools for detailed inspection. They help a lot.

- Memory allocations matter. Using capturing lambdas, LINQ, even foreach on interfaces introduce allocations and slows down the application. You can use ClrHeapAllocationAnalyzer[1] to find these hidden allocations.

- Using abstractions with interfaces and casting back to concrete types cause some overhead, though PGO will probably eliminate most of these.

- Use LINQ cautiously as its variants are mostly slower than explicit coding. E.g. .Any() vs .Count == 0

- Checking Logger.IsEnabled() before calling Logger.Debug() etc. helps a lot. You can even automate this with Fody [2], but it breaks Edit&Continue and possibly .NET Hot Reload too, so it may hinder your productivity.

[0] https://github.com/dotnet/BenchmarkDotNet

[1] https://github.com/microsoft/RoslynClrHeapAllocationAnalyzer

[2] https://github.com/jorisdebock/LoggerIsEnabled.Fody

luan42 · on Nov 21, 2021

> - Use LINQ cautiously as its variants are mostly slower than explicit coding. E.g. .Any() vs .Count == 0

Is this really true for the example? To me it seems that the implementation for .Any actually uses .Count when available, see https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...

notanaverageman · on Nov 22, 2021

Any method does an extra null check and a cast to ICollection<T> which incur unnecessary performance degradation. Of course this is in micro optimization scale. If you do not call Any() on a hot path it does not matter which one you use.

foepys · on Nov 21, 2021

> - Use LINQ cautiously as its variants are mostly slower than explicit coding. E.g. .Any() vs .Count == 0

When using LINQ also be aware that .First(predicate) is significantly slower than .Where(predicate).First() when called on List<T> and T[]. This is true for essentially all methods like Last, Single, Count etc. Don't trust Visual Studio when it's telling you to "optimize" this.

But if you want the last bit of performance, you shouldn't use LINQ anyways.

joe_guy · on Nov 21, 2021

Do you know why that is? That's very interesting.

sbelskie · on Nov 22, 2021

Not sure if it is still the case, but it used to be that First did a fairly naive foreach over the IEnumerable while Where has several collection specific type checks that allow it to use MoveNext and maybe other more efficient ways to traverse the collection.

foepys · on Nov 22, 2021

LINQ is parsing a tree of System.Linq.Expression here and the cases of First(pred) etc. are just not optimized because of the added complexity with little benefit. It only recently became a problem when Visual Studio got a new built-in analyzer that tells people to "optimize" this.

sobani · on Nov 22, 2021

If you're using LINQ on a list, it will not use Expressions, but a plain Func. So nothing will get parsed either way.

The difference, as @sbelskie already mentioned, is that Where has an optimization for List, while First only uses the naive enumerator.

alexyakunin · on Nov 22, 2021

As for BenchmarkDotNet, I totally agree with you in general - it's the best option available for micro-benchmarks. But if you want to run a benchmark involving a fairly complex interaction, multithreading, etc. (caching benchmark that I used is of this kind - it runs on client + server process, uses SQL Server hosted in Docker, etc.), it's rarely the best fit.

On a positive side, I can assure you I know how to run such benchmarks properly - i.e. things like warmup, explicit GC before / after are just some of the aspects taken into account. If you're curious about some other benchmarks I ran in past, check out https://itnext.io/geting-4x-speedup-with-net-core-3-0-simd-i... and https://github.com/alexyakunin/GCBurn

mattmanser · on Nov 21, 2021

I still just use System.Diagnostics for debug statements, any left overs are no problems in production code, they don't even get compiled.

I'm interested, what's the advantage for you to using ILogger instead?

notanaverageman · on Nov 22, 2021

I use it for structured logging, which makes filtering and searching very convenient. E.g. I can filter by an object’s id and a property to see which tasks change the property of that specific object and in what order. Serilog[0] and Seq[1] are the best tools for this in my opinion.

[0] https://github.com/serilog/serilog

[1] https://datalust.co/seq

whoisthemachine · on Nov 22, 2021

Can log to different outputs, such as a remote sink or a file.

alexyakunin · on Nov 22, 2021

Hi there, the author of the original post is here. PGO is disabled in .NET 6 by default mainly because of trade-offs associated w/ the startup time - and IMO it's totally reasonable assuming .NET 6 brings decent speed benefits even w/o PGO.

I turned it on mostly to show what you can expect from a service that runs for a while (more than a few minutes?) in a typical server-side scenario after migration to .NET 6 - IMO it's totally reasonable to turn PGO on for nearly any service of this kind.

vips7L · on Nov 21, 2021

FYI your chart is very unreadable on mobile.

notanaverageman · on Nov 21, 2021

Thanks, I removed unnecessary parts. It should be better now.

LilBytes · on Nov 21, 2021

A lot better now, thanks.

mjburgess · on Nov 21, 2021

In the same vein as what Linus is trying to do now at LTT, it would be useful if there was a "programming journalism" lab which builds "test benches" to create canonical benchmarks and verify the claims made on tech blogs (etc.).

Eg., highly standardised docker builds, on highly standardised hardware, running popular tasks using popular libraries for each "programming tech" (eg., website, stat modelling, event system, ...).

GuB-42 · on Nov 21, 2021

It is hard to do relevant tests of which language is the fastest.

Really, writing fast code is mostly down to the programmer. For example C is widely recognized as the fastest non-assembly language simply because it leaves a lot to the programmer, C won't magically make your terrible code fast, unless you are using time-to-segfault as a metric. Assembly is the fastest if you know what you are doing, very few know what they are doing.

So, what kind of code are you going to use for your benchmark? Highly optimized code written by experts spending way too much time, the "most idiomatic" code, code written by an average skilled programmer picked at random, code extracted from a big open source project? This can drastically change the ranking, so which one is the most relevant? If you go with the "most idiomatic" for instance, you miss out on the idea that parts can be optimized if needed, and that in real life, programmers aren't perfect and can write suboptimal code by mistake.

There is also a cultural aspect to languages that may not be caught in benchmarks. For example, C programmers tend to have a culture of performance, they tend to know about their hardware, will try to save memory, make data structure efficient, etc... Python programmers, not so much, instead they tend to value readability and development time.

You can't test languages like you test CPUs for instance. With CPUs, you just run the same code on them and time them. You can't do that for obvious reasons: your C compiler won't accept your Python code, it is necessarily an apples to oranges comparison.

kaba0 · on Nov 21, 2021

> Assembly is the fastest if you know what you are doing, very few know what they are doing.

Just a nitpick but for any reasonably sized code, no. While some people can indeed do impressive optimizations on small segments of assembly, they are humans and they will fail to do trivial optimizations that are reliably done by compilers.

DeathArrow · on Nov 21, 2021

In general the speed is lost because we make the CPU consume cycles to do things that are not necessary in order to solve a particular problem:

-unneeded allocations

-boxing/unboxing

-garbage collection

-interpreting

kaba0 · on Nov 22, 2021

If garbage collection happens on a separate thread, and makes allocation much faster, is it really “slower”? You have to call malloc which will try to defragment your memory, and then later you will have to call free. Those block the calling thread, if anything, for certain problems they are slower than GC.

ygra · on Nov 22, 2021

Garbage collection itself isn't really slow, per se. But allocating a lot of short-lived objects on the heap still means that a lot of objects have to be reclaimed fairly frequently. And at least .NET's garbage collector isn't able to do all that without pauses. And those add up.

But even if the GC is really concurrent: If there's a way of not doing that work it's still better, IMHO.

One interesting profile I've seen at work recently spent about 30 % creating objects, and another 35 % in garbage collection (of pretty much the same objects that have been created all the time). So if there was a way of not allocating that much, or not doing it on the heap, the algorithm could be about twice as fast.

kaba0 · on Nov 24, 2021

But comparing it to not doing that work is somewhat dishonest — for that you use compare a malloc for each call and a destructor at the end of the scope — and surprisingly, malloc will often do much worse than a good GC implementation, with trying to defragment a bit, etc.

Also, Java can often accumulate big heaps because it only runs the GC when it absolutely must — as you mentioned, it would be unnecessarily work otherwise. It might be interesting to mention that OpenJDK is the “greenest” out of the managed languages due to that.

withinboredom · on Nov 22, 2021

This is so true. The fastest implementation I’ve ever seen of a priority queue in PHP looks nothing like a priority queue by taking advantage of PHP’s sparse hash maps (aka, arrays). If you use any “standard” implementation it will be slower. I imagine this is true of most optimized algorithms in most other languages.

okl · on Nov 21, 2021

Like pts that https://www.phoronix.com/ uses?

gostsamo · on Nov 21, 2021

Usually the setups and architectural choices are so different that it would make more sense for everyone to get a good stat on which functions they are calling and how much and to make a prediction on the individual stats of each function. This does not take into account caching and multithreaded scenarios, but the map can never be the territory.

littlecranky67 · on Nov 21, 2021

There are these: https://www.techempower.com/benchmarks/

cies · on Nov 21, 2021

These are some microbenchmarks at best. Still nice to know who wins, but not representative of any real work scenario.

Hnus · on Nov 21, 2021

What would be representation of some real work scenarios? I think their test suite covers wide spectrum of operations enough so to be able to draw conclusions from them.

https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Proj...

fenring · on Nov 21, 2021

Unfortunently, after looking at the .net core implementation of this benchmarks, I wouldn't trust it at all. The code is just overengineerd to perform best at benchmark - everything hardcoded, custom routing to cover 2-3 routes with minimal overhead etc. It has nothing in common with real world code.

the_duke · on Nov 21, 2021

The Techempower entries are heavily gaming the system.

They often strip out framework functionality and hyper-optimise for the specific benchmark, including things like pre-allocating the exact amount of memory needed to serve the request, not doing route matching et all, etc.

They are basically an exercise in "how clever can we be to win the benchmark" rather than a realistic portrait of real world performance.

fabian2k · on Nov 21, 2021

at least for .NET the versions that strip out framework functionality are marked separately, though this part is not that easy to understand if you don't know about it. There are several .NET entries from very low-level without MVC and without ORM up to the full stack.

But still, these benchmarks have their uses but there are a lot of caveats you need to consider when looking at the results.

Philip-J-Fry · on Nov 21, 2021

I think people should just bench their own code and measure their own performance increases.

buryat · on Nov 21, 2021

why someone would do it fo free? with journalism you have advertiser that sponsor investigations albeit undirectly

Dayshine · on Nov 21, 2021

LTT is a perfect example of how you can do it without any particularly plausible conflicting sponsorships.

Do you really believe that Merch sales and sponsorship by Squarespace, VPN providers, sys-admin software, etc influence their coverage of CPUs?

sp33der89 · on Nov 21, 2021

I've seen the posts about all the speedups each new version of .NET gets and I'm just wondering, was .NET just alright in performance before all this? Is that the reason they can get all these speedups? :P

I'd be interested in some JVM vs .NET 6 benchmarks too, which platform to chose when.

EDIT: I know about the benchmarks and I also feel like sometimes these benchmarks are really optimized in a non-idiomatic way. I would love to know how performance idiomatic Java/.NET code is and if one is to start a new project today why one would choose the JVM over .NET or when someone would chose .NET over JVM.

jb_s · on Nov 21, 2021

C# was never really slow in the same way as python, etc. And anyway 99% of the time you're gonna be slow because the sack of meat writing the program screwed up some aspect of the system design or the code. Unless you're using something really shit-tier for perf.

The gap closed significantly with .NET core, which is why everyone was quite surprised when .NET 5 (the next iteration) had a fairly significant speedup in many scenarios.

For reference, Stack Overflow was running on .NET MVC on like 2 servers pretty recently (with some auxiliary infrastructure for CDN and search) and using MS SQL.. I think it might still be running on this setup but not 100%. Honestly I have no idea how they do it on a .NET monolith but there you go.

deanward81 · on Nov 21, 2021

Stack Overflow runs on 9 web servers with (iirc) 48 logical cores (2 x 12-core Xeons) and 64GB RAM. Those servers are shared by a few apps (Talent/Job, Ads, Chat, Stack Exchange/Overflow itself) but the main app uses, on average, ~5% CPU. Those machines handle roughly 5000 requests/sec and were running .NET 5 as of September 2021 (when I moved on). That’s backed by 2 v. large SQL clusters (each consisting of a primary read/write, a secondary read-only in the primary DC and a secondary in the failover DC). Most traffic to a question page directly hits SQL - cache hit ratio tends to be low so caching in Redis for those hits tends to be not useful. As somebody mentioned below, being just a single network hop away yields really low latency (~0.016ms in this case) - that certainly helps being able to scale on little hardware - typically only 10 - 20 concurrent requests would be running in any instance at any one time because the overall request end-to-end would take < 10ms to run.

Back in full framework days we had to do a fair bit of optimisation to get great performance out of .NET, but as of .NET Core 3.1 the framework _just gets out the way_ - most memory dumps and profiling subsequent to that clearly pinpoint problem areas in your own app rather than being muddied by framework shennanigans.

Source: I used to work on the Platform Engineering team at Stack Overflow :)

sp33der89 · on Nov 21, 2021

That's some great info, thank you!

jorams · on Nov 21, 2021

> cache hit ratio tends to be low

That's surprising to read. Is that because of the sheer volume of question pages? I don't think I've ever been on an SO page that couldn't have been served straight from cache.

tomc1985 · on Nov 21, 2021

Is it? Most people come to SO from Googling their random tech problems/questions. Not sure how much value there is in caching my random Rails questions, etc

setr · on Nov 21, 2021

I would expect SO usage to follow a distribution like Zipfs — most visits hit a small subset of common Q/A, and there’s a ridiculously long tail of random questions getting a few visits where caching would do next to nothing. I’m fairly positive I’ve seen some post showing this was true for atleast answer-point distributions.

Though I guess it’s possible for a power distribution for page-likely-to-be-hit to still be useless for caching, because I think you could still get that distribution if 99% of hits are on nearly-unique pages; with a long enough tail, you’d still have only relatively few pages worth bothering to cache, but by far most visits are in the tail

orra · on Nov 21, 2021

> Stack Overflow was running on .NET MVC on like 2 servers pretty recently (with some auxiliary infrastructure for CDN and search) and using MS SQL.

They are absolute beasts of machines. But yes, it’s incredible what you can do with just a few colocated physical servers.

It suggests to me lots of dev shops pay the IaaS or PaaS cloud tax, long before they would have hit any scaleabilty walls with a non-cloud setup.

danachow · on Nov 21, 2021

A poster above claimed the servers were “48 logical cores (2 x 12-core Xeons) and 64GB RAM”, which really isn’t what I would consider such a “beast” of a machine when the RAM is in laptop territory, and a modest number of cores for a server.

smackeyacky · on Nov 21, 2021

Nowdays you can purchase that machine on the second hand market for something like $200-$300.

They are measurably faster than even contemporary laptops though plus you often get ECC ram and raid disk setups and the good old Xeon didn't used to ramp up and down in speed, it just ran fast all the time. I'd still characterise that as a beast, especially on $/performance terms (although the power consumption is a worry).

toast0 · on Nov 21, 2021

Xeon covers a pretty wide variety of chips. Of course, the Pentium II Xeons didn't have speedstep either. 12-core tells us either fairly high end but older or kind of medium-low but newer. Dual socket tells us not the really low end Xeon that shares a socket (and probably a lot more) with the high end enthusiast desktop chips.

kaba0 · on Nov 22, 2021

Don’t we have server machines with multi-TBs of RAM nowadays?

orra · on Nov 24, 2021

Late reply, but you make a good point. Perhaps I am thinking five years ago, and back then the specs seemed more impressive.

Also, I hadn't twigged that “logical” cores are distinct from physical cores, which made the core count seem more impressive.

SideburnsOfDoom · on Nov 21, 2021

> And anyway 99% of the time you're gonna be slow because the sack of meat writing the program screwed up some aspect of the system design or the code. Unless

Or in my experience, most of the time a service's latency is dominated by out-of-process calls, e.g. the time taken to talk to other services over http, or to retrieve data from a data store. Speeding up the runtime is welcome, but even a massive 40% speedup of something that constitutes 10% of your total latency is ... closer to a 4% reduction in latency. Design matters more.

emteycz · on Nov 21, 2021

If your program is slow because it's waiting for external service response, you're doing programming wrong. Your program should do other work in the meantime. I guess you're already doing that, but if so, doesn't that invalidate your reply here?

SideburnsOfDoom · on Nov 21, 2021

> you're doing programming wrong. Your program should do other work in the meantime.

It's not always true that there is other work to do in the meantime. In fact in my experience, it seldom is. "you're doing programming wrong" is a very strong statement, and not one that I take seriously in this context.

Typically you "await" the external service response, so that it is not using a thread to do that, and "other work" in the form of starting to deal with other requests can happen in the meantime, thereby increasing service throughput.

But that won't speed up a given request - you can wait for an external service more efficiently, but you can't wait faster.

Services that do not depend on any other http services or any data store do happen, but they are rare in my experience (calculation engines, I suppose). So for almost every service, when thinking about response time, you have to, first and foremost, think about the latency of the data stores or upstream services.

emteycz · on Nov 21, 2021

That is all true, but in the context of .NET performance, it's not relevant.

SideburnsOfDoom · on Nov 22, 2021

What then is, in your opinion, most relevant?

emteycz · on Nov 22, 2021

Time taken by processing done by the application itself. Faster processing is good even if your backend is mostly waiting for other services - now it can wait for more services at the same time and deal with their output faster, allowing your app to use less resources or handle more traffic. That is especially pronounced on scale - if you're spending X million for computing time a month, reducing it by few percent is very interesting to you.

SideburnsOfDoom · on Nov 22, 2021

Both can be relevant. In the real world, designing how to deal with dependencies and data stores is clearly relevant, as it can be the largest large part of the time taken to respond to a given request. It would be a design error to ignore it.

emteycz · on Nov 23, 2021

Yes, but external services are irrelevant to performance of . NET.

SideburnsOfDoom · on Nov 23, 2021

If your job is to make performant services, in .NET (or in any similar language) external services are 100% relevant to your job.

If You want to be pedantic - and you most certainly do - then only .NET performance itself is relevant to performance of . NET itself, that's a truism, as defined.

But this narrow focus is not useful - if you want to do the job, then you have to think a bit more widely, and understand what the real problem is.

emteycz · on Nov 23, 2021

Sorry but this is not pedantic at all. You're saying performance speedup is irrelevant because most time is spent waiting for external services... But that is simply not true if you're beyond just a few servers. Output of external services needs to be processed and having faster software means less resource is spent per request, thus more requests can be served for the same resources. That is very important.

SideburnsOfDoom · on Nov 25, 2021

> You're saying performance speedup is irrelevant

I am not saying that; you are oversimplifying into a straw man. great-grand-parent comment is where I literally put a non-zero number to how relevant language perf improvement was: https://news.ycombinator.com/item?id=29295950 and later on "Both can be relevant"; which all contradict your characterisation.

The only person who said above "it's not relevant" is you, and you also said "external services are irrelevant" - You're projecting the "it's irrelevant" statement onto me here.

But the design considerations of how and when to use external services very definitely are relevant to service latency, contrary to what you say, for reasons given multiple times above. Your current odd comments are not fact-based or interesting, so I don't think that you have anything more to add to this discussion at all.

DeathArrow · on Nov 21, 2021

>If your program is slow because it's waiting for external service response, you're doing programming wrong. Your program should do other work in the meantime.

What work? Mine bitcoins while you wait the result for an API call?

emteycz · on Nov 21, 2021

Usually processing other requests

atraac · on Nov 21, 2021

I think any sane person here assumes any IO, especially in .NET is already async and allows other requests to use that time… Most web apps are still IO constrained.

emteycz · on Nov 22, 2021

Sure, but how is waiting for external requests relevant in any way to .NET performance then?

SideburnsOfDoom · on Nov 22, 2021

If you're asking "how is waiting for external requests relevant to my service's response time" the answer is "because it's usually the largest part". I really don't know how to explain it more simply than that.

If you need to improve this, and that code is .NET, then the solution is a different design, also in .NET Code.

emteycz · on Nov 23, 2021

I am asking how is waiting for external services relevant to .NET processing time.

SideburnsOfDoom · on Nov 25, 2021

I refer you to my answer above. https://news.ycombinator.com/item?id=29306220

foepys · on Nov 21, 2021

> Honestly I have no idea how they do it on a .NET monolith but there you go.

Low latency in a single rack can work wonders for performance. All those cloud services talk to each other over miles of cables and if you can slash latency to submillisecond regions, you get less wait time and free resources quicker. If you don't distribute your state across multiple Microservices, you can also save quite a bit of overhead.

Plus hardware is just wicked fast nowadays and SO has a model of millions of reads for a single write.

DylanSp · on Nov 21, 2021

Looks like SO's architecture is still pretty simple, yeah. https://stackexchange.com/performance

gameswithgo · on Nov 21, 2021

.NET has been faster than Java on most of the benchmarkgame benchmarks for a while, since .net core 3 or so.

More specifically though the JVM has tended to be better about optimizing naive code than .net while .net has tended to offer more tools to do your own optimizing (unsafe, simd, value types, etc). So it would be interesting to see if the performance of naive code has improved relative to Java lately

kasperni · on Nov 21, 2021

> .NET has been faster than Java on most of the benchmarkgame benchmarks for a while, since .net core 3 or so.

And which benchmarks games are those? If I go to to the Techempower benchmark and select only C# + Java. Java comes on top in every individual category of all the benchmarks.

I'm not claiming that Java is faster than .NET. Just that I don't believe one platform is significantly faster than the other.

[1] https://www.techempower.com/benchmarks/#section=data-r18&hw=...

grumpyprole · on Nov 21, 2021

Such programs are often specially and painstakingly constructed to avoid all the commonly used language features that are inefficient. For example, in Java, user-defined data types are heap allocated and generic code boxes everything, even primitive types (an ArrayList of ints becomes unfortuately an array of pointers).

Are these programs benchmarking typical idiomatic Java, or just some subset of the language?

CraigJPerry · on Nov 21, 2021

>> Are these programs benchmarking typical idiomatic Java

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

c# regex redux - 1.42 seconds

java regex redux - 5.31 seconds

Ok... but looking at the code:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

    import java.io.*;
    import java.util.*;
    import java.util.concurrent.CompletableFuture;
    import java.util.Map.Entry;
    import java.util.function.*;
    import java.util.regex.*;
    import static java.util.stream.Collectors.*;
    ...

It's only using vanilla Java features.

c# ?

    ...
    using System.Runtime.InteropServices;
    ...

Interesting, why does it need that?

        [DllImport("pcre2-8", EntryPoint = "pcre2_compile_8", CharSet = CharSet.Ansi)]
        extern static IntPtr PcreCompile(string pattern, long length, uint options,
            out int errorcode, out long erroroffset, IntPtr ccontext);

        [DllImport("pcre2-8", EntryPoint = "pcre2_jit_compile_8", CharSet = CharSet.Ansi)]
        extern static int PcreJitCompile(IntPtr code, uint options);

        [DllImport("pcre2-8", EntryPoint = "pcre2_jit_match_8", CharSet = CharSet.Ansi)]
        extern unsafe static int PcreJitMatch(IntPtr code, byte* subject,
            long length, long startoffset, int options, IntPtr match_data, IntPtr mcontext);

        [DllImport("pcre2-8", EntryPoint = "pcre2_match_data_create_8", CharSet = CharSet.Ansi)]
        extern unsafe static IntPtr PcreMatchDataCreate(uint ovecsize, IntPtr mcontext);

        [DllImport("pcre2-8", EntryPoint = "pcre2_get_error_message_8", CharSet = CharSet.Ansi)]
        extern unsafe static int PcreGetErrorMessage(int errorcode, StringBuilder buffer, long bufflen);

        [DllImport("pcre2-8", EntryPoint = "pcre2_get_ovector_pointer_8", CharSet = CharSet.Ansi)]
        extern unsafe static IntPtr PcreGetOvectorPointer(IntPtr match_data);

        [DllImport("pcre2-8", EntryPoint = "pcre2_substitute_8", CharSet = CharSet.Ansi)]
        extern unsafe static int PcreSubstitute(IntPtr code, byte* subject,
            long length, long startoffset, int options, IntPtr match_data, IntPtr mcontext,
            byte* replacement, long rlength, byte* outputbuffer, out long outlength);

Aha! It's because the c# impl is really just a wrapper round a native C impl of the problem.

In what world is this a useful comparison?

The fastest "real" c# solution is still faster than the java one though:

c# (real) - 3.1 seconds

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

zigzag312 · on Nov 21, 2021

I agree. That's really not usefull comparision. They should create categories for each benchmark, like:

  - very naive code (shortest, most readable & easy to write code)

  - idiomatic code

  - optimized code without other-language-libs wrappers and without SIMD, single threaded

  - optimized code without other-language-libs wrappers and without SIMD, multi-threaded

  - optimized code without other-language-libs wrappers and with SIMD and/or multi-threaded

  - optimized code with other-language-libs wrappers allowed and any other optimization technique

igouy · on Nov 22, 2021

You are free to take the data, create whatever categories you want and publish.

whthf_22 · on Nov 22, 2021

How would that undo the unfair comparison done in the benchmark game?

igouy · on Nov 23, 2021

1. Not “unfair”.

2. We wouldn’t look at the benchmarks game if we thought an alternative presentation was more useful.

igouy · on Nov 22, 2021

> In what world is this a useful comparison?

In this world where we also compare to C++ programs.

In this world where — as you acknowledge — we also compare a C# regex-redux program that does not use a third party library.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

> Interesting, why does it need that?

2 out of 10 tasks regex-redux and pidigits accept third-party libraries, the other 8 out of 10 tasks do not.

grumpyprole · on Nov 21, 2021

I agree it's not a useful comparison. That's why I don't give much weight to statements such as "Java comes on top in every individual category of all the benchmarks".

piaste · on Nov 22, 2021

> In what world is this a useful comparison?

If you're also separately benchmarking the same C library running on its own, then it's quite interesting to benchmark a .NET wrapper around the exact same library, as it allows you to estimate the overhead from the .NET runtime itself as separated from user code (ideally you'd try this with a bunch of different C libraries).

Of course, the program should be very very clearly labelled accordingly. Since it was just labeled as "csharpcore", then I am inclined to think the submitter was treating the benchmark as a competition.

igouy · on Nov 22, 2021

Please show the objective rules that could be used to identify "typical idiomatic Java" and "typical idiomatic C#".

Please show the objective rules to direct how comparison should be done when one languages "typical idiomatic" is not the same as some other languages "typical idiomatic" — to avoid you can write Java in any language.

naasking · on Nov 22, 2021

There's some low hanging fruit, like not permitting specialized collections in Java for a set of integers. Because of type erasure these are all heap allocated in Java but not in .NET.

I think there's value in benchmarks showing both the fastest you can go if you need to (specializing everything to eke out max performance), and benchmarks showing how fast you will typically go if optimizing for productivity.

igouy · on Nov 22, 2021

> … how fast you will typically go…

Why would we think that would be similar for both you and `grumpyprole`.

naasking · on Nov 22, 2021

You could probably constrain it sufficiently for some set of problems. Maybe something like: Solve problem X using the standard library associative map by elaborating the following pseudo code.

igouy · on Nov 22, 2021

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

gameswithgo · on Nov 21, 2021

Yes any benchmark will be invalid for some people, such is life. If you want to claim or know something specific you will have to so your own painstaking investigation or find someone who has done that work.

benchmarkgame does not attempt to compare idiomatic solutions for languages, it is closer to a “what is the best you can do” benchmark

grumpyprole · on Nov 21, 2021

> It is closer to a “what is the best you can do” benchmark

As I suspected. So of course this tells us very little about how fast idiomatic code is relative to other languages. "The best I can do" is to invoke hand optimised assembly language, but rarely is that the right choice.

A much more useful test would involve benchmarking some similar real world apps that solve the same problem.

igouy · on Nov 22, 2021

Please show the objective rules that could be used to identify "idiomatic Java" and "idiomatic C#".

Please show the objective rules to direct how comparison should be done when one languages "idiomatic" is not the same as some other languages "idiomatic" — to avoid you can write Java in any language.

vips7L · on Nov 21, 2021

Generic boxing should be fixed “soon” if they ever release Valhalla.

symbol-mason · on Nov 21, 2021

You're looking at pretty old results, round 18 was in 2019. I also don't think that boutique web frameworks say much about the strength of the underlying language or runtime (e.g. look at just.js).

What in the Java world is in the same maturity tier as ASP.NET is open to opinion, but at least local Java devs seem to consider Spring or Micronaut as sane defaults, and of course modern ASP.NET runs circles around those.

https://www.techempower.com/benchmarks/#section=test&runid=9...

DonHopkins · on Nov 21, 2021

Java's performance hasn't really mattered since Oracle took it over. There are things MUCH worse than poor performance, and being owned by Oracle is one of them.

kaba0 · on Nov 22, 2021

Will this bullshit ever die?

OpenJDK has the same goddamn license as the linux kernel. It is (yes, the open source codebase) developed by Oracle 98+% alone, and other vendors are just forks of this code base (including oracle jdk, which contains only trivial changes AND a paid support option, for those that need it)

You can hate Oracle as much as you want but their Java division is a surprisingly adapt and capable team, doing very great job at stewarding the language.

ptx · on Nov 21, 2021

Ownership by Microsoft isn't great either, unless you enjoy jumping through hoops to disable their telemetry[1].

[1] https://github.com/dotnet/sdk/issues/6145

pjmlp · on Nov 22, 2021

The alternative being no one would have bought Sun and Java would have died as Java 6.

I understand Sun ex-employees have an axe to grid with Oracle, yet no one else cared for Sun assets.

gameswithgo · on Nov 21, 2021

This: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

techempower is isolated to web framework testing as far as I know

jayd16 · on Nov 21, 2021

IMO that link makes .NET look very good. Aspcore, the straight off the shelf, obvious choice, is the best performing .net server? It beats Jetty and Spring but loses to a long tail of less popular frameworks

akra · on Nov 23, 2021

That seems a bit misleading of a comparison IMO and only one case (JSON serialisation) when I look at their data. You are also linking to a round from two years ago which is out of date. It's also showing a lot of frameworks that are not that mature and not well used in the Java camp vs ASP.NET that is widely used, full featured, has a lot of bells and whistles and a lot of plugins available for most technologies and standards. All of which could have negatively influenced performance, even the hooks to allow them to be injected in can do so even if not enabled. The fact that a full featured web framework makes it close to the top (sometimes the top) over several rounds over many of their categories of use cases I can't discount as pretty good.

i.e. Its hard to read benchmarks without context of each framework shown, the compromises they have taken, how usable it actually is for building software vs just a benchmark, what shortcuts are done in the benchmark, how idiomatic is the code, etc.

https://www.techempower.com/benchmarks/#section=data-r20&hw=...

My personal experience having worked on both platforms for several years is that Java is easier to get to an acceptable performance, but the .NET runtime when you have to put the effort in has a higher upper bound of performance. It just has more tools in the CLR to work with than the JVM (e.g. value types, proper generics, spans, and more) so you can express something with a little more mechanical sympathy. Java is left with some decisions from legacy IMO that by default hurt its performance (i.e. lots of default boxing has hurt me before especially with generics). With .NET Core and future versions I think .NET is also taking up Java's default perf area as well. YMMV but if I'm worried about performance being a risk in my project .NET gives me more tools to optimize it IMO should that risk eventuate.

zigzag312 · on Nov 21, 2021

> And which benchmarks games are those?

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Be aware that many implementations on benchmarksgame are much lower-level and using all kind of performance tricks than what you would normally write.

merb · on Nov 21, 2021

well the same for techemporer benchmarks. sorry but some of that stuff is as shady as the benchmarksgame. I really don't get it, why people don't create 100% benchmarks instead of specialized ones.

igouy · on Nov 22, 2021

> … shady…

The benchmarks game puts the source code under bright light, a couple of clicks from the measurements.

That's why people have noticed differences and commented.

igouy · on Nov 21, 2021

[flagged]

whthf_22 · on Nov 22, 2021

>> Other readers may have a broader range of skills

Relax.

When comparing anything, the comparison has to be fair. This is obviously not the case here. It's like Java is a Ford Mustang from the dealer and C# is a Camaro with a F1 engine installed.

igouy · on Nov 23, 2021

It’s like you’re still ignoring the “Camaro from the dealer”.

torginus · on Nov 21, 2021

While I'm not super familiar with the Java world, none of the frameworks that have a significant advantage sound familiar to me - I'm not sure how mature are they, whereas Asp.NET is the solution for writing servers under .NET.

Mikeb85 · on Nov 21, 2021

> benchmarkgame benchmarks

Lol look at the code. N-body for example, the C# is horrific C-in-C# code with a million optimizations (just read the comments lol), the Java code is idiomatic and not optimized at all.

symbol-mason · on Nov 21, 2021

Well the fastest C# entry for n-body looks like a translation of the C/C++ versions. It is a meaningful result that the primitives of the language allow for it to hang in that company. The absence of a Java version using numerics is unfortunate, that'd be a nice addition.

A lot of the coding style seems optimized around copy-pasting the C code, e.g. trying to alias the Vector methods (Vector256.Create) to their instruction name (_mm256_set1_pd). That makes the code non-idiomatic, but it also doesn't really help performance, just makes the porting easier.

The F# example is on the same runtime and a better view of using the numerics directly. As a trade-off of performance, memory, and code complexity it is actually a pretty solid balance, which I wouldn't have expected.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Mikeb85 · on Nov 21, 2021

How is it meaningful?

I can inline C code in Ruby, does that mean Ruby is as fast as C now?

I'd much rather see a comparison of idiomatic code in different languages. When I choose a language to build something in I'm not thinking "How can I write C in this language?"...

symbol-mason · on Nov 22, 2021

Inline C/ASM is literally a different language. System.Runtime.Intrinsics is a library. Just because this one benchmark used the library in a C-like style doesn't mean that's required.

There are many simpler implementations in all languages. I actually like that there are multiple implementations, as this lets us estimate the benefit and complexity of adopting various optimizations. Limiting the benchmark to naive implementations would penalize languages with more broad capabilities for optimization.

I'd personally prefer a benchmark limited to memory-safe implementations, though.

igouy · on Nov 22, 2021

Please show the objective rules that could be used to identify "idiomatic code in different languages".

igouy · on Nov 22, 2021

Yes look at the code!

Look at this C# program —

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Look at this C# program —

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

Mikeb85 · on Nov 22, 2021

And they're slower than Java... Albeit super close. The parent I responded to was referencing how C# is supposedly faster than Java... But idiomatic C# is about the same (slightly slower).

igouy · on Nov 22, 2021

Isn't that writing Java in C#?

Why should C# programs be limited to what can be done in Java?

Mikeb85 · on Nov 22, 2021

You can do the same thing in Java, but apparently no one wrote it yet because it's just a silly micro-benchmark. Either way I'd rather see idiomatic code. Or just write C/C++ directly.

igouy · on Nov 22, 2021

> I'd rather see idiomatic code

As you know simpler C# programs are shown.

sp33der89 · on Nov 21, 2021

Yeah, I would love it if I could just write idiomatic code for the platform and it'd be just fast enough!

native_samples · on Nov 22, 2021

Well .NET are going in the same direction as Java - as the top comment here observes the speedups come mostly from profile guided optimization which has been Java's forte since Java 1.1 or so. And .NET PGO is off by default, still, so it's got a long way to go yet.

kaba0 · on Nov 22, 2021

Just to add, Java now has simd in the form of the incubating Vector API. I found that it is one of the best high level low level SIMD APIs, with automatic fallback to for loops on ineligible hardware, as well as having an option for preferred lane width.

txdv · on Nov 21, 2021

.NET has the ability to handle your memory layout explicitly with structs.

They expanded on that functionality with span and made sure that the common libraries is implemented using this.

If you do textbook OOP development for everything, you will end up with a lot of allocations and what not, which was the case, so they went through the entire base class library and rewrote all often used methods to be faster.

they even posted a bunch of posts with all their improvement tricks: https://devblogs.microsoft.com/dotnet/performance-improvemen...

DeathArrow · on Nov 21, 2021

>If you do textbook OOP development for everything, you will end up with a lot of allocations and what not, which was the case, so they went through the entire base class library and rewrote all often used methods to be faster.

They forgot to tell the others that there is life beyond OOP and GoF design patterns.

mdoms · on Nov 21, 2021

C# is filled to the brim with functional programming features. Much of the base class library is somewhat functional (although obviously not all or even most, given the age of the BCL and stability of the API).

bob1029 · on Nov 21, 2021

> was .NET just alright in performance before all this

For some niche applications (i.e. financial exchanges), .NET 5 [was/is] arguably the fastest way to implement certain ring buffer abstractions because of its interesting blend of performance and safety. There is a variant of the LMAX Disruptor developed for .NET which leverages the value semantics of the C# struct to push things beyond what the Java implementations are capable of [0].

Certainly, with enough resources and manual memory management, you could best the C# implementations using a C/C++/ASM codebase, but this is a tenuous tradeoff with practical risks that must be accounted for.

  [0] https://medium.com/@ocoanet/improving-net-disruptor-performance-part-3-introducing-the-valuedisruptor-5b467730bbe

jiggawatts · on Nov 22, 2021

Mark and sweep garbage collection is optimal for some kinds of multi-threaded algorithms. If further minimised with judicious use of value types, it can be surprisingly difficult to outperform it even with carefully tuned C++ or Rust code.

Semaphor · on Nov 21, 2021

It’s not the same, but there is this well-known framework benchmark [0], it always had the .net frameworks close to the top.

I’m guessing a lot of the speedups come from getting rid of legacy cruft. With .net core/.NET 5/6 they got rid of a lot of things compared to .NET Framework 4.8 and could play with optimizations that simply weren’t doable before. That’s just me guessing, though ;)

[0]: https://www.techempower.com/benchmarks/

rndgermandude · on Nov 21, 2021

It's that in part. Here are some of my additional observations and or guesses.

They invested a lot of time adding language features with compiler and runtime support to avoid e.g. heap allocations/copying, like Span<> and friends, (readonly) ref structs, in/ref/out parameters (ref and out parameters existed before but were used a lot less in the runtime), or ValueTasks to some degree. This in turn enabled a lot of potential for optimizations in the compiler (aside from essentially writing an entirely new bytecode compiler with Roslyn and entirely new JIT with RyuJIT, throwing out the crufty old compilers), in the general runtime, and in the specific runtimes/frameworks e.g. ASP.NET. Those optimizations have to be implemented first however, and more and more get implemented with each new version.

I have a project I maintain that sees an almost 50% speedup from net48 to net5, and another 10-15% speedup from net5 to net6 (based on the time it takes to run the extensive test suite). It isn't even that compute heavy. From profiling it appears that a lot of these speedups are due to internal copies of data being avoided, and a lot of additional fast-paths in the runtime (e.g. fast-paths for byte-arrays or character-arrays as opposed to taking the generic array slow paths).

Another thing of note is that they added a lot of `bool Try*(..., out result)`-style APIs meant to avoid exceptions and the associated handling, and switched a lot of internal code to use these functions. E.g. in the reference source of the net48 runtime I think there are still a lot of instances of

   try {
     var number = int.Parse(value);
   }
   catch {
     // slow path/error path
   }

instead of the new-idiomatic .netcore and later style of

   if (!int.TryParse(value, out var number)) {
     // slow path/error path
   }

try-catch was/is slow-ish, and throwing exceptions is too, aside from it preventing inlining by the JIT a lot of times.

And while #nullable (source annotations for what is nullable or not) and associated annotations such as MaybeNullWhen() had no direct influence on how the compiler could optimize, it probably helped people a lot writing correct code and as a side effect a lot more code became compile-time provable non-nullable which enabled further optimizations e.g. generating code that skips redundant null checks.

chrisoverzero · on Nov 21, 2021

> the new-idiomatic .netcore and later style

`Int32.TryParse` has existed since .NET Framework 2.0, which was released on February 13, 2002: https://docs.microsoft.com/en-us/dotnet/api/system.int32.try...

rndgermandude · on Nov 21, 2021

Right, this one has, a lot of other public or runtime-internal Try* methods have not.

And even tho this particular one has existed for a long time, that doesn't mean it was used consistently in the runtime or in the popular first and third party frameworks.

I'd argue the Try*-style, while artifacts of it were present before already, only really became widely idiomatic with dotnetcore.

cyral · on Nov 21, 2021

Surprising that in 2021 Java still doesn't have this in its standard library

native_samples · on Nov 22, 2021

Well, the Java approach is philosophically somewhat different. C#/.NET is closer to C++ where they are very willing to complicate the language and APIs to make the job of the runtime or compiler easier. Java just philosophically refuses to do that, more or less (perhaps you could argue that's changing a bit now with value types).

So in Java they just made exceptions really fast. There are lots of runtime optimizations around exceptions, for example, if you regularly parse strings that aren't numbers then the resulting exception will automatically stop having its stack trace filled out, which makes throwing drastically cheaper. The JVM can also inline the code and then convert try/catches to ordinary gotos.

CornCobs · on Nov 21, 2021

Honestly I really like C#'s outvar and return success idiom. It's soooo ugly and yet slick at the same time. C does the same thing but I think the inline out var declarations make a huge difference to using them. Of course you miss out on error context an Exception or Result<T, Err> gives you but for many of the Try* functions it really doesn't matter.

rndgermandude · on Nov 21, 2021

Yeah, I became a fan of that too.

In a lot of cases Try* is the outright right approach, too. Like `IDictionary.TryGetValue(key, out var value)` is better than try { var value = IDictionary.GetValue(key); } catch (KeyNotFoundException) {}` and has no race like `if (IDictionary.ContainsKey(key)) IDictionary.GetValue(key);`.

Try* functions are still free to throw in actually exceptional cases, just not on generic not-so-exceptional errors.

If you really need context, there is nothing from stopping you from implementing Try* functions in your own APIs that either have another out param for the error information, return the error information instead of a bool or use a Result<T, Err> kind of type (or a tuple), either.

ygra · on Nov 22, 2021

In the dictionary case, TryGetValue also saves you from having to hash and lookup twice unnecessarily (once for ContainsKey, and once for GetValue).

jabits · on Nov 21, 2021

Thanks. This is an interesting comment…I had not realized coding styles in c# had changed in this way

foepys · on Nov 21, 2021

.NET Core also introduced some new CLR features that are incompatible with the CLR used in .NET Framework. Span<T> for example.

This has happened before with .NET Framework 2, 3, 4 etc. but instead of making a .NET Framework 5 they rather made .NET Core cross-plattform and threw backwards-compatability-at-all-cost out of the window. While all .NET Framework applications (except the ones that do naughty stuff with reflection) that were compiled for .NET Framework 4.5 behave the same way on .NET 4.8, .NET (Core) got rid of this and lets developers bundle the CLR directly, giving them more leeway for incompatible changes.

oaiey · on Nov 21, 2021

There are two things:

(1) .NET Framework was slow and had some bad habbits (e.g. heap allocations, reflections, little optimizations, etc) ... especially the web stack. .NET Core/.NET fixes that, issue by issue. And since .NET is historically very close to the underlying platforms, we now see competitive outcomes (to e.g. Go, C++, etc).

(2) Performance = lower CPU/Memory Allocation = more throughput = lower Cloud costs. At scale, that makes a huge difference.

fnord123 · on Nov 21, 2021

Software benchmarks are super subjective. Michael Larabel at Phoronix and Isaac Gouy of the benchmark game have done a lot in this area. But everyone says you need to take it with a gain if salt (which is often true).

There's also TPC-C benchmark suites where people benchmark their own software and claim results. Not really independent journalists there.

Rochus · on Nov 21, 2021

> Software benchmarks are super subjective.

No, they are not, but they are just a measurement tool, not a source of absolute truth. When I studied engineering at ETH we learned "Who measures measures rubbish!" ("Wer misst misst Mist!" in German). Every measurement has errors and being aware of these errors and coping with it is part of the engineering profession. The problem with programming language benchmarks is often that the goal is to win by all means; to compare as fairly and objectively as possible instead, there must be a set of suitable rules adhered to by all benchmark implementations. Such a set of rules is e.g. given for the Are-we-fast-yet suite (https://github.com/smarr/are-we-fast-yet).

fnord123 · on Nov 21, 2021

It's subjective because it can't be used as a source of truth. Of course "I measured X and my results were Y using methodology Z" can be a statement of fact but X and Z are where the subjectivity lie.

For example, benchmark game allows for warmups and so does awfy. This favors jits because it allows them to warm up when they would otherwise be slower. This might give the mistaken impression that java is a great choice for command line tools due to the performance characteristics.

In contrast, most benchmarks I've seen don't use profiler guided optimizations for C or C++. Hence the subjectivity.

And the claim of only wanting idiomatic code in awfy. This is, of course, subjective as well.

kaba0 · on Nov 21, 2021

Last I checked benchmark games didn’t care about warmups.

fnord123 · on Nov 21, 2021

You could be correct. The point doesn't rely on what a specific benchmark collection in particular does but that there is an open discussion on what is appropriate in the context of what people using the results find important.

9wzYQbTYsAIc · on Nov 21, 2021

.NET used to be considered a little bit slow, especially when using the Reflection related features, but not nearly as slow as interpreted languages.

thrower123 · on Nov 21, 2021

One SQL query going over the network will so dominate any micro-optimizations in the framework that it's a little silly for most of us to listen too closely when the ASP.NET team says they've sped up request processing another 40%. If JSON parsing request bodies and reading headers are significant, an API generally isn't doing very much.

jayd16 · on Nov 21, 2021

I assume temp solutions and low hanging fruit was added in the move to .net core and the new compiler. Now that it's more stable, things can get tightened up.

Bayart · on Nov 21, 2021

Most of the performance gains are really in the middleware (for example Entity Framework) and getting rid of pre-.NET Core legacy cruft rather than the VM, AFAIK.

CyanLite4 · on Nov 22, 2021

Not exactly. Here’s a really good blog post on the perf improvements: https://devblogs.microsoft.com/dotnet/performance-improvemen...

Rochus · on Nov 21, 2021

The speed-up very much depents on the (micro)benchmark in use. I did some measurements using the Are-we-fast-yet benchmark suite which includes both micro and larger benchmarks and got an overall speed-up (geometric mean of factors) of only 2% on x86 and even a little speed-down on x64.

See https://www.quora.com/Is-the-Mono-CLR-really-slower-than-Cor... and http://software.rochus-keller.ch/Are-we-fast-yet_results_win... for the details.

rndgermandude · on Nov 21, 2021

You're cross-compiling from the Oberon+ language to CLR IL bytecode using your own compiler... This isn't exactly something a lot of people would do. Most people would write more or less idiomatic C# and have the "official" compiler (Roslyn) produce the byte code.

What I am saying, I guess, is that I am not quite sure how much of your benchmark results come down to the quality of IL your custom compiler spits out.

munchler · on Nov 21, 2021

As an F# developer, I find it a little frustrating when people assume.NET = C#. If the blog post is about the speed of IL generated by the latest C# compiler, it should say so in the title instead of claiming to measure the performance of .NET in general.

ygra · on Nov 21, 2021

The runtime team certainly looks at discrepancies where C# and F# generate different IL that should still run about the same when JIT-compiled. So while C# is the main focus (also since the runtime libraries are written in C#), F# is not forgotten and benefits from a lot of those improvements as well.

pjmlp · on Nov 22, 2021

In what concerns .NET Native, F# was definitly forgotten, given that for a long time it did not compile MSIL required by F#.

And even though it works today, it is more of an hack than anything I would thrust to sign into production with my name on it.

Rochus · on Nov 21, 2021

> You're cross-compiling from the Oberon+ language to CLR IL bytecode ...

It's just "compiling", not "cross-compiling"; using CLR/CIL as a language backend is an intended feature, that's why the CLR and IL are standardized in ECMA-335 and ISO 23271, and that's why it is called "common language infrastructure".

> Most people would write more or less idiomatic C#

You are welcome to write a C# version of the benchmark.

> have the "official" compiler (Roslyn)

It's not the "official" compiler, but just the C# compiler implemented by MS and community; there are a lot of other compilers too.

infogulch · on Nov 21, 2021

You're arguing semantics. GP's point is that the compiler shipped with the platform may produce better byte code, which could have an affect on the benchmark results. This seems like a reasonable point to make.

Rochus · on Nov 21, 2021

Don't forget that IL is not executed, but is just an intermediate representation, and optimizations are done by the CLR; e.g. Mono does the following optimizations (according to e.g. https://man.archlinux.org/man/mono.1.en), regardless which compiler generated the IL:

  branch     Branch optimizations
  cfold      Constant folding
  cmov       Conditional moves [arch-dependency]
  deadce     Dead code elimination
  consprop   Constant propagation
  copyprop   Copy propagation
  fcmov      Fast x86 FP compares [arch-dependency]
  float32 Perform 32-bit float arithmetic using 32-bit operations
  gshared    Enable generic code sharing.
  inline     Inline method calls
  intrins    Intrinsic method implementations
  linears    Linear scan global reg allocation
  leaf       Leaf procedures optimizations
  loop       Loop related optimizations
  peephole   Peephole postpass
  precomp    Precompile all methods before executing Main
  sched      Instruction scheduling
  shared     Emit per-domain code
  sse2       SSE2 instructions on x86 [arch-dependency]
  tailc      Tail recursion and tail calls

forrestthewoods · on Nov 21, 2021

I don’t think you’re being very reasonable here.

You made a claim. Someone disputed the validity of your evidence. And your response is “well you can rewrite/replicate my entire project if you like”.

I think most people are going to assume your claim is bullshit and move on with their lives. You made the unconventional claim so the burden of proof is on you.

Rochus · on Nov 21, 2021

> You made the unconventional claim so the burden of proof is on you.

My assertion is supported by sufficient evidence. The criteria of scientificity are fulfilled. You can repeat the experiment on your system yourself if you wish. Under the referenced links you will find everything necessary to do so.

fabian2k · on Nov 21, 2021

You're arguing a very specific subset, which is a completely different thing than what essentially every article on .NET 6 performance claims. The performance claims are almost always about the whole thing, including various parts of the framework, the standard library and lots of low-level optimizations.

Microsoft published an enormously long article detailing many of the optimizations that were done (https://devblogs.microsoft.com/dotnet/performance-improvemen...). And it is not very suprising that pure number-crunching benchmarks only using the .NET IL would not gain very much. As much as I hate to discuss what "real world" applications are, the claims Microsoft and others are focusing on are much more relevant for typical applications where .NET is used than your examples.

infogulch · on Nov 21, 2021

Nobody is arguing against the results that you got. The question is if the results are applicable to the wider ecosystem or if there is another confounding variable that explains the outcome. Your experiment hints in this direction, and maybe someone should create another one that teases this apart, but definitive arguments either way are premature.

forrestthewoods · on Nov 21, 2021

The community clearly disagrees.

No one is disputing the results of your test. The question is will those results be replicated under conditions that are relevant to people writing code in a mainstream language under a much more prevalent compiler?

The answer might be yes! Everyone should always be suspicious of microbenchmarks. However people are also wise to be suspicious of benchmarks in obscure languages.

Your results introduce too many new variables for anyone to be comfortable to use it as a data point to inform their decision making.

NationalPark · on Nov 21, 2021

Maybe consider changing the title? I think most people are reading it as a much stronger claim (applying to Roslyn) than you intended.

justin66 · on Nov 21, 2021

> It's not the "official" compiler, but just the C# compiler implemented by MS and community

That sounds pretty official.

ripley12 · on Nov 21, 2021

It looks like you're not using the new dynamic PGO functionality (the OP is). I've seen throughput gains of ~15% from that.

jmkni · on Nov 21, 2021

Would be interested to see this benchmark on M1 (Apple Silicon), are there instructions on how to run it on that?

Rochus · on Nov 21, 2021

If you want to compare different .NET versions running natively on M1 such versions must be available as a precondition. If so, just download e.g. http://software.rochus-keller.ch/Are-we-fast-yet_CLI_2021-08..., update the included runtimeconfig.json file to the .NET version in use and run it (dotnet Main.exe).

jmkni · on Nov 21, 2021

Ok nice, will give that a go, cheers :)

tluyben2 · on Nov 21, 2021

We are testing on .NET 6 now with a large LoC monolithic asp.net system and the results indeed again have improved. We already rewrote a lot when we moved to .net 5 core to be more idiomatic so I guess those things were optimized more. It is not a huge jump but definitely nice work!

Edit; will try to post some numbers when all tests succeed; it is closed source but for a large (millions LoC) codebase I think it is nice to see how it performs under the same conditions compared to our current prod.

Ygg2 · on Nov 21, 2021

Do be warned that .Net 6 did cause some regressions, in reflection at least

https://github.com/dotnet/runtime/issues/61486

vadfa · on Nov 21, 2021

Isn't reflection by definition unstable?

dtech · on Nov 21, 2021

No. Reflection is a program accessing or modifying its own program structure. There's not need for it to be unstable, languages like Lisp, Java and I assume C# have clearly defined semantics for it.

phillipcarter · on Nov 21, 2021

> There's not need for it to be unstable

In practice there is though, it just depends on what you choose to take a dependency on.

For example, a few years ago the C# compiler did some lambda function optimization work. This broke someone's code because they were using reflection, and ultimately depended on how lambdas were getting optimized prior to the performance improvement in the compiler. The team by-designed that regression, since they make no guarantees that you can depend on a particular implementation detail of how the compiler optimizes things.

That said, when people use reflection in .NET, they're almost always programming against something that is stable and has likely worked the same way for a decade.

phillipcarter · on Nov 21, 2021

Also, I can't believe I didn't mention this already:

Reflection in .NET lets you dynamically invoke anything declared internal or private as well. I think it goes without saying that your code can be broken in the future if you do this.

Ygg2 · on Nov 22, 2021

It's the same as in Java.

Java never broke this weirdly though.

dtech · on Nov 23, 2021

You're confusing safety with stability. Reflection is necessarily unsafe, but not necessarily unstable. You just need to take care about keeping within bounds of guaranteed behavior.

E.g. in Java calling a method through reflection is guaranteed by the language to work. 100% stable forever.

Reflection also allows you to call internal JVM methods. This might or might not work depending on the JVM, making reflection an unsafe feature. It's still stable on the JVM's is works on though.

phillipcarter · on Nov 23, 2021

> You're confusing safety with stability.

Potayto, potahto I guess.

SideburnsOfDoom · on Nov 21, 2021

Correct.

Perhaps parent comment meant "unstable" in the sense that it turns compile time failures into runtime ones:

e.g. without reflection, if you type "customer.GetOrders()" then it either compiles or does not, whereas reflection code that finds a method called "GetOrders" can compile just fine but you won't know if it finds a method of that name or returns null, until runtime.

GordonS · on Nov 21, 2021

Not so it would crash the entire app with an uncatchable exception, no.

lvass · on Nov 21, 2021

How safe, audited and non-invasive is .net core by now? There's a .net program I have a VM for and that's kind of a pain. Since .net had telemetry by default, running bare on my machines was never an option, and Mono wouldn't even work.

brushfoot · on Nov 21, 2021

.NET doesn't have telemetry. The .NET SDK does by default, but that's for developing, not running, .NET apps. You don't need to (and shouldn't) install the SDK on a production machine.

In other words, if you just downloaded the .NET or .NET Core runtime to host an app, there's no .NET telemetry.

As far as the .NET SDK, you can disable telemetry by setting the environment variable `DOTNET_CLI_TELEMETRY_OPTOUT` to `1` or `true`.

tasogare · on Nov 21, 2021

In which kind of world do you live that setting a single environment variable is too much technical work?[0] I have the feeling your post is more about shitting on .NET with a low effort excuse than genuine interest.

[0] set DOTNET_CLI_TELEMETRY_OPTOUT environment variable to 1 or true

tester756 · on Nov 21, 2021

>Since .net had telemetry by default, running bare on my machines was never an option

Why?

I don't see anything weird in data collected

https://docs.microsoft.com/en-us/dotnet/core/tools/telemetry...

AlexanderDhoore · on Nov 21, 2021

Medical, military, industrial... Not everything is a webapp.

tester756 · on Nov 21, 2021

OP wrote about his private machine.

Anyway, OP was worried about installing .NET because it has telemetry by default, meanwhile you can disable telemetry before running your war-app or just ship standalone? idk.

lvass · on Nov 21, 2021

I never said what "my machines" do. Don't assume everyone has the same lax safety necessities as you have.

nojito · on Nov 21, 2021

It's downright impossible find a situation where .NET is not approved for use because of "security".

tester756 · on Nov 21, 2021

Fair, that's why I asked which specific telemetry is weird/insecure in your opinion

btrask · on Nov 21, 2021

I might be the last person to realize this, but did Microsoft name it .NET because they already had COM?

phillipcarter · on Nov 21, 2021

https://en.wikipedia.org/wiki/Microsoft_.NET_strategy

ygra · on Nov 21, 2021

.NET actually was known as COM+ for a time before public release. Environment variables for tweaking internal behaviour still retain that moniker.

randerson · on Nov 21, 2021

COM+ was basically Distributed COM, and was available for years before .NET. .NET Framework was implemented built on existing Win32 and COM/COM+ calls though, which is why you might see that.

DenisM · on Nov 21, 2021

Distributed COM was known as DCOM, COM+ became known as CLR.

contextfree · on Nov 22, 2021

You're both sort of right. There was something that was released under the name COM+, which was a bunch of services on top of DCOM. But what became the .NET CLR was also internally called COM+ (or part of COM+?) under development.

see https://wiki.c2.com/?ComPlus

I think what happened might have been similar to what ended up happening with the .NET name later - there was a name associated with an umbrella strategy, a bunch of different technical components were under development associated with that strategy, but only some of them were released before the strategy changed again, while others were repurposed/repositioned to be part of the new strategy.

This is a common pattern with Microsoft product/feature naming and I think it's one of the reasons everyone including Microsoft developer relations people routinely comment that Microsoft "not good at naming things". It's continuing now with UWP and WinRT, where those names are actually used to refer to a bunch of different things that were once part of a now-defunct Windows strategy - some of these things are now deprecated, while others (like the WinRT core language interop model) are still the basis of most new Windows API development, but this is very confusing to developers because of their association with the abandoned overall UWP strategy

pjmlp · on Nov 22, 2021

Unfortunely the only thing that didn't die with the strategy was the deep ingrained love for everything COM that the Windows team has, and they keep going at it without realising the rest of the world is done with COM, and we only endure it due to lack of alternatives in Windows APIs.

If they only had kept the way .NET Native and C++/CX exposed COM, but that would be too easy for their ways, and those tools are now gone.

oaiey · on Nov 21, 2021

It was the year 2000, web services was the hype of the .com bubble. So Microsoft pushed the web services in the net. Hence, all the products .NET. Windows Server .NET, Visual Studio .NET, .NET Framework etc.

It was marketing.

pjmlp · on Nov 22, 2021

Ironically that hype is the 2021 reality with SaaS and Web APIs everyhwere.

dgellow · on Nov 21, 2021

The library benchmarked in the article is Stl.Fusion: https://github.com/servicetitan/Stl.Fusion. I've only learned about it today, and the documentation is a bit messy, but that seems to be a really interesting project. The author describes it as a .Net library to quickly develop efficient, distributed, real-time web applications.

DeathArrow · on Nov 21, 2021

Quite interesting, but I would like to also see other benchmarks as the author said that the speed is constrained by the DB, ORM and other stack choices.

kolleykibber · on Nov 21, 2021

Are there any figures available for .net usage outside of large organisations?

jmnicolas · on Nov 21, 2021

I was so happy with upgrading one of my app to .net 6: I saw perfs gains from 10 minutes execution time on .net 4.8 to 1'30 minutes on .net 6.

Then my boss reminded me that we had new hypervisors with SSD (the old one had still spinning platters) so now I'm not so sure the .net 6 upgrade really made my app faster.

mihular · on Nov 21, 2021

Highly unlikely that you would gain that much just by upgrading. What kind of an app is anyway?

mikewarot · on Nov 21, 2021

What does .NET actually do?

Rapzid · on Nov 21, 2021

.NET is an umbrella term for the platform ecosystem that includes, among other things:

* Common Language Runtime(aka CLR, aka runtime) and its JIT compiler RyuJIT

* The C# language and it's compiler Roslyn

* The base class library and framework class libraries(aka BCL and FCL)

* ASP.NET the flagship web application framework

* F# the platforms flagship functional language

* etc etc

It's all ".NET"

cyral · on Nov 21, 2021

It is basically the standard library for C#/F#/VB.NET applications.

mikewarot · on Nov 21, 2021

I've never had to use it in my code, but I've had plenty of problems with its use in applications. It always seemed to me that it was Microsoft's attempt to lock people into Windows forever.