Recursive fibonacci benchmark using top languages on GitHub

drujensen · on Sept 28, 2018

ok, owner of the repo here. So this project was purely to show the macro differences between interpreted ruby and compiled crystal to beginner ruby devs at a meet up.

I decided to add the top languages on github to help give some idea of how crystal performed against them.

I'm happy to see so much discussion about it and was taken by surprise when my inbox was full this morning. Thanks @anonfunction. ;-)

The breaking benchmark examples were just that, examples of how to break the benchmark. I didn't expect to get a memoized version of every language and really don't think comparing them from a performance benchmark makes much sense. Let me know if i'm wrong about that.

I am fine adding all of your change requests and will try to keep the benchmarks up to date.

HankB99 · on Sept 28, 2018

Breaking benchmarks... Is that like the time I fiddled with the corresponding C program trying to get it to run at least in the same ballpark as the Rust version? That was until I determined that the Rust optimizer noted that the results of the 'meat' of the test were never used so it just optimized the whole thing away. :) (I thought it was a little disingenuous for the Rust folks to use this as an example of how performant Rust was.)

petre · on Sept 29, 2018

I used to benchmark using fibonacci, but the recursive method is just awful because it does a lot of function calls and you're essentially benchmarking that. Then I switched to finding primes using the Sieve of Sundaram. It uses arrays, hashes/maps/dicts and two loops. Also wastes a lot of memory if you don't split your search domain into ranges. The surprise was that Go and D (in that order) turned out to be faster than Rust mainly due to Rust's HashMap SipHash algorithm. I gave up trying to use other hash libraries (SeaHash specifically) that are not part of the standard lib, because it was quite frustrating compared to D.

igouy · on Oct 1, 2018

> gave up trying to use other hash libraries

fwiw a Rust k-nucleotide program using indexmap:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

igouy · on Sept 28, 2018

> really don't think comparing them from a performance benchmark makes much sense

No, it really doesn't — but if you provide times we all know that's exactly what people will do.

That's why the same comparison was removed from the benchmarks game and replaced with tasks that were still toy but more than a dozen lines.

> adding all of your change requests

These are the programs that were replaced:

https://salsa.debian.org/benchmarksgame-team/archive-alioth-...

glandium · on Sept 28, 2018

So... before looking at the code, I compared the elapsed time between the fib.rs program compiled with `rustc -O` and `rustc -O -C lto`. This yielded, interestingly, a ~500ms difference: ~6.45s vs ~6.95s (with LTO winning).

Then I looked at the code, and it began to make no sense. Then I looked at the assembly, and it made even less sense: the main function, fib::fib, which is where the vast majority of the time is spent, is identical, except for addresses.

I was starting to think, well, this might be related to instruction-cache lines... and it looks like it's not that:

  $ perf stat ./fib-lto
  2971215073

   Performance counter stats for './fib':

         6474.410894      task-clock (msec)         #    1.000 CPUs utilized          
                  17      context-switches          #    0.003 K/sec                  
                   0      cpu-migrations            #    0.000 K/sec                  
                 112      page-faults               #    0.017 K/sec                  
      22,607,598,238      cycles                    #    3.492 GHz                    
      55,325,512,417      instructions              #    2.45  insn per cycle         
      13,021,149,180      branches                  # 2011.171 M/sec                  
          76,737,179      branch-misses             #    0.59% of all branches        

         6.474991613 seconds time elapsed

         6.474816000 seconds user
         0.000000000 seconds sys


  $ perf stat ./fib
  2971215073

   Performance counter stats for './fib':

         6956.534790      task-clock (msec)         #    1.000 CPUs utilized          
                  11      context-switches          #    0.002 K/sec                  
                   0      cpu-migrations            #    0.000 K/sec                  
                 113      page-faults               #    0.016 K/sec                  
      24,290,924,647      cycles                    #    3.492 GHz                    
      55,325,864,841      instructions              #    2.28  insn per cycle         
      13,021,213,404      branches                  # 1871.796 M/sec                  
          84,392,454      branch-misses             #    0.65% of all branches        

         6.957260487 seconds time elapsed

         6.956974000 seconds user
         0.000000000 seconds sys

I didn't know an address difference of 0x30 could influence the number of branch misses so much.

glandium · on Sept 28, 2018

Interestingly, the C and C++ versions have different performance for the same reason: they produce the exact same machine code, at different addresses. Thus one runs faster than the other. Edit: actually, there isn't a difference in branch-misses for those... only in cycles, which is even more intriguing.

Another interesting fact: valgrind's branch simulator doesn't show a difference in branch mispredictions between both (it also shows a misprediction rate much larger than reality, it's likely based on an old model Edit: valgrind manual says: Cachegrind simulates branch predictors intended to be typical of mainstream desktop/server processors of around 2004.).

More edit: I hacked a linker script to place the fib function from the C++ implementation at a fixed address, and tried different addresses, with interesting results:

  0x10000: 5.117084095 seconds time elapsed, 17,866,364,962 cycles
  0x10010: 5.206242916 seconds time elapsed, 18,090,712,907 cycles
  0x10020: 6.096635146 seconds time elapsed, 21,285,484,583 cycles
  0x10030: 4.955020420 seconds time elapsed, 17,297,162,836 cycles
  0x10040: 5.146043048 seconds time elapsed, 17,954,722,919 cycles
  0x10050: 5.252477508 seconds time elapsed, 18,335,804,193 cycles
  0x10060: 6.100806292 seconds time elapsed, 21,300,089,284 cycles
  0x10070: 4.936397948 seconds time elapsed, 17,216,051,020 cycles

Even more edit: I vaguely remember there was someone doing some analysis (with performance counters) of some similar performance difference depending where the function was located, that was on HN a few months ago, but I can't find it anymore.

jcranmer · on Sept 28, 2018

An Intel engineer gave a presentation at the 2016 LLVM Developers' Meeting that gives some insight: http://llvm.org/devmtg/2016-11/Slides/Ansari-Code-Alignment.... (for the slides).

Essentially, some of the loop heuristics work on a 32-byte window, not a 16-byte window. An interesting statistic would be to find where all the loops are in the code as relative offsets, and find how these shift based on starting alignment.

glandium · on Sept 28, 2018

Thanks. This looks familiar, but I thought I had seen something similar in a blog form. Edit: Found it! https://dendibakh.github.io/blog/2018/01/18/Code_alignment_i... https://news.ycombinator.com/item?id=16198171

FWIW, this is what the function looks like at 0x10070:

  0000000000010070 <_Z3fibl>:
   10070:       48 83 ff 01             cmp    $0x1,%rdi
   10074:       7e 32                   jle    100a8 <_Z3fibl+0x38>
   10076:       55                      push   %rbp
   10077:       31 ed                   xor    %ebp,%ebp
   10079:       53                      push   %rbx
   1007a:       48 89 fb                mov    %rdi,%rbx
   1007d:       48 83 ec 08             sub    $0x8,%rsp
   10081:       48 8d 7b ff             lea    -0x1(%rbx),%rdi
   10085:       48 83 eb 02             sub    $0x2,%rbx
   10089:       e8 e2 ff ff ff          callq  10070 <_Z3fibl>
   1008e:       48 01 c5                add    %rax,%rbp
   10091:       48 83 fb 01             cmp    $0x1,%rbx
   10095:       7f ea                   jg     10081 <_Z3fibl+0x11>
   10097:       48 83 c4 08             add    $0x8,%rsp
   1009b:       48 8d 45 01             lea    0x1(%rbp),%rax
   1009f:       5b                      pop    %rbx
   100a0:       5d                      pop    %rbp
   100a1:       c3                      retq   
   100a2:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
   100a8:       b8 01 00 00 00          mov    $0x1,%eax
   100ad:       c3                      retq

Edit: It's interesting to note that there's only one recursive call to the function, not two (you get two calls if you compile with -O1). It's also interesting to note that -Os generates an even smaller version with only one recursion, but that ends up slower (~6.5s).

BeeOnRope · on Sept 28, 2018

The compiler has determined that the second call `fib(n - 2)` is "almost" tail-recursive (there is probably a better term for this), and so was able to convert that call into the loop you see at 10081 to 10095. This loop repeatedly calls the `fib(n - 1)` part.

In other words, the function was transformed from:

  uint64_t fib(uint64_t n) {
    if (n <= 1) return 1;
    return fib(n - 1) + fib(n - 2);
  }

into the equivalent of:

  uint64_t fib2(uint64_t n) {
    if (n <= 1) return 1;
    uint64_t sum = 1;
    for (; n > 1; n -= 2) {
      sum += fib2(n - 1);
    }
    return sum;
  }

Unfortunately, the result isn't too great as the initial part of the function is pretty heavy, and this bottlenecks computation. It would be better to inline some copies of fib into itself which might even allow some small amount of CSE, but importantly would reduce the number of calls.

The alignment you show is "special" (fast) because it's the point at which the top of the main loop (10081) is near the start of a 32-byte region, which allows multiple uops to be efficiently dispatched after jumping there.

BeeOnRope · on Sept 28, 2018

It looks like gcc is going backwards on this one (at least in performance, the gcc-8 code is smaller): gcc-8, which generates the compact code you show, is the slowest of all the versions I tested (at -O3). Here's what I got:

  gcc 4.8: 4.1 s
  gcc 4.9: 4.3 s
  gcc 5.5: 4.3 s
  gcc 7.3: 4.8 s
  gcc 8.1: 6.6 s

The difference between the compilers isn't just alignment: gcc-8 is generating very different code than the others, which are generally generating a large fib function with several calls of fib inlined into it, so there are many fewer calls (around 300k calls for gcc 4.8 through 7, but around 3 million for gcc 8).

stock_toaster · on Sept 28, 2018

Doesnt gcc8 include spectre mitigations?

BeeOnRope · on Sept 28, 2018

Many versions of GCC do, including the version of 5.5 I tested above, but the mitigation optional (enabled by specific command line arguments) and off by default, so they don't impact this test.

rbjorklin · on Sept 28, 2018

The owner of this repo has no idea it’s trending on HN. This issue gave me a good laugh: https://github.com/drujensen/fib/issues/51

avip · on Sept 28, 2018

On the flip side, HN just won itself a new user.

drujensen · on Sept 28, 2018

lol, yes it did.

vilos · on Sept 28, 2018

Your project is informative and awesome, I hope you don't mind the attention.

codr4 · on Sept 28, 2018

Haha, wheretf did all the users come from :)

A nice problem to have if you ask me...

gambler · on Sept 28, 2018

Sorry, but this is not how you do numeric benchmarks. Most real programs that deal with numbers also manipulate data structures and call many standard library methods. If those things are slow, your app is slow. This benchmark doesn't demonstrate anything that would translate into real-life performance.

alangpierce · on Sept 28, 2018

It's definitely not a realistic measure of overall language performance, but I think it certainly has value if taken with the right caveats. It's probably a reasonably good measure of function call overhead in different languages, and I think it's good for getting a better intuition about the order of magnitude difference to expect from, say, C vs JS (~3x slower than C in simple cases) vs Python (~100x slower than C).

jcranmer · on Sept 28, 2018

> It's probably a reasonably good measure of function call overhead in different languages

It's actually not. Fibonacci has a near-tail call in it that can convert some of the recursion into a loop. Furthermore, the runtime is dominated by useless recomputation that can be handled by memoization. So the distinction between languages is going to be dominated by their ability to do some moderately complex optimizations rather than by any intrinsic performance characteristics of their implementation.

Moreover, because the code is so small, there is likely to be major side effects as a result of effectively random differences--consider the effects of the code placement that cause a spurious difference between C and C++ kernels of about 20%. The JS version is going to get hit with a deoptimization at the very end (it might not impact performance) because the final result does not fit in an int32_t, and suddenly fib is no longer a well-typed function.

Microbenchmarking is _hard_, and it is all too easy for the microbenchmark to cease measuring the things that you want to measure.

Jtsummers · on Sept 28, 2018

Fibonacci (as implemented here) shouldn't be aided by any TCO. The execution trace will be something like:

  F(N) -- F(N-2) ...
   |
  F(N-1) -- F(N-3) ...
   |
  F(N-2) -- F(N-4) ...

But each has to return to the original caller because a summation still has to happen. The actual tail call is to the +/2 function.

F(N) can't complete until F(N-1) and F(N-2) have completed so it can sum up the values. If you pass the earlier computation, F(N-1), down to F(N-2) as a second parameter, you could get a tail call on the last part.

  fib(0, Val) -> Val + 1;
  fib(1, Val) -> Val + 1;
  fib(N, Val) ->
    T = fib(N-1,Val),
    fib(N-2,T).

(I think I wrote that correctly, can't test here.)

But that's a non-obvious tranformation. I really would like to see the compiler that recognized that this was a valid (computational) equivalent to the original.

jcranmer · on Sept 28, 2018

But it is aided by tail-call optimization. Suppose you have a function as follows:

   int foo(int x) {
     if (test(x)) return C;
     return evaluate(x) + foo(adjust(x));
   }

I can turn that function into:

   int foo(int x) {
     int accum = C;
     for ( ; !test(x); x = adjust(x)) {
       accum += evaluate(x);
     }
     return accum;
   }

In the case of fibonacci, that equates to:

   int fib(int x) {
     int accum = 1;
     for (; x > 1; x -= 2) {
       accum += fib(x - 1);
     }
     return accum;
   }

We still turned the recursive call (or one of them, at least), into a while loop.

And this idiom is recognized by both gcc and llvm. In fact. llvm has a test that specifically makes sure fib is transformed as thus: https://github.com/llvm-mirror/llvm/blob/30fa583f8430bfc7935...

drujensen · on Sept 28, 2018

yup, this was my original goal.

gameswithgo · on Sept 28, 2018

its exactly how you do a benchmark of recursive Fibonacci performance, which is what it is called.

feintruled · on Sept 28, 2018

I was intrigued to learn more about the winning language Nim to see how it beats C/C++, so I did a web search and was confounded to see Nim compiles to C/C++! What gives? Starting to doubt the methodology somewhat, though it was an interesting read.

fabriceleal · on Sept 28, 2018

Probably more about the difference between "echo" and "std::cout << ... << std::endl;" than the recursive function call.

Sean1708 · on Sept 28, 2018

I would be very very surprised if printing "2971215073" accounted for more than 1 second of runtime.

gambler · on Sept 28, 2018

Well, I know that printing stuff to console can be ridiculously slow, so this doesn't surprise me much.

Sean1708 · on Oct 3, 2018

An entire second to print 10 characters?! How does a terminal that slow even exist?

VeXocide · on Sept 28, 2018

std::endl flushes which isn't cheap, you're generally better off simply steaming in a newline.

lucozade · on Sept 28, 2018

Most of it won't be because the C++ constexpr version has the same IO call and runs in <0.1s.

syockit · on Sept 28, 2018

I think that's because even the std::cout call could be precalculated, so no runtime string concatenation and stream magic is involved.

rothron · on Sept 28, 2018

Don't see the point really.

Languages without tail recursion will perform worse. Memoization is borderline cheating, because it's a different implementation.

Fib is the poster boy for tail recursion but the reason for that is that recursion to implement fib is simply a bad choice. It's cute but that's about it. If the point is to measure function call overhead, then measure _that_?

DonaldPShimoda · on Sept 28, 2018

> Fib is the poster boy for tail recursion

But recursive Fibonacci isn't tail recursive. The final function call is to `+` (addition), which means that the two recursive calls must each be put on the stack and later returned so the sum can be computed. Tail recursion requires that there is no final operation other than exactly a single recursive call.

gameswithgo · on Sept 28, 2018

some compilers can rearrange it to be tail recursive automatically

Jtsummers · on Sept 28, 2018

Pretend you're a compiler, what changes would you make to this to make it tail recursive:

  fib(0) -> 1;
  fib(1) -> 1;
  fib(N) -> fib(N-1) + fib(N-2).

Rules: Making a second function (fib_help) is permitted. Shouldn't use any more memory than this one uses.

FeepingCreature · on Sept 28, 2018

Fun challenge.

https://paste.pound-python.org/show/m403qNkpS5I8dnYjJGJq/

I wanna see the compiler that gets that.

SeanLuke · on Sept 28, 2018

Which compilers can rearrange fib, with two recursive calls, into being tail recursive?

FeepingCreature · on Sept 28, 2018

If it knows it's a pure function, it's possible. It'd have to be a very specific optimization pass though.

jcranmer · on Sept 28, 2018

Any LLVM-based compiler. GCC.

DonaldPShimoda · on Sept 28, 2018

But that doesn't make it "poster boy" material, in my opinion.

rothron · on Sept 30, 2018

I called it that because I've seen fib be the go-to example of so many introductions to functional programming.

But I suppose it's more accurate to describe it as the poster boy example for changing an implementation to make it tail recurse.

sometimesijust · on Sept 28, 2018

It can be a gateway for learning interesting things about the mechanics of a language, its compilation, and in turn how to make similar cases in other languages faster/cleaner/more-secure. e.g. why is nim so fast in this case? Is it tail call optimisation, not doing overflow checks, static inlining, or something else? Nim compiles to c to it is especially odd.

fjh · on Sept 28, 2018

Why would languages without tail recursion optimisation perform worse when all languages use the naive implementation? It's not tail recursive, so it shouldn't make a difference, right?

spatulon · on Sept 28, 2018

You're right that tail recursion doesn't help - the final operation is the addition, not a recursive call.

rothron · on Sept 28, 2018

Just to underline the futility of this, if you are clever you can let the compiler do it.

https://everything2.com/title/C%252B%252B%253A+computing+Fib...

eesmith · on Sept 28, 2018

The benchmark includes a C++ constexpr version, at https://github.com/drujensen/fib/blob/master/fib-constexpr.c... , with timing numbers under the section "Optimized code that breaks the benchmark" and the comment "all benchmarks will have some caveat."

larkeith · on Sept 28, 2018

Somewhat off topic, but I love the fact that the first post regarding compile-time Fibonacci in C++ was submitted in 2000, another user expanded on the original in 2008, and now it's being usefully referenced in 2018.

I stumbled across Everything2 recently, but in many ways it strikes me as a tiny bit of the golden age of the internet, unexpectedly preserved.

leni536 · on Sept 28, 2018

It would be interesting to include skip[1] as it has language level memoization. There was a recent HN discussion about the language [2]. I do not think that the naive recursive Fibonacci is a useful benchmark in any way though, it's a way too pessimized implementation.

[1] http://www.skiplang.com/ [2] https://news.ycombinator.com/item?id=18077612

ishitatsuyuki · on Sept 28, 2018

Memoization isn't a new thing. It's available in Haskell as a first class citizen.

This benchmark explicitly targets popular languages, and it happens to include no pure functional languages.

harpocrates · on Sept 28, 2018

> It's available in Haskell as a first class citizen.

Not really. Haskell's laziness makes it easier to write functions that are memoized, but it does not automagically memoize functions. And how could it without incurring a non-trivial runtime space/time cost?

Haskell's purity is what makes it easier to write libraries that facilitate building memoized versions of functions in a transparent way (and that are obviously correct). For instance, I usually reach for [data-memocombinators][0], which happens to have a fibonacci example at the top of the docs:

    import qualified Data.MemoCombinators as Memo

    fib = Memo.integral fib'
       where
       fib' 0 = 0
       fib' 1 = 1
       fib' x = fib (x-1) + fib (x-2)

  [0]: http://hackage.haskell.org/package/data-memocombinators-0.5.1/docs/Data-MemoCombinators.html

Symmetry · on Sept 28, 2018

But is there any reason the Haskell compiler couldn't decide to memoize a function call itself if it determined that it would speed things up? To me that sounds hard for the compiler to do heuristically but a Haskell compiler should still have more scope to do optimizations like that than a C compiler has.

vmchale · on Sept 28, 2018

Haskell is call-by-need so you get guarantees you wouldn't otherwise. It's not the same as memoization but it gives you what you want in any case.

vmchale · on Sept 28, 2018

> I do not think that the naive recursive Fibonacci is a useful benchmark in any way though, it's a way too pessimized implementation.

probably, but I would still like to see more benchmarks that use recursion. It's a fundamental technique from functional programming and so few benchmarks bother with it at all.

gnulinux · on Sept 28, 2018

Why do we need language level memorization again? In python memoizing a function is just one line `@lru_cache()` I can't see why this needs an improvement by adding yet another thing to the core language.

thaunatos · on Sept 28, 2018

Just adding a @lru_cache() to a python isn't guaranteed to be correct; for example, if you're reading from an API the response could change between calls.

Skip tracks side effects, and will either (a) memoize automatically a pure function or (b) recognize that a function is impure and avoid the memoization.

_v7gu · on Sept 28, 2018

Some time ago, facebook introduced a javascript compiler which pre-compiled intermediate values. I think the point was to extend the spirit to the backend apps. If memoization is first class, implementing pre-compilation should be a blast

freecodyx · on Sept 28, 2018

During the my last technical interview, they asked my to write a fibonacci implementation in golang, i wrote the code, then they asked me to test it for like fb(180), i was surprised how slow it, then i was asked to come up with an optimisation in order to make it fast, i come exactly with same fib-mem.go provided in this benchmark, i was hired !

dangom · on Sept 28, 2018

I guess if the point is to optimize fibonacci then wouldn't the smartest move be to use the closed form solution where Fib(n) is the closest integer to phi^n / sqrt(5), where phi = (1 + sqrt(5))/2?

This is one of the examples in the beginning of SICP, if I remember correctly.

FreeFull · on Sept 28, 2018

The identity involves going through real numbers to get the integer result, and if you're using floating point or fixed point calculations, you need a good amount of accuracy to get the correct integer out, which also won't be the fastest thing. Using https://en.wikipedia.org/wiki/Fibonacci_number#Matrix_form you can get a different way to calculate Fibonacci numbers recursively, which involves far fewer computations.

Chinjut · on Sept 28, 2018

Fundamentally, the thing to do is to understand the N-th Fibonacci number as a + b, where φ^N = a + b * φ, where a and b are natural numbers and φ^2 = φ + 1.

It's not important to understand phi as a floating point value; we can just work in the abstract arithmetic where we've added to the natural numbers a new entity phi defined to satisfy φ^2 = φ + 1 (just like complex numbers are the abstract arithmetic where we've added to ordinary arithmetic an i defined to satisfy i^2 = -1).

Call these Fomplex numbers; a Fomplex number a + b * φ amounts to just a pair of natural numbers, and it's easy enough to add and multiply them with natural number arithmetic. To calculate the N-th Fibonacci number, just calculate φ^N and add together its coefficients, as noted. As for how to efficiently calculate φ^N, use the usual addition chain approach to exponentiation (e.g., "repeated squaring"), thus getting a result in Θ(log N) many additions and multiplications.

This is incidentally the same as the matrix approach, essentially, but perhaps a cleaner perspective on it; at any rate, it is a way of thinking which will serve as a useful tool in your back pocket for other general problems about linear recurrences.

edflsafoiewq · on Sept 28, 2018

How neat! Among its properties not the least interesting is that it has finally convinced me that the initial condition f(0) = f(1) = 1 is not arbitrary but perfectly natural.

Chinjut · on Sept 29, 2018

Ah, but actually that part is still a little arbitrary. This technique is fully general; variants describe ANY linear recurrence with ANY initial values. In particular, the Nth value of the Fibonacci-type sequence with 0th value x and 1th value y is xa + yb where a + bϕ = ϕ^N. I just happen to have chosen the weights x and y to both be 1 in that discussion.

sriram_malhar · on Sept 28, 2018

So you didn't write the non-recursive version?

quickthrower2 · on Sept 28, 2018

Yeah that fib-mem.go is just weird. It's almost a WTF.

bufferoverflow · on Sept 28, 2018

fib(180) without caching would take billions of years to complete, so "slow" is an understatement.

bjoli · on Sept 28, 2018

Or you write an iterative version of it that will not only be the simplest solution, it will be fast enough to compute fib(10000). There are constant time ones IIRC, but if someone comes up with such a solution in an interview without knowing it from before, they are probably over qualified for any job that uses the Fibonacci sequence as an interview question.

ummonk · on Sept 28, 2018

Billions of years is a gross understatement.

person_of_color · on Sept 28, 2018

cause o^2n?

Gondolin · on Sept 28, 2018

Any matrix implementation (with fast exponentiation) is going to be much faster, even in a "slow" language. For instance, in ruby, `Matrix[[1,1],[1,0]] * * 46` is instantaneous (real time: 0.000131s).

We have to go up to the ten million'th term to get something that is not instantaneous (0.486s) for a result which has 2089876 digits.

uryga · on Sept 28, 2018

Are you sure `Matrix[[1,1],[1,0]] * * 46` isn't optimized into a constant value by the bytecode compiler? Idk Ruby much, but it seems like a possiblity.

sebazzz · on Sept 28, 2018

Nice to see the different implementations side-by-side. I do believe however that this is mainly a benchmark of startup time and initial JIT of the CRT/runtime/virtual machine/execution environment.

If you remove that time from the equation, I expect that the execution time will not differ a lot from each other.

chvid · on Sept 28, 2018

Of all the benchmarks out there this is actually reasonable (though obviously limited). It runs the same thing in different languages and the processing time is long enough for VM startup not to matter.

Also the performance shows the bytecode languages C# and Java to do fairly well compared to C and GO.

imtringued · on Sept 28, 2018

That has mostly to do with the fact that the code isn't manipulating objects at all. As soon as you're starting pointer chasing in languages where almost everything is an object like javascript performance will suffer.

thecatspaw · on Sept 28, 2018

Java needs warmup time though to optimize, and the configuration of the vm can have a heavy inpact on performance as well

FabHK · on Sept 28, 2018

I thought so as well, and benchmarked the Julia version inside the REPL (after calling the function once to make sure JIT compilation was done), and the difference was insignificant (maybe 1 to 5 percent - basically same order of magnitude as the measurement error, ie usual run time fluctuations). So Julia runtime was still a bit less than 2x the C/C++ runtime (but faster than C/C++ without the -O3 switch).

ChrisRackauckas · on Sept 28, 2018

This shouldn't be surprising though. The point of Julia is that compilation time stays relatively constant while runtimes can grow enormous very easily. Normally with microbenchmarks you run multiple times to get rid of compilation time because microbenchmarks run in <1 second so compile time matters. But this show that, in any case where the user does begin to care about speed, that compilation time is really minimal. The only place where it truly matters in practice is in the REPL: it can cause a bit of lag which can get annoying but it's a tradeoff.

comboy · on Sept 28, 2018

Startup time of most VMs is in miliseconds. For VM stuff that includes compilation all times are 10s+.

_19qg · on Sept 28, 2018

SBCL on my 2012 Mac mini:

    rjmacmini:~$ sbcl
    This is SBCL 1.4.2, an implementation of ANSI Common Lisp.
    More information about SBCL is available at <http://www.sbcl.org/>.

    SBCL is free software, provided as is, with absolutely no warranty.
    It is mostly in the public domain; some portions are provided under
    BSD-style licenses.  See the CREDITS and COPYING files in the
    distribution for more information.
    * (defun fib (n)                        
        (declare (fixnum n)
                 (optimize (speed 3) (debug 0) (safety 0)))
        (if (<= n 1)
            1
          (the fixnum
               (+ (fib (- n 1))
                  (fib (- n 2))))))

    FIB
    * (time (fib 46))

    Evaluation took:
      14.957 seconds of real time
      14.947616 seconds of total run time (14.934818 user, 0.012798 system)
      99.94% CPU
      38,799,794,836 processor cycles
      0 bytes consed

    2971215073
    * (SAVE-LISP-AND-DIE "/tmp/fiblisp" :toplevel (lambda (&rest args) (print (fib 46))) :executable t)
    ; in: SAVE-LISP-AND-DIE "/tmp/fiblisp"
    ;     (LAMBDA (&REST ARGS) (PRINT (FIB 46)))
    ; 
    ; caught STYLE-WARNING:
    ;   The variable ARGS is defined but never used.
    ; 
    ; compilation unit finished
    ;   caught 1 STYLE-WARNING condition
    [undoing binding stack and other enclosing state... done]
    [defragmenting immobile space... 643+15263+734+344+25029+16756 objects... done]
    [saving current Lisp image into /tmp/fiblisp:
    writing 0 bytes from the read-only space at 0x20000000
    writing 848 bytes from the static space at 0x20100000
    writing 1863680 bytes from the immobile space at 0x20300000
    writing 11472480 bytes from the immobile space at 0x21b00000
    writing 26542080 bytes from the dynamic space at 0x1000000000
    done]
    rjmacmini:~$ time /tmp/fiblisp

    2971215073 
    real	0m14.785s
    user	0m14.755s
    sys	0m0.020s
    rjmacmini:~$

stevelosh · on Sept 28, 2018

I don't think adding debug/safety 0 are really worth it. In this:

    (declaim (optimize speed)
             (ftype (function (fixnum) fixnum) fib))

    (defun fib (n)
      (if (<= n 1)
        1
        (+ (fib (- n 1))
           (fib (- n 2)))))

    (print (fib 46))

adding `(safety 0) (debug 0)` took the time from 13.17s (with just the `speed` declaration) down to 12.94s for me. Is a 2% speed increase really worth the danger of `(safety 0)`?

_19qg · on Sept 28, 2018

Generally depends on the implementation and the default behavior.

But anyway, it's a good point.

jonenst · on Sept 28, 2018

They should

- remove the memoized versions (obviously it's faster)

- show the executed instructions for each language of the main loop (which would be a nice exercise with all the vms and dynamic languages)

ainar-g · on Sept 28, 2018

Re. Go.

I've turned this code into a Go benchmark and run it with both the default cmd/compile (1.11) and gccgo (8.2.0).

  cmd/compile:
  BenchmarkFib-4   	       1	16530409619 ns/op
  PASS
  ok  	command-line-arguments	16.533s
  
  gccgo:
  BenchmarkFib-4   	2	 776368644 ns/op
  PASS
  ok  	command-line-arguments	2.381s

So yeah, if you are doing heavy-weight math stuff in Go, you might consider switching to gccgo. You might get up to 20x performance boost.

Here is the gist, please tell me if I screwed up somewhere: https://gist.github.com/ainar-g/1bd363d41c441d9ebf05c0c0b9f2....

EDIT: After some disassembly and experimenting, it seems like what we see here is some clever unrolling and memoisation. If I change 46 from a constant to a variable, it becomes twice as slow. Plus there is this in disasm:

  /tmp/go/fibbench_test.go:13
          return fib(n-1) + fib(n-2)
      30b9:       e8 72 ff ff ff          callq  3030 <command_line_arguments.fib>
      30be:       bf 25 00 00 00          mov    $0x25,%edi
      30c3:       e8 68 ff ff ff          callq  3030 <command_line_arguments.fib>
      30c8:       bf 24 00 00 00          mov    $0x24,%edi
      30cd:       e8 5e ff ff ff          callq  3030 <command_line_arguments.fib>
      30d2:       bf 23 00 00 00          mov    $0x23,%edi
  ...

What puzzles me is why doesn't gcc do this for the C version. Even if I add an explicit __attribute__((const)).

ericpauley · on Sept 28, 2018

Microbenchmarks like this can be difficult to perform in practice, as gccgo can perform optimizations on pure functions that prevent Go's benchmarks from actually fully repeating a test. Also, this is a special case in which gccgo shines, specifically because it is completely cpu-bound. Generally, there isn't nearly as much of a performance difference.

FabHK · on Sept 28, 2018

So you're saying it would run in less than 1/4 of the time of the Nim, C, C++ implementations? Sure it's running the same algo?

ainar-g · on Sept 28, 2018

I am as confused as you are. The C version is much slower on my PC. I've added the link to the code above. Please tell me if you see something fishy.

joserr · on Sept 28, 2018

SBCL on my AMD FX(tm)-8350 Eight-Core Processor:

   ~$ sbcl
   This is SBCL 1.4.11, an implementation of ANSI Common Lisp.
   More information about SBCL is available at <http://www.sbcl.org/>.

   SBCL is free software, provided as is, with absolutely no warranty.
   It is mostly in the public domain; some portions are provided under
   BSD-style licenses.  See the CREDITS and COPYING files in the
   distribution for more information.
   * (defun fibonacci-tail-recursive ( n &optional (a 1) (b 1))
           (declare (optimize (speed 3) (safety 0) (debug 0))
                    (type fixnum n a b))
           (if (< n 1)
               a
               (fibonacci-tail-recursive (- n 1) b (+ a b))))

   FIBONACCI-TAIL-RECURSIVE
   * (time (fibonacci-tail-recursive 46))

   Evaluation took:
     0.000 seconds of real time
     0.000001 seconds of total run time (0.000001 user, 0.000000 system)
     100.00% CPU
     2,513 processor cycles
     0 bytes consed
  
     2971215073
   *

codr4 · on Sept 28, 2018

Different machine and superior algorithm, where are you aiming with this?

joserr · on Sept 28, 2018

It's true, I only read recursive Fibonacci. Now I see my mistake.

codr4 · on Sept 28, 2018

That being said, this one [0] gives correct results and runs slightly faster :)

[0] https://gist.github.com/codr4life/59f2c02403b27d551e706f673b...

ellisv · on Sept 28, 2018

Several implementations are wrong -- they don't give the correct result.

hurrrrr · on Sept 28, 2018

They all seem to be consistently wrong. The series is shifted by 1.

celrod · on Sept 28, 2018

Because Fortran was listed as x.xxx, I figured I'd try it. Taking the minimum of 100 runs of gcc/gfortran, 200 runs of g++, and 4 runs of Julia 1.1-dev:

gcc: 3.4866

g++: 3.4428

gfortran: 3.4953

Julia: 8.0378

I was doing other things while the benchmarks ran, which added noise. For the first run of g++, the mean was slightly higher than C's mean, while the standard deviation was much higher than C or Fortran's. So I reran it for C++, and then decided to just report the minimums for everything instead, because the minimum is probably the least biased measure.

So long as there are no allocations that occasionally get cleaned up by a garbage collector. If there were, the GC cost should get amortized over all the runs that contributed to it. We don't have to worry about this here.

While the C and C++ assembly for the fib function was the same, Fortran's was different.

Julia's assembly was extremely brief in comparison, because it doesn't have any recursion optimizations -- it is just the check and then two calls to itself.

ChrisRackauckas · on Sept 28, 2018

Yes, Julia doesn't do tail call optimizations (TCO) which it could do with LLVM. Just not enabled yet. I find it funny when people try to say that Julia's website benchmarks are cherrypicked to look good when the first example shows that it's tracking an optimization which it's missing...

bestboy · on Sept 29, 2018

One can improve the JAVA result by allowing the JIT compiler to inline the recursive calls.

   -XX:MaxRecursiveInlineLevel=1 (default): 6.667 s
   -XX:MaxRecursiveInlineLevel=2:           6.141 s
   -XX:MaxRecursiveInlineLevel=3:           5.768 s
   -XX:MaxRecursiveInlineLevel=4:           5.400 s
   -XX:MaxRecursiveInlineLevel=5:           5.361 s
   -XX:MaxRecursiveInlineLevel=6:           5.072 s

For comparison

   fib.c with -O3: 3.764 s

edit: formatting and adding more data points

iainmerrick · on Sept 28, 2018

What’s the point of even mentioning the memoized versions? OF COURSE the naive recursive version is slow and it’s very easy to write something much faster in any language whatsoever.

xenadu02 · on Sept 28, 2018

$ swiftc -Ounchecked -g fib.swift $ time ./fib 2971215073

real 0m6.341s user 0m6.297s sys 0m0.023s

Using unchecked mode is closer to what C is doing (skipping bounds and overflow checks).

codr4 · on Sept 28, 2018

I would submit an entry for my own project, Snabl [0]; but it intentionally doesn't support the algorithm since naive recursion beyond a few levels doesn't make much sense in practice. I added a separate keyword to force tail calls since that's usually what you want.

[0] https://github.com/codr4life/snabl#function

FabHK · on Sept 28, 2018

A few observations:

- on my machine, C++ is slower than C.

- I thought that in Julia, startup and compile time could be a factor, but they're pretty negligible (maybe 1 to 5% or so of the runtime, for Julia 0.7).

- Without the -O3 switch, the runtime more or less doubles for C and C++, making these languages slower than Swift, Go, Java, Dart, Julia, etc. That surprised me.

olzd · on Sept 28, 2018

The -O3 switch removes a recursive call, among other things (https://godbolt.org/z/oS3Cju).

FabHK · on Sept 29, 2018

Interesting. (Awesome compiler explorer website, btw).

The Julia version (whose runtime is halfway between the C versions with vs without -O3) contains both recursive calls, FWIW.

shele · on Sept 29, 2018

Yeah, startup and compilation time in Julia is not a big deal. One could even make the compiler work a bit harder, but you would have to ask a lawyer if this still counts as recursion ;-)

    julia> function fib(::Val{n}) where n
               if n <= 1 return 1 end
               return fib(Val(n - 1)) + fib(Val(n - 2))
           end
    julia> fib(n) = fib(Val(n))
    julia> @time fib(46) # Compilation and execution
       0.095409 seconds

decentralised · on Sept 28, 2018

I decided to try this challenge using Solidity and made a little writeup comparing four implementations.

https://medium.com/@jpa_of_snc/fibonacci-in-solidity-8477d90...

terancet · on Sept 28, 2018

As for me, these benchmarks are strange because there are solutions that are not omptimized -- in node.js implementation the memoized approach has been used (which outperforms the recursion), but in Java solution (as an example) -- the recursive approach has been implemented.

thedufer · on Sept 28, 2018

I think the intent is that the main implementations are those in files like `[Ff]ib.*`. There are additional memoized implementations for some languages, but presumably these are only used in the "Optimized code that breaks the benchmark" section.

acangiano · on Sept 28, 2018

Elixir can be made much faster via tail call optimization.

Something like:

  def fibonacci(n) when n >= 0, do: fib(0, 1, n)

  defp fib(a, b, n) do
    case n do
      0 ->
        a

      _ ->
        fib(b, a + b, n - 1)
    end
  end

pmontra · on Sept 28, 2018

Agreed. Everybody uses TCO in Elixir. It's the standard way to write servers and store state. So it makes sense to use it the benchmark or it will be very un-Elixir.

avip · on Sept 28, 2018

You could do that in python as well:

  def fib(a, b, n):
      return a if n == 0 else fib(b, a+b, n-1)

You'll get x300 speed. The test cases are much more revealing without such constructions.

brightball · on Sept 28, 2018

There's a PR there to do just that. I don't follow the reasoning that it's not been accepted. It's also using Elixir script instead of precompiled Elixir.

https://github.com/drujensen/fib/pull/33

acangiano · on Sept 28, 2018

Yeah, I don't think this breaks any of the rules. We are still using recursion. Just leveraging what the language has to offer.

alangpierce · on Sept 29, 2018

The algorithm being benchmarked is the naive double-recursive fibonacci algorithm, which ends up performing over a billion addition operations in this case. The tail-recursive function above is the iterative bottom-up fibonacci algorithm that does it in about 50 addition operations. It's true that both use recursion, but the second is definitely not the tail-recursive version of the first.

It is possible to do a form of tail call optimization for the double-recursive algorithm, described here: https://news.ycombinator.com/item?id=18096804

This reduces the number of total function calls but not the number of additions (since it's not changing the algorithm, just the way the function invocation strategy when running the algorithm).

edoo · on Sept 28, 2018

Something isn't quite right. Nim 'compiles' into C code ready for compilation. It should not be faster than the raw C and C++ implementation. Frankly even the C/C++ implementation shouldn't be so different.

sambe · on Sept 28, 2018

I think Rust also does not have tail call optimisation. Nim and Crystal I'd heard why relying on underlying compilers, and could sometimes fail to get it (perhaps more than you'd expect using a compiler more "directly"?).

bjourne · on Sept 28, 2018

Sure but which languages are still there to calculate fib(100) and which bailed out?

jomoga · on Sept 28, 2018

For Haskell try:

-- Memorized variant is near instant even after 10000 memoized_fib :: Int -> Integer memoized_fib = (map fib [0 ..] !!) where fib 0 = 0 fib 1 = 1 fib n = memoized_fib (n-2) + memoized_fib (n-1)

or

fibM = \n -> values !! n where values = [fibAux m | m <- [0..]] fibAux n | n <= 1 = n | otherwise = fibM (n-2) + fibM (n-1)

or

fibM2 :: Int -> Integer fibM2 = \n -> values !! n where values = [fibAux m | m <- [0..]] fibAux 0 = 0 fibAux 1 = 1 fibAux n = fibM2 (n-2) + fibM2 (n-1)

Run the following at the ghci Haskell prompt:

memoized_fib 47

fibM 47

fibM2 47

If you want to wait try:

-- Traditional implementation of fibonacci, hangs after about 30 slow_fib :: Int -> Integer slow_fib 0 = 0 slow_fib 1 = 1 slow_fib n = slow_fib (n-2) + slow_fib (n-1)

serichsen · on Sept 28, 2018

The sport here seems to be which compiler manages to re-write the bad code best.

Tail call optimization is only marginally relevant here, since the tail call is the addition, not one of the recursions.

midgetjones · on Sept 28, 2018

I'd be interested to see how Elixir does with an OTP-optimised redo. Probably still not great, but I feel like the example isn't playing to its strengths.

rkangel · on Sept 28, 2018

OTP allows for work to be distributed across cores and get a speedup in some way proportional to that. It wouldn't be a fair comparison though, exactly like memoisation or constexpr.

It's also an option that isn't unique to Elixir. Go is the obvious other candidate where the programming language helps, but all of the languages have support for parallelism.

dnautics · on Sept 28, 2018

Otp won't help. The benchmark itself is too synthetic to really matter as an analog for real world use cases where you would deploy elixir (or go or swift for that matter if we're being honest). Elixir cares about developer ease, high up time, lots of connections (network I/o). Do you need these things? Then you shouldn't care about the benchmark.

obahareth · on Sept 28, 2018

The example was indeed not playing to its strengths, it wasn't making use of tail call optimization.

I made a PR to fix that: https://github.com/drujensen/fib/pull/33

thedufer · on Sept 28, 2018

That's obviously a better choice if you actually want Fibonacci numbers, but it defeats the point of a benchmark if you implement a completely different algorithm in one of the languages.

lelf · on Sept 28, 2018

    18> timer:tc(fun fib:main/0). 
    2971215073
    {59352962,ok}                        % 59s… slooow

    19> hipe:c(fib, [o3]).               % compile with Hipe
    {ok,fib}

    20> timer:tc(fun fib:main/0).
    2971215073
    {12570352,ok}                        % 12.5s!

PS that's Erlang, but it will be similar with Elixir.

chvid · on Sept 28, 2018

What is OTP? Fibonacci optimised by memorising already calculated values is linear time and constant time if you use the analytical formular for it. But then it stops being an interesting micro benchmark (as is pointed out on the website).

comboy · on Sept 28, 2018

What do you mean by redo?

AndyKelley · on Sept 28, 2018

Does Zig get a mention? https://godbolt.org/z/G0uvIU

alangpierce · on Sept 28, 2018

It would be great to see webassembly on here as well! (Maybe hand-written and compiled from a few different languages.)

bcherny · on Sept 28, 2018

Since we’re talking about recursion, it would be neat to add benchmarks for languages that encourage recursion over iteration (not top 10, but could be fun):

- Haskell

- OCaml

- Scala

- F#

- Clojure

- Common Lisp

rbjorklin · on Sept 28, 2018

Haskell and OCaml have already been merged :)

BooneJS · on Sept 28, 2018

A pretty fast way in Haskell/ghci[0] is:

λ: fibs = 1 : 1 : zipWith (+) fibs (tail fibs) λ: last $ take 47 fibs 2971215073

[0]: https://wiki.haskell.org/The_Fibonacci_sequence#Canonical_zi...

st1ck · on Sept 28, 2018

It got mingled without newlines (and "λ:" prompt doesn't help readability). In one line:

let fibs = 0 : 1 : zipWith (+) fibs (tail fibs) in fibs !! 47

Shorter but less readable:

fix (scanl (+) 0 . (1:)) !! 47

amelius · on Sept 28, 2018

What happened to the "Great Programminglanguage Shootout"?

igouy · on Sept 28, 2018

Google can be your friend :-)

https://www.google.com/search?q=Great+Programminglanguage+Sh...

amelius · on Sept 29, 2018

Yeah, I meant why is the project so quiet lately, and why doesn't HN refer to it more often.

igouy · on Sept 29, 2018

Updates continue as usual:

https://salsa.debian.org/benchmarksgame-team/benchmarksgame/...

HN does refer to it:

https://hn.algolia.com/?query=benchmarksgame&sort=byDate&pre...

As-always people crave novelty.

As-always it's easier to write simple 10 line programs rather than 100 line programs for 10 different tasks.

khebbie · on Sept 28, 2018

The rigth way to do fibonacci would probably be to add memoization...

spatulon · on Sept 28, 2018

Or turn it into a simple iterative process, instead of a recursive process. They explain how quite simply in SICP:

http://sarabander.github.io/sicp/html/1_002e2.xhtml (Ctrl-F "We can also formulate an iterative process for computing the Fibonacci numbers.")

Here's a Python implementation:

    def fib(n):
        a = 1
        b = 0
        i = 0
        while i < n:
            temp = a
            a += b
            b = temp
            i += 1
        return a

With that, fib(100000) takes half a second to compute on my machine.

Al-Khwarizmi · on Sept 28, 2018

The "right" way to do Fibonacci is to use matrix multiplication to get the nth Fibonacci number in logarithmic time (google Fibonacci log n matrix).

braythwayt · on Sept 28, 2018

A matrix implementatiom in JavaScript:

http://raganwald.com/2015/12/20/an-es6-program-to-compute-fi...

It is based on a Ruby implementation:

http://raganwald.com/2008/12/12/fibonacci.html

toolslive · on Sept 28, 2018

Why not calculate it in constant time ?

https://artofproblemsolving.com/wiki/index.php?title=Binet%2...

braythwayt · on Sept 28, 2018

As mentioned in other comments, working with floating point numbers in practice is trickier than it looks in theory:

http://raganwald.com/2013/03/26/the-interview.html

edflsafoiewq · on Sept 28, 2018

You can't really compute it in constant time since the number of bits in the nth Fibonacci number is O(n), so you need to take at least that long just to write the result out.

Computing with Binet's formula is also rather tricky. You just need to round φ^n/√5, but how many bits of √5 do you need to use?

toolslive · on Sept 28, 2018

true: adding 2 64 bit numbers is constant time for me, but adding 2 4096 bit numbers is not. Eventually, even the simplest operation becomes O(ln n)

khebbie · on Sept 28, 2018

But I suppose not adding memoization will reveal the real performance of the language

khebbie · on Sept 28, 2018

Just not idiomatic code

stephc_int13 · on Sept 28, 2018

This is bullshit. IMHO, the recursive pattern is one of the worst constructs, this is really bad engineering and should not be used for benchmarking...

codr4 · on Sept 28, 2018

So you're saying never use recursion? Or never benchmark it? I don't get it, it's a tool; a pretty good tool when combined with tail call optimization. Yes, the algorithm they use is naive; I believe that's part of the point.

firic · on Sept 28, 2018

Why is python 2.7 faster than 3? I thought one of the big reasons to switch to 3 was because of performance

kec · on Sept 28, 2018

Where did you hear that? 3 hasn’t even been competitive vs 2 until very recently.

Reasons for the switch have always been better Unicode handling and not being left behind as the community matches on (and soon lack of security updates to 2).

syockit · on Sept 28, 2018

Something to do with python 3 using long† integer as the default int, which can be arbitrarily long, so it may have slower arithmetics.

† not to be confused with the typical C long int type.

KirinDave · on Sept 28, 2018

I'm just going to put this out there: this benchmark is silly, and only good for making languages with specific types of calling styles look good. It doesn't predict any real world performance. It doesn't speak to real world optimizer outcomes. It really just measures how shallow your call abstraction is. Several languages here perform much worse than equivalent code because the way the language interprets and executes naive recursion is different.

I've seen this game before, so the very first place I looked was the Haskell version. Sure enough, it doesn't even try to force the call graph at all. There's even a section entitled, "breaks the benchmark" and it doesn't note Haskell or other languages with different call semantics are not doing the same thing at all. It just says Haskell doesn't terminate.

Confusingly, there is then a "mem" benchmark which seems to ask "What happens if we do this with even the vaguest damn about algorithmic complexity?" But these are so all-over-the-map they don't even come close to measuring the same thing and have different memory usage.

What's frustrating about this is that the double-uncached is always the wrong way to write this code. It's never good, it really doesn't benchmark anything real world. It's not even a very good compiler benchmark because many optimizers actually go under the hood and rewrite code that is in this obvious style to be something else entirely, just to do better on these benchmarks.

For the Haskell benchmark as an example, the best way to to write this lazily is to write a function that consumes and drops a list like so:

    fibF n = head $ drop n fibs
      where
        fibs = 0 : 1 : next fibs 
        next (first : rest) = (first + head rest) : next rest

    -- Or using a library function most Haskell Fp devs know.
    fibZip :: Int -> Int
    fibZip n = head $ drop n fibz
      where fibz = 0 : 1 : zipWith (+) fibz (tail fibz)

This is fast (it gets fused down to essentially a for loop in Haskell, and others like Javascript & Clojure can use this technique for a memory-efficient approach) it's very straightforward, and it's also completely outside the world of something you can write naturally in C or Java (there is a natural Golang expression, but I don't see people using it).

This approach isn't even particularly fair to other languages that are good at producing performant machine code, like Rust.

These games are not really indicative of anything. They waste energy. Folks should understand what their language runtimes and compilers are capable of, rather than asking, "How well does this fare on the worst possible algorithm for an operation who's semantics are only loosely defined?"

enginaar · on Sept 28, 2018

isn't there supposed to be javascript results?

nhumrich · on Sept 28, 2018

Its listed under "node"

enginaar · on Sept 28, 2018

oh i see, thanks

1mike12 · on Sept 29, 2018

which is "broken" due to V8's optimizations unfortunately

arisAlexis · on Sept 28, 2018

so node 18 and python 500? really?

oweiler · on Sept 28, 2018

Lua would probably outperform most languages on the list.

shakna · on Sept 28, 2018

I couldn't find a list of the hardware used in the benchmarks, so comparing is difficult, though before testing, I'd lean towards agreeing with you. Luajit is often on par with C, D or Go.

However, as C was one of the faster, I'll use it as a comparison.

fib.c, compiled with 03: 10.49user 0.28system 0:11.62elapsed

    #include <stdio.h>

    long fib(long n) {
      if (n <= 1) return 1;
      return fib(n - 1) + fib(n - 2);
    }

    int main(void) {
      printf("%li\n", fib(46));
      return 0;
    }

Lua:

    function fib(n)
      if n <= 1 then
        return 1
      else
        return fib(n - 1) + fib(n - 2)
      end
    end

    print(fib(46))

Luajit: 48.66user 0.11system 0:52.95elapsed

Lua 5.3: 717.21user 8.40system 14:19.47elapsed

As Luajit was so much slower than C for this, which can be somewhat surprising.

Luajit would probably beat Ruby, for it's interpreted crown, but without optimisation, it won't beat the big boys.

I would say, that Lua isn't a good fit for solving this kind of problem, with these constraints, because every function call requires a hash lookup, which is irritating.

Of course, you could use Luajit's FFI to use C's implementation, which would be somewhat faster. Or expose the C implementation as a Lua library.

However, Lua is probably also a really good fit for memoization, and other techniques like that.

    local nums = {}
    local fib

    fib = function(n)
      if n <= 1 then
        return 1
      else
        if nums[n] then
          return nums[n]
        else
          nums[n] = fib(n - 1) + fib(n - 2)
          return nums[n]
        end
      end
    end

    print(fib(46))

This is a fairly naive implementation, but has the same final result as the previous examples... And 'time' is unable to measure how fast it is (Both Luajit and Lua5.3). For all intents and purposes, it's instant.

drujensen · on Sept 28, 2018

Hardware used is listed at the top of the readme. https://github.com/drujensen/fib/blob/master/README.md

saagarjha · on Sept 28, 2018

In what way? Are you saying that Lua would be faster than the assembly that the processor executes to run the Lua program?

octonion · on Sept 29, 2018

I have a similar GitHub repo, but with a more complicated example:

https://github.com/octonion/puzzles/tree/master/blackjack

super_mario · on Sept 28, 2018

Using generators in Python you can do this much much faster :D.

    #!/usr/bin/env python

    """
    Calculate fibonacci numbers using a generator
    Usage: fibonacci.py [options] <N>

    Arguments:
        <N>             Print all Fibonacci numbers up to <N>-th

    Options:
        -h, --help      This help

    """

    from docopt import docopt

    def fibonacci():
        """generate fibonacci numbers"""
        a, b = 0, 1
        while 1:
            yield a
            a, b = b, a + b

    if __name__ == '__main__':
        try:
            args = docopt(__doc__)
            fib = fibonacci()
            for i in range(int(args["<N>"])):
                print fib.next()
        except ValueError:
            print "You must specify valid integer"
        except KeyboardInterrupt:
            print "Good-bye"

You get something like

    $ time ./fibonacci.py 46
    0
    1
    1
    2
    3
    5
    8
    13
    21
    34
    55
    89
    144
    233
    377
    610
    987
    1597
    2584
    4181
    6765
    10946
    17711
    28657
    46368
    75025
    121393
    196418
    317811
    514229
    832040
    1346269
    2178309
    3524578
    5702887
    9227465
    14930352
    24157817
    39088169
    63245986
    102334155
    165580141
    267914296
    433494437
    701408733
    1134903170

    real	0m0.038s
    user	0m0.015s
    sys	0m0.018s

danbruc · on Sept 28, 2018

Recursive fibonacci benchmark - you are doing something different in which case you could just evaluate the closed form φⁿ / √5 and be even faster.

super_mario · on Sept 28, 2018

I'm perfectly aware it's meant to be recursive, but it's also a completely pointless test. It's not tail recursive, so you are measuring function call overhead in various languages. But various languages have options to perform much faster anyway, but solution is of course going to be language specific.