Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Sometimes, rewriting in another language works (espadrine.github.io)
56 points by espadrine on Jan 29, 2022 | hide | past | favorite | 46 comments


I'm amused that the author reached a satisfactory result of "only" 7 hours, just to later realize that they forgot to compile their Rust code with optimizations thus bringing the runtime down to 30 minutes. I've never used Julia, but unoptimized Rust binaries contain such naive and suboptimal codegen that I assume that something must have been wrong with their Julia code; my ballpark for the performance of unoptimized Rust code is that it's roughly on-par with interpreted (not JITted) Ruby or Python (parallelism notwithstanding, which even without optimizations bears much more fruit in Rust than in Ruby/Python).

Still, at the same time, we can judge a language based on how easy it is to fall into the pit of success rather than the pit of failure; if their Julia code had a problem, then maybe the Julia developers could tweak some defaults (or change some documentation) so that people in the future do not fall afoul of the same result. The same sort of thing exists in Rust as well, where the fact that I/O is unbuffered by default makes it perform poorly in microbenchmarks that involve printing a zillion times in a loop: https://nnethercote.github.io/perf-book/io.html#buffering


The biggest performance trap they had was copying all their strings in a really hot loop to a vector of characters. I'm not sure what we could do to steer people away from that...


Do you have any examples where unoptimized rust is in the ballpark of interpreted Python? That seems extremely slow even for carelessly written code.


I've been teaching people to use Rust since 2011. It is quite common for someone coming from an interpreted language background to not realize that they have to manually ask for optimizations when building their code. In such circumstances they often come asking for help saying "I must be doing something wrong, I rewrote this Python code in Rust and it appears to be slower somehow?" For a long time on the /r/rust sidebar we even had a note saying "before asking for help, have you tried compiling with --release?" And indeed, once they turn on optimizations their code goes from as slow as Python to as fast as C. The OP's case is completely representative; a 14x improvement merely by compiling in release mode is entirely in line with what I have observed.

This may be surprising to people coming from C or C++, where unoptimized code is slower but not that slow. But the point of Rust is that it's selectively provided as many zero-cost abstractions as it can, which when paired with a modern sufficiently-smart backend like LLVM boils away those abstractions into nothing. It's a great feeling the first time that you crack open the assembly output for a highly abstract chain of iterator/closure chains only to find that the whole shebang has been reduced to a handful of fixed-form arithmetic operations without a loop or a function call in sight. But the tradeoff is that you do have to perform the step of boiling those abstractions away.


This is not new problem actually, when IBM did their PL.8 research compiler for RISC, and safe systems programming, they took exactly the same approach as Rust, just in 1982.

Quite interesting paper, in case you haven't read it, "An Overview of the PL.8 Compiler"

https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.453...

Unfortunately despite its internal success they decided to go with Aix for RISC.


I don’t think there are C compilers out there that do not optimize at all.

AFAIK, all of them will do some register assignment, some constant folding, etc.

For example, if you write

  void f(int a)
  {
    int b = a;
    b = b + b;
    g(b);
  }
are there compilers out there that compile that to

  - load a
  - store into b
  - load b
  - load b into another register
  - add
  - store into b
  - load b
  - call g
  - return
?

Languages that do overflow and range checks such as Julia and Rust in debug mode would have to add lots of them there if they do not do any optimization.


I like Rust a lot but this is a big frustration with it. I have a hard time seriously imagining Rust taking off in game development when debug code is so bad, at least without some style guidelines that avoid the worst problems with it.

I'm glad Zig is providing some competition here that's more in line with what one would expect from C.


There is also a trick to compile your own program without optimizations, but optimize all dependencies.

I too had the "Un-optimized Rust code is slow!" experience when I started out, because my first project required PNG encoding and decoding, and the png crate was incredibly slow without optimizations.

But it's typical for C and C++ to be un-optimized at the level of `main`, but calling into heavily-optimized system libraries like ffmpeg.


One promising avenue of Rust development is that, while --release implies -O2 and --debug (the default) implies -O0, the -O1 case hasn't received a lot of attention. In the future it seems like the contexts you mention would benefit from such a sweet spot.


One of the early advent of code challenges triggered this situation for many people (including myself). The same program compiled in release mode was more than 4000x faster than debug mode.

https://media.discordapp.net/attachments/386246790565068811/...

Although it's worth mentioning that LLVM understands math and might be replacing an inefficient sum calculation with an efficient algorithm, so it might not be all down to codegen.


Debug Rust code is still probably 10 times faster than CPython.


That’s true, I didn’t realize the gap between debug and release Rust optimizers. Most of what I do in Rust is not sensitive to the extent that debug performance is an issue.

A tradeoff in the separate direction, though, is compilation performance. Currently the master branch takes 15 minutes to compile in release mode. It’s probably due to the hardcoded root choices, and I assume compilation is faster if they are loaded from a string instead of a structure, but it is a bit of a gotcha.


Surely you don't mean the code in the OP takes you 15 minutes to compile in release mode; it's a 100-line file with no dependencies, and no macros or generics to hide codegen behind, and can't take more than an instant to compile. To get to a 15-minute-long release build you'd either need a very large amount of code or you may be attempting extensive type-level programming that rustc was not designed to support. You may also want to try swapping out your linker; ditching the system linker for gold or lld can have dramatic results if you're bottlenecked on the link stage.


Not the code linked from the article; the goal of that code was to generate the scores of the guesses at the root of the search tree, so that I could hardcode them in.

The commit which hardcodes them in is the one that starts having slow compilation: https://github.com/espadrine/optimal-wordle/blob/2e71cb4ca46...

It is possible that there is a recommended way to do it differently which I missed. I tried lazy_static!, but ended up having to fight the type system, and the related GitHub issues didn’t bring me hope that I could overcome it easily.


Interesting, you appear to be hitting some kind of pathological case with the `vec!` macro. Apparently it doesn't like being used with a 15,000-line literal. :P Fortunately you're right, there's a different way to do this, which AFAICT doesn't suffer from the same pathology. I replaced this:

    pub struct Choice {
        pub word: String,
        pub avg_remaining: f64,
    }

    pub fn root_choices() -> Vec<Choice> {
        vec![
            Choice { word: "roate".to_string(), avg_remaining: 60.42462203023758 },
with this:

    use std::borrow::Cow;

    #[derive(Clone)]
    pub struct Choice<'a> {
        pub word: Cow<'a, str>,
        pub avg_remaining: f64,
    }

    pub const ROOT_CHOICES: &[Choice] = &[
        Choice { word: Cow::Borrowed("roate"), avg_remaining: 60.42462203023758 },
and it brought the time of `cargo clean && cargo build --release` down from 345 seconds to 13 seconds. I consider this a compiler bug, I'll file it if I can't find an existing issue.


Oh, thanks! That is exactly what I wanted; a global constant makes more sense for this use-case.


> my ballpark for the performance of unoptimized Rust code is that it's roughly on-par with interpreted (not JITted) Ruby or Python

That's... very surprising; And it's hard to take this seriously unless you provide some justification for this claim.


The problem with the Julia version is just that it collects a word to a list of characters every time it calls match-constraint. Removing that one line makes the Julia version 5x faster.

function match_constraint(word::String, constraint::Constraint)::Bool

  if constraint.type == has_no

    constraint.letter ∉ word

  elseif constraint.type == has_but_not_at

    (constraint.letter ∈ word) && (word[constraint.index] != constraint.letter)

  elseif constraint.type == has_at

    word[constraint.index] == constraint.letter

  end
end

This is also what the rust version does.


It does the same on line 22. But I'm not at a computer to see if addressing it there as well will have a comparable performance improvement.


That one doesn't matter nearly as much (I couldn't detect a performance difference). match_constraints isn't called nearly as frequently.


> I felt I could rewrite it in Rust within two hours.

I'm not sure what to conclude given the title of this post. Rewrites often improve things greatly. But a 2 hour time investment is not what anyone is talking about when they repeat mottos like "never rewrite."


Also, this is especially funny since I found the change for the Julia version that makes it 5x faster in approximately the time it took me to read the code.


That’s fair! It doesn’t go near the ballpark performance I was going after, though.

It’s hard to assess to which extent the performance tweaking vs. rewriting tradeoff scales up. For a larger project, the rewrite could take a month; but maybe, diminishing returns helping, the performance tweaking would plateau short of a month.


https://discourse.julialang.org/t/rust-julia-comparison-post... seems to be about 2x faster than the (optimized) Rust version with some minor tweaks to the Julia code so perhaps that is more in your ballpark?


Absolutely, that is what I was hoping for!

So, vcat() instead of append!(). Darn. Faith in Julia restored. I’ll update the article.


I bet this guy really wants to know how to do it in Julia, and is just taking the "Linux sucks because you can't do X the way you can in Windows" route to getting the answer :)


There is a discussion here:

https://discourse.julialang.org/t/rust-julia-comparison-post...

So far, a bug fix improves it 4700 min -> 150 min, changing the type stored -> 20 min, further optimisations -> 15 min.


Ha, the best way to optimise your code is to post about how another language can do it faster.


Nim is a better fit for scientific computing. Reason being excellent C interfacing and easy to learn syntax. Fast, lean executable and compile time is a factor too. There is a community page with available projects https://github.com/SciNim


The comments about V8 don't resonate with me. Sure, V8 is pretty damn fast for a Javascript VM, but that's a really low bar considering how utterly unoptimizable Javascript is. Overall, I have a lot of trouble believing that javascript would be a reasonable contender here compared to Julia. Language semantics matter.


Changing it to operate on &[u8; 5] instead of &str knocks about 60% off the runtime.


I like the ideas behind Julia, but the whole "Julia is performant" mantra without context has to stop.

Non-optimally written julia code is as slow as non-optimally written code in any other language. Simply expecting code to be super optimised just because you wrote it in Julia instead of language X, and putting zero effort in taking advantage of Julia's advanced features that actually result in that optimality is just plain silly.


Obviously the meaning of "Julia is performant" is that Julia CAN be performant in a way that Python can't. Of course it doesn't mean Julia will write fast code for you, how could it possibly mean that?

Edit: This is like saying people has to stop saying Ferraris are fast, because you can still drive slow in them


No it is not. It's like saying a Ferrari in third gear will be faster than a Fiat in third gear, because Ferrari. Clearly this is not the same as saying that "therefore a Ferrari is as good as a Fiat", because obviously Fiat only goes up to gear three whereas Ferrari goes up to 7. But chant the "Ferraris are always faster" mantra without this context, and eventually all but your most technical users will be the kind of user who sticks to the third gear that they're used to, and still expect a Ferrari to be fast.

You want your Ferrari to go fast? Drive it like a Ferrari.

Also, Python absolutely can be as performant as Julia, if not more. The point is that Julia has language-builtin features that were created with this performance in mind, whereas for Python you have to go explore the relevant package ecosystem and go down a numba/cython/pypy/etc route.

And, yes, if you bragged how a Fiat could never beat your Ferrari in a race, and you agreed to race me against my modified Fiat with external turbo engines and I beat the crap out of your Ferrari, your "well that wasn't really a Fiat" complain would be valid, but only so much. Especially if Fiats were super-moddable by nature.


Eagerly awaiting matrix libraries written in pure Python that outperform BLAS.



The fact that there is a competitive candidate replacement native in Julia (Although a few man years in engineering time wouldn't hurt) is telling.


"written in pure python" is just moving the goalposts.

There's no need for them to be "written in pure python". They just need to be seemlessly usable from within it.


Your argument is that “Python can call fast code written in other languages”? That is true, of course, but if that’s your definition of Python being fast, then it makes the notion of one language being faster than another meaningless.


You see, this kind of response is precisely the problem with the Julia community and the performance fallacies they present. What does this post have to do with Python? You claim to be the fastest language, and if someone questions it, you only mention how fast it can be compared to pure looping Python. It's a Machiavellian promotion strategy that has so far worked for Julia, but it isn't going to work any longer.


Hmm, I would be interested to see where exactly it is that these Julia advocates have claimed it to be "the fastest language"


Start with the false claim of JuliaCon21 banner, "Julia is the fastest high-level programming language", then go to the Julia Discourse and read all the false reports on the internet claiming 3X, 300X, 3000X faster than Fortran/C/Python/... Having a business to run does not justify making false claims.


The JuliaCon 2021 banner can be found here (https://juliacon.org/2021/). It says "JuliaCon 2021 (with JuMP-dev) was online and virtual 28th to 30th of July, 2021. Check out all the recorded talks on Youtube". A google search query for the phrase "Julia is the fastest high-level programming language" finds no hits.

https://www.google.com/search?q=%22Julia+is+the+fastest+high...

Please have some intellectual honesty or share some links as apparently you know where to find some things that Google does not know about.


It turns out the claim is even is worse than what I remembered. It claims to be the fastest high-performance language. What a shameful Machiavellian act by JuliaComputing for lacking intellectual honesty. https://www.linkedin.com/events/juliacon2021-freeandvirtual6...


Search the LinkedIn ads in summer 2021.


well the Julia version runs faster than rust once the bugs were fixed, and a slightly more optimized Julia version finished in under a second.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: