Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mersenne Twister PRNG for JavaScript (github.com/jawj)
10 points by gmac on May 26, 2024 | hide | past | favorite | 28 comments


I’m always surprised to see the disproportionate levels of attention the Mersenne Twister seems to get.

It’s not a particularly noteworthy PRNG. It’s not all that fast. It’s not simpler to implement. It’s not suitable for cryptography.

About the only interesting property I’m aware of is the comically large period, but 2^19937 is no more useful than 2^64 for almost all non-cryptographic workloads and 2^256 is thermodynamically sufficient until we start exploiting the multiverse. Past that is just wasting memory and CPU cycles to no meaningful end. Plus, having a large period doesn’t necessarily indicate quality.

So what makes this particular PRNG so weirdly sticky? Is it just the cool name?


yes! I'm gonna die on the hill called "it's only popular because of its cool name". Like a few other curious things in tech. Maybe I should make a site collecting them.

MTs state is also comically large, making it very slow to initialize - which has been a recurring source of performance problems in my professional life.


Came here to say that same thing. Why did you steal my thoughts? :D


It had been "good" when the paper had been published in 1997.


Best answer I can find is ‘nobody ever got fired for implementing the Mersenne Twister’.

I made this JS library a few years ago when I needed some reproducible random numbers, and I think I picked the Mersenne Twister over others just on the thinking that if it’s used by so many other languages and applications it can’t be too bad a choice.

Edit: it is also a cool name, but I think maybe ‘unremarkable’ is exactly what many people are looking for in a PRNG.


If this page is correct, the most important JS engines (V8, Firefox and Safari) all use xorshift128+ https://prng.di.unimi.it/#intro, which is faster than MT and doesn't fail the linear test of BigCrush but the last 32bits do fail other tests of BigCrush: https://lemire.me/blog/2017/09/08/the-xorshift128-random-num.... So, depending on your usage MT may (if you're using the last bit as a boolean) or may not be "better" than JS' Math.random, but of course, you can't set the seed by yourself.


> Edit: it is also a cool name, but I think maybe ‘unremarkable’ is exactly what many people are looking for in a PRNG.

Sure, but at that point you can pick a modern option (PCG, xoroshiro, whatever) which are implementable in four or five lines of straightforward C. They’re simple enough to memorize.

From a very reasonable perspective, MT is remarkable in terms of how large, slow, and complex it is. I think the most compact implementation I’ve seen is pushing 50 lines, with the only way to audit it being carefully walking through the spec and reference implementation.


> So what makes this particular PRNG so weirdly sticky?

Reproducibility, on both client and server side. Easy to implement in pretty much all scripting and programming languages.


It is dramatically harder to implement than most modern PRNGs. The PCG and xoroshiro+ families are like four or five lines of code. You could memorize them if you wanted to.

Mersenne Twister is anywhere from 50-250 lines and I doubt anyone could reproduce it accurately without consulting the spec or a reference implementation or both.


Reproducibility does not mean implementability.

What I meant with reproducibility is reproducibility of state, meaning that the seed produces the same result on both the frontend and backend side which doesn't mean that it is easy to do across different programming languages.

> 50-250 lines

Usually you try out a library or existing package first before you start to implement anything.

That's how you develop stuff outside the C++ world :) dismissal intended.

I think that MT19937 has a good balance between being easy to implement across various programming languages and type systems while also not necessarily being cryptographically secure. It's somewhere in the sweet spot of "good enough" for most use cases while also being a seed based randomizer.

I also would not assume that frontend developer jobs come with mandatory cryptography knowledge criteria, which implies that there must be an available implementation across a lot of programming language ecosystems to get the same traction.


> Reproducibility does not mean implementability.

You literally said it was “Easy to implement in pretty much all scripting and programming languages.”

Reproducibility from a seed is a basic feature of essentially every non-cryptographic PRNG. It’s so fundamental to the problem domain it’s hard to even call it a “feature”.

> Think of e.g. an online casino where every client can potentially cheat

The solution to this is a server-side CSPRNG, not hoping your client generates predictably random numbers. If no party can trust any other, there are other solutions, none of which involve synchronizing non-cryptographic PRNGs.

> I think that MT19937 has a good balance between being easy to implement across various programming languages and type systems while also not necessarily being cryptographically secure.

It isn’t, and this simply isn’t a reasonable or informed thought. It is strictly and inarguably more complicated by a factor of ten than other PRNGs which are also faster, use less memory, and generate more uniformly random output. MT has zero advantages over the alternatives. I don’t say this lightly. There is no “balance” it strikes. It is worse along all axes.

If you need a CSPRNG, use a CSPRNG. If you don’t, use something other than MT. Please just let MT die.


Maybe you should start implementing other PRNGs then across various package management systems and programming languages to promote your cause?

If you keep ignoring 90% of my opinion then I don't wanna discuss this with you anymore either. I never said MT is superior, it's what you implied.


Better PRNGs are widespread and available in literally every developer ecosystem. This isn’t the ‘90s.

If PCG isn’t available for yours, it is trivial to reimplement.

    typedef struct { uint64_t state;  uint64_t inc; } pcg32_random_t;
    
    uint32_t pcg32_random_r(pcg32_random_t* rng)
    {
        uint64_t oldstate = rng->state;
        // Advance internal state
        rng->state = oldstate \* 6364136223846793005ULL + (rng->inc|1);
        // Calculate output function (XSH RR), uses old state for max ILP
        uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
        uint32_t rot = oldstate >> 59u;
        return (xorshifted >> rot) | (xorshifted << ((-rot) & 31));
    }
That’s it. That’s the whole thing.


You are aware that you don't need to mansplain the implementation to me, right?

My argument, from the beginning, was about maintainability from a company's perspective.

Also you do realize I am not a user of MT, right?


> My argument, from the beginning, was about maintainability from a company's perspective.

Better mansplaining than gaslighting I suppose. Your argument, from the beginning, was (quoted in full): “Reproducibility, on both client and server side. Easy to implement in pretty much all scripting and programming languages.”

None of your actual original argument, nor what you’re now claiming it was, nor the ones inbetween hold water.

It’s reproducible like every non-CS PRNG. It’s not simple, or simpler than alternatives. It doesn’t strike a “balance” between simplicity and being cryptographic (which is frankly a really weird thing to say; being cryptographic isn’t a spectrum). It’s not CS, and so it competes against the modern alternatives which are strictly simpler.

I mean I’m genuinely sorry that that you’re misinformed here, but that’s on you to reconcile. What’s the point in dying on this hill?

> You are aware that you don't need to mansplain the implementation to me, right?

If we’re being blunt and rude, it seems pretty obvious you did need this.

> Also you do realize I am not a user of MT, right?

In what way is this relevant?


Seeing your comment has been edited (again) at least 10 times the last 15 minutes, changing your argument, structure and content a lot that you want to criticize...the only thing I have left for you is this:

If you learn to argue in echo chambers, you won't grow on a personal level. That's not how the real world works, and not in any job profession.

Nobody in their right mind will say "you won that argument" if the opponent posts the implementation code as an argument that is supposed to win over the "lack of maintainability" argument before that. If you don't understand why there are package managers in the first place, then maybe you should ask someone _why_ they are so successful?

I hope you learn to love the embedded development world, because you, right now, seem to be very unhappy in it. Otherwise you wouldn't be so enraged over a very simple argument like this one.

I wish you the best. Maybe at some point in the future you will learn to differ critique on a subject from critique on a personal level. This was never a personal attack, and I thought I made that very clear.

If you interpreted this as a personal attack, then I guess I am sorry? I won't change my previous comment because I think that is unfair behavior.

The argument MT vs PRNG XYZ will never be won with implementation code as "proof that it is really better". I am not sure you understand that.

There is no point in "dying on this hill". Opinions change over time, there is never a right or wrong, ever, which will persist. _Everything_ is a tradeoff.

If you still think that, you haven't been coding long enough to see how much worse your own code was when you were younger.

Implementations change, bugs are found, problems are discovered and fixed, knowledge changes. Just look at the PIKE/SIKE fiasco the last years. The only thing that persists is the constant maintenance burden.

And I don't want to have to maintain a randomizer, because it's the most unnecessary thing to maintain with the most collateral damage if it's ignored.


I’m not enraged, I’m just baffled that someone would come in to a conversation, state things that are wrong, then dig in their heels rather than say “ah yeah I see”. And then keep changing what their “original argument” was in a thread where you can literally scroll up to see what it was.

And sorry for the edits, it’s usually not a problem since the time delta between a reply and and it being read it is usually large. I’m just never satisfied with what I’ve written, so I edit until I am. Maybe day I’ll learn to do that before I post.

> This was never a personal attack, and I thought I made that very clear.

You were the one that brought a personal attack into the conversation. Sucks to be on the receiving end, yeah?

> The argument MT vs PRNG XYZ will never be won with implementation code as "proof that it is really better". I am not sure you understand that.

It sure as hell is a convincing argument that it’s simpler to implement than MT.

> If you still think that, you haven't been coding long enough to see how much worse your own code was when you were younger.

25 years in to my career, thanks. I regularly think this about code I wrote last week. I’m missing the relevance.

> knowledge changes

It does. I used to be a fan of MT in the nineties when I was young. It sounded cool and it wowed with a huge period stamped on the tin. I updated my knowledge as time passed.

> And I don't want to have to maintain a randomizer, because it's the most unnecessary thing to maintain with the most collateral damage if it's ignored.

Nobody is asking you to. You’re the one that jumped in to defend MT as a reasonable choice. It isn’t. Progress has happened and this hasn’t been true for a very long time.


what are the faster / more effective alternatives?


That depends on what your use case is, but because we're considering MT we can assume there's nothing remotely related to cryptography so there are plenty of options.

In practice people seem to have adopted the various xorshift style generators - certainly IIRC from back when I worked on JSC (and so also paid attention to other JS engines) we all adopted xorshift128+ for all unsafe RNGs (Math.random, but also internal entropy used in mitigations and similar), though the general family is "xorshift"[1]. There are a few others, and for me this was a decade+ ago so maybe there have been improvements since - xorshift does fail some of the diehard tests but so few as to be acceptable for the overwhelming majority of cases.

There days I believe that the PCG RNG[2] family might be a better choice? They claim to be both faster and statistically better, but I'm guessing not by enough for JS engines to care (RNG performance was a 10-20 years ago as the increasing perf of the rest of the engines meant the cost of Math.random() in benchmarks started being a problem lol - JSC at the time used two calls to the system arc4random(), which required call overhead and the system implementation required locking, but it was cryptographically secure! at least on Mac+iOS where it is not actually RC4).

Very very big NB: If you're wanting random numbers to feed into _any_ part of _anything_ cryptographic (keys, nonces, padding, ...) use the OS provided secure RNG, do not try and do it yourself - not just because of "it's real easy to get it wrong", but because the system APIs automatically leverage hardware RNGs automatically when possible (the semi-mythical "true random").

[1] https://en.wikipedia.org/wiki/Xorshift [2] https://www.pcg-random.org/index.html


Depends what you need and how/where you implement it. For example, if it should pass BigCrush https://en.m.wikipedia.org/wiki/TestU01 or PractRand https://github.com/MartyMacGyver/PractRand

Just a page with SmallCrush and BigCrush examples, not to be taken as an endorsement for their PRNG: https://www.pcg-random.org/statistical-tests.html


If you don't need cryptography secure rngs, PCG is pretty cool.

It's pretty straightforward and builds off basic primitives, but the results has some neat qualities/features if you find you need them. E.g. People often find that they don't want pure randomness, but rather want some kind of uniformity//equidistribution over some period, which implies state (I.e. Shuffling a music playlist).


This needs "It is high time we let go of the Mersenne Twister" https://arxiv.org/abs/1910.06437

    This paper surveys these results for the non-specialist, providing new, simple, understandable examples, and it is intended as a guide for the final user, or for language implementors, so that they can take an informed decision about whether to use the Mersenne Twister or not.


That’s incredibly helpful. I’ll link that from the README and think about implementing something better.


Maybe just add that the author is also the author of the xoroshiro* PRNGs, which have their own problems. So his endorsement of using these is to be talken with a grain of salt: https://www.sciencedirect.com/science/article/pii/S037704271...


Here is a paper about BigCrush, where its implementors test various known PRNGs: https://www.researchgate.net/publication/220492690_TestU01_A... The interesting part is "7. TESTING WIDELY USED RNGS", Mersenne Twister is "MT19937" in the table.


I've seen a Mersenne twister in the wild, implemented in Excel VBA. The small differences in numeric behaviour caused the author to re-implement basic numeric operations like additions.

It was used in a Mone Carlo simulation implemented on data in excel sheet. The thing was 4K lines of VBA and was extermely slow: It did its simulation over 8 hours for a few 100K iterations.

Someone had given it to a 'High performance excel developer' company (!) and they ran into their limits. I reimplemented it in single threaded C in the evening, and it did its thing in seconds. But that was not acceptable, as the end user could not adapt the VBA anymore.

So I dumped only the Mersenne twister in a DLL, imported that in VBA, and it gained 10% speed, basically all time spent in the RNG. I suggested moving other parts to C, but they didn't want to break ties with the excel programmers and nothing came from it.


Mersenne twister implementations have cost millions in theft, and will continue to.

https://milksad.info


Using any PRNG where you should be using a CSRNG will result in very bad things happening. A bit unfair to pick on the Mersenne Twister.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: