What I am big on is forcing developers to make deliberate choices. That's why I like React's policy of naming functionality "dangerouslySetInnerHTML" or "__SECRET_DOM_DO_NOT_USE_OR_YOU_WILL_BE_FIRED".
If you add usages for these in a PR I'm reviewing without justification, it's not getting merged.
So why not make cryptographically unsafe random unsafeRandom() or shittyRandom() or iCopyPastedThisFromStackOverflowRandom()?
This assumes writing crypto code is the most common use case for random numbers.
How often do you write crypto code?
vs
How often do people use random numbers + threshold for A/B tests? How often do game developers use random numbers for gameplay variety? How often is random used for animation variety? Do these use cases need the overhead of a cryptography RNG?
A former employer had the same issue as in the article - the security team implemented an automated vulnerability scanner in our github enterprise instance, and it spammed comments and marked a review as requiring changes if it edited any merge request which touched a file which used java.util.Random. It lasted a day before the security team was made turn it off as on our team (and many others), literally 0 uses of random numbers were those requiring a secure random.
Making one specific aspect of incompetent crypto coding “safer” doesn’t solve any of the problems.
We can argue about what threshold you need to reach to be considered “capable” of writing crypto code (that’s not just a learning exercise), but not knowing the deficiencies of random() is clearly well below that bar. Even knowing that is barely past the “don’t eat your crayons” level of skill.
If anything, a hook in your CI pipeline that automatically fires anybody who checks in code using random() in a cryptographic context is a better way to “make all of us safer”.
You could be a great mathematician and not realize what kind of random you're picking. It's not as simple as you put it.
This discussion goes for other stuff too. Rust is getting popular because even amazing C and C++ devs make terrible mistakes that cause severe security and privacy problems down the line.
Just because someone is a great mathematician does not mean they would be a great cryptographer. Mathematics and cryptography are about as related as computer science and cryptography (not that much).
Not sure what you have in mind when you say guids, but the word guid doesn't confer any info here. Most people are referring to uuids when they say guid, and a v4 uuid has 122 bits of randomness. A random 122 bit number can certainly be sufficient for most applications like api keys over the network.
Some standard uuid libraries have weird not-very-random fallbacks when the RNG fails to initialize. It’s rare, but you can get strange stuff without realizing it. Better to use a real RNG and check error codes.
I genuinely don't see the reason why non-cryptographic random number generators exist outside of niche applications.
The main arguments I've seen are speed and determinism.
However, a cryptographically secure, deterministic PRNG can be built from hash or block cipher primitives that have hardware acceleration, making them quite fast. Seed (and potentially periodically re-seed) it from a strong source of randomness, and you've got a fast and cryptographically secure non-deterministic PRNG.
I thought that "classic" PRNGs like the widespread Mersenne Twister even had issues that can cause practical problems when used in certain kinds of simulations (Monte Carlo, possibly) that rely on large amounts of random numbers, but I haven't been able to find a clear source for this.
I'm certainly defaulting to secure ones, and I'm surprised modern languages and libraries don't do this by default for their standard randomness functions.
> I genuinely don't see the reason why non-cryptographic random number generators exist outside of niche applications.
Because for well over 99.99% of developers, cryptography is a “niche application”.
I’ve never written crypto code I’ve deployed anywhere. If I need crypto, I use the highest level crypto library I can find that people I trust who _do_ know about crypto recommend.
The only time I ever recall non cryptographic random() functions to have surprised or affected me was way back when I discovered if you forgot to seed the random on an AppleII the games I wrote in Basic all started out with the same “random” choices.
Cryptography is rare, but generating other data that needs to be unpredictable (e.g. session IDs, password reset tokens, random passwords getting generated, gift card codes, real-money gambling numbers) is quite common, I think.
And the default implementation of random() doesn't seem to be any faster than AES-CTR (which is the core of one form of secure PRNG).
It's true that most people who aren't doing sensitive work don't need cryptographically secure random number generators, but if you use something secure by default, it probably won't cause problems, and to the extent that it does you can catch them with some profiling. If you use insecure RNG by default, it probably won't cause problems but if it does you'll find them when a black hat hacker compromises your system in production.
Very few people set out to roll their own crypto. The issue in my experience is less about someone writing their own hand-optimized password hash function and more about people having overly-narrow views of what counts as security critical code.
The problem is the “banning” things as suggested in the article leads to bullshit like FIPS 140-2 mode.
So MD5 hashes are disabled in the system library for applications where it is not presenting any meaningful risk, for example. The other issue is that bans usually come with lists of things are not aren’t banned. So now you are stuck with waiting for some disinterested committee to support something that delivers a benefit.
Use the minimal amount of resources and scale up. If you need mostly random don't force cryptographic level random. It adds unnecessary processor cycles and reduces speed.
I benchmarked it, and AES-CTR is faster than `random()` on a machine with AES-NI.
That's my main point: It does _not_ seem to be meaningfully more expensive to use cryptographic randomness. Yes, you could build a faster non-cryptographic PRNG, but that's not what is done by the default library.
The most common case for me is when fuzzing, when I want reproducible random numbers, so it must be possible to initialize the RNG from a known fixed seed so that I can re-run and debug a failure.
Hence my suggestion to use a deterministic cryptographic generator: This way, you can either get "proper" randomness by seeding it with a random value, or predictable (and yet strong against anyone who doesn't know the seed) randomness by seeding with a fixed value.
> It lasted a day before the security team was made turn it off as on our team (and many others), literally 0 uses of random numbers were those requiring a secure random.
can concur, currently approaching 400k SLOC of C++ in the repo. A few dozens different places crop up where random is needed (with a quick and dirty grepping). Literally 0% is for secure stuff. Most of it has to be as fast as possible (and very low quality, as it just needs to be random / noisy enough to look random for human perception)
This just kind of proves GP's point. Random APIs usually tell you what the RNG is, but not the why/how. Most people don't care if it's /dev/(u)random, Mersenne twister, PCG, LFSR, LCG, RDRAND, etc. They care about roughly 4 attributes:
- Is it good for crypto
- Is it fast
- Is it reproducible
- Is it portable
But fundamentally, it's about the use case and interface:
- I need secure random (strong, slow, secure)
- I need Monte Carlo (good enough, fast, reproducible)
- I need chaotic behavior for my game/stress test/back off protocol (usually can be barely random, fast, reproducible)
I think calling the last case InsecureRandom or RandomEnough is reasonable to convey "don't use me for secure purposes".
Interestingly a major aspect of video game speed running is figuring out how the game generates random numbers then exploiting the knowledge. For example speedrunners
avoid all random battles in an rpg with this tactic. I'm not arguing games need true random for the record.
What does that have to do with being verbose and letting developers know they are using an insecure method when it would apply to them?
If I'm writing code and using rng for gameplay variety, and then I notice that I have to use a function called "insecureRandom", at the very least I'm going to read up on an interesting aspect of computing and be a little more informed at the end of the day.
Because suitability for use in secure algorithms is just one property of the random number generator. At what point do we then decide it needs to be uniformInsecureBoundedRandom?
Why don't we apply the same logic to string comparisons. Should we replace String.equals with String.shortcuttableEquals()? Since there's plenty of circumstances where that is inappropriate for crypto uses also.
What about other functions with important caveats? Should we have mailGmailMightReject()? fsyncCantFixHardware()? file.existsAtCurrentInstant()?
The other thing is, true randomness doesn't seem random to humans. Which is why spotify and others had to modify shuffle. So true random might not be appropriate for the use case.
But I bet that the main goal is not "Let's make this change so it sounds more random, and then people will like it because they think it's randomer." Rather, it's "Let's make this change so it sounds better, and also as a side effect people might think it's more random."
The creation of a trueRandom function certainly seems to solve this problem more than taking away a useful tool for cases where pseudo-random is good enough.
It's really not clear cut in either way on the surface.
On one side, you can argue that leaning people towards true random will cause unnecessary performance impact because the majority of cases don't need true random.
On another side, the impact of not using true random could cause a catastrophic result for a large number of people.
So which has more weight? I dunno.
In either case, it would be nice if developers knew the consequences of using either method, so this discussion is really more about education than anything else.
>the impact of not using true random could cause a catastrophic result for a large number of people.
And the impact of using 1000x slower trueRandom could cause catastrophic results for an even larger number of people, since by far PRNGs are used where speed is more important than security.
And once you pick a "true random", how true is it? Will it be secure in 10 years? Will we then need a "truerTrueRandom" to mitigate that true random has failed to pass future mathematical or hardware tests? Will it return random numbers fast enough for future uses?
It's a rabbit hole. Let developers use the one they need, and since the vast majority does not need secure random, don't force it on them at significant cost.
If your crypto developer cannot know which to use you're going to have a lot more holes in your crypto than the RNG.
Why not? I feel the same about naming a method to shame/discourage use as you suggested for outright banning. It shouldn't require a justification to use non-CSPRNG, because most use cases I run into for random are not crypto, because I don't write my own crypto.
It's also a good idea to give safer things shorter names.
So make random() a CSPRNG (and an alias for SecureRandom() for people who want to be explicit) while InsecureFastRandom() is just what it says and has no other name. Then if you really need performance over unpredictability, it's there, but nobody is confused about what they're getting. And lazy people who don't like to type or pay close attention get the safe one.
random() should be the most universally applicable random which includes making it as secure as possible. Non-universally applicable randoms should be named accordingly.
Actually, in my experience using the default random implementation in games:
* It's not fast enough.
* It has patterns that can be seen if you are using it to, for instance, generate 2d noise.
So for games you'd typically use, say, the Mersenne Twister [1], which is faster (amortized) and is distributed evenly across 623 dimensions. [2]
It's not cryptographic, but it's far better for games. If you're not going to have a crypto default random, better to at least have a really good and really fast one.
Mersenne Twister is the MD5 of random number generators: it's neither secure nor fast (and unlikely md5 not space-efficient, or simple to implement either). You can have something that has better randomness, runs several times faster, uses less space and has a much more compact and simple implementation.
So given that Mersenne Twister sucks in pretty much every possible way other than having a catchy name, likely the sole source of its continued popularity, its probably not a good choice for replacing some "default" random number generator with something better.
I don't think better randomness than MT is as high a bar as you make it out to be, both links above also briefly mention statistical shortcomings of MT.
Patterned random numbers become a feature in some games, even if an unintentional one.
I used to be able to play about a dozen Pac-Man levels knowing exactly where every ghost was going to go and every bonus that was going to appear. And I wasn’t a very good player.
Pac-Man would have been a less fun game for many people if it used crypto grade random.
Pac-Man ghosts did not moved all that much randomly. Each ghost had special follow up strategy. You likely learned their movement patterns without realizing it.
Look up “password generator” or similar terms on npm and take a look at how the packages you find generate random numbers. I did this ~5 years ago and it took until the second page of results before I found any packages that used a crypto-secure rng.
Even that seems unlikely to be problematic for anything short of literally constant seeds AND a generator becoming extremely popular.
The vast majority of people reuse low-entropy passwords, figuring out what password generator someone used would be a much higher bar than figuring out passwords, and just knowing the insecure generator wouldnt reduce the entropy by that much.
Actually, a password generator on GitHub that generates the password that is literally just seconds-since-1970 would still be a good generator for almost all use cases.
Can you explain more? I genuinely don't see any plausible threat model that a user running an Math.random() based custom algorithm password generator would be susceptible to, but the same algorithm using SecureRandom one is not. Both cases are so drastically better than manually thinking up a password that it's not even close.
I think if there's any gap it would be wrong roll your own password generator at all and you only use ones authored by security experts: just using SecureRandom instead of Random isn't going to somehow magically guarantee you didn't mess up another way and write a low-entropy password generator.
I was trying to generate random string identifiers to make some element ids unique and took the most popular library, which was accidentally crypto levels of secure:
The claim that most uses of random() are not in places where cryptographic security is needed is not in conflict with any list of examples where it is needed.
Here are the last several times I saw random() used.
Seeding a neural network for a Coursera course. Not only do you need to call random() a bunch of times, but the ability to set a seed and get deterministic results makes grading of the results massively easier.
Creating simulation data used for integration tests on a piece of software.
Picking a few random numbers that I used in an explanation of an answer.
Of course the plural of anecdote is not data. However in my corner of the world it is very rare to need cryptographically secure anything. And when I do, I know better than to code it myself. But it is common to need a lot of cheap numbers in a hurry.
chacha8 (chacha20 but with fewer rounds) seeded with either a deterministic seed or from /dev/urandom (depending on what you need) is a perfectly fine PRNG, and should run at 4 GB/sec on a single core, which is plenty.
Only 7 rounds of chacha are cryptographically broken, so chacha20 has excessive security margin. Using 8 rounds of that everywhere you use random today (and not a CSPRING) should be a no-brainer, I guess.
On hardware with AESNI (like all x86 for the last 10 years), a reduced round AES-CTR should be very fast and also string enough for the cases when you don't need a CSPRING. Full AES-128 outputs 4.7GB/sec on my laptop, recent chips have better AES throughput.
Well of course an online poker game should be using a CSPRNG.
The parent comment said "Most simulations, games ...". Most. Not all. I think it's pretty obvious that Poker would not be included a statement of "Most".
The real litmus test is the question of "What happens if a malicious actor is able to predict the random numbers?"
I personally parsed the claim as "most (simulations, games, everything)" rather than "(most simulations), games, everything", but I can see how that can go either way.
If you're using it to give hands to people, then no. If you suitably hash or whiten the outcome, then yes.
If you're using it to generate Monte Carlo hand playouts to generate % outcomes to assist such a game, then you most certainly want an extremely fast generator. Only slightly more complex than a LCG is the PCG class (based on a PCG, and it's just as simple and fast in most cases as a LCG). And you'll need such Monte Carlo simulations to detect cheating, among other things.
So even for an online poker game you need to know what you're doing. Neither type of RNG will solve all the issues you need.
So yes, at the base a simple LCG would suffice if you know how to use it.
If your devs for your online casino aren’t smart and experienced enough to know when blanket advice and recommendations about random numbers don’t apply to them - enjoy going broke, it shouldn’t take long...
I’d bet a lot of money that the number of times random() is used in a non cryptographic context is many orders of magnitude higher than the use of random numbers in crypto code. That make non crypto use “the most universally applicable” case.
I would also agree that “non universally applicable random” such as those used in crypto code should be named accordingly. Which they are. Secure_random() is the right choice for the vanishingly small number of developers writing crypto code. random() is the right name for pretty much everybody except those who need to know a lot of other crypto-specific other things as well. They can’t use random() the same way the cant use non-constant-time comparisons and algorithms that leak side channels via power monitoring or cache hitrates. Fixing random() and letting people who don’t know they should have been calling secure_random() write code in niches they don’t know enough to get everything else right, is without doubt going to end up with way more code that is not “as secure as possible” even if it happens to use secure random numbers.
The thing is, if you need random numbers fast, then profiling will tell you "Opps, used random() when I should have used fast_random()". That's an easy change to make that'd show up in profiling if performance were an issue.
If random() never comes up in profiling, why care that you are getting the secure version?
The danger of missing "secure_random()" is that it creates a security vulnerability (Potentially leading to loss of money, information, etc). The danger of going the "fast_random()" route is that your application will run slower potentially leading to a dev needing to spend time to swap in "fast_random()" for "random()". That, to me, doesn't seem like a major problem or risk.
Programmers are notoriously bad at predicting when something will end up being a performance issue. So why preoptimize because we assume most people want/need speed?
Then we’ll end up with a csprng getting used in a tight loop iterating over every pixel in a raytracer...
“Lazy people who don’t want to type” are not the sort of people I want writing the code I might use or interact with that requires cryptographically secure random numbers...
> Then we’ll end up with a csprng getting used in a tight loop iterating over every pixel in a raytracer...
Which will then be conspicuous enough for the developer to notice and fix it.
> “Lazy people who don’t want to type” are not the sort of people I want writing the code I might use or interact with that requires cryptographically secure random numbers...
Yeah, and if I remember correctly one neat technique for exploiting security vulnerabilities in Firefox was to use them in order to set turn_off_all_security_so_that_viruses_can_take_over_this_computer to true, with obvious results.
Rust's "unsafe" is a pretty bad name and completely different reasoning behind using it. It doesn't mark something as "dangerously unsafe, don't use", to a consumer it indicates "exercise caution" and to a compiler it just allows 5 things:
Dereference a raw pointer
Call an unsafe function or method
Access or modify a mutable static variable
Implement an unsafe trait
Access fields of unions
The point of "unsafe" in rust is to highlight which area requires more human attention... not to discourage its usage.
`dangerouslySetInnerHTML` is literally dangerous and allows XSS if used with outside input.
It also is faster than the other variant. The same is true for `random()`. Both can be used when you know what are you doing to gain some performance.
Meanwhile, `unsafe` rust by itself is not different from safe rust in terms of speed. You have no choice, but to use it places it supposed to be used.
The point of `dangerouslySetInnerHTML` is also to highlight an area which requires more human attention. It's perfectly safe if you have otherwise handled escaping or validation of the content. It's just that you want to pay careful attention to that code to ensure that you're doing it correctly, whereas in normal React code you don't have to think about escaping at all because the runtime handles it for you.
Likewise `unsafe` marks areas where you need to be really careful that you upholding the safety invariants yourself, whereas in normal Rust code you don't need to think about that at all.
The word 'safe' in Rust has a very specific, technical meaning. 'unsafe' is simply code that is not automatically 'safe' in the Rust sense.
A non-Rust developer sees safe/unsafe and gets worked up, but that just means that he should put his rust-colored glasses [*] on.
This is not unusual in ICT, known for its colorful language. A non-ICTer hears 'black hat' and thinks about how cool and stylish the hat is. An ICTer hears 'hacker' and thinks about how cool and stylish the hack is.
Neither does React's dangerouslySetInnerHTML. What both of these do is mark a potentially dangerous operation with an in-your-face warning message that you have to go out of your way to ignore. Which in practice is very useful, as often the biggest problem with security issues is not know what you don't know. It's impractical to review the entire codebase on a regular basis, and it can be hard to know which bits to focus on.
I agree, with the caveat that use of random for cryptography is actually a domain specific use case.
It's probably okay to leave the function as it is and just drill into people that if you're doing cryptography, you either need to know exactly what you're doing all the way down to the hardware or you need to leave it a task for somebody else more specialized than you. I, for one, never assume random() is cryptographically secure, but it might be because I grew up programming during the era where random was computed off of clock cycles since CPU startup because there wasn't much other cheap entropy to lay a hand on ("battery-backed onboard date clock?! Oh, look who has AKERS money!").
Beginner friendliness is something to remember, too. There are half a dozen words you could use to describe pseudoRandom(). Random() is easy for a first year or non-professional to remember.
Most of the time the people who write and name the functions don't know it's not secure or safe. So you would still need to ban random when the new name is implemented.
The "ban" can be evaded by telling semgrep to ignore it for one line. https://semgrep.dev/docs/ignoring-findings/ This doesn't really scale though - if someone bans it with a different tool, you'd have to tell each tool to ignore this line.
I have required parameter to push our app to production called: YES_I_HAVE_ALREADY_MERGED_THE_LIB_REPOS_AND_WAITED_FOR_THEM_TO_COMPLETE_BEFORE_MERGING_THE_APP_REPOS
Gets the point across and will still work when I'm long gone.
So you’re not big on bans but if you use dangerouslySetInnerHTML then it’s definitely not getting merged? Is that not a ban? Do you just not like when tooling enforces it?
No, as I said, it would raise a red flag. That flag can be lowered by justification, e. g. if you add types or constraints to only allow safe-enough parameters etc.
The root problem here is the notion that you need to choose between "strong and slow" randomness vs. "weak and fast" randomness. If every language's random() was strong and fast, most developers would never have to think about it.
"Strong" randomness is often too slow because every time you ask for new entropy, you make a syscall. The solution is to use 32 bytes of strong randomness to seed a userspace CSPRNG. You can generate gigabytes of secure entropy per second in userspace. If you need deterministic entropy, just use the same seed.
This isn't a one-size-fits-all solution, of course. If you only need to generate a few keys now and then, it's marginally safer to make a separate syscall for each of them. If you're targeting some tiny SoC, then sure, use xorshift instead. But what we care about is the common case, and right now the common case is a developer choosing the weak, deterministic RNG because it's faster and has a more convenient API and the secure RNG says "for cryptographic purposes" and well this usecase doesn't seem like cryptography, it's just a simple load balancer...
> The solution is to use 32 bytes of strong randomness to seed a userspace CSPRNG
All cryptographic randomness generation should be performed by the kernel.
You always have to think about security because if you don't think about security you're going to get hacked. By all means, name the insecure randomness generation function ‘insecure_random’. It does help. But secure-by-default helps you only marginally because when building secure software you don't get to just use the defaults; you have to think about what they're doing.
You have to (for example) know and think about timing attacks even if you're using a cryptographic primitives library that's hardened against them, because it's really easy to introduce timing dependence into your own code and none of Daniel Bernstein or Tanja Lange’s careful designs will save you.
That's fair. There's no silver bullet for security. But we should not let the perfect be the enemy of the good. Everyone writing non-trivial systems should have some understanding of security; but the more components we make secure-by-default, the less those developers need to learn.
> You can generate gigabytes of secure entropy per second in userspace.
I haven't thought about this before, so please have patience:
I guess the "secure" qualifier does a lot of work in this sentence? That there's 32 bytes of "true entropy", but "secure entropy" is theoretically weaker but practically just as strong with reasonable assumptions about an attacker's computing resources.
So I'd guess the "secure" qualifier must mean something like "given any quantity of derived pseudorandom information, the seed bytes can't be efficiently deduced? Pretty neat. (I had a knee-jerk disagreement until I re-read your post and saw that you said "32 bytes", not "32 bits". Quite plausible -- and cool -- that we have a good solution with just a small amount more seed randomness though.)
To answer the question as to when you should use cryptographic random(), ask yourself "What is the worst that could happen if someone guesses the result of random()?"
If the answer is "I don't know," go cryptrographic. You'll save your butt if you didn't know it was important.
If the answer is along the lines of "someone could impersonate a user, or leak information they shouldn't see," for the love of all that is holy, use cryptographic. This is basically every scenario where you are using random to generate an ID of some kind, and while it's only truly critical if that ID is all you need for validation, it does provide another layer of security even if you also require other information to match before giving out elevated access.
If the answer is "it defeats the algorithm I'm trying to do" (think something like ASLR, where you're randomizing the offsets of addresses so that attackers don't know where things are located), well, the reason why you need to use cryptographic should be blindingly obvious.
If the answer is instead "they can reproduce my results," well, you shouldn't use cryptographic in this case. And that's not a lot of cases: Monte Carlo simulations, testing, fuzzing are the obvious poster children for this category, and indeed reproducibility in these cases tends to be a highly valuable feature rather than an anti-feature.
Cryptographic random is almost never harmful to your application, and almost always provides some benefit in reducing guessability of your system. You should err on the side of using cryptographic random(), and only not use it when you are sure that guessability will not harm security in any way and you know that the cryptographic nature actively harms your application.
I would argue that if you're asking yourself "What is the worst that could happen if someone guesses the result of random()?" and your answer is "I don't know," then you're doing something you shouldn't be doing.
There are a lot of domains where security is a non-issue and performance is a huge concern (graphics, game logic, many kind of simulations, etc), and the default is the reverse... always just use the installed non-secure random() and if that's too slow consider other options.
Having a flag you can enable to warn about a non-secure random() usage, when it makes sense for your company/usage, sure. But banning it outright makes no sense, and the default behavior you want is very situational.
When it comes to optimization, there's a useful adage: make it work, then make it fast. Secure should really be seen as a necessary component of correct (and that it often isn't is a testament to the failure of our profession).
In that vein, the default random should be cryptographically-secure, with all the logic necessary to actually effect that security (e.g., not reusing seeds after a call to fork). You can also go ahead and provide an insecure random as well, but choosing the insecure random should always be something that the programmer has to go out of their way to do.
Secure should really be seen as a necessary component of correct security. I don’t see random() as part of security, and the problem is that people use it as such (that’s the failure of our profession as I see it). You wouldn’t want the default string equality operator to be constant time to prevent a possible timing attack, and in the same way I don’t think random() should be cryptographically secure by default. If you need secure random values, you are (should be) a domain-expert and should be selecting an appropriate cryptographically secure random generator from a security library, in the same way you would with a constant-time equality function.
I guess it's a matter of perspective over who random() is for. I see random() as for the programmers who don't know what kind of randomness they need, and don't need to know that because they just need something 'random' not something secure. I expect the domain-experts to know that it's not what they need. In my mind it's not that random() is not secure, it's that using it for something it's not intended for is insecure.
>When it comes to optimization, there's a useful adage: make it work, then make it fast.
When you need something to be fast, you better design it from the start to be fast. This is terrible advice for everything but some UI/web cases.
Speed is a feature. Not every feature can be just 'added' to existing code without changing most of it.
All of that is especially true for running simulations and the like. Whether these are fast is often determined by the architecture you decided on in the beginning.
Nowadays the world is full of libraries, frameworks, etc. that will never be as fast the competition, because they can't become fast without changing their APIs completely.
It's true that there's no one-liner you can use to sum up the whole field of software engineering, but that doesn't mean none of them are useful.
I completely agree that things which need to be fast should be designed structurally to be fast.
But which RNG function you use in a given function is about as far you can get from an architectural decision. You shouldn't need to refactor large swathes of your codebase to accommodate a substitution of one random number generator for a faster one.
Server-side folks generate random identifiers and shared secrets all the time. Yes, it's niche, but not "extremely" and you don't use a crypto library for this (you use secure random!)
There is a difference between generating these kind of IDs and writing the generator for these kinds of IDs. You shouldn't be rolling your own UUID generator if you don't fully understand the concerns/requirements in regards to your source of randomness.
Generally speaking, I'd agree the need for a cryptographicly secure random is niche in that it is limited to the implementation of specific libraries/functions that despite being widely used, should NOT be frequently re-implemented.
That's only 120 bits of entropy, which means that you'll get a collision after generating ~ 2^60 IDs.
Ok, maybe you're not worried about that scale; but I normally recommend 256-bit IDs in order to make sure that you don't need to worry about that possibility.
Say you're making an online game, and you need an RNG on your server. Above all, this RNG needs to be unpredictable, or someone will easily game it. Most non-cryptographic PRNGs are very predictable, so it's dangerous to use them.
I think this is a scenario that (a) isn't "extremely niche," and (b) warrants CSPRNGs.
It's not that you shouldn't be using it necessarily, it's just that for many cases (games, procedural generation, graphics, many kind of simulations) it's unnecessary and slow. In my experience if someone doesn't know if they need a cryptographicly secure random(), or if a given random() implementation is secure then they (a) don't need it or (b) are trying to implement something they shouldn't be.
It is expensive to increase entropy of a random source. So for randomised algorithms you might not get the performance that merited the algorithms in the first place.
The cryptographic randomness has practically no downside if you use it for non-cryptogrpahic purposes. Not true the other way round. And I'm inclined to say given how many misconceptions around randomness there are around, I don't think people are good at knowing whether they need secure randomness.
The only possible justification for insecure randomness would be performance, but you'd need to generate a lot of random numbers to even be able to measure that.
> The cryptographic randomness has practically no downside if you use it for non-cryptogrpahic purposes
Cryptographic randomness is typically slower than other forms of randomness.
In all of the programming I've done in my career, I've only needed cryptographic randomness a few times. For the rest, a fast pseudorandom number generator seeded by the clock was the correct choice.
Inversely, in my career there's only been a handful of times where cryptographic randomness was too slow.
I'd argue it's better to do the safe thing by default and switching to the faster alternative when you have proof you need it. Doing the fast thing by default and fixing security later is how we got Meltdown/Spectre.
It's going to depend on your experience, for me I have often run into the exact opposite extreme... where the non-secure random is to slow for my uses-cases (games, graphics, procedural texture gen, etc) and a much faster but less statistically random generator better suited my needs.
I would argue that the default behavior should favor the novice and non-domain-expert. Should game programmers, graphic programmers, etc, be expected to know that they need to tune the performance of random() or should the domain-experts writing cryptographic algorithms be expected to understand the limitations of random() as it applies to their use-case?
The game programmer who needed better performance can "just" switch to a faster algorithm when the profiling calls for it (and if you don't notice it, well, no harm anyway).
The guy who just needed to generate some cryptographic keys? Rotate everything, and you had some pretty horrible hidden vulnerabilities in the meantime.
> Inversely, in my career there's only been a handful of times where cryptographic randomness was too slow.
Yeah, that's the opposite of my experience. Often the libc random is too slow and limiting performance and I need to substitute something even faster.
But I agree that using something that is securely random by default is a good idea. People can substitute faster thing fit for their purpose if needed.
My counter would be that if someone "doesn't know whether they need secure randomness" then the problem is not that random() is not secure, it's the fact that someone is doing something they really should not be doing in the first place.
Obligatory mention here for the fine folks of systemd, who have made a properly seeded CSPRNG a requirement for merely booting a system and then kept bricking peoples systems when it turns out finding that seed at boot time is a non-trivial problem. All for what, avoiding collisions in some hash table implementation?
I don't really care for the browser application, if you made a TLS connection in the first place obviously you better have the randomness and might as well make random() use that, but someone explicitly using a CSPRNG in a native application is a huge code smell on the level of implementing your own crypto.
As much as it's overkill for most people, I'm a fan of safe defaults so I say let random() be slow and good. It's better to find out your code is slow due to a slow random() than to find out it's broken because you didn't know and thought random() was really random.
If you need a fast source of randomness, for some Monte Carlo algorithm for example, then you know this and can pick a deliberate pseudo-random generator that fits your needs.
I worked on a Monte Carlo path tracer. Early on we swapped out the random number generator from the standard random(). Initially not for speed, but due to the poor distribution.
After optimizing other areas it became a bottleneck and we swapped it out again for a faster one.
It is. The question was how often that's the case. If 50% of the uses of random() are bad, then getting those fixed may be worth the cost of annoying the authors of the legitimate 50%.
It turned out to be much less useful than that. So they got rid of it.
Indeed, I use something like it from a vendor supplied C math library for a noise generator on an embedded app, where I really just care about its crude statistical behavior.
But short of saying "banned," any review of security critical code should include an explanation of where the random numbers are coming from and why they're trusted. Or in general for any code review: Why do you believe your numbers?
> Eg. for randomised algorithms you need a fast source of randomness.
Though normal random() implementations are LCGs which have poor distributions when you either only look at the least significant bits or project them into multiple dimensions.
As a result they may make some randomized algorithms perform poorly!
Honestly, the one thing that got me into graphics (from physics and math) was just the incredible amount of: "you can literally do anything so long as you make it pretty in the end."
I took that as a life philosophy and it's been pretty great so far.
random typically works by storing/modifying some state, so every call with the same argument results in different numbers.
In shaders, the same code is executed in parallel for potentially every pixel. Storing state would mean pixels could only be calculated serially, slowing things down.
Hence you need a random-ish function that depends only on its input. Very low RNG quality is not a blocker, as long as things look good.
So in shaders you see a lot of random generators which simply take the pixel coordinate or something else that distinguishes 2 pixels and do some nonsense operations on them.
It's written in GLSL which is a C like language for shaders designed to be executed on the GPU.
The 2D vector used for input is to represent pixel coordinates mapped from 0 to 1 or -1 to 1 on both axes.
The magic numbers are nothing special, they are just large numbers to make the result unpredictable for the given input.
The top level function is a fract which takes the fractional part of a number. So if the result of the inner computation is twisted enough it will be hard to trace it back to the original values. There are lots of variations for these one liners, most of the do a great job to produce noise.
Just make Math.random() cryptographically secure, now all your apps are fixed, and no existing code broken. I can't imagine anything relying on Math.random() being "less" random than a CSRNG.
Why must CSRNGs always have alternative obtuse APIs. We're still stuck on C style srand() + rand().
Cryptography is so ubiquitous now that failure to provide cryptographically secure random numbers should be viewed as a hardware flaw.
Some folks purposely want random-ish results. When OpenBSD was changing the behaviour of its legacy POSIX random functions it was observed:
This API is used in two patterns:
1. Under the assumption it provides good random numbers.
This is the primary usage case by most developers.
This is their expectation.
2. A 'seed' can be re-provided at a later time, allowing
replay of a previous "random sequence", oh wait, I mean
a deterministic sequence...
They went through the code, especially the third-party packages/ports, to identify uses:
> Differentiating pattern 1 from pattern 2 involved looking at the seed being given to the subsystem. If the software tried to supply a "good seed", and had no framework for re-submitting a seed for reuse, then it was clear it wanted good random numbers. Those ports could be eliminated from consideration, since they indicated they wanted good random numbers.
> This left only 41 ports for consideration. Generally, these are doing reseeding for reproduceable effects during benchmarking. Further analysis may show some of these ports do not need determinism, but if there is any doubt they can be mindlessly modified as described below.
This is exactly what I mean about being stuck in the C mindset. You're looking at the problem though the lens of what this giant pile of ancient C software does.
Why should newer languages take the same approach to APIs that these old code bases did? It's not like we're porting all those programs to JS. Those C APIs were written long before hardware could provide fast good randomness, heck even before cryptography was standard practice instead of a special use case.
Not to mention in JS you can't even seed the random number generator. If you want predictable "random" numbers, you should have to jump through additional hoops. By default random numbers should be cryptographically secure.
EDIT: It's also worth mentioning that from your reported dataset, 41 of 8800 programs analyzed used srand to get a repeatable set of "random" numbers. That's 0.47%. I'm happy to break less than half a percent of software if it helps prevent the far more ubiquitous failures of software using insecure random numbers.
A reproducible pseudorandom sequence is necessary for fuzz testing with randomized inputs. It isn't strictly the domain of "ancient C". Why should everyone be hobbled by security guarantees they won't need?
Also reproducible builds, non-fuzz testing of very complex systems, the list goes on.
Quite often CS papers describe randomized data structures and algorithms. Quite often (at least historically, not sure now) ML models were seeded from random states.
If you want something to be algorithmically reproducible, you need some pretty strong guarantees from component parts. If you're using a hash table, and that hash table makes use of nondeterministic state that you can't control, you need to make sure that you're not using it in a way that lets that nondetederminism leak out -- if it doesn't provide a deterministic iteration order, you shouldn't iterate over it.
Sometimes determinism can be won back (just using lookup methods on your hash table, sorting the hash elements after iterating over the hash table) but in some cases it's not really possible, and in many cases it's at least impractical. Not providing a mechanism for nondeterminism in the first place can be simpler, but it comes with different problems. (Security problems are an obvious example -- an OS that gives deterministic bits when you ask for random bits is a worry.)
Why should everyone be compromised because fuzzers want to use the simpler rand() API instead of something different?
I'm not saying we shouldn't have a way to generate predictable "random" sequences. But the default, simple, API for random numbers in any given language should be secure.
The vast majority of software does not need pseudorandom sequences. Developers shouldn't have to think every time "is pseudorandom good enough for this use case?" It should just be strong random every time. If you need a pseudorandom we should have a separate API for that.
Do you have a guarantee that random() will give you the same sequence on different endians, on different libcs, on different 32/64bit arches, on different OSes or even dists?
The expectation of reproducability is probably lost if you switch any of the above "details", so the problem with bad-random-generation is that it's not only the people who thinks the entropy is good in it that are wrong, the ones that think rand() didn't change in the last 40(?) years are equally wrong.
I'm looking at the problem through the lens of changing the behaviour of an existing API.
If you want to create 'secure' APIs that 'do the right' thing, I'm not against it. But leave the old stuff around, or mark it deprecated, throwing warnings and errors on compilation even, for possible future removal—don't change it.
This was specifically a discussion of the impact of purposefully breaking a strict interpretation of a POSIX API, though. They looked at where it was used, which is appropriate.
If by "predictable" all that's meant is "you can deterministically recreate every output bit generated by the function, forever, using a single value that represents the starting state of the function", then you can certainly have that and a secure RNG function. Just use ChaCha20's function with the key representing your seed.
That might be a good default, but where would the seed come from? You also need to make sure that the order in which random bits are read is deterministic, which is a lot harder than it sounds.
> Okay, sure, but I might be running a simulation or something, why should I be punished because some idiot decided to srand(time(0))?
It's about cost/benefit. As far as the compiler/framework knows, there's a 20% (say) chance that you've just introduced a major security bug into your program. Doesn't the benefit of requiring some explicit acknowledgement of that case outweigh the cost? You contrast "I" with "some idiot", but the evidence of the last 20+ years is that most programmers who think they can write secure code can't; if you make those kind of warnings only to people who opt-in to them, the very people who most need them will not get them.
Because your in the minority. Why should rand() be reserved for your use case and people with other use cases need to use a more obtuse SecureRandom() API?
Defaults should be secure. The simple case should be secure.
Because rand has a specific and well defined meaning. In addition I see no evidence that this use-case is any less popular. Anyway, this subthread is about having to go through extra hoops to have deterministic random numbers, your post is irrelevant to that. I do not think that anyone would be against defining random() to return a secure random number in your language.
> Defaults should be secure
Are you supporting that new computers should come preinstalled with Qubes OS, have a constant time memcmp, constant time font rendering, etc?
> Because your in the minority
Fuck people with allergies, right? Let's only provide food with nuts and have them go through "additional hoops" to get food that won't kill them.
Unless I'm wrong, the only valid reason to not use a better random number generator is for performance / simplicity, which then demands benchmarks and evaluation.
Seeded random is a glorious thing in the right circumstances. As an example, I've used it for 'random' testing sequences (jumbling up a list of inputs) but in a way I can later re-run EXACTLY the same test.
It's also useful for other data generation tasks where the output can basically be saved as a seed, making it lightweight and easy to store - it could be written it on a scrap of paper in seconds.
Maybe it's a bad name though - it should be called seededRandom() or semiRandom() or deterministicRandom(). Or perhaps it should be true random is no seed is set. Hard to know. Maybe the true random only needs to be the seed to a deterministic random and reset on a frequent basis in some cases.
Then there's the category of casual random that doesn't matter, like random colours just for the sake of it. It doesn't need to be a secure safe random.
And... assuming that any random function is truly random is a mistake anyway. Basing on hardware, and it may fail. Base it on software and where's the source of entropy. Add to that the possibility of bugs/defects in the implementation, and it's possible that it might not be as random as it needs to be. It's better to assume ALL RNGs are PRNGs, with the caveat that some are decidedly better than others.
So no I wouldn't support a ban on it, nor would I support removing it from any language/runtime where it might be useful.
Came here to say this. I have spent a lot of time in hardware validation. Pseudo-random (explicitly NOT random) sequences are hugely useful.
I once had a lights out server room of 60 servers whose entire purpose was to take skeletonized tests and a seed for a pseudo-random function and generate a test instance. That test instance went to one of a dozen test jigs. What was recorded was: pass/fail, the git sha of the template, and the seed. Any failing test could be reproduced at any time from just the git sha and the seed. True random would have killed that whole methodology.
That sounds awesome, and the use of a repeatable random vital. Another example of random in a non-cryptography context where it's unpredictable under normal operation, but completely predictable when needed. If you wanted you could run the same test on all the test jigs with different seeds, safe in the knowledge that you could re-run all of them exactly again and again if required. Or you could add problematic seeds to a list for repeated retest with future versions. So much power and freedom!
There are many uses of random() that do not require cryptographic security: simulation, simulated annealing, sound synthesis, digital signal processing and the like. It would be a nuisance if developers of those kinds of software have to fight warnings because developers of completely different applications can't get it right.
Further, such users usually want to be able to repeat a test case: start from the same seed, get the same sequence. They don't want true randomness, they want a repeatable sequence with good statistical properties.
It's all about the discipline of the team in the end... You can ban things all day, but it just takes 2 developers deciding they don't give a shit to code, review & merge that use-fast-random-for-session-token PR. There is more than 1 way to get something that is "random", so basic string matching for methods you don't like is certainly not a guarantee.
In our organization the policy is very simple. We have static method available throughout called CryptographyService.GenerateCsprngBytes(count = 64). All developers are aware that any security-sensitive requirements around entropy must use this method. It wraps the OS-level offering, and encourages a minimum reasonable level of entropy with a default count.
I don't see any reason to make it more complicated than this. Communication with your team is more important than writing check-in rules to prevent bad things from happening.
As for other uses of Math.Random, et. al., we don't have any official policy. Because we have clearly communicated the awareness that security-sensitive applications should always use the secure method, we don't need to add a bunch of additional bandaids on top. Enrich the team before the process.
> Communication with your team is more important than writing check-in rules to prevent bad things from happening.
There's some subtlety here. This is sort of a security vs safety issue.
Some people are just reckless, and that's a human problem that is best dealt with through a stern talking to (or, ultimately, termination) rather than technical measures. You'd require an oppressive amount of check-in rules in place to be even remotely effective at stopping this behaviour, and those would just make life miserable for everybody else.
Some people are new to the team and/or just plain inexperienced, and it takes time for them to absorb all the standard practices so they can innocently cause trouble. Even veterans will make mistakes. Low friction guard rails can help keep those people from getting into too much trouble without being too onerous.
It seems like the "proper" solution to this problem would be to make all random number generators pull from the cryptographically secure randomness pool by default. If your random number needs are within what can be provided with strong guarantees, it doesn't seem like there's any reason to give you anything but strongly random numbers.
People who need more random numbers per second than can be generated securely will simply have to pass explicit parameters indicating they want to stop getting high quality numbers. This would be easy to see in code review and highlight the choices being made.
Most static code analysis tools I've used allow you to write exceptions for rules into your code using comments that follow a particular signature. Isn't it sufficient to just ban the use of random() and require devs to use one of those comments to effectively "sign off" on it if they encounter a good use case?
> Fix Rate is the percentage of merge-blocking findings that are fixed (i.e., not muted*) in CI. We believe this is a proxy for engineering value. As we run our own developer-focused security programs, we’re obsessing over how to increase the Fix Rate for the rules on our projects.
> Observing a bad 0% Fix Rate for random() (with only 7 data points from our projects), we decided to silence the rule for r2c developers
I wouldn't necessarily consider a fix rate of 0% to be bad. If you're doing something that looks bad, you should probably leave a comment that silences the static analysis error and justifies why you're doing it. If that justification is missing, the static analysis tool should flag it IMO.
If using random() in a crypto project is a bad code smell, then I'd say every use of it should come with a brief justification.
For cases where small statistical differences matter, like Monte-Carlo simulation, the overhead is non-trivial. Linear congruential generator is so simple it can be inlined by the compiler.
"Fix rate" is an interesting metric, but I don’t think it’s a good proxy for engineering value.
When I see a false positive flagged by a compiler warning or static analyzer, sometimes I’ll fix it just because I’m not sure I want to turn off the rule. For example, I often use -Wunused-parameter with -Werror with Clang or GCC, and then just use (void)arg; to silence the false positives.
No mention of arc4random(3)? Seems like a solved problem in BSD land.
The key takeaways I feel like are:
1. You want as simple of an interface as possible. arc4random(3) returns a single random 32 bit integer, or you can tell it to fill a buffer with them.
2. Just make it cryptographically secure. arc4random does it and it seems to be fine.
It is extremely useful for testing. It keeps this code simple and simpler tests are less buggy. Randomize a bunch of choices in input ranges and run a test. Need to re-run that exact scenario? Just set the seed to the same as the first go.
To quote Theo de Raadt's recent commit to OpenBSD's rand(3)/random(3) man pages, since people seem to be very confused here.
+The deterministic sequence algorithm changed a number of times since
+original development, is underspecified, and should not be relied upon to
+remain consistent between platforms and over time.
This is probably a hot take, but I don't think standard libraries should contain a random number generator.
There is no globally correct choice of PRNG algorithm. If you need randomness, you should ask what kind of randomness - cryptography wants secure, Monte Carlo methods want fast, games can often benefit from N-dimensional equidistribution, etc. - and find a library specialised for that use case. A standard library function that nobody uses because there's always a better alternative elsewhere is a bad standard library function.
I most commonly use random() for generating names (e.g. docker's container names) and generating test inputs. I don't care about cryptographic safety in either case.
But you probably care about collisions in that case. The small state of a language-default insecure PRNG will make collisions much more likely. Especially if seeded by a clock.
I have seen temp file name collisions cause data corruption in a real system because the default language RNG was used. Also infinite loops in a production system because random() was called in the same clock tick by two separate threads generating a handle value. Both wasted weeks of effort to pin down.
random() should default to the system CSPRNG. Provide insecureFastRandom() for those who know they need it and it is safe for their use.
It certainly is in my experience: everything from deterministically shuffling lists to Rogue-like map generation to audio/visual glitch & noise effects.
If any, I would say it should be other way around. There are very few use cases outside of cryptography. So flag it if people uses a cryptographically secure PRNG directly. In almost all cases they would be better off finding a library that does what they need.
Semi-unrelated question: How secure are the hardware random number generators on say, a raspberry pi. Would these sort of things be cryptographically secure?
I think those built-in RNGs should be treated like RDRAND on x86, you may mix them into the pool along with all the other sources of entropy your platform supplies, and then generate cryptographic streams seeded out of the pool. Using them alone might turn out to be a bad idea in the long run.
We did essentially that at my employer. I think the rationale is good, biasing toward a secure random function makes sense because the downside (as I understand it) is performance, but defaulting to insecure random has worse downsides. And if there ends up being a hot path where secure random is too inefficient, you can change it in that case. (This is in a context where a secure random function is readily at hand, when that's not the case, it could be trickier.)
Is it possible to have something like random.org but without paying for it?
say you want to build a lottery application, who can you rely on to make receive a very good random number generator at low cost?
Should the government provide this for free to developers? Seems like its in everybody's interest to have a ~true random() function.
Pokerstars uses lasers, someone else uses lava lamps, radio waves, what else?
Also on a side note: how much do you think we have truly discovered the nature of "randomness"? Nassim Taleb says its not random if you run into somebody you know in the supermarket while thinking of them. Some physicists have likened it similar to newtonian others a more parallelian view. Why is it that some natural order emerges out of "randomness" out of a 52 card deck? How is it that a randomly swinging spot light in a dark room is able to "find" a plant that is also "randomly" placed in the room? Or the randomness of people's birth date and time emerging in lottery tickets? Or even more controversial, the fact that RNG is able to be seemingly influenced by the collective consciousness?
I hate how bloated software development has become nowadays and I hate all these tools which keep raising warnings about vulnerabilities which are not relevant to the use case.
It's unbelievable that we live in a society where we're obsessed with achieving 100% test coverage of all our small petty software systems but our monetary system itself (the mother of all systems) is not even integration tested... First time anyone ever heard the word 'test' when discussing the financial system was 'stress test of the banking sector' after 2008 crisis and these tests are so superficial, it's a joke! What's worse about the monetary system is that known vulnerabilities don't even get patched after decades of active exploits! With all its unnecessary bureaucracy, the software industry is a joke. What is the point of all this rigorous software testing infrastructure when the entire environment within which the software exists is unsound and untested? Managers don't trust their developers... But it's the managers who don't deserve to be trusted.
"Banning" language features that are only "banned" when a linter is placed as a roadblock between the developer and the versioning system have to be the dumbest thing us developers have inflicted upon.
How is this preventing me from recreating my own shit `random()` when it's entirely too late in the evening, deadlines are looming and the garbage office politics preclude me from disabling this asinine thing?
I used to regularly trip on this damned reified rituals where we're only supposed to use a single type of quotes, or maybe using short vars like `i,j,k` on for loops and other garbage "rules" that might have sounded great when originally put in place but are horrible on a day-to-day basis.
There's also the behavioral cost, I've noticed more and more people simply don't the code in code reviews anymore.
Most feedback I get these days is stuff like "that's not supposed to be snake_case" or "a single space should be put between methods", and sometimes a glaring logic bug will crop up after 3 or 4 people who "approved" the review will gladly and openly admit _they_ _did_ _not_ _read_ _the_ _code_, and I think this is related to us overstressing the importance of all this tiny "form" misshaps and disregarding the "function" bits because it's harder to write a roadblock that checks for them.
> There's also the behavioral cost, I've noticed more and more people simply don't the code in code reviews anymore. Most feedback I get these days is stuff like "that's not supposed to be snake_case" or "a single space should be put between methods", and sometimes a glaring logic bug will crop up after 3 or 4 people who "approved" the review will gladly and openly admit _they_ _did_ _not_ _read_ _the_ _code_, and I think this is related to us overstressing the importance of all this tiny "form" misshaps and disregarding the "function" bits because it's harder to write a roadblock that checks for them.
The goal of linters should be to ensure this doesn't happen. By turning non-compliant code into a test failure no reviewer attention ever needs to be wasted on formatting issues (which I fully agree is a complete waste of everyone's time).
* If the tests pass, the formatting is correct. Great, reviewers have nothing to complain about.
* If the tests don't pass, the formatting has to be fixed before the code will be merged. Code that doesn't pass the tests should never be merged in the first place. As a result, reviewers don't need to care as it has to be fixed anyway.
Reviewers that still feel the need to point out formatting inconsistencies are likely just disinterested in providing genuine feedback, and would not have provided useful feedback regardless of the presence of linters.
What I am big on is forcing developers to make deliberate choices. That's why I like React's policy of naming functionality "dangerouslySetInnerHTML" or "__SECRET_DOM_DO_NOT_USE_OR_YOU_WILL_BE_FIRED".
If you add usages for these in a PR I'm reviewing without justification, it's not getting merged.
So why not make cryptographically unsafe random unsafeRandom() or shittyRandom() or iCopyPastedThisFromStackOverflowRandom()?