I was once involved in helping diagnose a race condition in a website whereby a user could change another user's password if they changed their password at approximately the same time. When looking at the code it was not at all obvious that there was even potential for the issue until you knew it was going on...
Some on this thread are acting like these are trivial problems to avoid. I think they are wrong...
The problems aren't trivial, and sometimes the solution isn't trivial, but that doesn't mean there aren't strategies that automatically mitigate some of the issues. Using database transactions, verifying things worked correctly, rolling back transactions with unexpected results, not making assumptions like "two users won't change passwords in the same millisecond", and so on can protect your product. PHP and MySQL are largely to blame as so many web developers started out building sites on a database that didn't support transactions if you used the default engine.
Basic software quality assurance is missing from the majority of web development businesses.
Yeah, this is my question exactly. I've dealt with MANY race conditions over the years and the idea that two different user requests could update the same user account password seems... unlikely at best. Not saying it's impossible, but more detail would have been appreciated.
I've seen this before in a Java system. A MVC Controller populated the HTTP params into a Bean, which it assigned to an instance field, and then used that subsequently to grab the value to update (don't think it was password, might have been email address iirc). This would work fine if the controlled was request scoped, but it turned out to be a singleton - so the field was shared across all requests!
It always amazes me when developers making crucial transaction systems don't know about or implement some form of locking. I personally use a cluster of consul servers for distributed locking ( http://consul.io by hashicorp ) although I don't use the rest of consuls features I'd sure like to learn how someday :)
You can build what you think is the most robust and secure system in the world, someone somewhere will figure out how to break it. I don't think it's fair to insinuate that the people who wrote the code were "incompetent", especially given the size of Facebook's codebase. And given their audience they'll be more exposed to hackers than whatever thing you'r working on that isn't Facebook and doesn't have the same audience.
Exactly. When I write my code I am aware of a high number of things that could go wrong, and that I deliberately don't check. If it goes wrong I'll let it crash, or I make the deliberate decision too see if that condition ever actually happens in the real world. I'm not talking about security! It can be things like only checking if function parameters are what I expect them to be for some functions where I think it's important, or being aware that if some functions are called with other timings than I expect something could happen. The problem is I'm not writing the final app, I'm writing some sort of library (sort of), so I have no control over how it is going to be called in the end. I'll just add it to the documentation but I make few attempts at catching all or even a lot of such errors.
If I would blow up my code at least tenfold if I tried to take care of all the possible conditions - creating a lot more of them in the process. Writing code feels amazingly fragile to me, and yet it works well. Note that that code has had several reviews from other developers, so I'm not talking about really bad code.
After having delved deep into medical topics out of curiosity - hundreds of hours of anatomy, physiology, neuroscience, bio-chemistry, lots of statistics, I'm even less concerned. The ways things go wrong in a biological system are orders of magnitude more numerous, and the approach of nature is "fix it when it happens" (or start by creating a new instance).
I think the more complex our own human-made systems become we'll have to use more and more of the nature method. We are already doing it everywhere, electronics or software.
I see two competing forces:
a) The human attempt to make systems more "provable", for example by formalization/"mathematization",
b) Nature showing us that complex systems can only be done with a relaxed and laissez-faire attitude ("shit happens") after putting in a reasonable effort.
The balance shifts towards b) for systems in rapidly changing environments, and to a) for systems in static conditions.
So discussions about the subject should never be just about the system (piece of software) itself, it must include the environment it is to operate in.
I'm going to assume that on top of the complexity of programming itself. The reason is that I previously read that Facebook lets new hires work directly on the live site. They tend to just push code out there. They also do it in tools like PHP where it's harder to automate QA since no language-level annotations that make that easier. The combination of tooling that makes problems easy plus nothing stopping inexperienced people getting bad code out there means Facebook has higher-than-average risk of problems happening.
Good news is that their people are smart. Makes up for it a bit. Inexperience will still bite them, though.
You'd be surprised. handling over a million dollars in visible, actual transactions daily makes you prime target for hackers looking to exploit these kind of race conditions.
I would love to hear how these were all patched. Wrap everything in a transaction? Use some kind of MQ? Make sure these critical DB calls are not cached in any way? I'm sure it's different for each case.
Staying off topic for a moment... I bet there's an interesting study in this. Something along the lines of neurochemical or psychological rewards when one does something like "downvote" in self-regulating forum like this; seems like it would be narrowly focused enough an as compared to say, trolling, that you might be able discern something in from that sort of behavior. (I'm clearly no scientist :-) )
For the record I agree with you. I think when someone says something that is silly, naive, or somehow just wrong in a forum like this... it seems like it would be far more productive to both participants and observers to have someone just point out what's wrong (yes, that takes time, but you cared enough to down vote, right?) Some part of the time you'll just uncover a misunderstanding or actually find out the person that would otherwise be down voted may have had a point.
I used to use a camera shopping website that had a race condition in their product listing/search functionality. Their site was quite slow, and if you tried to have more than one tab open to their site then you would get mixed-up results. The tabs might have the same content, mixed content (eg the breadcrumbs from the other tab), or sometimes nothing at all. This would occur even across a significant period of time, not just if you made the requests at the same time.
I struggle to understand how that could even have happened. Were they storing the results of the DB lookup referenced against my IP somehow?
That isn't so much a race condition as session/state corruption. The designer hasn't considered that their chosen state store in this instance is not entirely private so two instances of their code can clobber each other's global state.
A common occurrence of this I've seen in many places (including some of our old code years ago) is using a cookie to store simple client-side choices that you want to persist between requests, for instance the tab that was selected when you were last on a complex page, or some other temporary preference, or as in your example breadcrumbs for navigation.
This falls over as soon as the user has two tabs/windows open because (unless the window is explicitly opened with a different session i.e. in private mode, with File/NewSession in IE, or such) the cookies (even temporary session level ones) are a shared resource. It isn't a big issue usually as more often than not it only concerns superficial UI matters (those that are "nice to have"s rather than breaking something essential), but I can imagine circumstances where the problem could open more serious problems.
I'm guessing a lot of them stem from using session variables and from caching. I've seen everything from an nHibernate cache being stored in the session, serializing and deserializing a huge chunk of the database on every request, to developers not being aware that the session variable they get is user specific (They access it via Session[UserId]).
Caching in particular seems to be something that's applied without much analysis, I've seen it slow down implementations more often than it sped them up.
What I'd really like is for the browser to send a tab id on every request, so we can scope variable to a tab as well as user.
localStorage is often used for it's persistance across closing/reopening the browser, but for times when you don't want storage to bleed across tabs or closing the browser then session storage is much better, and unlike a cookie doesn't automatically get attached to request so there aren't XSRF worries.
Lenovo's website does this: if you try to browse and customize more than one ThinkPad at a time, the site will get completely confused and start replacing the older one with the newer one, losing the older one entirely.
Off topic: FarmVille on a satellite connection could let you plant fields on top of each other even though they weren't supposed to intersect. Unintentional vertical farming.
Bug bounty programs are interesting. I wonder if it wouldn't be a good use of tax payer dollars to pay people at the nsa or similar organization with computer security responsibilities to churn away at these programs. As in "spend one month a year doing big bounty programs, collect your normal salary, keep whatever you earn" ...
Some on this thread are acting like these are trivial problems to avoid. I think they are wrong...