Race conditions on Facebook, DigitalOcean and others (fixed) (opens in new tab)

(josipfranjkovic.blogspot.com)

294 pointsfranjkovic11y ago88 comments

88 comments

ejcx11y ago

I actually fixed the issue that was reported to LastPass.

I could be mistaken but I believe he reported the security issue through our regular support channel which is why it took three days to see (instead of our security channel). From the time I saw it, I fixed it with the patch going live within an hour or two.

When I DID see it, tried it myself with a quick shell script that that curled and backgrounded the same request a bunch of times, I just kind of chuckled. It was a good bug. Josip is top notch.

franjkovicOP11y ago

Thanks! I reported the bug to security@ email, and one of your team's members replied on the same day (January 6th). Either way, good job on fixing this really fast. I wish more teams are as responsive as yours.

ejcx11y ago

Oh okay I was mistaken then.

I believe the race condition is on the rise in terms of severity and importance. Developers are aware of common OWASP bugs, but this type of race condition is often overlooked and developers are going to NEED to be just as aware of. Way to go.

homakov11y ago

> When I DID see it, tried it myself with a quick shell script that that curled and backgrounded the same request a bunch of times, I just kind of chuckled. It was a good bug

That's the problem with OWASP, when developer from a big company sees race condition for the first time and is surprised

monksy11y ago

BTW: I just subscribed to LastPass a few days ago. I'm pretty happy with the service.

jdubs11y ago

LastPass is awesome but I hate their website login process! It bothers me to no extreme that if I type in my email address with a wrong username, it pops back with, "Invalid password" while typing in a obviously random email, it pops back with a "Unknown email address. Would you like to create an account now?."

I worry that a malicious attacker could finger the service for potential victims.

2 more replies

MichaelGG11y ago

We should see lots more of these if people embrace eventual consistency instead of "slow" ACID transactions. And interestingly, the more larger scale a system, the more likely that globally consistent operations are too expensive to enable in general, and developers will overlook cases where they must implement some locking or double checking.

partisan11y ago

I would have thought that the opposite would be true; by having an CQRS/event sourcing system with eventual consistency would allow you to avoid posting duplicates to your database:

1. The user submits X number of requests within a second. 2. The system puts the request in a command queue that synchronizes the commands by coupon code, for example. 3. The command is popped off the queue and an event is generated and saved saying the coupon was redeemed. 4. The next command is picked up and all events are applied before processing. At this point, the command is no longer valid so you reject and send an event saying that an attempt was made to redeem a redeemed coupon. 5. Do the same for subsequent requests.

To me, this approach is safer and easier to reason about. You have a log of the fact that someone made the attempt so you can report on this. Not sure you get that benefit from a stored procedure and a transaction unless you build it in and then increase the running time of the open transaction.

pyvpx11y ago

when did eventual consistency equate to race conditions, or even increased susceptibility to race conditions? I don't follow. could you explain your reasoning further?

MichaelGG11y ago

It's probably just an ease-of-use question. The more guarantees your database can deliver, the easier it is to reason about things and make sure you aren't being caught on a gotcha.

It's not necessarily different than using a normal RDBMS, right - you could do a check in SQL outside a transaction and end up writing multiple times. But with an RDBMS, you can easily solve the situation by turning on a transaction and leaving no question about things.

This is why things like VoltDB ("NewSQL") are pushing to keep SQL and ACID, and figure out a way to scale, instead of throwing it all aside and making the developer deal with consistency issues.

It's not that you can't end up with the same functionality using eventual consistency, just that it's harder. Just look at Amazon's "apology based computing" (I think that was the name) and how they structure their systems to be resilient by being able to resolve multiple conflicting commands in a proper way (deciding, without communication, which command wins, figuring out rollbacks, etc.) It's fantastic, and perhaps it's the only feasible way to operate at their scale. But it's also a hell of a lot more complicated than "UseTransaction = true".

(So my predictions/guesses: If developers that'd otherwise use a traditional ACID RDBMS switch to non-ACID (BASE?) systems, they'll end up introducing bugs due to the shifted responsibility of handling data consistency. And seeing how big servers are, and even how far sharding can take you with normal RDBMS, the scale at which people "need" to drop ACID is probably far higher than the point at which people are dropping it.)

1 more reply

janoelze11y ago

appreciating the joke (?) in the comments. https://i.imgur.com/zWE5ABQ.png

unclesaamm11y ago

Wow, it seems like there is room here for a 3rd party vendor to implement promo code handling as a service, and to do it right once and for all.

d_luaz11y ago

No bounty for bug report? Should at least have a nominal fee of $100 (else no one would bother to report it).

reagan8311y ago

The economics of bug bounty programs could lead to misaligned incentives. Because the overhead cost to validate and communicate around bug reports isn't zero, the % of non-bugs submitted could become imbalanced because it is free to submit.

In most systems the reward is zero, so you can infer if a person has taken the time to submit a bug report it is because he/she is invested in seeing it fixed.

Context: I work at a decent sized company in SV on this type of problem.

nmjohn11y ago

So when I find a bug in say Paypal which allows complete account takeover and could sell it to an organized hacker group for say $100,000 or report it to Paypal "because I'm invested in seeing it fixed" and receive nothing - that is only an easy decision for the whitest of white hat hacker.

Properly designed bug bounty programs are a cornerstone to any company who remotely cares about the security of their product, period.

The idea of misaligned incentives due to poor bug reports being free to submit is ignorant - and worse toxic, because it sounds so true to an executive who has no actual understanding of the issue.

A quality bug report should take no more than 1 minute for a reviewer to look at and know if it's really a bug or not. If it can't, it should be rejected saying provide more clear details. For example a dom based xss attack could be reported with just a target URL and it is quite clear what the problem is. That would take 10 seconds to analyze.

Additionally, most bugs reported to most decent sized companies are reported by someone who has previously reported a bug to the company before. If someone is constantly reporting good bugs or the opposite, its quite easy to prioritize which of those individuals gets their emails read first.

4 more replies

d_luaz11y ago

So the best solution is not to have a reward? Or not to have a publicized reward? Or don't depend on the public on bug hunting? Or just hope on goodwill?

squiguy711y ago

I agree. If I had my own company I would surely provide some incentive for bugs found in the product. Whether that incentive was monetary, a free membership, etc. I think it's important to acknowledge that all software systems are imperfect.

diminoten11y ago

Clearly not, though.

Kiro11y ago

I'm a novice but would like to know how these issues can arise. What kind of backend setup is needed for it to be a problem? What is happening when a race condition occurs in these examples?

spdy11y ago

Its actually quite simple as example for the promo code the code looks like this:

1. Code sent.

2. Check if valid.

3. Redeem code.

4. Invalid code.

Now if i send 10 requests at the same time with the same code maybe 4-6 will hit the code part after 2.

And your window of opportunity is the time it takes to go from 3 to 4. Sometimes certain tasks are put inside async queue, you have a slight delay to your database server or you need to wait for db replication to kick in.

Because normally there is no code part to recheck how often this code was used.

codenut11y ago

Can this issue be prevented if we use the promo code as the table primary key or document ID?

1 more reply

mike_hearn11y ago

>I'm a novice but would like to know how these issues can arise.

The problem is concurrency. Whenever you have multiple things happening at once, you have concurrency and programming concurrent system is always really hard.

Unfortunately the software industry has never really got a grip on this problem and there are lots of developers who have never really studied multi-threading at all. That's a problem, because it's something that takes a lot of practice and you have to just incorporate it into the way you think. After a while you do get a sixth sense for race conditions, but you'll still write racy code from time to time anyway. It's just tricky to get it right 100% of the time.

spdy has already outlined what is happening here, but this problem is something that is covered in literally any introductory course to database systems or multi-threaded programming. If you have two threads (or processes) in flight simultaneously that are reading and writing to a shared data store, then you need some kind of mutual exclusion. That can mean a lock:

1. Request for /reviews/add is received.

2. Database lock on the page reviews table is acquired.

3. Check if the user has already posted a review. If so, release the lock and abort (any good framework will release locks for you automatically if you throw an exception).

4. Add review to table.

5. Release lock.

At the point where the lock is acquired if another web server is in the middle of the operation, conceptually speaking the first one will stop and wait for the table to become available.

Real implementations don't actually "stop and wait" - that would be too slow. They use database transactions instead where both web server processes/threads proceed optimistically, and at the end the database will undo one of the changes if they detect that there was a conflict .... but you can imagine it as being like stop and wait.

Of course once you have concurrency, you have all the joy that comes with it like various kinds of deadlock.

It's funny, a few days ago I was letting my mind wonder and ended up thinking about web apps for some reason. Oh yes, now I remember, I was thinking about concurrency strategies in a software library I maintain and how to explain it to people. And then I was thinking how hard multi-threading is and how many people are selling snake-oil silver bullets to it, and started to wonder how many web apps had race conditions in them. And then I just discarded the thought as one of no consequence and got on with my day, haha :) Perhaps I should have tried to earn a bit of money this way instead.

knoble11y ago

This is really helpful. Thanks for sharing!

benihana11y ago

The article links to an article that explains it really well:

https://defuse.ca/race-conditions-in-web-applications.htm

emmab11y ago

It would be cool if there was a browser addon that let you submit a form N times in parallel.

ejcx11y ago

I do a lot of App Sec related things and I actually use mostly Chrome dev tools and command line instead of burp and other tools. The way I reproduced the bug when it was reported was by using the "Copy to curl" feature in Chrome, and then using it as follows

    for i in `seq 1 16`;do
        curl.*&               #copied from chrome dev tools. & to background
    done

bburky11y ago

Also, curl gained a --next command line option somewhat recently. It lets you send off multiple requests in the same curl invocation. These requests will all be pipelined in the same HTTP connection, which might trigger slightly different behavior in the website.

I have considered writing a program that will let me send of a bunch of HTTP requests at once, but wait to close all the connections at the exact same time. That would probably be the most effective way to trigger race conditions.

odonnellryan11y ago

If you go down to the "proof of concept" here it's not hard to test this: https://defuse.ca/race-conditions-in-web-applications.htm

SixSigma11y ago

why would it ?

andersonmvd11y ago

More interesting than the bounty itself is to understand which defense works best at scale and the nitty gritty details of those kind of attacks. Intuitively I think that we just need to avoid inconsistencies between the Time of Check (TOC) and Time of Use (TOU), so veryfing the existence of a discount coupon while inserting it in one query should do the trick (INSERT INTO coupons (...) Values (...) WHERE NOT EXISTS (SELECT 1 FROM coupons WHERE (...)) instead of increasing the time between the TOC/TOU, e.g. one query to check if the coupon exists and a second one to insert the coupon. Besides it I am wondering if I am missing something, e.g. is this really a problem limited to the application layer or are the databases unable to prevent such attacks? I think I am right regarding the app protection, but let's see what people have to say :)

pilif11y ago

In many databases, your suggested "where not exists" sub query might not actually protect you but just make the possible window to hit the race much smaller. What happens is that your database would evaluate the subquery, the rest of the where, commit another transaction and then finally run the insert part of your query.

There are no guarantees in the SQL standard that queries with subqueries should be atomic.

The only truly safe way to protect yourself is to fix the schema in a way that you can make use of unique indexes. Those are guaranteed to be unique no matter what.

MichaelGG11y ago

>only truly safe way

Or put the whole thing in a transaction, right?

1 more reply

Osiris11y ago

This could be an issue of a clustered database where the requests are being load balanced to multiple masters and due to latency in replication, part of the cluster may not be consistent with the other part yet. Though for someone like DigitalOcean I'd be surprised if this was the case.

underwater11y ago

Not every database is powered by SQL. Add to that sharding, caching, cross data center traffic and the problem becomes non trivial very quickly.

hobarrera11y ago

sharding is what confuses me the most. How would you avoid these race conditions with a distributed database?

3 more replies

zhoutong11y ago

Or just use a UNIQUE INDEX.

rblatz11y ago

You are making a lot of assumptions about architecture. Not everything is back by a simple sql database. Some sites utilize event stores and read only data stores to provide data access/storage. Other sites have other unique architectures that make this specific issue a lot harder to mitigate than a simple unique constraint.

hobarrera11y ago

I'd never heard of "WHERE NOT EXISTS", but my first approach would be to make the pair (user_id, cupon_code) unique together, so that only one insertion can be made and only do any further processing if that transaction does not fail.

inportb11y ago

So the review bug was a security issue but the username bug wasn't? I wonder what else the review bug affected.

franjkovicOP11y ago

I think they did not reward me because you cannot really hurt anyone by having multiple usernames.

joshschreuder11y ago

What about squatting on valuable ones? But probably not a big deal unless it relates to Pages.

georgerobinson11y ago

Can anyone comment on how the author flooded HTTP requests to the endpoint URLs? Did he use developer tools in his browser and execute his own JavaScript, or use CURL in a tight loop with the cookie and CSRF token from his browser session?

gislifb11y ago

Without knowing exactly how he did it I assume this is possible by doing a POST with cURL inside a loop or with parallel.

You can then get the exact request by using Chrome developer-tools. (Find the POST-request in the network-tab, right-click and select copy as cURL)

Rafert11y ago

I have reported the same issue with Digital Ocean (security) in November 2014, and they told me I was using the wrong address and that they forwarded it to the proper team. I triggered it by accident, using the same GitHub code twice, and I (or the DO staffer) didn't realize it was a race condition. I never heard back but they let me keep the balance :)

numair11y ago

I would be really interested to know how various forms of this bug are resolved. This seems like a problem that, on its surface, seems easy to fix, but isn't. Especially if you've designed your architecture for real-time-ness and global redundancy. Google's servers with atomic clocks come to mind...

ekimekim11y ago

cynical answer: I've seen alot of races get "fixed" by adding a sleep() or similar

less cynical answer: Commonly you already have some kind of means to handle races - locking, transactions, some other variety of extra check - and the fix for newly discovered races is "oh, I didn't realise that could happen. add lock"

hobarrera11y ago

If you get three requests in at the same time, and sleep the tree for N (say, 400) miliseconds they'll all still run concurrently.

Adding a random time to sleep might work, but some requests would run noticeably slower.

1 more reply

jbkkd11y ago

Now that race condition bugs have been widely exposed, I have a feeling we'll start seeing more of these "attacks" in the near future. They are relatively easy to execute and don't raise a high suspicion.

tomcam11y ago

Now please fix race conditions everywhere else, like Baltimore.

yesmade11y ago

$3k for the facebook review bug. that's a little bit too much

- update

thanks for the downvotes guys. keep up the good work

franjkovicOP11y ago

The bounty actually surprised me, too. I expected between $1000-$2000. That is one of reasons I like reporting bugs to Facebook - they pay really good, critical bugs are fixed really fast (<1 day).

One time they paid me $5000 for a bug I never could have found, but they did internally based on my low severity report. (http://josipfranjkovic.blogspot.com/2013/11/facebook-bug-bou...)

mwsherman11y ago

It’s impressive that they are able to fix them so quickly – one needs to imagine they get a non-trivial number of reports, and that some majority of them are junk. They have a good triage + repro + escalation system.

1 more reply

yesmade11y ago

congratulations on both findings

totony11y ago

This bug actually seems quite critical imo, defeats the purpose of a feature and permits abuse/cheating

mikeash11y ago

Who are you to say that it's "too much," when it's their money than they can spend as they wish?

yesmade11y ago

> seems > too > much

relax guy nobody here is angry at the amount he made

1 more reply

Gigablah11y ago

Instead of questioning why others are getting so much, question why you're getting so little.

yesmade11y ago

chill out man. you are turning this into something personal. it was only a comment at the amount he got for cheating the review system. even the OP said he wasn't expecting that much.

stop jumping into the hate wagon everybody

2 more replies

j / k navigate · click thread line to collapse

88 comments

ejcx11y ago

I actually fixed the issue that was reported to LastPass.

When I DID see it, tried it myself with a quick shell script that that curled and backgrounded the same request a bunch of times, I just kind of chuckled. It was a good bug. Josip is top notch.

franjkovicOP11y ago

ejcx11y ago

Oh okay I was mistaken then.

homakov11y ago

> When I DID see it, tried it myself with a quick shell script that that curled and backgrounded the same request a bunch of times, I just kind of chuckled. It was a good bug

That's the problem with OWASP, when developer from a big company sees race condition for the first time and is surprised

monksy11y ago

BTW: I just subscribed to LastPass a few days ago. I'm pretty happy with the service.

jdubs11y ago

I worry that a malicious attacker could finger the service for potential victims.

2 more replies

MichaelGG11y ago

partisan11y ago

I would have thought that the opposite would be true; by having an CQRS/event sourcing system with eventual consistency would allow you to avoid posting duplicates to your database:

pyvpx11y ago

when did eventual consistency equate to race conditions, or even increased susceptibility to race conditions? I don't follow. could you explain your reasoning further?

MichaelGG11y ago

It's probably just an ease-of-use question. The more guarantees your database can deliver, the easier it is to reason about things and make sure you aren't being caught on a gotcha.

This is why things like VoltDB ("NewSQL") are pushing to keep SQL and ACID, and figure out a way to scale, instead of throwing it all aside and making the developer deal with consistency issues.

1 more reply

janoelze11y ago

appreciating the joke (?) in the comments. https://i.imgur.com/zWE5ABQ.png

unclesaamm11y ago

Wow, it seems like there is room here for a 3rd party vendor to implement promo code handling as a service, and to do it right once and for all.

d_luaz11y ago

No bounty for bug report? Should at least have a nominal fee of $100 (else no one would bother to report it).

reagan8311y ago

In most systems the reward is zero, so you can infer if a person has taken the time to submit a bug report it is because he/she is invested in seeing it fixed.

Context: I work at a decent sized company in SV on this type of problem.

nmjohn11y ago

Properly designed bug bounty programs are a cornerstone to any company who remotely cares about the security of their product, period.

The idea of misaligned incentives due to poor bug reports being free to submit is ignorant - and worse toxic, because it sounds so true to an executive who has no actual understanding of the issue.

4 more replies

d_luaz11y ago

So the best solution is not to have a reward? Or not to have a publicized reward? Or don't depend on the public on bug hunting? Or just hope on goodwill?

squiguy711y ago

diminoten11y ago

Clearly not, though.

Kiro11y ago

I'm a novice but would like to know how these issues can arise. What kind of backend setup is needed for it to be a problem? What is happening when a race condition occurs in these examples?

spdy11y ago

Its actually quite simple as example for the promo code the code looks like this:

1. Code sent.

2. Check if valid.

3. Redeem code.

4. Invalid code.

Now if i send 10 requests at the same time with the same code maybe 4-6 will hit the code part after 2.

Because normally there is no code part to recheck how often this code was used.

codenut11y ago

Can this issue be prevented if we use the promo code as the table primary key or document ID?

1 more reply

mike_hearn11y ago

>I'm a novice but would like to know how these issues can arise.

The problem is concurrency. Whenever you have multiple things happening at once, you have concurrency and programming concurrent system is always really hard.

1. Request for /reviews/add is received.

2. Database lock on the page reviews table is acquired.

3. Check if the user has already posted a review. If so, release the lock and abort (any good framework will release locks for you automatically if you throw an exception).

4. Add review to table.

5. Release lock.

At the point where the lock is acquired if another web server is in the middle of the operation, conceptually speaking the first one will stop and wait for the table to become available.

Of course once you have concurrency, you have all the joy that comes with it like various kinds of deadlock.

knoble11y ago

This is really helpful. Thanks for sharing!

benihana11y ago

The article links to an article that explains it really well:

https://defuse.ca/race-conditions-in-web-applications.htm

emmab11y ago

It would be cool if there was a browser addon that let you submit a form N times in parallel.

ejcx11y ago

    for i in `seq 1 16`;do
        curl.*&               #copied from chrome dev tools. & to background
    done

bburky11y ago

odonnellryan11y ago

If you go down to the "proof of concept" here it's not hard to test this: https://defuse.ca/race-conditions-in-web-applications.htm

SixSigma11y ago

why would it ?

andersonmvd11y ago

pilif11y ago

There are no guarantees in the SQL standard that queries with subqueries should be atomic.

The only truly safe way to protect yourself is to fix the schema in a way that you can make use of unique indexes. Those are guaranteed to be unique no matter what.

MichaelGG11y ago

>only truly safe way

Or put the whole thing in a transaction, right?

1 more reply

Osiris11y ago

underwater11y ago

Not every database is powered by SQL. Add to that sharding, caching, cross data center traffic and the problem becomes non trivial very quickly.

hobarrera11y ago

sharding is what confuses me the most. How would you avoid these race conditions with a distributed database?

3 more replies

zhoutong11y ago

Or just use a UNIQUE INDEX.

rblatz11y ago

hobarrera11y ago

inportb11y ago

So the review bug was a security issue but the username bug wasn't? I wonder what else the review bug affected.

franjkovicOP11y ago

I think they did not reward me because you cannot really hurt anyone by having multiple usernames.

joshschreuder11y ago

What about squatting on valuable ones? But probably not a big deal unless it relates to Pages.

georgerobinson11y ago

gislifb11y ago

Without knowing exactly how he did it I assume this is possible by doing a POST with cURL inside a loop or with parallel.

You can then get the exact request by using Chrome developer-tools. (Find the POST-request in the network-tab, right-click and select copy as cURL)

Rafert11y ago

numair11y ago

ekimekim11y ago

cynical answer: I've seen alot of races get "fixed" by adding a sleep() or similar

hobarrera11y ago

If you get three requests in at the same time, and sleep the tree for N (say, 400) miliseconds they'll all still run concurrently.

Adding a random time to sleep might work, but some requests would run noticeably slower.

1 more reply

jbkkd11y ago

tomcam11y ago

Now please fix race conditions everywhere else, like Baltimore.

yesmade11y ago

$3k for the facebook review bug. that's a little bit too much

- update

thanks for the downvotes guys. keep up the good work

franjkovicOP11y ago

The bounty actually surprised me, too. I expected between $1000-$2000. That is one of reasons I like reporting bugs to Facebook - they pay really good, critical bugs are fixed really fast (<1 day).

One time they paid me $5000 for a bug I never could have found, but they did internally based on my low severity report. (http://josipfranjkovic.blogspot.com/2013/11/facebook-bug-bou...)

mwsherman11y ago

1 more reply

yesmade11y ago

congratulations on both findings

totony11y ago

This bug actually seems quite critical imo, defeats the purpose of a feature and permits abuse/cheating

mikeash11y ago

Who are you to say that it's "too much," when it's their money than they can spend as they wish?

yesmade11y ago

> seems > too > much

relax guy nobody here is angry at the amount he made

1 more reply

Gigablah11y ago

Instead of questioning why others are getting so much, question why you're getting so little.

yesmade11y ago

chill out man. you are turning this into something personal. it was only a comment at the amount he got for cheating the review system. even the OP said he wasn't expecting that much.

stop jumping into the hate wagon everybody

2 more replies

j / k navigate · click thread line to collapse