Carnegie Mellon’s Mayhem AI Wins DARPA’s Cyber Grand Challenge (opens in new tab)

(techcrunch.com)

166 pointsebakan9y ago26 comments

26 comments

Hey guys, member of the (currently unverified) third place team, Shellphish. If anyone has any questions, I (or another member of my team) would be glad to answer them. We'll also be giving a talk at DEF CON on Sunday after the CTF ends, where we'll be open sourcing our CRS!

yeukhon9y ago

Can you explain how this particular CTF work and how the system in general work against adversary? The article said insecure code and code filled with bugs are constantly being fed to the system. I don't really get it.

riffraff9y ago

I hope someone more knowledgeable can chime in, but AFAIU, each player acts as the manager of a certain set of services, and as an attacker against all the others.

Such services contain bugs, so what each player must do is identify the bugs, fix them or mitigate them, and at the same time exploit them to gain access to the boxes of the other players.

So basically the programs in the competition do

* vulnerability identification

* vulnerability mitigation

* identification of the best target to attack (presumably based on the first thing, not sure if other things factor in)

harsh16189y ago

First of all, congratulations for the awesome work. Do any of the components of your CRS make use of machine learning techniques? I read somewhere that mayhem uses deep learning but I'm not sure how exactly that would work in a program analysis scenario. I am assuming you used some form of symbolic execution (Edit: just realized it's angr, which is often useful in CTFs). How different was it from other general purpose SE systems (Klee etc)? Did you use any formal methods too?

bradneuberg9y ago

Is this both automated defense and offense via machine learning, or just automated defensive systems? If it includes automated offensive systems, what's to keep these kinds of systems from jumping outside of their sandboxes and compromising the outside world?

sanxiyn9y ago

For a flavor of automated offensive system, see this Automatic Exploit Generation paper: http://security.ece.cmu.edu/aeg/

David Brumley, PI of the research, went on to found ForAllSecure which is the company covered in the article.

bradneuberg9y ago

I'd love to learn more about the techniques actually being used in thse systems. Any good pointers to some scientific papers or review articles on the subject? I have a background in machine learning so am comfortable with technical papers.

sanxiyn9y ago

Here is 2015 competition postmortem from Trail of Bits: https://blog.trailofbits.com/2015/07/15/how-we-fared-in-the-...

mkagenius9y ago

Whats your view on complete automation vs human assisted automation? Which one is better to focus building on for a 5 year timeline?

crypto59y ago

What kind of AI was involved in your and competitors systems?

jmgrosen9y ago

If you mean AI in the sense of neural networks, Bayesian inference, etc., absolutely none in our CRS :) In retrospect, we could have made some better decisions about when to patch by using some of the simpler "AI" methods, but in terms of the actual core exploiting and defending, there's not much research into using AI in security.

yayitscaroline9y ago

It's funny that Brumley's first-place-winning robot CTF team is going to be competing against his first-place-winning human CTF team at DEFCON.

The DARPA team is headed up by professor David Brumley. He also leads the Carnegie Mellon CTF hacker group PPP (Plaid Parliament of Pwning) that often wins at DEFCON's CTFs. This article mentioned that the Mayhem robot is going to be battling the human CTF players at DEFCON. I wonder who he'll be rooting for.

cschmidt9y ago

As of this afternoon when I walked by Mayhem was in last place, and PPP was in second place

BobombXing9y ago

Brumley likes to imply that his company's team is the "CMU team." Either way he'll see it as a CMU win.

joeyrideout9y ago

I just came from a full day of talks at DEF CON, and a highlight for me was how the CGC servers were all lit up on stage behind the speakers of one room of the con [1]. It was incredibly stylish and impressive.

[1] https://twitter.com/joey_rideout/status/761710072237961216?s...

pingec9y ago

Video here: https://www.youtube.com/watch?v=xek4OcScCh4

deckar019y ago

This was a really amazing competition. Imagine running symbolic analysis and fuzzing like integration tests as part of a deploy process, then having fixes proposed algorithmically when a vulnerability is discovered.

Eclyps9y ago

I thought that the production of the competition was extraordinary. Seeing everything lit up on stage was straight out of a movie (in a good way). I thought that the event itself at Defcon was super weird, though. A lot of people, myself included, assumed that the event was going to be more real-time. In reality, the servers had been competing for hours already.

That being said, huge props to these amazing teams. It was so fascinating to see how each system reacted to the same situations and then either hunkered down to protect itself or go on the offensive. Really amazing stuff.

xtacy9y ago

I tried browsing the Darpa challenge's website to know more, but I couldn't find any information. Could someone please post a link to a detailed description of the challenge?

cschmidt9y ago

It is basically computers playing Capture the Flag (CTF) against each other. They are given binary programs with security flaws. They need to identify the flaws automatically and develop a patch for their own system. At the same they go out to crash the other teams. Normally humans do this, but the darpa challenge was to have computer systems do it autonomously.

nl9y ago

https://www.cybergrandchallenge.com/tech

Includes a link to the github for the challenge framework.

ChuckMcM9y ago

I'm really surprised they didn't call it Black Ice :-)

That is a pretty amazing result all in all. So at what point do we combine it with DeepMind and have something that owns the Internet?

brian_herman9y ago

The Mayhem is also competing in the CTF.

q3k9y ago

It's not doing that hot - currently last place, but not very far back in terms of points.

However (and impressively), it did patch at least one bug in a task (LEGIT_00007) before any other human team did.

mkagenius9y ago

I am very impressed by the visualisations - super computers churning data for visualisations!

rasz_pl9y ago

>Not the nicest thing to say about a champion AI that just took first place in an incredibly sophisticated virtual game

what? This was special olympics of CTF. All AI teams played at the same, terribad level, score differences were minimal.

j / k navigate · click thread line to collapse

26 comments

jmgrosen9y ago

yeukhon9y ago

riffraff9y ago

I hope someone more knowledgeable can chime in, but AFAIU, each player acts as the manager of a certain set of services, and as an attacker against all the others.

Such services contain bugs, so what each player must do is identify the bugs, fix them or mitigate them, and at the same time exploit them to gain access to the boxes of the other players.

So basically the programs in the competition do

* vulnerability identification

* vulnerability mitigation

* identification of the best target to attack (presumably based on the first thing, not sure if other things factor in)

harsh16189y ago

bradneuberg9y ago

sanxiyn9y ago

For a flavor of automated offensive system, see this Automatic Exploit Generation paper: http://security.ece.cmu.edu/aeg/

David Brumley, PI of the research, went on to found ForAllSecure which is the company covered in the article.

bradneuberg9y ago

sanxiyn9y ago

Here is 2015 competition postmortem from Trail of Bits: https://blog.trailofbits.com/2015/07/15/how-we-fared-in-the-...

mkagenius9y ago

Whats your view on complete automation vs human assisted automation? Which one is better to focus building on for a 5 year timeline?

crypto59y ago

What kind of AI was involved in your and competitors systems?

jmgrosen9y ago

yayitscaroline9y ago

It's funny that Brumley's first-place-winning robot CTF team is going to be competing against his first-place-winning human CTF team at DEFCON.

cschmidt9y ago

As of this afternoon when I walked by Mayhem was in last place, and PPP was in second place

BobombXing9y ago

Brumley likes to imply that his company's team is the "CMU team." Either way he'll see it as a CMU win.

joeyrideout9y ago

[1] https://twitter.com/joey_rideout/status/761710072237961216?s...

pingec9y ago

Video here: https://www.youtube.com/watch?v=xek4OcScCh4

deckar019y ago

Eclyps9y ago

xtacy9y ago

I tried browsing the Darpa challenge's website to know more, but I couldn't find any information. Could someone please post a link to a detailed description of the challenge?

cschmidt9y ago

nl9y ago

https://www.cybergrandchallenge.com/tech

Includes a link to the github for the challenge framework.

ChuckMcM9y ago

I'm really surprised they didn't call it Black Ice :-)

That is a pretty amazing result all in all. So at what point do we combine it with DeepMind and have something that owns the Internet?

brian_herman9y ago

The Mayhem is also competing in the CTF.

q3k9y ago

It's not doing that hot - currently last place, but not very far back in terms of points.

However (and impressively), it did patch at least one bug in a task (LEGIT_00007) before any other human team did.

mkagenius9y ago

I am very impressed by the visualisations - super computers churning data for visualisations!

rasz_pl9y ago

>Not the nicest thing to say about a champion AI that just took first place in an incredibly sophisticated virtual game

what? This was special olympics of CTF. All AI teams played at the same, terribad level, score differences were minimal.

j / k navigate · click thread line to collapse