The alleged NSA malware developers are at risk to be identified (opens in new tab)

(yousry.de)

108 pointsyousry9y ago42 comments

42 comments

This post seems lacking in the data required to make such a claim; I do not understand how it has gained so much traction.

Where is the actual research, and where are the probable identified candidates? Did I miss a data analysis part somewhere that explained the methodology, and probable attribution to actual people? This appears to be a basic string search of the code and some simple syntax analysis.

There are learning algorithms for stylometry, and they can probably be adapted to code. This article appears to state that "it might be possible to use these anomalies as clues", but does not elaborate on, how, why, or what any hypothesis is other than this.

exo7629y ago

Haven't analyzed author's claims, but in general programmer identification is solved problem:

https://www.youtube.com/watch?v=YMa04HovKfs [De-anonymizing programmers 32c3]

akerro9y ago

My first thoughts were about the demo you linked and about this one: https://www.youtube.com/watch?v=xipI-0HU010

micaksica9y ago

Awesome. Thanks for this. I missed this one.

CoryG899y ago

Looks to me like the author is posting initial findings (and if I am reading this right, withholding some).

It doesn't look like a crazy amount of time/resources have gone in, but it looks like a basic proof of concept to me. Perhaps it will get the ball rolling and someone else who reads this will figure it out.

zigzigzag9y ago

However, in contrast to 3.5 billions Internet users, only a few hundred experts have to be identified.

This is the sentence that lets you know the post can be safely ignored. Anyone who thinks there are only a few hundred people in the world capable of writing Linux exploits doesn't have a grip on the scale of the world at all.

rincebrain9y ago

The assertion was not that there are only a few hundred people, but that the organization responsible for this software employed at most a few hundred people to write it.

(There are other problems with the article's conclusions absent the data they withhold, but I don't agree that this is one of them.)

e: Actually, on further reflection, neither your interpretation of their statement nor mine is a reasonable conclusion, so I now agree with you that this is a flaw in their argument.

zigzigzag9y ago

But that isn't the approach the article takes - it tries to narrow down the list of possible authors from public data, not identify employees of organisations that may have a few hundred hackers.

CoryG899y ago

Perhaps it's possible to limit the search space by also looking only at experts likely (or possibly) have worked with the US government or NSA in the past or present. Then maybe you could get the list down to a reasonable number? For example, any experts that have never been to the US for extended periods of time can probably be excluded.

zigzigzag9y ago

How would you know? I've encountered at least one person who was without a doubt ex-GCHQ but didn't identify that anywhere.

micaksica9y ago

Agreed. There are probably 5-25K (yes, large range, but still order of magnitude higher) people in the Bay Area alone that are capable of writing exploits.

mseebach9y ago

Also, there's a huge difference in the number of people capable of secretly building exploits alone in their bedrooms at night (probably committing a crime), and those building them as a day job, where you can solicit feedback and advice from peers, reference well-organised documentation and study the original source code of previously successful exploits and freely discuss ideas and approaches with colleagues over lunch.

Which of course partially challenges this assumption in the article:

The developers of the malware [..] were discovered and not trained.

1 more reply

yousryOP9y ago

I'm currently working on anomaly detection algorithms and used the good opportunity (the Shadow Brokers release) to analyze a number of malware applications at once.

bitxbitxbitcoin9y ago

I'd love to see your results once you're ready to share them!

matt_wulfeck9y ago

The author appears to run "strings" on the binaries and then goes on to shoot a few theories in the dark:

> The developers of the malware are leading experts in the area of Linux, Network and Security development.

> They were discovered and not trained.

> Because the archive contains a collection of applications, the calculated result-set is reasonable small for further investigations.

drvdevd9y ago

Also:

> LinkedIn will show you the professional discipline, GitHub the shared libraries and their publicity.

I would guess that NSA has a firm grasp on this sort of basic OSINT problem and code attribution techniques.

wjnc9y ago

Retroactively scrubbing a programmers published work and social media participance is a red flag in itself.

1 more reply

lostboys679y ago

Like many NSA or GCHQ developers will have a public account on github

alfiedotwtf9y ago

After seeing this post, the malware devs may have unfollowed/unstarred the repos used in order to evade discovery.

It would have been interesting to have GitHub's star/follow history...

andruby9y ago

Github has a comprehensive open dataset [1]. I'm not sure if it keeps historical data, but I'm sure there are people hitting the API's and keeping the data archived :)

[1] https://www.githubarchive.org/

carlsborg9y ago

Nice forensic analysis and tutorial.

Note that parsing out strings from a binary and finding names from it gives you mainly false positives. e.g. from glibc

https://fossies.org/dox/glibc-2.24/C-identification_8c_sourc...

pulse79y ago

TLDR: Assumptions: "The developers of the malware are leading experts in the area of Linux, Network and Security development." and "They were discovered and not trained."

1 more reply

sschueller9y ago

Why is it a problem if they are identified? It is probably the only case where writing Malware doesn't get your in trouble with the government because they paid you to do it.

avh029y ago

a naive question: would sending this code through an obfuscater not mess up this methodology? (other than lib identification)

It clearly hasn't happened here, but wouldn't that be a reasonable step to cover tracks given this kind of analysis?

j / k navigate · click thread line to collapse

42 comments

micaksica9y ago

This post seems lacking in the data required to make such a claim; I do not understand how it has gained so much traction.

exo7629y ago

Haven't analyzed author's claims, but in general programmer identification is solved problem:

https://www.youtube.com/watch?v=YMa04HovKfs [De-anonymizing programmers 32c3]

akerro9y ago

My first thoughts were about the demo you linked and about this one: https://www.youtube.com/watch?v=xipI-0HU010

micaksica9y ago

Awesome. Thanks for this. I missed this one.

CoryG899y ago

Looks to me like the author is posting initial findings (and if I am reading this right, withholding some).

zigzigzag9y ago

However, in contrast to 3.5 billions Internet users, only a few hundred experts have to be identified.

rincebrain9y ago

The assertion was not that there are only a few hundred people, but that the organization responsible for this software employed at most a few hundred people to write it.

(There are other problems with the article's conclusions absent the data they withhold, but I don't agree that this is one of them.)

e: Actually, on further reflection, neither your interpretation of their statement nor mine is a reasonable conclusion, so I now agree with you that this is a flaw in their argument.

zigzigzag9y ago

But that isn't the approach the article takes - it tries to narrow down the list of possible authors from public data, not identify employees of organisations that may have a few hundred hackers.

CoryG899y ago

zigzigzag9y ago

How would you know? I've encountered at least one person who was without a doubt ex-GCHQ but didn't identify that anywhere.

micaksica9y ago

Agreed. There are probably 5-25K (yes, large range, but still order of magnitude higher) people in the Bay Area alone that are capable of writing exploits.

mseebach9y ago

Which of course partially challenges this assumption in the article:

The developers of the malware [..] were discovered and not trained.

1 more reply

yousryOP9y ago

I'm currently working on anomaly detection algorithms and used the good opportunity (the Shadow Brokers release) to analyze a number of malware applications at once.

bitxbitxbitcoin9y ago

I'd love to see your results once you're ready to share them!

matt_wulfeck9y ago

The author appears to run "strings" on the binaries and then goes on to shoot a few theories in the dark:

> The developers of the malware are leading experts in the area of Linux, Network and Security development.

> They were discovered and not trained.

> Because the archive contains a collection of applications, the calculated result-set is reasonable small for further investigations.

drvdevd9y ago

Also:

> LinkedIn will show you the professional discipline, GitHub the shared libraries and their publicity.

I would guess that NSA has a firm grasp on this sort of basic OSINT problem and code attribution techniques.

wjnc9y ago

Retroactively scrubbing a programmers published work and social media participance is a red flag in itself.

1 more reply

lostboys679y ago

Like many NSA or GCHQ developers will have a public account on github

alfiedotwtf9y ago

After seeing this post, the malware devs may have unfollowed/unstarred the repos used in order to evade discovery.

It would have been interesting to have GitHub's star/follow history...

andruby9y ago

Github has a comprehensive open dataset [1]. I'm not sure if it keeps historical data, but I'm sure there are people hitting the API's and keeping the data archived :)

[1] https://www.githubarchive.org/

carlsborg9y ago

Nice forensic analysis and tutorial.

Note that parsing out strings from a binary and finding names from it gives you mainly false positives. e.g. from glibc

https://fossies.org/dox/glibc-2.24/C-identification_8c_sourc...

pulse79y ago

TLDR: Assumptions: "The developers of the malware are leading experts in the area of Linux, Network and Security development." and "They were discovered and not trained."

1 more reply

sschueller9y ago

Why is it a problem if they are identified? It is probably the only case where writing Malware doesn't get your in trouble with the government because they paid you to do it.

avh029y ago

a naive question: would sending this code through an obfuscater not mess up this methodology? (other than lib identification)

It clearly hasn't happened here, but wouldn't that be a reasonable step to cover tracks given this kind of analysis?

j / k navigate · click thread line to collapse