Where is the actual research, and where are the probable identified candidates? Did I miss a data analysis part somewhere that explained the methodology, and probable attribution to actual people? This appears to be a basic string search of the code and some simple syntax analysis.
There are learning algorithms for stylometry, and they can probably be adapted to code. This article appears to state that "it might be possible to use these anomalies as clues", but does not elaborate on, how, why, or what any hypothesis is other than this.
https://www.youtube.com/watch?v=YMa04HovKfs [De-anonymizing programmers 32c3]
It doesn't look like a crazy amount of time/resources have gone in, but it looks like a basic proof of concept to me. Perhaps it will get the ball rolling and someone else who reads this will figure it out.
This is the sentence that lets you know the post can be safely ignored. Anyone who thinks there are only a few hundred people in the world capable of writing Linux exploits doesn't have a grip on the scale of the world at all.
(There are other problems with the article's conclusions absent the data they withhold, but I don't agree that this is one of them.)
e: Actually, on further reflection, neither your interpretation of their statement nor mine is a reasonable conclusion, so I now agree with you that this is a flaw in their argument.
Which of course partially challenges this assumption in the article:
The developers of the malware [..] were discovered and not trained.
> The developers of the malware are leading experts in the area of Linux, Network and Security development.
> They were discovered and not trained.
> Because the archive contains a collection of applications, the calculated result-set is reasonable small for further investigations.
> LinkedIn will show you the professional discipline, GitHub the shared libraries and their publicity.
I would guess that NSA has a firm grasp on this sort of basic OSINT problem and code attribution techniques.
It would have been interesting to have GitHub's star/follow history...
Note that parsing out strings from a binary and finding names from it gives you mainly false positives. e.g. from glibc
https://fossies.org/dox/glibc-2.24/C-identification_8c_sourc...
It clearly hasn't happened here, but wouldn't that be a reasonable step to cover tracks given this kind of analysis?