Taint analysis is notoriously prone to false positives; as well as the reasons listed in this post, there are many situations where relations between variables mean that tainted data doesn’t cause problems. [For example, the size of the memcpy target (bp) is known to be greater than payload; so even though payload is tainted, there isn't a risk of a write overrun.] But even noisy warnings can be very useful — when we first implemented simple taint analysis in PREfix a decade ago, the first run was 99% false positives but one of the real bugs we found was in a system-level crypto module. So with the increased attention to these kinds of bugs after Heartbleed, seems like a great time for more attention to these classes of bugs.
Thinking of it as a false positive seems like the wrong perspective. The static analyzer is a tool that flags usages that are not proven to be correct. The fact that it turned out to be valid isn't the issue, the issue is that your code did not prove it valid to the satisfaction of the analyzer. This isn't necessarily a failing of the analyzer, but an indication that your code should be written in a different way, or provided more "evidence" that its correct (i.e. if guards/size checks).
The goal should be to write code in such a way that whatever tool you're using can prove it correct. Sure, the better the tool the easier this process is. But we really need to fundamentally rethink how we approach this problem.
A static analyzer that will actually be used can't have too many false positives, and this is the big challenge with these things. He said that allowing some false negatives (to cut down on false positives) made the tools more effective in actually solving problems.
That said, with something like openSSL, you do sort of just wish the programmers would deal with it. Language design should include elements to make these sorts of static analyses easier.
Perhaps something similar could be done using typedefs in C?
[1] http://www.gwtproject.org/javadoc/latest/com/google/gwt/safe...
That said, one could use actual wrapper structs around the various types.
Why do I get the feeling that we're going to see three months of new OpenSSL vulnerabilities, like we saw with Rails last year? I'm sure Heartbleed plus all the bad press about code quality means a lot of people are suddenly looking. Assuming there is more to find, does anyone have any advice for how we might prepare for it?
(Carmack's review: http://www.altdevblogaday.com/2011/12/24/static-code-analysi...)
That said, are there other ways to fix this class of problem? We have choices. We can continue to build ever-more-advanced tools for patching over the problems of C and C++, or we can start using languages that simply do not have those problems.
There will always be a need for C and C++ in device drivers, microcontrollers, etc. But there's no compelling reason why SSL implementations in 2014 should use languages designed to run on mainframes in 1973.
Except that safer systems programming languages are older than C with bounds checking by default, having compilers that allowed to disable them if really really required[1]:
Algol (1960)
PL/I (1964)
Modula-2 (1978)
Mesa (1979)
Even VAX, B6500 and 68000 assembly have support for doing bounds checking.
[1] Not the first version of Algol though, as according to Hoare Turing Award speech, customers didn't want unsafe features:
<quote> A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to—they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law. </quote>
That said, we're of course working on changing that with Rust. But I should note that memory safety without garbage collection is just hard: it requires the entire language design to be balanced on a delicate precipice. It's not surprising that it's taken a long time to get there.
I was thinking about this recently and I think a large part of the problem is that C arrays are too weakly typed. Array should be a different type than pointer and they shouldn't be convertible. In particular, you shouldn't be able to subscript a pointer, and the in-memory representation of an array should begin with its length. At that point the compiler can include a runtime bounds check for every array access that it can't prove is safe at compile time.
I submitted a link to TLS Lite a few days ago, but, alas, showed poor judgement in timing:
https://news.ycombinator.com/item?id=7564740
Direct link: http://trevp.net/tlslite/
I'm actually rather anxious to hear the knowledgeable crowd discuss this fine project.
IMO, C fails at being sufficiently low level to do this as well as it could. I don't think C will ever be replaced by a higher level language; it will be replaced with a lower level language that is better at incorporating static analysis of particular usage patterns into the language. To put it another way: C makes you do a bunch of "extra" work compared to other languages without really helping you do that work; other popular languages I know of try to not make you do that "extra" work, which is often a good thing but not always. A true replacement for C will need to still make you do all the "extra" work but help you make sure that work is correct.
E.g.memory allocation: it is not that manual memory allocation and deallocation is necessarily unreliable, but that there is no single way to make it reliable that works well for all programs. But there is also no way to do automatic memory managment that works well for all programs.
I would never recommend C++, but my sense is that current popularity of C++ might be connected to templates which are flexible and powerful in a different way than C or any other language I know of.
(also C was designed for "minicomputers" not mainframes, so not really all that different from its modern usage)
I googled that phrase, and came up empty handed. Could you give an example?
The modification made by the team is referenced in John's blog post "Their insight is that we might want to consider byte-swap operations to be sources of tainted data".
As Andy said (and quoted), that's a modification that we need to evaluate overall to look at its impact in term of false positives (FP). It will probably be made available however under some options if it doesn't pass our acceptance tests for FP rate though... a bit too early to say.