A New Development for Coverity and Heartbleed (opens in new tab)

(blog.regehr.org)

125 pointsneuroo12y ago46 comments

46 comments

Interesting approach! Kudos to Coverity for jumping on it so quickly.

Taint analysis is notoriously prone to false positives; as well as the reasons listed in this post, there are many situations where relations between variables mean that tainted data doesn’t cause problems. [For example, the size of the memcpy target (bp) is known to be greater than payload; so even though payload is tainted, there isn't a risk of a write overrun.] But even noisy warnings can be very useful — when we first implemented simple taint analysis in PREfix a decade ago, the first run was 99% false positives but one of the real bugs we found was in a system-level crypto module. So with the increased attention to these kinds of bugs after Heartbleed, seems like a great time for more attention to these classes of bugs.

hackinthebochs12y ago

>first run was 99% false positives but one of the real bugs we found was in a system-level crypto module.

Thinking of it as a false positive seems like the wrong perspective. The static analyzer is a tool that flags usages that are not proven to be correct. The fact that it turned out to be valid isn't the issue, the issue is that your code did not prove it valid to the satisfaction of the analyzer. This isn't necessarily a failing of the analyzer, but an indication that your code should be written in a different way, or provided more "evidence" that its correct (i.e. if guards/size checks).

The goal should be to write code in such a way that whatever tool you're using can prove it correct. Sure, the better the tool the easier this process is. But we really need to fundamentally rethink how we approach this problem.

arghnoname12y ago

I was once talking to someone with a lot of experience in this field and he said that false positives were one of their biggest problems. If you have too many false positives programmers end up deciding the analyzer is full of crap and either dismiss the results entirely or gloss past many ultimately useful results.

A static analyzer that will actually be used can't have too many false positives, and this is the big challenge with these things. He said that allowing some false negatives (to cut down on false positives) made the tools more effective in actually solving problems.

That said, with something like openSSL, you do sort of just wish the programmers would deal with it. Language design should include elements to make these sorts of static analyses easier.

1 more reply

skybrian12y ago

I think it works even better if you can get help from the type system. For example, the SafeHtml interface in GWT [1] gives you some safety from Java's type checking and can also make additional static analysis easier. (Then it becomes an exercise in making sure the API is used as intended.)

Perhaps something similar could be done using typedefs in C?

[1] http://www.gwtproject.org/javadoc/latest/com/google/gwt/safe...

dbaupp12y ago

Typedefs in C are just aliases, so given `typedef int foo;` one can freely use `int`s and `foo`s interchangeably, i.e. no checking by the compiler.

That said, one could use actual wrapper structs around the various types.

2 more replies

pjungwir12y ago

> additional locations in OpenSSL are also flagged by this analysis, but it isn’t my place to share those here.

Why do I get the feeling that we're going to see three months of new OpenSSL vulnerabilities, like we saw with Rails last year? I'm sure Heartbleed plus all the bad press about code quality means a lot of people are suddenly looking. Assuming there is more to find, does anyone have any advice for how we might prepare for it?

vladd12y ago

In case you want to try out static analysis on your own code-base, here's a link with the list of most popular tools in this field, grouped by language: http://en.wikipedia.org/wiki/List_of_tools_for_static_code_a...

LVB12y ago

Visual Studio now ships with a reasonably good static analyzer built in. We're using it more than our aging copy of PC-Lint.

(Carmack's review: http://www.altdevblogaday.com/2011/12/24/static-code-analysi...)

ygra12y ago

Can you tell VS by now to ignore warnings in 3rd-party libs? Everything that's in Qt's headers is pretty much out of my control but those were by far the most common warnings (or at least I didn't see many real warnings amidst them). At least that's how it's in VS 2010 and it kept me from turning code analysis on more often.

TwoBit12y ago

VC++ analysis is useful aside from the fact that it mistakenly thinks every pointer usage is a potential null pointer, but I've gotten better results with clang static analysis.

1 more reply

Joeri12y ago

I configured jshint inspection in a pre-commit hook on my team's code repository. Yesterday I checked the logs and in the last 10 days it prevented 15 commits, a few of which were actual bugs. For me static analysis is a no-brainer. The only challenge is finding the right set of settings that you don't get too many false positives.

Eridrus12y ago

Coverity also has a free trial you can do online with your own code: http://softwareintegrity.coverity.com/free-trial-coverity.ht...

kyberias12y ago

Interesting: "As you might guess, additional locations in OpenSSL are also flagged by this analysis, but it isn’t my place to share those here."

nodata12y ago

Anyone know if these have been bug reported?

neurooOP12y ago

Not yet, we are looking at them right now.

lbarrow12y ago

It's super cool to see the power of advanced static analysis these days. Props to the Coverity team for using the Heartbleed trainwreck to motivate new research on these problems.

That said, are there other ways to fix this class of problem? We have choices. We can continue to build ever-more-advanced tools for patching over the problems of C and C++, or we can start using languages that simply do not have those problems.

There will always be a need for C and C++ in device drivers, microcontrollers, etc. But there's no compelling reason why SSL implementations in 2014 should use languages designed to run on mainframes in 1973.

pjmlp12y ago

> There will always be a need for C and C++ in device drivers, microcontrollers, etc. But there's no compelling reason why SSL implementations in 2014 should use languages designed to run on mainframes in 1973.

Except that safer systems programming languages are older than C with bounds checking by default, having compilers that allowed to disable them if really really required[1]:

Algol (1960)

PL/I (1964)

Modula-2 (1978)

Mesa (1979)

Even VAX, B6500 and 68000 assembly have support for doing bounds checking.

[1] Not the first version of Algol though, as according to Hoare Turing Award speech, customers didn't want unsafe features:

<quote> A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to—they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law. </quote>

pcwalton12y ago

Well, usually this stuff is written in C because other languages come with big runtimes that make them unsuitable for utility libraries that need to be callable by everyone and from everywhere.

That said, we're of course working on changing that with Rust. But I should note that memory safety without garbage collection is just hard: it requires the entire language design to be balanced on a delicate precipice. It's not surprising that it's taken a long time to get there.

AnthonyMouse12y ago

> But I should note that memory safety without garbage collection is just hard: it requires the entire language design to be balanced on a delicate precipice.

I was thinking about this recently and I think a large part of the problem is that C arrays are too weakly typed. Array should be a different type than pointer and they shouldn't be convertible. In particular, you shouldn't be able to subscript a pointer, and the in-memory representation of an array should begin with its length. At that point the compiler can include a runtime bounds check for every array access that it can't prove is safe at compile time.

3 more replies

hf12y ago

Trevor Perrin (of TACK fame) wrote TLS Lite in Python.

I submitted a link to TLS Lite a few days ago, but, alas, showed poor judgement in timing:

https://news.ycombinator.com/item?id=7564740

Direct link: http://trevp.net/tlslite/

I'm actually rather anxious to hear the knowledgeable crowd discuss this fine project.

tptacek12y ago

It's fantastic if you want to build TLS testing tools, or if you want a codebase to reason about TLS with.

1 more reply

joveian12y ago

You can fix almost any issue in C and use almost any programming paradigm. Some people consider this a bug but I consider it a feature. There is no reason a SSL implementation written in C should be vulnerable to buffer overflow. Since c99's anonymous data allocation it is even possible to have a string plus length struct and a single macro to convert C strings to that structure that can be used in function arguments. It is true that C at best doesn't help correct behavior and sometimes actively encourages incorrect behavior and I think this is a serious issue in the language. However, other languages take a "let me do things right for you" approach that works well as long as your definition of right sufficiently matches the language. But as paradigms come and go it is easy to hit corner cases in such languages while C still works or can be adapted fairly easily, which makes it a good choice for core functionality if not everything.

IMO, C fails at being sufficiently low level to do this as well as it could. I don't think C will ever be replaced by a higher level language; it will be replaced with a lower level language that is better at incorporating static analysis of particular usage patterns into the language. To put it another way: C makes you do a bunch of "extra" work compared to other languages without really helping you do that work; other popular languages I know of try to not make you do that "extra" work, which is often a good thing but not always. A true replacement for C will need to still make you do all the "extra" work but help you make sure that work is correct.

E.g.memory allocation: it is not that manual memory allocation and deallocation is necessarily unreliable, but that there is no single way to make it reliable that works well for all programs. But there is also no way to do automatic memory managment that works well for all programs.

I would never recommend C++, but my sense is that current popularity of C++ might be connected to templates which are flexible and powerful in a different way than C or any other language I know of.

(also C was designed for "minicomputers" not mainframes, so not really all that different from its modern usage)

fpgaminer12y ago

> c99's anonymous data allocation

I googled that phrase, and came up empty handed. Could you give an example?

2 more replies

apaprocki12y ago

Is this implemented in Coverity by a local model? (There is reference to a model being applied) Or was the actual product modified to support this? Can Coverity customers get ahold of this now?

neurooOP12y ago

The "model" makes reference of the model injection for memcpy.

The modification made by the team is referenced in John's blog post "Their insight is that we might want to consider byte-swap operations to be sources of tainted data".

As Andy said (and quoted), that's a modification that we need to evaluate overall to look at its impact in term of false positives (FP). It will probably be made available however under some options if it doesn't pass our acceptance tests for FP rate though... a bit too early to say.

apaprocki12y ago

Thanks, I was just curious if customers could play with these kind of experiments if they understood the FP potential. I really like Coverity's output and always like new ways to tease out potential bugs.

j / k navigate · click thread line to collapse

46 comments

jdp2312y ago

Interesting approach! Kudos to Coverity for jumping on it so quickly.

hackinthebochs12y ago

>first run was 99% false positives but one of the real bugs we found was in a system-level crypto module.

arghnoname12y ago

That said, with something like openSSL, you do sort of just wish the programmers would deal with it. Language design should include elements to make these sorts of static analyses easier.

1 more reply

skybrian12y ago

Perhaps something similar could be done using typedefs in C?

[1] http://www.gwtproject.org/javadoc/latest/com/google/gwt/safe...

dbaupp12y ago

Typedefs in C are just aliases, so given `typedef int foo;` one can freely use `int`s and `foo`s interchangeably, i.e. no checking by the compiler.

That said, one could use actual wrapper structs around the various types.

2 more replies

pjungwir12y ago

> additional locations in OpenSSL are also flagged by this analysis, but it isn’t my place to share those here.

vladd12y ago

LVB12y ago

Visual Studio now ships with a reasonably good static analyzer built in. We're using it more than our aging copy of PC-Lint.

(Carmack's review: http://www.altdevblogaday.com/2011/12/24/static-code-analysi...)

ygra12y ago

TwoBit12y ago

VC++ analysis is useful aside from the fact that it mistakenly thinks every pointer usage is a potential null pointer, but I've gotten better results with clang static analysis.

1 more reply

Joeri12y ago

Eridrus12y ago

Coverity also has a free trial you can do online with your own code: http://softwareintegrity.coverity.com/free-trial-coverity.ht...

kyberias12y ago

Interesting: "As you might guess, additional locations in OpenSSL are also flagged by this analysis, but it isn’t my place to share those here."

nodata12y ago

Anyone know if these have been bug reported?

neurooOP12y ago

Not yet, we are looking at them right now.

lbarrow12y ago

It's super cool to see the power of advanced static analysis these days. Props to the Coverity team for using the Heartbleed trainwreck to motivate new research on these problems.

pjmlp12y ago

Except that safer systems programming languages are older than C with bounds checking by default, having compilers that allowed to disable them if really really required[1]:

Algol (1960)

PL/I (1964)

Modula-2 (1978)

Mesa (1979)

Even VAX, B6500 and 68000 assembly have support for doing bounds checking.

[1] Not the first version of Algol though, as according to Hoare Turing Award speech, customers didn't want unsafe features:

pcwalton12y ago

Well, usually this stuff is written in C because other languages come with big runtimes that make them unsuitable for utility libraries that need to be callable by everyone and from everywhere.

AnthonyMouse12y ago

> But I should note that memory safety without garbage collection is just hard: it requires the entire language design to be balanced on a delicate precipice.

3 more replies

hf12y ago

Trevor Perrin (of TACK fame) wrote TLS Lite in Python.

I submitted a link to TLS Lite a few days ago, but, alas, showed poor judgement in timing:

https://news.ycombinator.com/item?id=7564740

Direct link: http://trevp.net/tlslite/

I'm actually rather anxious to hear the knowledgeable crowd discuss this fine project.

tptacek12y ago

It's fantastic if you want to build TLS testing tools, or if you want a codebase to reason about TLS with.

1 more reply

joveian12y ago

I would never recommend C++, but my sense is that current popularity of C++ might be connected to templates which are flexible and powerful in a different way than C or any other language I know of.

(also C was designed for "minicomputers" not mainframes, so not really all that different from its modern usage)

fpgaminer12y ago

> c99's anonymous data allocation

I googled that phrase, and came up empty handed. Could you give an example?

2 more replies

apaprocki12y ago

Is this implemented in Coverity by a local model? (There is reference to a model being applied) Or was the actual product modified to support this? Can Coverity customers get ahold of this now?

neurooOP12y ago

The "model" makes reference of the model injection for memcpy.

The modification made by the team is referenced in John's blog post "Their insight is that we might want to consider byte-swap operations to be sources of tainted data".

apaprocki12y ago

j / k navigate · click thread line to collapse