> I believe the only level of test coverage that's worth it is 100% of the lines you want to be covered.
This statement I vehemently disagree with. It seems like the kind of thing someone would hit you with as an excuse for why testing isn't useful at all because it's not perfect. I have found in my experience that simple end to end tests give me the best bang for my buck, and even a test suite that covers 80% of the code is pretty good.
To paraphrase from the Google code reviewer guide: No code is perfect, just try to make it better than it was.
100% test coverage doesn't tell you if you have race conditions, security problems, or networking issues. 100% test coverage doesn't matter much in distributed systems, where you can individually test each component to 100% but still have the system not work as a whole. 100% coverage doesn't mean you're checking that all your returned results are verified and what they should be.
> The problem with having a goal that is not 100% is that it leaves room for interpretation and negotiation.
By handing out edicts like these, you're taking away tools like negotiation and common sense and replacing them with blanket rules. Time is a finite resource and test coverage is only one small part of testing. Going from 95% to 100% might take more time and provide less value than other kinds of testing and other concepts (like UX, or market fit, where you test that what you're building is usable and useful).
Just because you answer all the questions right on the test doesn't mean you have perfect knowledge of the situation. Thinking such is hubris and leads to errors/defects.
By excluding lines and saying only make sure you cover what you're testing you're intentionally making blind spots. Just let the coverage number be what it is. Look at your coverage reports. Check the coverage of key components.
> I think it is a perfectly fine decision to exclude some code from coverage if the tests are too expensive to write (because of the required setup for instance).
No, no, no, no, no x a million. This is the fallacy that if you unit test each component, then when you put it together it's perfect because each part is perfect. You still have to do integration tests and complicated setup tests. If a test takes a long time, try to make it shorter, don't try to avoid testing it. Anything you avoid testing is where bugs will nest and code will churn.
To sum up, most of the important, complicated, and hairy bugs are not found by unit testing. It is the simple bugs that are found by unit testing. These are important to get out of the way because a system with a thousand cuts can still kill you, but in no way is 100% coverage and tests passing mean anything other than 100% coverage and tests passing.
Thanks for the detailed answer! I have answered some of your points and some of the points made in the comments below in the article.
The article might have been perceived as more dogmatic than it meant to be. It is actually, I believe, a pretty pragmatic position. I tried to clarify the wording here and there, let me know if you still disagree with the point I'm making.
### 100% test coverage does not mean you're testing everything right.
Absolutely - this point is explicitly stated in this article. I even give an example situation showing how just checking the test coverage leads to missing an important test.
100% test coverage does not make your code invulnerable, and it should evidently not be your only metric. This article is only about the test coverage metric.
### A test suite that covers 80% is pretty good
Absolutely. It is a good number and a good goal for any codebase.
However, what about that remaining 20 %? Why are they not tested? Will it be clear in 2 months why they were not tested? In 6 months? In a year? While it may make perfect sense not to test them, you should be explicit about that decision and keep the reason in the code.
If you don't keep the test coverage _metric_ at 100%, then you leave it up to the code reviewer to challenge your test coverage assumption.
### 100% is a blanket rule that leaves no room for negotiation
Once again, the goal is not to cover 100% of the lines of code - it would be almost impossible. Thanks to `no cover` markers, you can still decide to exclude code from coverage. It actually makes this negotiation explicit in the code, as opposed to implicit during the code review phase coverage.
Consider the example below:
def toast(bread: str) -> str:
if toast == "brioche":
raise ValueError("...")
Let's say you're fixing a bug in this code, and you find out that the `if`
branch is not covered. You're left to wonder why. The developer did not have
enough time? They decided it was too trivial to need a test?With an explicit marker, the intent is clear and gives you insight with what is considered appropriate coverage for _this_ codebase:
def toast(bread: str) -> str:
if toast == "brioche": # pragma: no cover, trivial error case
raise ValueError("...")
### It is not feasible in an old codebaseRight, that's probably the case. However, this is not different from proper testing, monitoring, logging practices. Decide if it's important, then start with something attainable that brings you closer to the goal, and iterate.
Also, if it is not a desirable goal for the codebase, then for sure don't monitor test coverage!
### Enforcing 100% test coverage leads to bad tests
It bears repeating: this is not about testing 100% of the lines; this is about keeping the code coverage metric at 100%. I am not sure how that would lead to bad tests.
Putting too much focus on one particular metric might lead to gaming behavior. That does not mean that no metric you should be enforced. I believe that there isn't enough material and training about what it takes to write good tests. This is beneficial regardless of whether you enforce test coverage. Also, the more you talk about test coverage, the more you iterate and learn about what it takes to write good tests in your language and codebase.
What's sure is that it is straightforward to write bad tests. It takes a lot of skills, experience, and hygiene to write great, maintainable tests. That's a topic for another article.
### 100% coverage only matters for languages without a type checker
No. Sure, a type checker might catch some classes of bugs that would require a test in a language without type checking, but you still need to test correctness (among others).
### You're creating blind spots
Unless you're reviewing your test coverage report after every single commit, leaving explicit markers in the code and keeping the metric at 100% is actually a much safer practice.
When working in code that has been excluded, you'll immediately see the `no cover` marker, perhaps with a comment explaining why the code is excluded. This lets you reconsider the coverage decision.
Any regression in coverage will break the tests.
### You should not exclude code from coverage because of setup cost
This article is not about end-to-end vs. unit testing. I have provided some examples of code that I sometimes exclude from testing, but your mileage may vary.
Test coverage clearly matters even if it's not 100%, as anyone who has ever worked on an older codebase knows.
As long as you're not causing coverage regressions, it is perfectly acceptable to leave untested code that "should" be tested, but hasn't been changed in years.
Test it when you change it; don't test for the sake of achieving a coverage benchmark.
When writing test one should worry about the other registers too ...
This concern has been nagging me in the back of my mind, but I hadn't convinced myself that code-coverage was insufficient (in practice) until you framed it this way.
Now all kinds of supporting examples come to mind:
- divide-by-0 errors
- floating-point exceptions
- numeric underflow / overflow (both silent and signaled)
The alternative code paths exercised by these situations are generally outside the purview of code-coverage tools, because they reside in the silicon / firmware, process' default signal handlers, etc.
Test coverage matters even if it is not 100%, especially if it covers the most important use-cases. Cost and time always needs to be considered. 100% coverage may not be worth it. Coverage based on LOC is also an arbitrary metric in some sense, the author himself has examples of that. From a business POV, use-case coverage is more important.
It resulted in a mug being made in the same style as our company mugs but with the words "Fuck Coverage" instead of the company name.
Eyebrows were raised when it was mistaken for a 'real' mug and taken in to a board meeting by a manager however..
At a previous shop, we had a simple test that ensured user could log in for a while, until we created more coverage. We didn't test every aspect of the code. This simple sanity check saved us once or twice.
Should we have not written this test just because we couldn't test every conceivable piece of code?
Don't let perfect be the enemy of good...
Are you covering 100% of code interactions? How much of your integration logic are you mocking? Are you covering 100% of HTML and interface logic? How much code is not in your code? What about packages and imports? It's incredibly arbitrary for no good reason except to slow development of a product that may depend on fast development for survival in trade to test for bugs that honestly, actually, may never happen. To be clear, testing is good, but you write tests to serve the code and product, not the other way around.
Your team ends up spending half their time writing painful depressing tests that are bad afterthoughts. Generally with TONS of mocks and little real meaning.
That said, if you have a single test, that better than no tests. And 90% is better than 80%. Yes, 100% is ideal, but if you let perfect be the enemy of good, than you waste time when low hanging fruit breaks.
raise ValueError(' '.join('meaningful message with:', parameter)
Is a stupid mistake that I still make all too often inside an exception handler.While that statement is itself is somewhat questionable, it's a little better and IMO reflects the purpose of unit tests better. Usually, if I'm finding it hard to write or setup a test, that's a signal that something's off. Also, UTs are absolutely lovely when you're refactoring code - as long as code tests invariants and publicly visible behavior instead of implementation. So even if the part you're refactoring has say 60%, you'd still be in a much more confident of making changes.
Code coverage under anything other than unit tests can be very misleading - for ex I've seen places where system & integration tests are run on instrumented code (and little to no unit tests) and since a huge swathe of code will be hit (mostly happy path), there's a false sense of code quality and safety which is probably more dangerous than no tests.
Overall, "When a measure becomes a target, it ceases to be a good measure." applies most often to test coverage.
Eg. complex math functions, it'd be very easy for them to be not quite right and you would never notice without tests, and also they will probably never change.
An area that would be stupid to test (with automated tests) for example, would be a UI menu.
* The test (ie. replicating a user pressing buttons via code) is probably gonna be more complex and less maintainable than the code its testing.
* If there is a bug it will be obvious in human acceptance testing.
* UI menus are gonna change drastically during development.
Anyway, I could see a misguided requirement for 100% code coverage massively hampering a project and potentially killing it.
You can even do model based testing.
For example, you're testing a word processor. You know that your state is no document. Then you click the open file button. Press some other buttons, and knowing what button it is, you should expect some change.
That being said, usually you can "press" the buttons virtually, for example sending window messages or calling the same functions as the buttons wire up to, rather than knowing the pixel layout.
People have been doing these things for years.
I know UI stuff breaks a lot, but my point is that when it does break, its obvious. Bugs should be easily caught in human acceptance testing.
Also an automated test UI is never going to be the same as a human test, a lot of UI bugs can result from the vagaries of various physical devices touchscreens / gamepads / motion controls etc.
Also anything that tests such high level functionality is going to be very complicated and touch many parts of your code, it will probably be more complicated and less maintainable than the code its testing, which defeats the purpose imo, do you want to write tests for your tests?
UI stuff tends to change radically over development, so you'd be more than doubling your work with all the testcode youd have to write.
and it would still never replicate an actual human running the actual game, you absolutely are still going to be doing human acceptance tests + the closer you try and get automated tests to that the more complex and less maintainable they get.
Well, with Python, the compiler doesn't do type checking, so 100% coverage (or close) is very important. One of his cases of "what do you not need to check" is error-handling. Yet, having something in production that should raise an exception that gets handled versus having the exception blow up and crash the system is a critical system stability defect. And in fact, in my personal code, since I tend to try to create exception messages that are as specific as possible, I must admit with chagrin that screwing up a ''.join() to create the exception message is one of my most common errors. I at least touch-test my exception code, without exception.
And then... line coverage? Oh my, that is just the beginning. I pretty much always use the hypothesis package so that everything get exercised with a large variety of inputs. Inputs that I didn't have to think up, and avoid my pre-existing biases of what might get thrown at the function.
Line coverage does do much to really ensure that you have reached "interesting conditions". I wish Python had a good tool for that kind of instrumentation. Line coverage is really only a "touch-test" and mainly useful as a check-in gate, not as production testing.
Intuitively I agree with you: In my personal experience, the more dynamically typed a language is, the more unpleasant surprises I get at runtime.
Some years ago I did a project in MLTON [0], and I was delighted by how many bugs were caught at compile that that, in other languages, would likely go unnoticed until runtime.
Conversely perfect or high coverage is not a positive signal per se, but rather just the absence of a negative signal
Is there a difference between 79.9% and 80%? Maybe not. My team is increasing test coverage for a legacy code base. 60% to 80% DOES matter. And it’s not just the fact that the tests flags regression, but the very process of increasing test coverage uncovers bugs / dead code.
The solution to this dilemma is to make it easy to mark code as "OK not to be tested". It's annoying to have test annotations in code, but I'd expect mostly entire modules will be exempted, so it's not that big a deal. And this way, the coder has to either write tests or make a specific decision not to.
This is ridiculous. He writes unreachable code and says he doesn't test it because it's "too simple". No, you're not testing it because you can't test it, because it's code that will NEVER run. Instead of deleting the pointless code he marks it as not needing test coverage.
I'm serious. I think we're being trolled here everyone.
Some lines of code should be tested twice to ensure they're correct.
Translation: "100% coverage" is a vanity metricThe author makes a good point -- everything should either be covered by tested or explicitly marked as excluded. Having a gray area of code that is included but not covered by tests is a recipe for missing something. That much seems obvious.
The author argues about the horrors of going from 80% to 79.9%, but I don't see how that is any different from the suggestion to explicitly exclude code. The excluded code is still in the gray area of code that is included but not covered by tests. What happens when someone starts making breaking changes to excluded code?