Failing != flaking. If your tests interact with any level of randomness (seed data, time based constraints, etc) you're going to find the occasional test that doesn't work and subsequently works on the rebuild.
If something is consistently failing I would assume this tool does not disable it.