We apply the Occam's razor here. For a given problem, a smaller program that solves the problem is a preferred solution. It is not a very good metric, and it is fine to disagree with it. But getting a "best" solution, requires one to define what "best" is, and there is no such agreed definition for code.
> The majority of "useful" tests either are stand-ins for what the language should have given you anyways (proper type checking), or they find and fix known bugs, e.g. the intersection of three features.
Find and fix known bugs is one aspect of it. You are not accounting the entire philosophy of test driven development, where one specifies functionality as tests upfront, and later writes code to pass those tests. Anyhow, we have found large projects like Discourse [1], Gitlab [2], Diaspora [3] have tests that specify code behavior well enough so that RbSyn can synthesize them. Very few tests are type-checking, bug fixing and intersection of 3 features.
[1]: https://github.com/discourse/discourse [2]: https://github.com/gitlabhq/gitlabhq [3]: https://github.com/diaspora/diaspora