Malicious compliance is in fact a useful escape hatch here, a maintainer can release a 1.0 where the entire API is "test 1 + 1 == 2". That, too, is useful information.
But the package registry checks all API tests against the last version and rejects the registration if they change at all. That can be relaxed for non-semantic parts of the test, like a description of what the test means, but none of the code is allowed to change. It would be better if this were based on the AST, so that whitespace tweaks don't trigger a build failure, that's practical to achieve in most languages.
Refactoring an API test isn't worth losing the guarantees a system like this provides, and it's only the API tests which come with any restrictions, maintainers may do as they please with the rest of the test suite. An improved API test has to be provided as a new test. Part of the proposal is that users can refer to API tests in their own code, as a way to determine if tests they rely on break in a major release, so the tests need unique names, which means they can be rearranged in the file. It also means that if there's a typo in the name, or the name sucks, well, you're stuck with that until the next major release, and even then it goes on to live in infamy, forever, in the obsolete-API portion of the test suite. Not ideal, but it can't be avoided.