1. btree; see the Rust version which uses a bump allocator for example
2. Doesn't matter whether it's exactly one language.
> You are allowed an opinion about what is or is not compelling.
It's not a matter of opinion. The definitional purpose of benchmarks is to indicate something about reality; if you contrive rules that cause the benchmarks to deviate from reality, they lose their utility as benchmarks. I've demonstrated that the rules are contrived (i.e., they prohibit real-world, idiomatic optimizations), so I think we can say as a matter of fact that these benchmarks aren't useful.
Of course, no one can force anyone else to see reason (but I don't have any interest in talking with unreasonable people).