I disagree--the bottleneck here is entirely the allocation. The copying is just a memcpy and it's very fast for small structs like this; like I said, it's not the same as a clone() in Rust, which is a deep copy. If you optimized the allocation away entirely (leaving only the copy cost), there wouldn't have been a significant performance problem and this blog would never have been written.
Actually, you'll find that in Rust, Box::new(stuff) will too often put stuff on the stack before copying it in the newly allocated memory. For large enough stuff, that can be slower than the allocation.