Concurrency and the shared state nature of torrents are definitely what makes it all hard and tricky. I've rewritten this part of DHT completely recently, but according to your experience looks like this time won't be the last.
For block requesting, rqbit has a pretty simple algorithm https://github.com/ikatson/rqbit/blob/main/crates/librqbit/s..., and I didn't notice it in benchmarks, thanks to Rust being fast by default I guess. I admit though, never looked how other clients do it, maybe the rqbit algorithm is too naive.