In our own workloads, writers are always going after the same pages in their index updates, which inevitably led to deadlocks in BerkeleyDB. As a result, we get much higher throughput with fully serialized writers than with "concurrent" writers. A microbench might show greater concurrency on simple write tasks, but in a real live system with elaborate schema, there's no payoff for us.
As always, you have to profile your workload and see where the delays and bottlenecks really are. Taking a single mutex instead of continuously locking/unlocking all over the place was a win for us.