As opposed to using a simple or recursive mutex while forgoing concurrent reads, in this case. There was an order of magnitude speed difference in raw lock/unlock performance.
Obviously this can still be useful in read-heavy systems where the lock has to be held for long spans, but for our synchronized collections, we switched to std::mutex, which was massively faster.
Have you tried GCD barriers for this? A custom parallel queue basically gives you a reader/writer lock, with dispatch_sync or _async acting as a reader lock, and dispatch_sync_barrier or _async_barrier acting as a writer lock. GCD has a heavy emphasis on performance so I'd be interested to know how the speed of that approach is.
I recently reimplemented some of the caches within RestKit using a `NSMutableDictionary` guarded by a dispatch queue instead of a `NSCache`/`NSMutableDictionary` + `NSRecursiveLock`. The general idea is that you use a concurrent dispatch queue to provide concurrent read access and then use barriers to obtain an exclusive write lock to the resource. There were significant performance benefits from the change as I was able to shift the cache updates on a miss into the background. The dispatch queue approach also outperformed `NSCache` significantly under the workloads I was testing -- it appears that some of its internal calculations on add/remove can be quite costly if you are adding/removing rapidly.
Its also very flexible, as you can do `dispatch_sync` if you need a synchronous fetch of the resource or use callback blocks to go as async as possible.