I have a need for quickly aggregating counters across multiple goroutines. `atomic.AddUint64(¢ral, threadLocal)` works quite well for that, better than pushing to a channel, better than deferring the aggregation until a sync point.
Any needs i have for alignment are derived solely from the requirements of the atomic package.