ah, that make sense, I think another user point out that spawning n goroutine does not actually spawn n physical threads but rather queue n task to m thread in the pool, so if we exhaust m thread, n-m task will be blocked.
Thanks for the explanation
I wonder what is the point where each trade off make sense (ie. what is consider heavy computation vs light computation, it probably is related to OS thread allocation time)