The latest Go implementation (Go 1.7) has made Go a lot faster. I would argue that it closer the speed of executables generated with GCC (gcc/g++) than OpenJDK, the Oracle JVM, Mono or the .NET compiler for C#.
Go (the language) can be made just as fast as C (the language), for many cases. Go has the advantage of making it much easier to use multiple processors, though.
Go 1.7 performs nowhere near the set of optimizations that GCC and LLVM do. GCC/LLVM have a huge number of algebraic simplifications (InstCombine), aggressive alias analysis, memory dependence analysis, instruction scheduling, an optimized instruction selector, a highly tuned register allocator with stuff like rematerialization, SCCP, etc. etc. It will take years and years for Golang to come close.
> Go (the language) can be made just as fast as C (the language), for many cases.
No, it can't. The M:N scheduling model will always have some overhead relative to 1:1 if you don't need the performance profile of C10K-style servers. The dynamic semantics of "defer" is an unavoidable performance tax over RAII. Unwinding is mandated by the language, inhibiting some optimizations. There is little control over allocation: language constructs allocate in ways that are not immediately obvious. The fact that interfaces result in huge numbers of virtual calls results in a good amount of overhead that (unlike Java) Go can't even eliminate with inline caching, because it's AOT compiled. This is just off the top of my head.
> Go has the advantage of making it much easier to use multiple processors, though.
Not really. Go's parallelism primitives are just as low-level as those of C. The "one size fits all" scheduling algorithm is a poor fit for getting the most performance out of multicore. The lack of generics is a real problem: it prevents you from using optimized concurrent data structures without paying the tax of interface{} or going through code generation hoops.
In any case, the lack of SIMD basically kills Go's applicability in these domains.
Conversely if you're using C/C++ through a ton of heap/virtual pointers then you're losing a lot of the value the language brings and should be using something higher level.
Because it exposes pointers as a first-class concept, you also have good control of how data is laid out in memory (=> locality).
It's not like Python or Java where everything is a pointer and gets spread out all over memory.
Escape analysis, like any such analysis, gets much more difficult in the presence of higher-order control flow. Currently the Go compilers punt on higher order control flow analysis. And Go uses higher-order control flow in spades, due to its heavy reliance on interfaces.
The end result is that lots of stuff is heap allocated.
> Java where everything is a pointer and gets spread out all over memory.
That's not true for Java. Its generational garbage collector performs bump allocation in the nursery, yielding tightly packed objects with excellent cache behavior. Allocation in HotSpot is like 3-5 instructions (really!)
I think the HotSpot approach makes the most sense: instead of trying to carve out special cases that fall down regularly, focus on making heap allocations fast, as you'll need to make them fast anyway. After that, add things like escape analysis (which HotSpot has as well).
In also not a huge fan of a compiler "automatically" performing escape analysis. Makes a single change causing cascading perf problems very easy and hard to catch.
After previous implementation has made Go a lot slower.