A convnet-optimized chip could clearly be much faster and more power efficient than a convnet running on a CPU or GPU. The move from CPU to GPU brought a 10x speedup already, but GPUs are hardly ideal for running convnets. For one thing, they have tons of graphics-specific hardware that's useless for convnets and could just be deleted in a convnet chip. For another, GPUs are much more flexible than necessary for convnets. The main operation you need to perform is convolution and you could make fixed-function convolution units that would be much more power and area efficient than generalized GPU shader cores. For yet another thing, there's no reason to believe that 32-bit IEEE 754 floating point is the best power/precision tradeoff for convnets. I'm willing to bet that you could go much lower. You could even experiment with approximate arithmetic; 0.5 ULP precision is probably not necessary.