The problem is getting data to/from the FPGA, which imposes unavoidable latency. If you want to do this fast it can't be done cheaply because it requires too much silicon. Aside from simulation type tasks, the tasks best suited for FPGAs are streaming tasks for this reason since once you've started streaming data through the FPGA you don't need to worry about latency too much.