You'll want to be running a huge state-of-the-art network trained on large datasets on it to approach human capabilities and I don't think 2.5TFLOPS will cut it.
I had a look around and this thing is probably more in the right ballpark: https://www.nvidia.com/en-us/autonomous-machines/embedded-sy...
It uses up to 60W for 270TFLOPS at full power, but its processing power should be in the right ballpark to at least do decently with something trained on the best datasets there are.
There's a chance much smaller hardware would do if only our software was advanced enough, but it's probably not. I'm not sure where we are really at, hence my original question. You'd need to somehow work out Watts/HumanPerformance.