A few sq mm at 40 nm is about $20k, and you can only configure it once. I think the Versal also gives you more useful gates at that size (thanks to block RAMs and hard multipliers).
The FPGA will have higher static power (running all the overheads) but probably lower dynamic power for the same design. 40 nm is old at this point for high-performance chips.
The static power might also depend on whether the FPGA is an SRAM type or a floating-gate type, I'd think. Does Lattice have any parts fabbed in relatively new processes?