An alternate formulation of this would involve the second person above arriving at a new job, observing that they're using Spark on an expensive cluster to regularly perform a computation, and noticing that actually they could do the whole calculation on a single node using simpler tools.
Double digit-strong data science team: speechless
Team engineers: we need a high-speed design with advanced adders and someone who can do the difficult static timing analysis.
Expert: No, you need someone who actually understands VLSI design who can interleave time and space in an algorithm. At that point, you can basically use a single step up from stupid-simple ripple carry adders. And if you use non-overlapped clock generators you don't even have to do static timing analysis. The biggest wins are in architecture, not implementation.
Yeah, 33MHz design in 125nm means VLSI like it's 1999.