That's a hardware management question. The optimized binary used in my shell script still runs orders of magnitude faster and cheaper if you orchestrate 100 machines for it than any Hadoop, Spark, Beam, Snowflake, Redshift, Bigquery or what have you.
That's not to say I'd do everything in shell. Most stuff fits well into SQl, but when it comes to optimizing processing over TB or PB scale, you won't beat shell+massive hw orchestration.