It just worries me when I sometimes see those same Jupyter notebooks running in production, crunching 100s of terabytes of data. Maybe I’m wrong, but I didn’t get the impression everyone realizes exactly how wasteful that is. I guess AWS credits are easy to come by.
One thing Google did well back in the day, was making resource costs report in SWE/hours, the idea being that you see if you should go and rewrite something. If it cost 100 SWE/h to run, and it only took you a day to cut that in half, you should do it.
A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team. And I really think if they’d had better intuition about Python’s performance under different scenarios, they could have saved a year of effort. This is why I feel it’s worth having frank discussions about trade offs when it comes to this language.
It’s incredibly useful, but people in the community aren’t clearly told about its limitations. (Especially wrt performance, but also maintainability.)
Sure but there's a trade off, no? Go is typically 3x the code than python. And C++ is 10x the complexity easily.
There was one point back when I stopped coding C++ where one coder might not understand what another C++ coder was doing because the standard was so large.
> A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team.
You know, I have horror stories about C++ and Java as well. Usually that kind of blame goes to management for not understanding the issues up front. Pretty soon, I'll have slew of stories about go misusage as well.
Can you cite a source/example for that? I cannot imagine an optimized C program that doesn't blow python with numpy out of the water. Even a poorly written C program is likely to be 2x faster simply because it doesn't have to round trip operations from C to python and back.
I found some metrics after 30 seconds of googling.