undefined | Better HN

0 pointsianai9y ago0 comments

If you're in a statistics program you're going to learn to code. That's been my experience anyway.

0 comments

I think it's great that students and young professors in the sciences are taught to code now. I've even taught some of them.

To me, data science is more than understanding statistics, it's been essential to know how to scale them up and out.

If you're a domain scientist, you won't necessarily learn how to write reusable tools that are performant (or runnable) on data that is different from your initial model data. I once worked with a group whose model had grown so unwieldy that their config file was in NetCDF.

I found my niche was often in doing things that were slightly (or completely) outside the comfort zone of most domain scientists who were competent coders themselves, but who didn't have the funded time nor the inclination to learn things like database, visualization, and networking technologies that became necessary either to share their work with other research groups or to operate on larger datasets.

One project had me take a big model that was normally run twice a day and on a 4km grid and help write something that could run and visualize the results of the same thing on a 0.5km grid over a larger area and hourly. And then devise something that could help them visually explore the timeseries as it evolved, sometimes over months.

Designing the pipeline that can handle that is outside the scope of most scientists, even the ones who are good coders.

sbov9y ago

That line you're talking about sounds more like the traditional science/engineering divide. Maybe staticians are data scientists, but what we call "data science" is really data engineering?

j / k navigate · click thread line to collapse