undefined | Better HN

0 pointslaichzeit04y ago0 comments

I would completely concede that R has better libraries. However, getting stuff like online prediction into production is a real pain when the models are developed in R. And R is single threaded. There is no way to hide that detail.

0 comments

civilized4y ago

R isn't the best for production predictions for sure (it can work though). But it's not hard to translate well-designed R processing pipelines and models into other languages if you must. The problem is that R programmers often don't know how to write good code in any language.

Same issue as Excel, really. Easy to use, so you get a lot of users with very thin engineering skills.

The solution is for production engineers to understand just enough R to set standards for data scientist code that enable reliable translation of the models to the production language. As with JS, you can complain about the yucky parts, or you can accept that it's the best tool for some jobs and make an effort to work around the yucky parts, or use the tools of those who are doing that (e.g. tidyverse and Wickham).

If you want data scientists to produce production-ready results, you have to hold them to the standards of production engineering.

mellavora4y ago

"Same issue as Excel, really. Easy to use, so you get a lot of users with very thin engineering skills."

Huh?

While I totally agree with your quote, I'd think it applied a lot more to python than to R. Especially given that python seems to be the dominant "first language for people to learn when they get into programming" because it is "easy".

civilized4y ago

The proportion in R is higher because the community of software engineers working in R is a lot smaller. R coders are overwhelmingly data analysts, while Python coders have more diverse roles. People who use R are also much more likely to have learned R, and only R, from their university courses towards a data science-related degree, especially if that degree is in statistics.

dragonwriter4y ago

R is a language people use when they get into statistics, not even thinking specifically of programming.

deng4y ago

> And R is single threaded. There is no way to hide that detail.

Python isn't much better in this regard, thanks to the GIL.

What I actually found most baffling when I delved into R is the fact that it doesn't support 64bit integers (lack of proper native UTF-8 support coming a close second).

laichzeit0OP4y ago

> Python isn't much better in this regard, thanks to the GIL.

Take some standard ML model built with Caret or LME4 and try serve predictions with Plumber in R. It’s significantly more painful than using sklearn + FastAPI. You either need to use future::promise (which still sucks because it’s forking new R runtimes) or forgo this and go K8s or something similar.

I don’t get the love for RStudio either. It crashes frequently for me, or locks up randomly. The debugging experience is abysmal compared to PyCharm. Getting reproducible R builds are a pain, slightly alleviated by Renv. But not really if you want separate dependencies for dev and production.

Python and R tooling are not comparable. You will have serious issues operationalising R. Skills that most statisticians are simply not equipped to deal with, and serious software engineers will hate about R.

civilized4y ago

FWIW I have many years of full-time RStudio dev experience, and while I've definitely had a few hard-to-explain crashes, I'd characterize it as very reliable overall. When problems arise they tend to be due to community-contributed packages, especially packages that call out to C++. (My name is on the bug fix log for some major packages.)

Unintentional and unnecessary creation of huge, memory-hogging objects is a closely related footgun. Packages are often not built with large data in mind and make choices that scale terribly, such as storing multiple copies of the data in the model object, or creating enormous nonsparse matrices to represent the model term structure. It's a legacy of the academic statistics culture R grew out of. Most researchers test their fancy new method on a tiny dataset, write a paper, and call it a day.

No argument about the debugging experience. I find it very slow, especially with large datasets, and try to avoid it. Not much experience with reproducible R builds but I wouldn't be surprised if it was a pain.

j / k navigate · click thread line to collapse

0 comments

civilized4y ago

Same issue as Excel, really. Easy to use, so you get a lot of users with very thin engineering skills.

If you want data scientists to produce production-ready results, you have to hold them to the standards of production engineering.

mellavora4y ago

"Same issue as Excel, really. Easy to use, so you get a lot of users with very thin engineering skills."

Huh?

civilized4y ago

dragonwriter4y ago

R is a language people use when they get into statistics, not even thinking specifically of programming.

deng4y ago

> And R is single threaded. There is no way to hide that detail.

Python isn't much better in this regard, thanks to the GIL.

What I actually found most baffling when I delved into R is the fact that it doesn't support 64bit integers (lack of proper native UTF-8 support coming a close second).

laichzeit0OP4y ago

> Python isn't much better in this regard, thanks to the GIL.

civilized4y ago

j / k navigate · click thread line to collapse