R, OpenMP, MKL, Disaster (opens in new tab)

(jyotirmoy.net)

117 pointsyomritoyj4y ago42 comments

42 comments

In a previous life, almost a decade ago, I fought very similar fights with OpenMP and MKL using R. It's painful and you need to pay heed to all these small details pointed out in the docs as in OPs case. However, it's worth noting that OpenBLAS is as fast as MKL, at least if you compile it yourself for your system (i would expect that system provided ones with system detection would be as good, but that wasn't always the case back then). I benched this extensively for all my R usecases and for several systems that i cared about back then. So there is usually no need to use MKL in the first place.

microtonal4y ago

OpenBLAS

OpenBLAS is incompatible with application threads. Most Linux distributions provide a multi-threaded OpenBLAS that burns in a fire if you use it in multi-threaded applications. Even though OpenBLAS' performance is great, I'd be careful to give a general recommendation for people to rely on OpenBLAS. Like this MKL example, you have to be aware of its threading issues, read the documentation and compile it with the right flags (in a multi-threaded application: single-threaded, but with locking).

it's worth noting that OpenBLAS is as fast as MKL

This depends highly on the application. E.g. MKL provides batch GEMM, which is used by libraries like PyTorch. So if you use PyTorch for machine learning, performance is still much better with MKL. Of course, that is if you do not have an AMD CPU. If you have an AMD CPU, you have to override Intel CPU detection if you do not want abysmal performance:

https://danieldk.eu/Posts/2020-08-31-MKL-Zen.html

https://www.agner.org/optimize/blog/read.php?i=49

The BLAS/LAPACK ecosystem is a mess. I wish that Intel would just open source MKL and properly support AMD CPUs.

jabl4y ago

> OpenBLAS is incompatible with application threads. Most Linux distributions provide a multi-threaded OpenBLAS that burns in a fire if you use it in multi-threaded applications.

Can you explain what you mean by this? Are you saying there's a correctness issue here? I only recall running into issues with MPI, where you (typically) run one MPI rank (process) per CPU core. Then if you combine that with a multi-threaded BLAS library you'll suddenly have N^2 BLAS threads fighting over the CPU's and performance goes down the drain. The solution to this is, like you say, to use a single-threaded OpenBLAS, or then the OpenMP OpenBLAS and set OMP_NUM_THREADS=1

I guess with threads you'll have the same issue if you launch N cpu-bound threads and all those call BLAS, resulting in the same N^2 issue as you see with MPI.

1 more reply

gnufx4y ago

Debian and Fedora provide serial, OpenMP, and pthreads versions of lilbopenblas. Are you sure OpenBLAS doesn't detect nested OpenMP? I thought it did, though I'd normally use the serial version outside something like R, but if you mix different low-level simple pthreads with high-level OpenMP, you can expect problems. OpenBLAS is fine generally -- competitive with MKL on Intel hardware and infinitely faster on ARM and POWER. For PyTorch, presumably you want libxsmm (which is responsible for MKL's current small matrix performance). On AMD hardware, I don't understand why people avoid AMD's support, which is just a version of BLIS and libflame. (BLIS' OpenMP story seems better than OpenBLAS'.) The linear algebra story on GNU/Linux distributions would be less of a mess without proprietary libraries like MKL. It's fine if you take the Debian approach, in significant experience running heterogeneous HPC systems. Fedora has cocked up policy through not listening to such experience, but you can do the Debian-style thing with the approach of https://loveshack.fedorapeople.org/blas-subversion.html (and see the old R example refuting the MKL story). That's one example of the value of dynamic linking.

1 more reply

gjvc4y ago

> The BLAS/LAPACK ecosystem is a mess. I wish that Intel would just open source MKL and properly support AMD CPUs.

Given that their latest compilers are based on LLVM, that seems like a fair trade between the closed- and open-source worlds.

kergonath4y ago

> OpenBLAS is incompatible with application threads.

I’ve never had any issue when using it in OpenMP codes (either compiling it myself or using the libopenblas_omp.so present in some distros), what do you mean by “burn in a fire”?

vhhn4y ago

> OpenBLAS is incompatible with application threads

R is single-threaded.

jabl4y ago

> i would expect that system provided ones with system detection would be as good, but that wasn't always the case back then

Also in a previous life, I recall running into distro openblas packages that were not compiled with DYNAMIC_ARCH=1 (which enables the openblas runtime cpu target architecture selection, similar to e.g. MKL) but were instead compiled with some lowest common denominator x86_64 arch. I filed some bug(s?), and IIRC this problem has subsequently been fixed.

deng4y ago

> As a good citizen he wanted to file a documentation bug to have this behaviour documented. But R’s bug tracker seems not to be open to the public. So the story has to be recorded here for Google to find.

Huh? The bug tracker is here:

https://bugs.r-project.org/

Yes, for filing a bug you need to request an account because they apparently were overwhelmed with spam, as documented here:

https://www.r-project.org/bugs.html

dash24y ago

The first place to report this would be on the R-devel mailing list.

kzrdude4y ago

Systems that fail without producing an error or warning are worrying and unsettling to me.

lstmemery4y ago

I had a similar problem in a prediction pipeline a few years back. If I remember correctly, someone updated a R package to the next minor version. The package was to read an obscure file format. The fix installed a new C++ library. That C++ library somehow interacted with a second R package (using a specialized type of linear model) when compiled at source and all the results coming out of our package were subtly wrong but only with large files.

It turns out the way the second R package would determine the required precision of floats in sparse arrays was based on the compiled linear algebra libraries available. It took a week for us to debug and ultimately it was easier for us to just rewrite the whole thing in Python.

Renv has made things easier but I don't think packrat/renv allows you to lock C/C++ libraries as well as R ones.

gnufx4y ago

It's perhaps worth saying that if you must mix OpenMP libraries built against the LLVM (Kuck/Intel) runtime, and GNU GOMP on GNU/Linux: Ensure libomp is built with GOMP compatibility, however that's configured, make a shim from the result, like

    gcc -shared -Wl,-soname=libgomp.so.1 -o libgomp.so.1 empty.c -lomp5

where empty.c is an empty file, and put the result on LD_LIBRARY_PATH ahead of the real libgomp. Alternatively, preload the compatible libomp5. On Debian 11 there's already a libgomp in the llvm packaging. Dynamic linking assumed, as is right and fitting.

boulos4y ago

I don't have MKL to try this out, but I'd check that the MKL threading choice actually didn't break the initialization to 1.0 loop.

That is, instead of checking after doing the x[i] *= SCALE bit with cblas, I would check both before and after the scaling.

teleforce4y ago

In this excellent article Patrick Li, the author of a new optionally type language Stanza and co-founder of JITX (YC S18), provided a compelling reason to design a new language [1]. TL;DR, a powerful language like Ruby enabled the creation of powerful RoR library and framework that help spawned unicorn size startups like Github and Twitter, that's otherwise not feasible.

I want to add another dimension to this argument, what if we can maintain an existing language eco-system (library, community, etc) but modernize the engine that's running and compilation of the R language. This new engine can avoid the dreaded global locking limitation, provides native multi-thread applications and seamless interface with non-native R libraries in C/C++. Interestingly someone has tried this, with a sponsorship from Oracle no less, and presented this futile effort in the last year's R flagship conference keynote [2].

IMHO he will be more successful in his endeavour if using D language in his previous endeavors. What so special about D language you may ask? I would refer to the fact most of the languages do not provide Ruby on Rails (RoR) like tool except D but that for another story (see ref [1]). There's also the fact that D has a working alternative library to OpenBLAS and MKL, and it's even faster than both them five years back [3]! D also supports open method as an alternative for multiple dispatches that is much touted by Julia language community. D is also bringing native support for borrow checker feature that's always mentioned in the same sentence as Rust language. In addition D also has second to none FFI support for C and C++ language. Heck the latest D compiler has standard C compiler built-in. I can go on furthermore but I think you've probably already got the pictures.

My not so humble proposal to R and D language community is to compile R on top of D language. Essentially you a have dynamic language of R that is compiled at runtime (CTFE) on top of static D language. This approach is becoming more popular now as posted recently for the new Val and Valet language combination [4]. Just think of CTFE as the new JVM, but provides truly static and native compilation for R.

[1] What makes a programming language productive? “Stop designing languages. Write libraries instead.”:

https://jaxenter.com/stop-designing-languages-write-librarie...

[2] Why R? 2020 Keynote - Jan Vitek - How I Learned to Love Failing at Compiling R:

https://www.youtube.com/watch?v=VdD0nHbcyk4

[3] Numeric age for D: Mir GLAS is faster than OpenBLAS and Eigen:

http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

[4] Show HN: Val - A powerful static and dynamic programming language(val-lang.org):

https://news.ycombinator.com/item?id=28683171

keewee74y ago

As a software developer forced to work with data scientists who refuse to learn Python there is nothing I hate more than R.

R is good for explorative data analysis but useless for everything else.

civilized4y ago

Of course, if you read the article, you find out that the problem had nothing to do with R. It was a misconfiguration of the underlying linear algebra libraries that R (and Python and everything else) relies on. The author even made a minimal reproducible example in a single C script, no dependencies on R whatsoever.

I hear a lot of "R is bad, Python is Enterprise Production Quality (TM)" blather at my work. It's always because the people involved don't understand computers, don't read documentation, don't debug, don't do root cause analysis, and want to quickly pass off responsibility for their laziness and incompetence. Meanwhile I and my team are happily chugging away, producing millions of dollars of reliable value for my company in R year after year.

Python lags far behind R in wide swaths of data science. Pandas is inferior to both dplyr and data.table, and R's modeling capabilities blow Python's out of the water in breadth and depth. You only use Python when you have to, e.g. for unstructured data and deep learning type stuff.

If your colleagues make you deal with their bad R code, that's too bad, but don't blame the language. It's designed to be easy to use, so a lot of bad coders use it. Go train your bad coders or hire better ones.

laichzeit04y ago

I would completely concede that R has better libraries. However, getting stuff like online prediction into production is a real pain when the models are developed in R. And R is single threaded. There is no way to hide that detail.

2 more replies

WhompingWindows4y ago

Wow, tell us how you really feel. How much have you used R and Python? Maybe those data scientists would prefer if you didn't viscerally hate the main data/statistics language and didn't call it useless for things beyond a narrow use-case. It may lead to better outcomes if people hated things less and tried to understand the valid use-cases, for instance the reams and reams of statistics that can be done on R where Python may lag behind, since R is the lingua franca of statistics and research.

gnufx4y ago

I've never seen anything for Python that allows you to a linear algebra-based code and run it at maybe petascale with trivial modifications. There's an R example somewhere under https://pbdr.org/publications.html

otabdeveloper44y ago

R is way more powerful and flexible for data science stuff. (Going from Python to R is almost like going from Excel to Python.)

mellavora4y ago

with regard to good software engineering, there is a funny thing about python. Simply cut-and-paste code from one env to another can completely destroy the program, if the cut-and-paste messes with the indentation.

Now some people say this can be solved with a good IDE. Which might (or might not) be true if you can reliably identify, by manually reviewing the code, the ends of the functions, loops, etc which got munged in the paste.

But interestingly enough, jupyter notebooks (which seem to be the go-to tool these days) aren't IDEs. Making it incredibly easy to fubar otherwise perfectly working code by pasting it from your local IDE into, let's say an AWS Sagemaker instance, to pick one random example of a current widely used jupyter implementation. So even if the problem could be fixed by a good IDE, there is no guarantee that that IDE is (easily) accessible for production code.

I just have a hard time seeing how such a fundamental flaw in a language can lead to "good software engineering"

goerz4y ago

So don’t mess up the indentation when you paste. Seriously, in my 15 years of using Python on a daily basis this hasn’t been a problem once.

kgwgk4y ago

It could be worse. They could learn python and still prefer to use R!

tharne4y ago

I don't know why you're getting downvoted. I was one of the data guys you mentioned who learned R first and resisted python. There are a lot of things about R that leads users to develop very bad habits. The only reason R caught on in the first place is because python did not have mature libraries for data analysis for a long time.

civilized4y ago

All languages strike a tradeoff between flexibility and enforcing a regular structure. A lot of people seem to think their preferred language hits the perfect point on that tradeoff, and judge any language that makes a different choice. Python lovers judge R, Java users judge Python, C++ users judge Java, Rust users judge C++, Go users judge Rust, and everyone judges JavaScript.

A language that's more flexible than your favorite "encourages bad habits", while a language that's less flexible than yours is "bureaucratic".

1 more reply

project2501a4y ago

it is not much as R, as the (relative) unwillingness to break compatibility or enforce global standards. Why do all string functions do not accept UTF-8?

j / k navigate · click thread line to collapse

42 comments

kettleballroll4y ago

microtonal4y ago

OpenBLAS

it's worth noting that OpenBLAS is as fast as MKL

https://danieldk.eu/Posts/2020-08-31-MKL-Zen.html

https://www.agner.org/optimize/blog/read.php?i=49

The BLAS/LAPACK ecosystem is a mess. I wish that Intel would just open source MKL and properly support AMD CPUs.

jabl4y ago

> OpenBLAS is incompatible with application threads. Most Linux distributions provide a multi-threaded OpenBLAS that burns in a fire if you use it in multi-threaded applications.

I guess with threads you'll have the same issue if you launch N cpu-bound threads and all those call BLAS, resulting in the same N^2 issue as you see with MPI.

1 more reply

gnufx4y ago

1 more reply

gjvc4y ago

> The BLAS/LAPACK ecosystem is a mess. I wish that Intel would just open source MKL and properly support AMD CPUs.

Given that their latest compilers are based on LLVM, that seems like a fair trade between the closed- and open-source worlds.

kergonath4y ago

> OpenBLAS is incompatible with application threads.

I’ve never had any issue when using it in OpenMP codes (either compiling it myself or using the libopenblas_omp.so present in some distros), what do you mean by “burn in a fire”?

vhhn4y ago

> OpenBLAS is incompatible with application threads

R is single-threaded.

jabl4y ago

> i would expect that system provided ones with system detection would be as good, but that wasn't always the case back then

deng4y ago

Huh? The bug tracker is here:

https://bugs.r-project.org/

Yes, for filing a bug you need to request an account because they apparently were overwhelmed with spam, as documented here:

https://www.r-project.org/bugs.html

dash24y ago

The first place to report this would be on the R-devel mailing list.

kzrdude4y ago

Systems that fail without producing an error or warning are worrying and unsettling to me.

lstmemery4y ago

Renv has made things easier but I don't think packrat/renv allows you to lock C/C++ libraries as well as R ones.

gnufx4y ago

    gcc -shared -Wl,-soname=libgomp.so.1 -o libgomp.so.1 empty.c -lomp5

boulos4y ago

I don't have MKL to try this out, but I'd check that the MKL threading choice actually didn't break the initialization to 1.0 loop.

That is, instead of checking after doing the x[i] *= SCALE bit with cblas, I would check both before and after the scaling.

teleforce4y ago

[1] What makes a programming language productive? “Stop designing languages. Write libraries instead.”:

https://jaxenter.com/stop-designing-languages-write-librarie...

[2] Why R? 2020 Keynote - Jan Vitek - How I Learned to Love Failing at Compiling R:

https://www.youtube.com/watch?v=VdD0nHbcyk4

[3] Numeric age for D: Mir GLAS is faster than OpenBLAS and Eigen:

http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

[4] Show HN: Val - A powerful static and dynamic programming language(val-lang.org):

https://news.ycombinator.com/item?id=28683171

keewee74y ago

As a software developer forced to work with data scientists who refuse to learn Python there is nothing I hate more than R.

R is good for explorative data analysis but useless for everything else.

civilized4y ago

laichzeit04y ago

2 more replies

WhompingWindows4y ago

gnufx4y ago

otabdeveloper44y ago

R is way more powerful and flexible for data science stuff. (Going from Python to R is almost like going from Excel to Python.)

mellavora4y ago

I just have a hard time seeing how such a fundamental flaw in a language can lead to "good software engineering"

goerz4y ago

So don’t mess up the indentation when you paste. Seriously, in my 15 years of using Python on a daily basis this hasn’t been a problem once.

kgwgk4y ago

It could be worse. They could learn python and still prefer to use R!

tharne4y ago

civilized4y ago

A language that's more flexible than your favorite "encourages bad habits", while a language that's less flexible than yours is "bureaucratic".

1 more reply

project2501a4y ago

it is not much as R, as the (relative) unwillingness to break compatibility or enforce global standards. Why do all string functions do not accept UTF-8?

j / k navigate · click thread line to collapse