Weld: Accelerating numpy, scikit and pandas as much as 100x with Rust and LLVM (opens in new tab)

(notamonadtutorial.com)

592 pointsunbalancedparen6y ago110 comments

110 comments

"the first implementation was in Scala, which was chosen because of its algebraic data types and powerful pattern matching. This made writing the optimizer, which is the core part of the compiler, very easy. Our original optimizer was based on the design of Catalyst, which is Spark SQL’s extensible optimizer. We moved away from Scala because it was too difficult to embed a JVM-based language into other runtimes and languages."

There is an important "contemporary history of computing" article to write about the evolution of the Spark project from "let's build a distributed filesystem for MapReduce in Java because we read those early Google papers" to "SQL is the right model for working with data so DataFrames" to "meet data scientists where they are: Python (and R)" to "make machine learning easy" and now to "LLVM, but for crunching big numeric arrays".

vmchale6y ago

Not that you should replace Rust with Haskell, but Haskell would've been a better choice than Scala.

It has its own runtime, but it's not difficult to call Haskell code from C or ATS or whatever.

choeger6y ago

Not difficult to call from C? How does that work, exactly? Wouldn't you need to properly setup the whole runtime (incl. GC) first?

2 more replies

dpflan6y ago

Very interesting. Do you have any references to share relevant to the article you suggest to be written?

spenrose6y ago

No, I based on attending SparkConf between 2015 and 2017. You could probably assemble half of it just by reading summaries of Matei's keynotes.

1 more reply

paulddraper6y ago

> it was too difficult to embed a JVM-based language into other runtimes and languages

In addition to the JVM, Scala has had JS [1] and native (via LLVM) [2] targets for years.

(And that's not even mentioning any second-order compilations; e.g. Scala -> JVM bytecode -> native)

There's a number of reasons to not choose Scala, but portability is far from one of them.

[1] https://www.scala-js.org

[2] http://www.scala-native.org

pacala6y ago

The library ecosystem is different between JVM/JS/Native. Porting across runtimes may require more work than just changing the compilation target.

1 more reply

the_duke6y ago

See also this interesting talk on Weld at RustConf 2019: https://www.youtube.com/watch?v=AZsgdCEQjFo&t=1430s

westurner6y ago

There's also RustPython, a Rust implementation of CPython 3.5+: https://news.ycombinator.com/item?id=20686580

> https://github.com/RustPython/RustPython

tomrod6y ago

Is this basically what Cython and PyPy are trying to do, but with Rust?

sp3326y ago

PyPy is a JIT compiler. RustPython is an interpreter.

1 more reply

sieabahlpark6y ago

Why didn't they call it Rython

rmilejczz6y ago

RPython exists (it’s used to implement PyPy) so I imagine that would be confusing

microcolonel6y ago

Or RytOn ;-)

1 more reply

pard686y ago

Convention wound lean towards calling it RPython

vmchale6y ago

Why?

tmostak6y ago

Also worth checking out OmniSci (formerly MapD), which features an LLVM query compiler to gain large speedups executing SQL on both CPU and GPU: https://github.com/omnisci/omniscidb . And here's a link to a blog post giving a high level overview of the advantages of JIT compilation of queries over an interpreter: https://devblogs.nvidia.com/mapd-massive-throughput-database... .

nautilus126y ago

That tweetmap is impressive

adrien-treuille6y ago

This post combines pretty much every technology I'm obsessed with right now: Python, Rust, Pandas, Numpy, and LLVM. Yess!!!

superdimwit6y ago

If these things interest you, check out Julia

mlevental6y ago

how do you know someone uses Julia? don't worry they'll tell you.

2 more replies

whoevercares6y ago

Just a word of caution, always obsess with product and customer needs first :) In ML/data science tech first normally won’t end up well

d336y ago

It's important to enjoy your work, which is - among other causes - about having right tools. Also, some of us actually get to have some influence over what language we write our projects in.

1 more reply

mkl6y ago

People aren't all or always working on products or serving customers.

cortesoft6y ago

Unless they are just talking about personal projects?

stereosteve6y ago

This project has similar goals to the MLIR project:

https://github.com/tensorflow/mlir

https://www.youtube.com/watch?v=qzljG6DKgic

Exciting times for the future of parallel computing!

mlthoughts20186y ago

Very bizarre there is no discussion of numba here, which has been around and used widely for many years, achieves faster speedups than this, and also emits an LLVM IR that is likely a much better starting point for developing a “universal” scientific computing IR than doing yet another thing that further complicates it with fairly needless involvement of Rust.

https://numba.pydata.org/

sppalkia6y ago

I'm one of the developers of Weld -- Numba is indeed very cool and is a great way to compile numerical Python code. Weld performs some additional optimizations specific to data science that Numba doesn't really target right now (e.g., fusing parallel loops across independently written functions, parallelizing hash table operations, etc.). We're also working on adding the ability to call Python functions from within Weld, which will allow a data science program expressed in Weld to call out to other optimized functions (e.g., ones compiled by Numba). We additionally have a system called split annotations under development that can schedule chains of such optimized functions in a more efficient way without an IR, by keeping datasets processed by successive function calls in the CPU caches (check it out here: https://github.com/weld-project/split-annotations).

Overall, we think that the accelerating the kinds of data science apps Weld and Numba target will not only involve tricks such as compilation that make user-defined code faster, but also systems that can just schedule and call code that people have already hand-optimized in a more efficient and transparent way (e.g., by pipelining data).

infinite8s6y ago

Although to be fair, there is no reason why numba couldn't gain those capabilities, it just hasn't been a focus of the project. It should be possible to build a lightweight modular staging system in python/numba similar to Scala's (https://scala-lms.github.io/) or Lua's (http://terralang.org/).

marmaduke6y ago

Did you read the article? If you know of Numba works, you know it can't just pick up different functions from sklearn and scipy and do interprocedural optimization (IPO). For Numba to do that, it'd need all functions involved to be written in Numba @jit style, whereas Weld would work directly on the pre-existing functions.

Rust is just a IPO driver of sorts here.

I'm not critizing Numba btw, I use it regularly, but your comment seems a little off here, considering that Weld has different goal in mind.

mlthoughts20186y ago

I don’t agree that the purpose of the article is misaligned from my criticism. This is based on reading the article.

unbalancedparenOP6y ago

Hi, I am the interviewer. I think I saw numba once but forgot about it. I will check it and probably ask to interview them too. We are preparing interviews about RAPIDS and other similar projects too.

FreakLegion6y ago

Numba is the option used in Lectures in Quantitative Economics with Python, posted and highly upvoted here yesterday: https://news.ycombinator.com/item?id=21022620.

blts6y ago

Numba is amazing. +1 for numba

objektif6y ago

But does it really speed up numerical libraries like numpy and pandas? I thought it only works on pure python code.

axegon_6y ago

I have said multiple times that Rust has an incredible potential in the data analysis world. And Weld is a great example.

the_duke6y ago

Weld is a compiler/JIT/runtime though, something Rust is very well suited for, and which is very different code from data analysis/ML.

I think Julia is a more interesting language for this space, with the built in matrix support, easier prototyping, a REPL, etc...

sppalkia6y ago

Rust is great, but this is an important comment! We used it to implement Weld's compiler and runtime, but we don't expect data scientists who use languages such as Python, Julia, or R to switch over to it; the idea is that these data scientists continue using APIs in these languages, and under the hood, Weld will perform optimizations and compilation for decreasing execution time (and these "under the hood" components are the ones that we wrote in Rust).

1 more reply

vmchale6y ago

Lots of languages are well-suited to compilers. See e.g. Haskell accelerate.

I don't really know why you'd use Rust instead of a GC language from the ML family.

janered6y ago

Ahem... ahem..., D?

rolltiide6y ago

If only you could get paid for porting open source libraries, maintaining them and trying to get a community to use it

Have fun entertaining your Patreon

axegon_6y ago

True that, hence the reason why I do it so little. In fact I personally do it out of pure boredom.

riboflavin6y ago

Sounds a lot like Gandiva (part of Apache Arrow) as well. https://www.dremio.com/announcing-gandiva-initiative-for-apa.... Cool!

ris6y ago

So... this requires cooperation from the underlying libraries (numpy, pandas...) - what is the likelihood of said libraries adopting this upstream vs Weld having to maintain their own shadow implementations for the foreseeable future?

Numpy et al of course already have N python acceleration frameworks hammering at their doorsteps to integrate more closely...

sgillen6y ago

How much cooperation is needed though? It seems to me that all that numpy pandas etc. need to do is maintain a stable API, which they already do AFAIK.

xiphias26y ago

I saw a performance comparision with XLA, and it's interesting that Weld is faster, because XLA is supposed to optimize the code using the known tensor sizes during compile time.

Weld and XLA seem to have similar optimization steps though.

sppalkia6y ago

XLA and Weld do have similar optimizations -- at their core, one of the main things they do is removing inefficiencies like unnecessary scans over data, common subexpressions, etc. across many operators. The speedup in the benchmark you're referring to actually involved some NumPy code too for pre-processing, and the reason Weld outperformed XLA is because Weld could perform those kinds of optimizations across TensorFlow operators and NumPy functions (whereas XLA only optimizes the TensorFlow part of the application).

I also want to mention that this benchmark is from a while back (around 2017 I believe), so its possible improvements in both XLA and Weld will make the numbers look different today :)

davmre6y ago

For what it's worth, Jax (github.com/google/jax) now lets you use XLA to compile Numpy code. It'd be cool to see how that would stack up in a modern comparison.

dlphn___xyz6y ago

whats the benefit of rust over julia or C for computation?

shpongled6y ago

Not to sound like a member of the Rust evangelism strike force, but after using Rust for a couple years, I don't have any desire to go back to C - sum types alone are worth the switch to me, not to mention iterators, concurrency story, etc.

tclancy6y ago

I can’t decide if Rust Evangelism Strike Force sounds like an awful or awesome cartoon.

1 more reply

shaklee36y ago

C++ has sum types as of several years ago with std::variant.

2 more replies

jdc6y ago

from TFA:

>> We chose Rust because:

>> It has a very minimal runtime (essentially just bounds checks on arrays) and is easy to embed into other languages such as Java and Python

>> It contains functional programming paradigms such as pattern matching that make writing code such as pattern matching compiler optimizations easier

>> It has a great community and high quality packages (called “crates” in Rust) that made developing our system easier.

vmchale6y ago

Did they have to write the runtime in the same language?

They could've used an ML with GC and it would've been better (for a compiler).

It doesn't really have any functional programming paradigms. Pattern matching is present in imperative languages like past versions of ATS

1 more reply

giancarlostoro6y ago

Rust performance is pretty much in terms of C/C++ performance but promises stability in regards to memory management due to the borrow checker. Rust is very impressive on its own. If you havent taken the time to research Rust because "its yet another language" you really ought to honestly.

mkl6y ago

The computation here is actually not done in Rust. The Rust code is performing stages of compilation of the original source code, into an intermediate representation that LLVM finishes compiling. The fully compiled code is what does the computation.

juststeve6y ago

SIMD / intrinsics?

1 more reply

rch6y ago

I like the memory safety, concurrency, and zero-based indices.

vmchale6y ago

This is different from Julia.

I don't really know why you'd write a project of this sort in C.

agumonkey6y ago

in this particular case it's to be a drop in replacement for the python datascientist, the rest is irrelevant I believe

Myrmornis6y ago

> The motivation behind Weld is to provide bare-metal performance for applications that rely on existing high-level APIs such as NumPy and Pandas.

With regard to Pandas this makes me pause slightly, since, while pandas contains lots of high quality and high performance implementations, the API of pandas in some places doesn’t feel well-designed (the most obvious example is indexing of data frames via square brackets and the various properties like iloc).

syrusakbary6y ago

This is awesome. The quality of work behind it it's incredible.

I think there might be something interesting for this strategy also in the WebAssembly space :)

objektif6y ago

Can anyone pls tell me if there are any other tools out there to increase performance of pandas?

alcidesfonseca6y ago

Modin is an alternative pandas implementation for distributed processing using Ray or Dask:

https://github.com/modin-project/modin

roadbeats6y ago

Interesting. It looks like Rust and Swift will be competitors in this field.

xtat6y ago

This would have made so much of my work so much faster

RocketSyntax6y ago

you had me at keras

janered6y ago

Interesting that initial implementation was in Scala but then they switched to Rust because of minimal runtime, language embeddability, functional paradigms, community and high quality packages. The hype bandwagon is so real in here. So basically one could say the same for several other well established languages, e.g. Haskell. Also what saddens me is that everyone forgets about D which has the same benefits and a syntax that does not make scratch your eyes out, especially when it comes to FP. Also D has not actually "skipped the leg day" ;)

j / k navigate · click thread line to collapse

110 comments

spenrose6y ago

vmchale6y ago

Not that you should replace Rust with Haskell, but Haskell would've been a better choice than Scala.

It has its own runtime, but it's not difficult to call Haskell code from C or ATS or whatever.

choeger6y ago

Not difficult to call from C? How does that work, exactly? Wouldn't you need to properly setup the whole runtime (incl. GC) first?

2 more replies

dpflan6y ago

Very interesting. Do you have any references to share relevant to the article you suggest to be written?

spenrose6y ago

No, I based on attending SparkConf between 2015 and 2017. You could probably assemble half of it just by reading summaries of Matei's keynotes.

1 more reply

paulddraper6y ago

> it was too difficult to embed a JVM-based language into other runtimes and languages

In addition to the JVM, Scala has had JS [1] and native (via LLVM) [2] targets for years.

(And that's not even mentioning any second-order compilations; e.g. Scala -> JVM bytecode -> native)

There's a number of reasons to not choose Scala, but portability is far from one of them.

[1] https://www.scala-js.org

[2] http://www.scala-native.org

pacala6y ago

The library ecosystem is different between JVM/JS/Native. Porting across runtimes may require more work than just changing the compilation target.

1 more reply

the_duke6y ago

See also this interesting talk on Weld at RustConf 2019: https://www.youtube.com/watch?v=AZsgdCEQjFo&t=1430s

westurner6y ago

There's also RustPython, a Rust implementation of CPython 3.5+: https://news.ycombinator.com/item?id=20686580

> https://github.com/RustPython/RustPython

tomrod6y ago

Is this basically what Cython and PyPy are trying to do, but with Rust?

sp3326y ago

PyPy is a JIT compiler. RustPython is an interpreter.

1 more reply

sieabahlpark6y ago

Why didn't they call it Rython

rmilejczz6y ago

RPython exists (it’s used to implement PyPy) so I imagine that would be confusing

microcolonel6y ago

Or RytOn ;-)

1 more reply

pard686y ago

Convention wound lean towards calling it RPython

vmchale6y ago

Why?

tmostak6y ago

nautilus126y ago

That tweetmap is impressive

adrien-treuille6y ago

This post combines pretty much every technology I'm obsessed with right now: Python, Rust, Pandas, Numpy, and LLVM. Yess!!!

superdimwit6y ago

If these things interest you, check out Julia

mlevental6y ago

how do you know someone uses Julia? don't worry they'll tell you.

2 more replies

whoevercares6y ago

Just a word of caution, always obsess with product and customer needs first :) In ML/data science tech first normally won’t end up well

d336y ago

It's important to enjoy your work, which is - among other causes - about having right tools. Also, some of us actually get to have some influence over what language we write our projects in.

1 more reply

mkl6y ago

People aren't all or always working on products or serving customers.

cortesoft6y ago

Unless they are just talking about personal projects?

stereosteve6y ago

This project has similar goals to the MLIR project:

https://github.com/tensorflow/mlir

https://www.youtube.com/watch?v=qzljG6DKgic

Exciting times for the future of parallel computing!

mlthoughts20186y ago

https://numba.pydata.org/

sppalkia6y ago

infinite8s6y ago

marmaduke6y ago

Rust is just a IPO driver of sorts here.

I'm not critizing Numba btw, I use it regularly, but your comment seems a little off here, considering that Weld has different goal in mind.

mlthoughts20186y ago

I don’t agree that the purpose of the article is misaligned from my criticism. This is based on reading the article.

unbalancedparenOP6y ago

FreakLegion6y ago

Numba is the option used in Lectures in Quantitative Economics with Python, posted and highly upvoted here yesterday: https://news.ycombinator.com/item?id=21022620.

blts6y ago

Numba is amazing. +1 for numba

objektif6y ago

But does it really speed up numerical libraries like numpy and pandas? I thought it only works on pure python code.

axegon_6y ago

I have said multiple times that Rust has an incredible potential in the data analysis world. And Weld is a great example.

the_duke6y ago

Weld is a compiler/JIT/runtime though, something Rust is very well suited for, and which is very different code from data analysis/ML.

I think Julia is a more interesting language for this space, with the built in matrix support, easier prototyping, a REPL, etc...

sppalkia6y ago

1 more reply

vmchale6y ago

Lots of languages are well-suited to compilers. See e.g. Haskell accelerate.

I don't really know why you'd use Rust instead of a GC language from the ML family.

janered6y ago

Ahem... ahem..., D?

rolltiide6y ago

If only you could get paid for porting open source libraries, maintaining them and trying to get a community to use it

Have fun entertaining your Patreon

axegon_6y ago

True that, hence the reason why I do it so little. In fact I personally do it out of pure boredom.

riboflavin6y ago

Sounds a lot like Gandiva (part of Apache Arrow) as well. https://www.dremio.com/announcing-gandiva-initiative-for-apa.... Cool!

ris6y ago

Numpy et al of course already have N python acceleration frameworks hammering at their doorsteps to integrate more closely...

sgillen6y ago

How much cooperation is needed though? It seems to me that all that numpy pandas etc. need to do is maintain a stable API, which they already do AFAIK.

xiphias26y ago

I saw a performance comparision with XLA, and it's interesting that Weld is faster, because XLA is supposed to optimize the code using the known tensor sizes during compile time.

Weld and XLA seem to have similar optimization steps though.

sppalkia6y ago

I also want to mention that this benchmark is from a while back (around 2017 I believe), so its possible improvements in both XLA and Weld will make the numbers look different today :)

davmre6y ago

For what it's worth, Jax (github.com/google/jax) now lets you use XLA to compile Numpy code. It'd be cool to see how that would stack up in a modern comparison.

dlphn___xyz6y ago

whats the benefit of rust over julia or C for computation?

shpongled6y ago

tclancy6y ago

I can’t decide if Rust Evangelism Strike Force sounds like an awful or awesome cartoon.

1 more reply

shaklee36y ago

C++ has sum types as of several years ago with std::variant.

2 more replies

jdc6y ago

from TFA:

>> We chose Rust because:

>> It has a very minimal runtime (essentially just bounds checks on arrays) and is easy to embed into other languages such as Java and Python

>> It contains functional programming paradigms such as pattern matching that make writing code such as pattern matching compiler optimizations easier

>> It has a great community and high quality packages (called “crates” in Rust) that made developing our system easier.

vmchale6y ago

Did they have to write the runtime in the same language?

They could've used an ML with GC and it would've been better (for a compiler).

It doesn't really have any functional programming paradigms. Pattern matching is present in imperative languages like past versions of ATS

1 more reply

giancarlostoro6y ago

mkl6y ago

juststeve6y ago

SIMD / intrinsics?

1 more reply

rch6y ago

I like the memory safety, concurrency, and zero-based indices.

vmchale6y ago

This is different from Julia.

I don't really know why you'd write a project of this sort in C.

agumonkey6y ago

in this particular case it's to be a drop in replacement for the python datascientist, the rest is irrelevant I believe

Myrmornis6y ago

> The motivation behind Weld is to provide bare-metal performance for applications that rely on existing high-level APIs such as NumPy and Pandas.

syrusakbary6y ago

This is awesome. The quality of work behind it it's incredible.

I think there might be something interesting for this strategy also in the WebAssembly space :)

objektif6y ago

Can anyone pls tell me if there are any other tools out there to increase performance of pandas?

alcidesfonseca6y ago

Modin is an alternative pandas implementation for distributed processing using Ray or Dask:

https://github.com/modin-project/modin

roadbeats6y ago

Interesting. It looks like Rust and Swift will be competitors in this field.

xtat6y ago

This would have made so much of my work so much faster

RocketSyntax6y ago

you had me at keras

janered6y ago

j / k navigate · click thread line to collapse