There is an important "contemporary history of computing" article to write about the evolution of the Spark project from "let's build a distributed filesystem for MapReduce in Java because we read those early Google papers" to "SQL is the right model for working with data so DataFrames" to "meet data scientists where they are: Python (and R)" to "make machine learning easy" and now to "LLVM, but for crunching big numeric arrays".
It has its own runtime, but it's not difficult to call Haskell code from C or ATS or whatever.
In addition to the JVM, Scala has had JS [1] and native (via LLVM) [2] targets for years.
(And that's not even mentioning any second-order compilations; e.g. Scala -> JVM bytecode -> native)
There's a number of reasons to not choose Scala, but portability is far from one of them.
https://github.com/tensorflow/mlir
https://www.youtube.com/watch?v=qzljG6DKgic
Exciting times for the future of parallel computing!
Overall, we think that the accelerating the kinds of data science apps Weld and Numba target will not only involve tricks such as compilation that make user-defined code faster, but also systems that can just schedule and call code that people have already hand-optimized in a more efficient and transparent way (e.g., by pipelining data).
Rust is just a IPO driver of sorts here.
I'm not critizing Numba btw, I use it regularly, but your comment seems a little off here, considering that Weld has different goal in mind.
I think Julia is a more interesting language for this space, with the built in matrix support, easier prototyping, a REPL, etc...
I don't really know why you'd use Rust instead of a GC language from the ML family.
Numpy et al of course already have N python acceleration frameworks hammering at their doorsteps to integrate more closely...
Weld and XLA seem to have similar optimization steps though.
I also want to mention that this benchmark is from a while back (around 2017 I believe), so its possible improvements in both XLA and Weld will make the numbers look different today :)
>> We chose Rust because:
>> It has a very minimal runtime (essentially just bounds checks on arrays) and is easy to embed into other languages such as Java and Python
>> It contains functional programming paradigms such as pattern matching that make writing code such as pattern matching compiler optimizations easier
>> It has a great community and high quality packages (called “crates” in Rust) that made developing our system easier.
They could've used an ML with GC and it would've been better (for a compiler).
It doesn't really have any functional programming paradigms. Pattern matching is present in imperative languages like past versions of ATS
I don't really know why you'd write a project of this sort in C.
With regard to Pandas this makes me pause slightly, since, while pandas contains lots of high quality and high performance implementations, the API of pandas in some places doesn’t feel well-designed (the most obvious example is indexing of data frames via square brackets and the various properties like iloc).
I think there might be something interesting for this strategy also in the WebAssembly space :)