As far as I could tell I would have to bring that data into elixir, do the text processing and put it back into explorer, which to me defeated the whole point of a dataframe library.
I imagine it's good for precleaned data, using it with the built in datasets has been fine
Feel free to open up an issue! We have been focused more on high-level features (such as integration with S3, Postgres, Snowflake, SQLite, etc) and therefore we are missing many functions that already exist on Polars. Good news is that it is very quick to add them, so just let us know. :)
My understanding is you can use any Series operations without penalty (ie. they get passed into some rust NIF call), https://hexdocs.pm/explorer/Explorer.Series.html#functions-s..., which does include whitespace trimming, but not arbitrary strings, but I imagine it wouldn't be too much of a jump to add arbitrary constant strings. Might just need to expose `str.slice` or `str.replace`.
The docs do imply that `mutate_with` operates lazily, so you only pay the transfer cost once per row, no matter how many mutations you're applying, but whether that's performant enough depends would be case by case.
https://pola-rs.github.io/polars/py-polars/html/reference/ex...
https://pragprog.com/titles/smelixir/machine-learning-in-eli...
You can always make Elixir app talk to Python ML backend and get the best of both worlds if you desire.
I like Elixir for web development otherwise, it is a much more stable domain so above doesn't apply (although I've seen some claim otherwise, which is telling how much more of an issue it would be for niche ML use case).
I'd be very happy to be proven wrong by some case studies of how companies leveraged Elixir in real ML projects and concluded it is superior to Python.
And just as Elixir is (in my opinion) preferable to Python for web development, it's possible that the same may happen with AI.
your flaw of reasoning can be trivially pointed out simply by explaining that once upon a time, Python was NOT "the language for machine learning". Essentially, NO "X is the solution for Y" started out that way. Which is why appeals to popularity are a fallacy.
The thing is Elixir is really good at an increasing number of things.
If you need to write a HTTP proxy in the middle of your application, since Elixir processes & incoming HTTP workers are cheap, you do not need to go evented: it just works.
If you need to have reactive web apps with automated changes pushed to the client, it's the same: there is no need to external tools (e.g. any cable) at certain scale.
If you need to do some scripting, there is `Mix.install/2` for single-file dependencies description & use.
If you start crawling too much web pages or process to many APIs, the concurrency support kicks in and there is less need to scale (or later), turning into fewer machines, fewer ops problems (or delayed) etc.
And now you start being able to use MachineLearning, deploy the same type of code on GPU, embed Machine Learning models right in the middle of your web app without much work, etc, which in turns makes it a nice platform for apps / SaaS.
Elixir really is becoming a Swiss-army knife which scales easily :-)
I say this as someone that likes elixir, but after seeing it failing miserably at my org, I'm very skeptical it can be thrown around like a spring or node or django project. It needs real support from the org and requires module design skills that are not present in most random devs from a random org.
Devs we hire without direct Elixir experience pick it up really quick (within a couple weeks). The energy needed to "get good" with Elixir is really not much considering it provides veritable super powers on the backend and introduces a whole category of concurrency concepts that are not easy to grasp elsewhere.
How confident are you that the junior you just hired is operating correctly in the other_code module they're responsible for
import other_code, as: other
def f:
my_dict = {"foo": 1, "bar": 2}
other.function(my_dict)
return my_dict["foo"] # ==> you might be wrong about what's in my_dictMost of the Elixir Nx efforts are on inference, especially on how you can embed and scale it using concurrent and distributed patterns (see this post/video [1]). It may not be what you are looking for but we have more folks deploying than training models, so maybe they will find incentives to give Elixir a try. :)
[1]: https://news.livebook.dev/distributed2-machine-learning-note...
> Why is Python not Sufficient?
It then proceeds to make a case why Python would not have enough speed or support for parallel processing, which is what I'm disputing.
Throwing BEAM or FP acronyms around won't really strike a chord with people working with data and models.
Mojo will (as per promise) tap into the wider ecosystem. Other platforms are more than welcome to try but this ultimately requires a huge community of scientists / developers to become a real alternative.
Elixir & Python are not an apples to apples comparison - there are fundamental differences in the programming model (functional, immutability, etc) and runtime (preemptive scheduling + OTP) that is the reason it has distinct advantages not available without heavy cost trade-offs elsewhere.
Either way once Mojo is production ready Elixir will be able to use it as well like it does Rust, Zig, or Python.
Last week heard a story about an ML dev that would literally rebuild his system every week because python would break it
Naming things is hard.
Java does occasionally require that a person might have to implement their own code after reading a research paper, but I've always enjoyed that part of the job.
I've never understood Python's popularity except that I've heard some people say that it's used at Google.
It's definitely not a fast moving language.