As far as I could tell I would have to bring that data into elixir, do the text processing and put it back into explorer, which to me defeated the whole point of a dataframe library.
I imagine it's good for precleaned data, using it with the built in datasets has been fine
Feel free to open up an issue! We have been focused more on high-level features (such as integration with S3, Postgres, Snowflake, SQLite, etc) and therefore we are missing many functions that already exist on Polars. Good news is that it is very quick to add them, so just let us know. :)
Yeah I noticed that, it looks like the numeric manipulations are well represented, less so for strings.
I'll dig out the the code later on and get an issue raised.
My understanding is you can use any Series operations without penalty (ie. they get passed into some rust NIF call), https://hexdocs.pm/explorer/Explorer.Series.html#functions-s..., which does include whitespace trimming, but not arbitrary strings, but I imagine it wouldn't be too much of a jump to add arbitrary constant strings. Might just need to expose `str.slice` or `str.replace`.
The docs do imply that `mutate_with` operates lazily, so you only pay the transfer cost once per row, no matter how many mutations you're applying, but whether that's performant enough depends would be case by case.
new_column = Series.transform(df["column"], fn arg -> ... end)
DF.put(df, "column", new_column)
which _is annoying_ since you are not supposed to use it. The correct way is to extend the Series API, which we will be very happy to!https://pola-rs.github.io/polars/py-polars/html/reference/ex...
https://pragprog.com/titles/smelixir/machine-learning-in-eli...
You can always make Elixir app talk to Python ML backend and get the best of both worlds if you desire.
I like Elixir for web development otherwise, it is a much more stable domain so above doesn't apply (although I've seen some claim otherwise, which is telling how much more of an issue it would be for niche ML use case).
I'd be very happy to be proven wrong by some case studies of how companies leveraged Elixir in real ML projects and concluded it is superior to Python.
And just as Elixir is (in my opinion) preferable to Python for web development, it's possible that the same may happen with AI.
And yes, when it comes to AI, things also could change in favour of Elixir - I would be pretty happy about it.
your flaw of reasoning can be trivially pointed out simply by explaining that once upon a time, Python was NOT "the language for machine learning". Essentially, NO "X is the solution for Y" started out that way. Which is why appeals to popularity are a fallacy.
The thing is Elixir is really good at an increasing number of things.
If you need to write a HTTP proxy in the middle of your application, since Elixir processes & incoming HTTP workers are cheap, you do not need to go evented: it just works.
If you need to have reactive web apps with automated changes pushed to the client, it's the same: there is no need to external tools (e.g. any cable) at certain scale.
If you need to do some scripting, there is `Mix.install/2` for single-file dependencies description & use.
If you start crawling too much web pages or process to many APIs, the concurrency support kicks in and there is less need to scale (or later), turning into fewer machines, fewer ops problems (or delayed) etc.
And now you start being able to use MachineLearning, deploy the same type of code on GPU, embed Machine Learning models right in the middle of your web app without much work, etc, which in turns makes it a nice platform for apps / SaaS.
Elixir really is becoming a Swiss-army knife which scales easily :-)
I say this as someone that likes elixir, but after seeing it failing miserably at my org, I'm very skeptical it can be thrown around like a spring or node or django project. It needs real support from the org and requires module design skills that are not present in most random devs from a random org.
Module design doesn't seem any harder than class design in JS or Python.
Do you mean the language is generally harder for non-developers? Or that Elixir is harder for JS/Python developers to pick up and write good code? Or something else?
Writing well designed Elixir code does seem to require a fairly different approach from most common OO languages, at least at a surface level. (Although IMO that's more because you can copy OO patterns you've seen before without thinking much about why they're good patterns than because good design in Elixir is much different from OO)
It is too much a "language of experts" at the moment, although it is not caused by the language itself, more by the topics covered in general.
The current workarounds to make this happen in python are quite ugly imho, e.g. Pytorch spawns multiple python processes and then pushes data between the processes through shared memory, which incurs quite some overhead. Tensorflow on the other hand requires you to stick to their Tensor-dsl so that it can run within their graph engine. If native concurrency were a thing, data loading would be much more straightforward to implement without such hacks.
1. Loading data
2. Running algorithms that benefit from shared memory
3. Serving the model (if it's not being output to some portable format)
There are also general benefits of using one language across a project. Because Python is weak on these things, we end up using multiple languages.
Go and elixir provide some parallelism but the primary focus for both languages is concurrency.
Devs we hire without direct Elixir experience pick it up really quick (within a couple weeks). The energy needed to "get good" with Elixir is really not much considering it provides veritable super powers on the backend and introduces a whole category of concurrency concepts that are not easy to grasp elsewhere.
How confident are you that the junior you just hired is operating correctly in the other_code module they're responsible for
import other_code, as: other
def f:
my_dict = {"foo": 1, "bar": 2}
other.function(my_dict)
return my_dict["foo"] # ==> you might be wrong about what's in my_dict* Exercism track: https://exercism.org/tracks/elixir
* Sasa Juric's book: https://www.manning.com/books/elixir-in-action-third-edition
* Dave Thomas's Elixir Course: https://codestool.coding-gnome.com/courses/elixir-for-progra...
* Phoenix Guides: https://hexdocs.pm/phoenix/overview.html
* Ecto Guides: https://hexdocs.pm/ecto/getting-started.html
The above covers the language basics/ideas/concepts and the main tooling (Phoenix/Ecto) if you're looking to build apps or get an Elixir job. I definitely recommend the Phoenix Guides or similar - they're very high quality and kept up to date with any new releases or changes while books can sometimes get out of date.
Most of the Elixir Nx efforts are on inference, especially on how you can embed and scale it using concurrent and distributed patterns (see this post/video [1]). It may not be what you are looking for but we have more folks deploying than training models, so maybe they will find incentives to give Elixir a try. :)
[1]: https://news.livebook.dev/distributed2-machine-learning-note...
> Why is Python not Sufficient?
It then proceeds to make a case why Python would not have enough speed or support for parallel processing, which is what I'm disputing.
Throwing BEAM or FP acronyms around won't really strike a chord with people working with data and models.
Mojo will (as per promise) tap into the wider ecosystem. Other platforms are more than welcome to try but this ultimately requires a huge community of scientists / developers to become a real alternative.
Other languages have certain features that make extension and integration feel like first-class concerns which lowers the barrier to contributions from a wider range of people and also helps keep e.g. dependencies and build processes relatively simple.
Elixir & Python are not an apples to apples comparison - there are fundamental differences in the programming model (functional, immutability, etc) and runtime (preemptive scheduling + OTP) that is the reason it has distinct advantages not available without heavy cost trade-offs elsewhere.
Either way once Mojo is production ready Elixir will be able to use it as well like it does Rust, Zig, or Python.
Last week heard a story about an ML dev that would literally rebuild his system every week because python would break it
Naming things is hard.
Java does occasionally require that a person might have to implement their own code after reading a research paper, but I've always enjoyed that part of the job.
I've never understood Python's popularity except that I've heard some people say that it's used at Google.
It's definitely not a fast moving language.