Polars seems to be the most prominent competitor in the Python DataFrame space, and DuckDB appears to pursue an approach similar to SQLite, but columnar.
I am personally working on a solution to a broader problem, which can also be viewed as an alternative to Pandas [2].
[1] https://wesmckinney.com/blog/apache-arrow-pandas-internals/
That being said, if I were to start a new project requiring that kind of work today, I would probably try Polars first. Their greenfield implementation allowed them to get rid of many of the crusty edges of pandas.
I was thinking about it for quite a while, but not sure if there would be interest. Thanks for your comment!
Would have made my life a lot easier when I was learning Pandas.
Would also be cool to have a Polars version of this too.
One suggestion:
A lot of folks come to Pandas from using SQL. It might be handy to have a couple "The equivalent of this SQL statement but in Pandas"
Pandas was never meant to be a technologist's tool. It was meant to be a researcher's tool and was unfortunately coopted to be a technical solution as well. It has not well escaped it's roots.
Pandas is fantastic for doing iterative and interactive research on semi-structured data. It has a lot of QoL facilities and utility functions for seamlessly dealing with exploratory timeseries analytics for in-core data. Data that fits into memory.
For example, I can take two time series and calculate their product:
ts3 = ts1 * ts2
This one line does a huge amount of heavily lifting by automatically aligning the timestamps and columns between the two inputs so that I'm not accidentally multiplying two entries that have the same ordinal but not the same timestamp or column label.
Can I do the same with Polars? Yes, but it comes with exponentially more cognitive overhead. And this is just one example.
Pandas is ultimately a flawed product as it's origin's go back more than a decade where R's dataframe was cutting edge. A lot of innovation happened since then and the API and internals of Pandas mean that certain choices that were made early on are nontrivial to change.
This doesn't change the fact that Pandas is still immensely useful. Eventually perhaps Polars will come close to it, but so far the focus wasn't on interactive use ergonomics unfortunately.
As it stands, I use pandas for research and polars for production systems.
The data world owes a lot to pandas, but it has plenty of sharp edges and using it can sometimes involve pretty close knowledge of how things like indexing/slicing/etc work under the hood.
If I get stuck in polars, its almost always just a "what's the name of the function to use?" type problem rather than needing lots of knowledge about how things are working under the hood.
A kind understatement imo. For me, the following experiences are highly coupled in my brain: "I'm using Pandas" + "I'm feeling a weird combination of confusion and pain" + "This is a dumpster fire".