Before, I would reach for Apache Spark to run queries on local Parquet datasets (on a single machine), but I've started using DuckDB for that and it's super fast (much faster than Pandas) and unfussy to integrate in Python code (with PySpark you need all kinds of boilerplate code).
DuckDB is so lightweight that it's also great for quick interactive work in Jupyter or IPython.
I also use it to do cross-format joins between Parquet (immutable) and CSV files (mutable) -- DuckDB can load both into the same environment -- which makes it easy to solve different kinds of programming problems. The dynamic stuff goes into the CSV files while the Parquet dataset remains static. For instance, when processing a large Parquet dataset, my code keeps track of groups that I'vee already processed in a CSV file. If the program is interrupted and I need to resume from where I left off, I just do a DuckDB join between Parquet and CSV and exclude already processed groups (particularly when you don't have a group key, and you're grouping by several fields). Yes, you can do all this with Spark too, but the DuckDB code is so much simpler and compact.
For large Parquet datasets, I currently roll my own code to chunk them so I can process them out-of-core, but sounds like this latest streaming feature in DuckDB takes care of that detail.
Sure, DuckDB doesn't do distributed compute like Spark, but as a SQL engine for Parquet, I find it's so much more ergonomic than Spark.
DuckDB is often compared to SQLite. But a more apt comparison might be KDB+, a proprietary vector embedded database.
Postgres is a row store engine rather than a column store, so I believe there will need to be quite a lot of translation for Postgres to be able to process parquet data (DuckDB and parquet are both columnar). My hypothesis is that DuckDB would be significantly faster! However, feel free to benchmark things!
We saw speedups (20%+?), but wrong orders of magnitude of perf that we advocate DB vendors to aim for when we do visual analytics integrations. Arrow opens up saturating networks & PCI cards for DB<>GPU, so think going for 10-50GB/s.
So you have a new way to run SQL on Parquet et al through DuckDB -> Arrow -> Parquet. Of course, you still need to watch out for memory usage of your SQL query if it contains JOINs or Window functions because the integration is designed for streaming rows.