undefined | Better HN

0 pointscube22223y ago0 comments

That's what I thought about ROAPI as well, until I benchmarked it, and it ended up being very slow[0].

[0]: https://news.ycombinator.com/item?id=32970495

0 comments

It could be the NDJSON parser (DF source: [0]) or could be a variety of other factors. Looking at the ROAPI release archive [1], it doesn't ship with the definitive `columnq` binary from your comment (EDIT: it does, I was looking in the wrong place! https://github.com/roapi/roapi/releases/tag/columnq-cli-v0.3...), so it could also have something to do with compilation-time flags.

FWIW, we use the Parquet format with DataFusion and get very good speeds similar to DuckDB [2], e.g. 1.5s to run a more complex aggregation query `SELECT date_trunc('month', tpep_pickup_datetime) AS month, COUNT(*) AS total_trips, SUM(total_amount) FROM tripdata GROUP BY 1 ORDER BY 1 ASC)` on a 55M row subset of NY Taxi trip data.

[0]: https://github.com/apache/arrow-datafusion/blob/master/dataf...

[1]: https://github.com/roapi/roapi/releases/tag/roapi-v0.8.0

[2]: https://observablehq.com/@seafowl/benchmarks

cube2222OP3y ago

Yes, DataFusion itself is definitely fast, no denying that.

j / k navigate · click thread line to collapse