clickhouse-local - https://news.ycombinator.com/item?id=22457767
q - https://news.ycombinator.com/item?id=27423276
textql - https://news.ycombinator.com/item?id=16781294
simpql- https://news.ycombinator.com/item?id=25791207
We need a benchmark i think..;)
dsq CLI - https://github.com/multiprocessio/dsq DataStation desktop app - https://datastation.multiprocess.io/
Two alternative CLI tools that I looked at:
sq - https://sq.io/ octosql - https://github.com/cube2222/octosql
Honourable mentions to:
Miller - https://miller.readthedocs.io/en/latest/index.html Dasel - https://daseldocs.tomwright.me/
These don't use SQL.
VisiData is also amazing for data navigation, although it requires some effort to get the model:
I’ve spend what feels like too much time looking for a windows app that I can use to occasionally preview CSV files before moving them into WSL.
I just think $50/year is too much for occasional use.
Since this posts has initiated several other shameless plugs, maybe I can swoop in and ask for shameless recommendations for something cheap in this space.
I created this project for distributing Node via PyPI: https://pypi.org/project/nodejs-bin/
2) the author once answered some questions I posted on Datafusion, so they're cool in my book
Here are my anecdotes.
I'd be willing to bet most programmers would instantly understand something like `.users[] | {email: .email, lastLogin: .logins[-1].date}`, even if they've never seen jq.
Now that I'm thinking about it, the kind of structures we often use JSON for are, in a way, a subset of what can be done with tables and pointers (foreign keys), so would it be possible to create a kind of jq to SQL compiler? Has anyone tried that?
Is not a BI tool tho, if queries have high variability on the where clause and you can't leverage indexes then you're looking at minutes as response time.
If the data is not in structures but plain csv/json, all bets are off.
I've not yet tested it at terabyte scale albeit it should happily scale up there.
If you like this, I recommend taking a look at OctoSQL[0], which I'm the author of.
It's plenty fast and easier to add new data sources for as external plugins.
It can also handle endless streams of data natively, so you can do running groupings on i.e. tailed JSON logs.
Additionally, it's able to push down predicates to the database below, so if you're selecting 10 rows from a 1 billion row table, it'll just get those 10 rows instead of getting them all and filtering in memory.
> blazingly fast
I’m going to need to see a citation for that. Last I checked, it was being beaten by Apache Spark in non-memory constrained scenarios [0]. This may be “blazingly fast” compared to Pandas or something, but it’s still leaving a TON of room on the table performance-wise. There’s a reason why Databricks found it necessary to redirect their Spark backend to a custom native query engine [1].
[0] https://andygrove.io/2019/04/datafusion-0.13.0-benchmarks/
[1] https://cs.stanford.edu/~matei/papers/2022/sigmod_photon.pdf
A simple group by
time columnq sql --table books_10m.ndjson "SELECT AVG(overall) FROM books_10m"
takes 66 seconds.The equivalent in OctoSQL takes less than 3 seconds.
I retract my statement about this project being blazingly fast, though I imagine it's just the JSON parser that requires optimization.
e.g.
sqlite3 :memory: -cmd '.mode csv' -cmd '.import royalties.csv Royalty' -cmd '.mode column' \
'SELECT SUM(Royalty),Currency FROM Royalty GROUP BY Currency'1. SQL can be thought of as being composed of several smaller lanuages: DDL, DQL, DML, DCL.
2. columnq-cli is only a CLI to a query engine, not a database. As such, it only supports DQL by design.
3. I have the impression that outside of data engineering/DBA, people are rarely taught the distinction between OLTP and OLAP workloads [1]. The latter often utilizes immutable data structures (e.g. columnar storage with column compression), or provides limited DML support, see e.g. the limitations of the DELETE statement in ClickHouse [2], or the list of supported DML statements in Amazon Athena [3]. My point -- as much as this tool is useless for transactional workloads, it is perfectly capable of some analytical workloads.
[1] Opinion, not a fact.
[2] https://clickhouse.com/docs/en/sql-reference/statements/dele...
[3] https://docs.aws.amazon.com/athena/latest/ug/functions-opera...
> Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Even the name of the project "ROAPI" has "read only" in the name.
What kind of headline would make you want to read/try such a thing?
(I'm planning on announcing it + releasing code on HN but have never done so before)