...seems like a quite odd way to store time-series in ClickHouse. If I understood that code correctly (and I am really not sure), they partition their data by some tag value (the first one in a list?), and sort each partition by "tags ID", while timescaledb partitions by time afaik.
Of course there will be large discrepancies if data is sorted one way in one database schema, and another way in another schema. It seems that at least their query of "ORDER BY time LIMIT 10" would greatly benefit from partitioning or sorting the table by time.
Whether that makes sense depends on your usecase. But I don't think a benchmark with completely dfferent schemas, partitioning and primary keys across databases is fair.
Another thing I noticed is that their version of ClickHouse is quite old, at least aroudn the time the test was written. The shown CREATE TABLE syntax is deprecated since a few versions and cannot be found in recent docs, only github: https://github.com/ClickHouse/ClickHouse/blob/v18.16/docs/en...
CREATE TABLE tsbs_modern (
created_date Date DEFAULT today(),
created_at DateTime DEFAULT now(),
time String,
tags_id UInt32,
additional_tags String DEFAULT ''
) ENGINE = MergeTree
PARTITION BY toYYYYMM(created_date)
ORDER BY (tags_id, created_at)We'd love to get your feedback!
The post here really focuses on one query and that is weirdly without a time sort. Would similar queries be also fast? - What about a join, aggregates, lag()-over, subqueries, unions, etc queries
NB: The launch of this demo was done some time ago on HN: https://news.ycombinator.com/item?id=23616878
Cleanup is semi manual for now. Time partitions can be removed or detached via SQL. We’re working on automating that.
Cool! Will that be continuous queries that can be used for downsampling?
I'm working on load testing and monitoring tools. Since either can produce enough metrics to overflow available storage, the downsampling story ends up as important as write speed for me. I imagine that's true of a lot of metric database scenarios--what happens if they go on...forever?
Looks similar to SingleStore, I wonder how it would perform on the same benchmark. They also use JIT and scan parallelization.