undefined | Better HN

0 pointsRedShift15y ago0 comments

Can you elaborate on those issues?

0 comments

19965y ago

It would be too long. To quickly summarize, from the pain of backups (unless you setup a WAL replica, the load may take your database down), the large size of the data on disk (timescale does offer some compression now, but it's still too much), the low performance of large queries, the memory requirements - it's death by a thousand papercuts!

Don't get me wrong, timescale is a great way to get started with time series - just like sqlite is a great way to get started with databases if all you know is nosql.

However, it quickly brings its own challenges - and the new license is the cherry on the cake: it is locking you down to your own infrastructure unless you want to pay for timescale own SAAS offering (and then prey they do not alter the condition of the deal too much later)

It is just not worth it, unless you have a very small problem, or you can afford to have people concentrating on timescale maintenance - and in this case, you would be getting better bang for bucks by having these people work on clickhouse.

I'm speaking only from my own experience. I have relatively large servers dedicated to time series (about 100T of disk space, between 128 and 256 Gb of RAM). They were going to be retired for even bigger servers. Instead, we experimented with clickhouse on one of the recently decommissioned servers. We could not believe the benchmarks! Moving to clickhouse has improved the performance on about every metric. Yes, it required some minor SQL rewrites, about 1 day of work total, but unless your hardware is free and your queries are set in stone, clickhouse makes more sense.

mfreed5y ago

Hi - I just want to clarify some mistake / misrepresentation about our Timescale License (TSL):

1. The TSL is not a new license, have had it in place since late 2018. What we recently announced is that multi-node TimescaleDB will be available for free under the TSL (free, source available), while (for example) clustered InfluxDB is purely proprietary (paid, closed source).

2. Our TSL license prevents offering TimescaleDB-as-a-service, it absolutely does NOT prevent you from running/offering a SaaS service or from utilizing cloud services/infra (you say "it is locking you down to your own infrastructure unless you want to pay for timescale own SAAS offering"). Specifically, Timescale offers a pure Apache-2 version and a "Timescale License" (TSL) Community version. For the TSL version, what it primarily restricts is the cloud providers like AWS and Azure from offering TimescaleDB-as-a-service (e.g., TimescaleDB Community on AWS RDS). Many thousands of companies use our community version for free to build SaaS services running on their own AWS instances.

19965y ago

> InfluxDB is purely proprietary (paid, closed source).

And clickhouse is not. I just suggest skipping the timescaledb step to someone migrating from influx, and going straight to clickhouse.

> For the TSL version, what it primarily restricts is the cloud providers like AWS and Azure from offering TimescaleDB-as-a-service (e.g., TimescaleDB Community on AWS RDS)

If there is some kind of emergency and I need to have the database on the cloud, this is a serious restriction. It limits my choices and constrains my actions.

> Many thousands of companies use our community version for free to build SaaS services running on their own AWS instances.

We have our servers, so it wasn't an issue. It was more of a long term concern, a chilling effect: what else may be restricted in the future?

Again, I think timescaledb has a wonderful place. It will certainly become the entry level database for timeseries.

It is just not suite for our workload.

2 more replies

manigandham5y ago

Timescale has added their own layer of compression and columnar layouts to the Postgres row storage. That will get you to around 70% of the performance of using a dedicated column-oriented data warehouse, with the rest depending on how complex and selective your queries are.

It won't match the pure scan and computation speed of Clickhouse but the continuous aggregation feature is the recommended approach for querying large datasets (similar to Clickhouse table engines like AggregatingMergeTree).

cevian5y ago

(TimescaleDB engineer here) Some of the comments here sound technically off.

We've never seen a backup take down a machine. The backups we use are the same as Postgres which are used by millions of companies without a problem (and can be streaming incremental backups like pgBackrest, WAL-E, etc. or whole-database backups like pg_dump). As with any DB you do have to size and configure your database correctly (which these days isn't hard).

We've never seen anybody claim that ClickHouse offers significantly better compression than we do overall. Obviously compression depends heavily on data distribution and I'm sure you could make up a dataset where clickhouse does better (just as you could where TimescaleDB does better). But on real distributions we don't see this at all, we do pretty advanced columnar compression on a per-datatype basis [1], and see median space reduction of 95% from compression across users.

Large queries is a weird claim to make since Postgres has more different types of indexes than Clickhouse and has support for multiple indexes. If you are processing all of your data for all your queries then yes, click house sequential scans may be better. But that's less common, and also where TimescaleDB continuous aggregates come in.

We've seen customers successfully use our single-node version with 100s of billions of rows so claiming that we are just for small use-cases is simply untrue, and especially with the launch of multi-node TimescaleDB.

I understand people may have different preferences and experiences, but some of these felt a bit off to me.

[1] https://blog.timescale.com/blog/building-columnar-compressio...

19965y ago

On servers with a very high CPU load, backup done without using continuous streaming to a second server and done from this second server, something (I have stopped using timescale so I can't tell you what did) during backup caused a peak in load and IO, impacting read and write performance of the primary server, causing a cascading failure of the processes due to timeouts, eventually taking the server down due to swap issues and OOM triggering a reboot.

So we stopped doing backups. Actually, that's how we started using clickhouse: for cold storage, as the files in /var/lib/clickhouse used far less storage space and issues. Eventually the same data was sent both to timescaledb and clickhouse, in a poor's man backup. Finally, timescaledb was removed.

> As with any DB you do have to size and configure your database correctly (which these days isn't hard).

Thanks for supposing we didn't try. We did not end up with 256Gb of RAM per server for no reason.

All I'm saying is that Timescale totally has a place, but not beyond a certain scale and complexity.

> We've never seen anybody claim that ClickHouse offers significantly better compression than we do overall

Altiny does, so do a few others. mandigandham above says that you are now at 70% of what clickhouse does. I'm not saying you're not improving. It was just one of the too many issues we had to fight.

Also, you have only recently introduced compression - good, but I'm not aware if you already offer something like DateTime Codec(DoubleDelta, LZ4), or the choice of compression algorithms. LZ4 can be slow, so there is a choice between various alternatives.

For example, T64 calculates the max and min values for the encoded range, and then strips the higher bits by transposing a 64-bit matrix. Sometimes it makes sense. zStd is slower than T64 but needs to scan less data, which makes up for it. Sometimes it makes more sense.

Large databases need more flexibility.

> If you are processing all of your data for all your queries then yes, click house sequential scans may be better

I confirm, it is better.

And for some workloads, continuous aggregates make no sense.

> We've seen customers successfully use our single-node version with 100s of billions of rows so claiming that we are just for small use-cases is simply untrue, and especially with the launch of multi-node TimescaleDB

I have about 50Tb of data per server. What is below 1Tb I call "small use cases".

> I understand people may have different preferences and experiences, but some of these felt a bit off to me.

When I was trying to use timescaledb and reported weird issues, I had the same return: my use case and bug report felt "off" to the person I reported them to.

Maybe it is why they weren't addressed - or maybe much later, when reported by more clients?

Personally, I have no horse in the game. If you become better than clickhouse for my workload, and if the license change to allow me to deploy to a cluster of AWS servers (just in case we ditch our own hardawre), I will consider timescale again in the future.

For now, I'm watching it evolve, and slowly address the outstanding issues, like disk usage, and performance. By your own admission and benchmarks, you are now at 70% of what clickhouse does - in my experience, the actual difference is much higher.

But I sincerely hope you succeed and catch up, as more software diversity is always better.

1 more reply

j / k navigate · click thread line to collapse

0 comments

19965y ago

Don't get me wrong, timescale is a great way to get started with time series - just like sqlite is a great way to get started with databases if all you know is nosql.

mfreed5y ago

Hi - I just want to clarify some mistake / misrepresentation about our Timescale License (TSL):

19965y ago

> InfluxDB is purely proprietary (paid, closed source).

And clickhouse is not. I just suggest skipping the timescaledb step to someone migrating from influx, and going straight to clickhouse.

> For the TSL version, what it primarily restricts is the cloud providers like AWS and Azure from offering TimescaleDB-as-a-service (e.g., TimescaleDB Community on AWS RDS)

If there is some kind of emergency and I need to have the database on the cloud, this is a serious restriction. It limits my choices and constrains my actions.

> Many thousands of companies use our community version for free to build SaaS services running on their own AWS instances.

We have our servers, so it wasn't an issue. It was more of a long term concern, a chilling effect: what else may be restricted in the future?

Again, I think timescaledb has a wonderful place. It will certainly become the entry level database for timeseries.

It is just not suite for our workload.

2 more replies

manigandham5y ago

cevian5y ago

(TimescaleDB engineer here) Some of the comments here sound technically off.

I understand people may have different preferences and experiences, but some of these felt a bit off to me.

[1] https://blog.timescale.com/blog/building-columnar-compressio...

19965y ago

> As with any DB you do have to size and configure your database correctly (which these days isn't hard).

Thanks for supposing we didn't try. We did not end up with 256Gb of RAM per server for no reason.

All I'm saying is that Timescale totally has a place, but not beyond a certain scale and complexity.

> We've never seen anybody claim that ClickHouse offers significantly better compression than we do overall

Altiny does, so do a few others. mandigandham above says that you are now at 70% of what clickhouse does. I'm not saying you're not improving. It was just one of the too many issues we had to fight.

Large databases need more flexibility.

> If you are processing all of your data for all your queries then yes, click house sequential scans may be better

I confirm, it is better.

And for some workloads, continuous aggregates make no sense.

I have about 50Tb of data per server. What is below 1Tb I call "small use cases".

> I understand people may have different preferences and experiences, but some of these felt a bit off to me.

When I was trying to use timescaledb and reported weird issues, I had the same return: my use case and bug report felt "off" to the person I reported them to.

Maybe it is why they weren't addressed - or maybe much later, when reported by more clients?

But I sincerely hope you succeed and catch up, as more software diversity is always better.

1 more reply

j / k navigate · click thread line to collapse