undefined | Better HN

0 pointswaterlx1y ago0 comments

Hi, in case you did not find the answer yet. In my hamble opinion: - choose Iceberg: If you have several computing/query engines other than Spark, like Presto, Flink. Iceberg has a great extraction and design for a engine-independent table format. But its learning cost is relative high - choose Delta: If you only have Spark and would like to be deeply binded with Databricks - choose Hudi: If you would like to use data lake out-of-the-box and it is quite easy to use. - If your data is updated frequently, like streaming, check https://paimon.apache.org/ if you would like to be deeply binded with Flink

0 comments

indoordin0saur1y ago

Thank you! Sounds like iceberg is the best then. I'm very allergic to lock-in. Currently we're very Spark heavy and our query engine is AWS Redshift Serverless. The recent AWS Glue Catalog support for Iceberg seems to make this promising.

ruipds1y ago

I heard from a AWS worker that they consider Iceberg to be the future. A lot of their services will be glued together with it.

j / k navigate · click thread line to collapse

0 comments

indoordin0saur1y ago

ruipds1y ago

I heard from a AWS worker that they consider Iceberg to be the future. A lot of their services will be glued together with it.

j / k navigate · click thread line to collapse