Dbt to acquire Transform to build out its semantic layer (opens in new tab)

(getdbt.com)

47 pointspinkbeanz3y ago23 comments

23 comments

Airbnb wrote a few great blog posts about why they built a standardized metrics layer - I think this one gives better technical context on the "what" and "why" than this announcement does:

https://medium.com/airbnb-engineering/how-airbnb-achieved-me...

(As the article notes, Transform's founders are Airbnb vets who worked on Minerva)

wantsanagent3y ago

The interesting bit for me about Minerva (predecessor to Transform) was the engineering rigor that the pipeline allowed. It's pretty easy to write a query that brings a DB to its knees and the source controlled and reviewed intermediary steps for producing metrics seemed like crucial features. If you then have a good catalog of what metrics exist you can let BI tools or business types with their own tools mix and match performant metrics to their hearts delight.

pinkbeanzOP3y ago

After 2 years of slow progress, dbt acquires Transform and their open-source MetricFlow to help build out the semantic layer.

CharlesW3y ago

I'm curious what this enables can't be done with a thoughtfully-designed data warehouse. Looking at the "With MetricFlow" and "Without MetricFlow" examples, it seems like it's designed for people who are willing to use a new query language as long as it's not SQL?

d_watt3y ago

On the one hand, the compile targets for MetricFlow and DBT are SQL. So strictly speaking, nothing can be done with these that can't be done with a data warehouse.

SQL doesn't really have a way to define new aggregations well. So taking something like Daily Active Users / Monthly Active Users (DAU/MAU), which is a semi complicated calculation using windowed data, there's no good way to have a repeatable DAU/MAU aggregation concept you could reuse at different levels without having to re-write that sql that defines the aggregation. EG DAU/MAU against the account, or DAU/MAU by user type, or DAU/MAU by region would all be different queries/reports.

Instead if you have a layer that can be given the definitions of how a particular aggregation is structured, and it knows how data is related to each other, you can ask for that metric against different concepts or levels without having to repeat yourself.

Extending that out, if BI tools can understand that layer, then you can get consistent WYSIWYG report building across multiple tools without having to redefine the calculations everywhere.

I don't think I did a great job explaining here, this keynote is a good topic survey IMO: https://www.youtube.com/watch?v=w2weqVjbUTs&feature=emb_imp_...

CharlesW3y ago

Your answer was helpful, and I appreciate the video link as well. Thank you!

rwhaling3y ago

In practice - complicated time series metrics, especially on top of derived temporal-logic attributes like funnels, activation, etc., are phenomenally difficult to write by hand in SQL for most analysts. We are in the process of switching to dbt metrics and it cuts the effort down for this kind of thing 5x-10x, and the SQL code dbt generates runs signficantly faster too.

qeternity3y ago

How much does the dbt enterprise cost?

1 more reply

agent2813y ago

Looking at the DBT semantic layer documentation I think I get it. The goal seems to be describing a metric that could be aggregated at multiple levels once instead of defining all of the aggregations separately.

For example, if you want to be able to aggregate revenue by subscription plan and by country over year, month or day, then you would have 6+ different aggregations. The more dimensions and time periods you add, the more the various combinations blow out. (I think the formula is dimension permutations multiplied by the number of time periods.) Having a single definition of the metrics prevents errors from occurring in rote implementations of the metrics. A bit like generics, but for queries.

At this point, I'm curious how they manage to make the integrations with different platforms work. (Ultimately, I think that's why they acquired Transform.)

johnito3y ago

An alternative to Transform that's done an awesome job to integrate with BI tools is Cube.js Check their website https://cube.dev

pinkbeanzOP3y ago

I think that’s still the outstanding question. Tools like Mode and Hex have already started supporting dbt’s metrics layer, but will big players like Looker and Tableau also move to support it? That’s harder to say

1 more reply

Wonnk133y ago

Seems like a cool place to work, but I'm always a little weary of venture backed open source companies. When the board wants a liquidity event they'll win over users every time- put more features behind a paywall or force everyone onto a SaaS platform.

Anyhoo, all pure speculation. While I've never used it, my friends have great things to say about DBT.

di4563y ago

How does this compare to multidimensional OLAP cubes? Do they solve similar problems of different time windowed views of the data?

pinkbeanzOP3y ago

That’s right, they’re solving the same problem, although metric/semantic layers have traditionally lived within BI tools. Looker’s LookML is a great example of a metric layer.

There you define the relationships between tables, which dimensions to expose to your users, and how measures are to be aggregated along which dimensions and at what roll ups. Filters, formatting, and other semantic information rounds out what’s possible, making it easier to build complex reporting without having to pre-build large joined tables in the warehouse.

In theory a semantic layer decoupled from BI can be useful too, as downstream systems can all query a metric. Maybe an operational workflow needs access to revenue numbers or a spreadsheet wants a feed of some fresh aggregates.

The real question is will BI tools adopt a semantic layer they don’t own, giving up control over a key part of their input into their visualizations? Time will tell.

di4563y ago

Oracle BI has this separation in the RDP layer. So there is some history of BI tools separating out that layer but the separation doesn't have to result in giving up control. It's more of an organizational decision. Although it's probably not advisable to have wide open access to the semantic layer to the same users creating dashboards, as things can quickly get overcomplicated and hard to maintain.

tootie3y ago

Can someone ELI5 why to use DBT instead of CREATE VIEW?

d_watt3y ago

Because when you have a DAG of 100s of CREATEs that need to be run in dependency, many of which are table materializations and not views, you will end up creating a poorly specified, half baked version of DBT.

camgunz3y ago

Here's why I like it:

- Dependency tracking

- Not just views, but tables, and incremental tables

- Testing framework

I've worked on a few data teams that have all been "copy/paste SQL from GitHub into the Query Editor" and it's (obviously) pretty bad. DBT is super low-lift and professionalizes your data pipeline basically instantly.

kpierce3y ago

My top three:

- Dev/stag/prod env check numbers before pushing to production.

- Unions between two sources that are not the same shape can be done without the headache. https://github.com/dbt-labs/dbt-utils#union_relations-source

- Macros for common case when statements.

rm9993y ago

Create view is a great V0.5 for a data warehouse and what I recommend people do if possible so they can concentrate on building the right schema, naming standards, etc.

dbt is the V1. You get a lot of tooling, including a proper dag, logging, parametrization. You also get the ability to easily materialize your tables in a convenient format, which is important if (probably when) you figure out consistency is important. Views can take you far, but most orgs will eventually need more, and dbt is designed to be exactly that.

As a side note, moving from views to dbt is actually quite easy. I've done it several times and it's usually taken a couple of developer days to get started and maybe a couple weeks to fully transition.

j / k navigate · click thread line to collapse

23 comments

rwhaling3y ago

Airbnb wrote a few great blog posts about why they built a standardized metrics layer - I think this one gives better technical context on the "what" and "why" than this announcement does:

https://medium.com/airbnb-engineering/how-airbnb-achieved-me...

(As the article notes, Transform's founders are Airbnb vets who worked on Minerva)

wantsanagent3y ago

pinkbeanzOP3y ago

After 2 years of slow progress, dbt acquires Transform and their open-source MetricFlow to help build out the semantic layer.

CharlesW3y ago

d_watt3y ago

On the one hand, the compile targets for MetricFlow and DBT are SQL. So strictly speaking, nothing can be done with these that can't be done with a data warehouse.

Extending that out, if BI tools can understand that layer, then you can get consistent WYSIWYG report building across multiple tools without having to redefine the calculations everywhere.

I don't think I did a great job explaining here, this keynote is a good topic survey IMO: https://www.youtube.com/watch?v=w2weqVjbUTs&feature=emb_imp_...

CharlesW3y ago

Your answer was helpful, and I appreciate the video link as well. Thank you!

rwhaling3y ago

qeternity3y ago

How much does the dbt enterprise cost?

1 more reply

agent2813y ago

At this point, I'm curious how they manage to make the integrations with different platforms work. (Ultimately, I think that's why they acquired Transform.)

johnito3y ago

An alternative to Transform that's done an awesome job to integrate with BI tools is Cube.js Check their website https://cube.dev

pinkbeanzOP3y ago

1 more reply

Wonnk133y ago

Anyhoo, all pure speculation. While I've never used it, my friends have great things to say about DBT.

di4563y ago

How does this compare to multidimensional OLAP cubes? Do they solve similar problems of different time windowed views of the data?

pinkbeanzOP3y ago

That’s right, they’re solving the same problem, although metric/semantic layers have traditionally lived within BI tools. Looker’s LookML is a great example of a metric layer.

The real question is will BI tools adopt a semantic layer they don’t own, giving up control over a key part of their input into their visualizations? Time will tell.

di4563y ago

tootie3y ago

Can someone ELI5 why to use DBT instead of CREATE VIEW?

d_watt3y ago

camgunz3y ago

Here's why I like it:

- Dependency tracking

- Not just views, but tables, and incremental tables

- Testing framework

kpierce3y ago

My top three:

- Dev/stag/prod env check numbers before pushing to production.

- Unions between two sources that are not the same shape can be done without the headache. https://github.com/dbt-labs/dbt-utils#union_relations-source

- Macros for common case when statements.

rm9993y ago

Create view is a great V0.5 for a data warehouse and what I recommend people do if possible so they can concentrate on building the right schema, naming standards, etc.

j / k navigate · click thread line to collapse