No engineer wakes up in the morning excited to sync data to Marketo, so we started there - `npm install` and so you can get back to building the core product. We make data self-serve for your non-technical colleagues and we handle all the exhausting integration stuff you don’t want to think about (API nuances, rate limiting, retrying, batching, etc).
Question: Could we use Grouparoo to replace mixpanel? Would we need to build the client side to collect events and dump that into Grouparoo?
Stitch looks to be doing ELT in the fivetran-ish space. Their "sources" are lots of tools and their "destinations" are warehouses. Grouparoo can have sources like Mailchimp (did they read the mail), but Mailchimp is more likely to be a "destination" for us.
This is because we are doing more like ETL to those tools. In our current case, the "T" is property and segmentation definitions, often done by end users like marketers. So that notes that our users also include non-engineers. There's less burden on engineers after setting up Grouparoo because the person in charge of Mailchimp are doing those definitions.
There are many tools that do one-way integration from source to destination. Very few do real two-way sync. Is Grouparoo designed for that?
Grouparoo is set up with sources and destinations. The most common sources are likely to be databases/warehouses and the most common destinations are tools like Marketo/Zendesk/etc. That being said, Zendesk will likely be a source one day to pull user info into Grouparoo (number of tickets created, lastEmailedAt, etc). Databases can already be destinations (write back out the member of the VIP group to Postgres, in addition to sending it to Marketo and Zendesk).
I'm not sure all of that is two-way sync or not, but it's certainly round-tripping the data. If you really want a full duplication in your data warehouse of a SaaS tool, I'd look into Stitch or Fivetran at this point. Then, Grouparoo will happily read that :-)
The philosophy I've heard (maybe from Hashicorp?) is that the core should solve the data problem and the enterprise edition should solve organizational problems. So the source and destination and the syncing data and all that will stay in the core. At some point, we can do single-sign on, change, change management, GDPR support, compliance, etc in the enterprise version.
In the past, you needed a large, focused SaaS vendor to be able to store a million users and their properties/events. AWS and friends have caught up and now it's easy. Because of that, we can take that data into your own environment and use it to increase control, customization, privacy, and compliance. The cost is significantly better, too.
Open source is a good way to do that because you know what you are running and can fit it to your needs. Engineers tend to like open source and we've seen interest in extending it. There are thousands of things to connect with, both inside your infra and outside with vendors, and open source makes that possible.
GDPR is about giving users control and visibility over their personal data and controlling personally identifiable information.
Grouparoo is an open source app that runs in your own infrastructure (AWS, etc) and it does segmentation of users. The effect of that is that less info leaves your world and goes to third parties. For example, you used to send an address and lifetime value to Braze so that you could make a “high value Bay Area customers” group over there. Now you keep that in house.
On top of that, whatever information is leaving now has a chokepoint, so you can stop sending a user to Braze (and everywhere else) if that is the requirement or return all the information about them easily if that is the ask.
Unfortunately I can't use it yet as there's no Postgres SSL support - filed a GitHub issue here: https://github.com/grouparoo/grouparoo/issues/734
Let us know on the issue if it's working for you.
Does it have an API to access the user profile and group data ad-hoc?
Can you stream data in?
Can it trigger a destination sync when the underlying data changes?
Can it do profile merging (visitor -> known customer stitching)?
How can you do reporting/analytics?
> Does it have an API to access the user profile and group data ad-hoc?
We’ve seen a few approaches here. 1) Yes, there are APIs 2) The Postgres database this runs on is in your data center, so you can read it directly 3) You can write back to your own product database as a “destination”
> Can you stream data in?
We support events via an API. We’ll store the vents and allow your to create profile properties from them. I’m very interested in creating a Kafka or other message bus sort of integration too that brings in data and/or triggers recalculations. No one has needed that yet, though, so it’s just on our eventual list.
> Can it trigger a destination sync when the underlying data changes?
We have schedules, table queries, and events to know profile data has changed. When it changes, it then recalculates groups. Then properties or groups change that are being sent to a destination, it automatically exports there. “Hey Mailchimp, the user changed their first name and should now be tagged as VIP.”
> Can it do profile merging (visitor -> known customer stitching)?
We have the concept of anonymous id before login. When we realize two profiles are the same (usually after logging in from another device or something), the profiles are merged and everything recalculated.
> How can you do reporting/analytics?
This hasn’t been a focus so far outside of our ETL mechanics. You can see who has been imported and exported and with what and when and all of that.
Things get more interesting around properties and their values, but we haven’t gotten there yet. We’ve seen some success at pointing tools like Metabase at the Grouparoo database.
Overall, there's an interesting organizational dynamic that we've seen around data enablement. Marketing and other operational teams need it and it's often locked in the product space. It's usually not a priority for the eng team because they are focused on the core product, but the data is there. The important stuff (ETL copy of the product db) is usually not a huge mess.
We are inspired by warehouse tools like Looker that made that accessible to more people, giving them autonomy to be successful. Grouparoo takes that one step further to add on top of the data and make it actionable in all the other places that people want it.
As a Rudder and Fivetran user, I can see a very complementary use case for Grouparoo. Where the first two are responsible for unifying events and external data in the DW and Grouparoo to sync user data to other tools.
Two other tools that I saw in this space (not Open Source): https://www.calixa.io/ https://windsor.io/
You are moving up the stack, and providing value at that layer. As a general rule that formula works in most industries.
I'm a big fan of looker, and I hope to see you guys grow!
We've built an e2e marketing automation platform on top of your data warehouse. Marketers can interactively explore their customer base, run targeted campaigns in downstream email/ad/etc tools, and analyze results leveraging all the data they have in their warehouse.
RE: "messy data" -- Totally agree with bleonard's point that overall, the trend is towards data enablement. That said, I don't think any of the solutions in the markets (even Looker) suffice. I've attended dozens of calls with Looker users who first say that Looker offers self-service exploration but then fail to retrieve fairly basic information via a Zoom screen-share. The truth is it's really hard to do self-service data exploration generically. I think what's lacking from the "BI market" are more verticalized solutions on top of your warehouse (think UIs like Amplitude's funnel analysis, Intercom or Kustomer's segmentation interface, etc.).
To make our product work, we've built UIs that are super focused on particular tasks as well as a pretty nifty graph-based "modeling layer" that sits above your warehouse (which ideally, you use DBT/Dataform on) to abstract over complex JOINs and such.
This whole space is super fascinating to me. Always happy to exchange notes and talk shop RE: marketing, warehouses, customer data, etc. If you have thoughts, hit me up at tejas [at] hightouch.io.
1) generating a simple report (like how many people came to your website and then opened your emails in last 1 yr) used to take for ever. Storing data was costly.
2) You could not generate complicated reports, say combining marketing + product data (e.g. number of people who came through campaign X and did used the product)
3) You were stuck with their not-so-great UI.
4) Analytics is one use case. Segmentation is another. If you want to create a segment of users (e.g. customers using the free tier of your product and have become active in last 7 days) and sync that segment to multiple destinations like email, salesforce etc, there is not a great way to do that from all the marketing systems.
On the other hand, if you can get all the data in your warehouse, you can use the best of breed tools for storage (Snowflake etc), Visualization (Looker/ChartIO/Tableau) and so on.
But you would need a product like Segment, RudderStack (and now Grouparoo) to get the data into the warehouse and sync it from warehouse back to different destinations.
We support all the major data warehouses (incl. straight Postgres), connect to a bunch of different applications, and don't store any of your customer data!
Bonus: we recently added native support for dbt :-)
Is it fair to say this is more like the Segment personas's product? We see a bunch of use cases for personas (which we don't have in RudderStack) so can point to you guys.
Congrats again on the launch.
The gap we saw was around understanding the user and segmenting in a way that could be shared across multiple services and the product. And doing so in a way where you controlled the data and the total cost was managed.
It's nice to see there are others open in this space. The trends are certainly in the open direction to provide a lot of value and control in a way that's good for everyone.
If you do have that, we have support for google sheets[1]. You share a sheet with a service account and that allows it to be a source in Grouparoo. From there, you can make groups and send to destinations.
We'd be curious about your use cases. Feel free to make an issue[2] with what you are hoping for and we can discuss there.
[1] https://www.grouparoo.com/blog/google-sheets-source
[2] https://github.com/grouparoo/grouparoo/issues/new?assignees=...