The context is that Airbyte is now (after pivoting 3x during YC https://airbyte.com/blog/how-we-pivoted-3-times-in-the-1st-m...) the largest/fastest growing open source community (see our github https://github.com/airbytehq/airbyte) of data pipeline connectors[0], so in a sense they have always been free if you are self hosting. But now using them on Airbyte Cloud is going to be free as well aka "we will do your ELT for free no matter the volume as long as our connectors are not GA yet".
This is a massive commitment to improve the quality of our connectors, which is also something we have been pushing the industry on: https://airbyte.com/blog/connector-release-stages :
Alpha: new, basic docs, works, passes acceptance tests
Beta: Alpha + at least 25 active users + >90% sync success rate + snapshot tests + all streams + severe issues handled + security + supports checkpointing + SLA on cloud
GA: Beta + >99% sync success rate + more than 50 active users + <24 hours downtime + polished docs + performant
It's been going very well; you can see how many connectors we promote to GA each month in our slack (https://slack.airbyte.io/) and changelogs, and our new lowcode CDK (https://www.youtube.com/watch?v=i7VSL2bDvmw) is helping new connectors insta-promote to beta.
We hope to set the new standard in data integration and this is still only day 1.
[0]: good explainer on why companies are moving towards ELT in the first place for the uninitiated https://airbyte.com/blog/elt-pipeline
Maybe I am thinking about it wrong and this is more aimed at people who were previously paying for something and now get the same thing for free.
and failures happen, anyone who promises you otherwise is lying. you'd have them yourself if you DIY. what helps is having good monitoring, a large open source community with strong first party support, and a good development/testing framework so most breakages can be fixed the same day they happen.
For me right now, Airbyte is that tool I wish we had at my last startup.
We we're pulling data from a lot of weird places (servers in the back of mom and pop vet clinics). This meant writing a lot of one off scripts to populate our databases. We learned the hard way about scheduling, retries, resource monitoring, error reporting etc..
Would've loved to have someone else take care of all that for us.
Anyway love letter over.
I'm currently wondering if I can use this to power some of my web scraping scripts....
Hoping that more users will bring more maturity and better solutions to those edge cases.
Open source makes it the solution future-proof, in the sense that it will address your future long-tail or custom needs, while a closed-source solution won't which will require you to build/maintain connectors in-house again.
As with anything there are tradeoffs though - gaining the ability to have a connector in a non-Python language comes with the overhead of running (likely) Docker-in-docker. Also, connectors not built on our SDK[0] are missing out on some nice features like batch message[1] support (for bulk loading) and stream maps[2] for inline data transformations.
[0] https://sdk.meltano.com/ [1] https://sdk.meltano.com/en/latest/batch.html [2] https://sdk.meltano.com/en/latest/stream_maps.html