GitLab is working on a tool just for data teams (opens in new tab)

(about.gitlab.com)

233 pointsTheMissingPiece7y ago94 comments

94 comments

This looks like an amalgamation of 8+ open source projects or industries with products put forth by companies that have dozens of employees and worked on their products for years.

It also doesn't even categorize the products they compete with correctly[0].

Why not contribute some of your resources to one of the many active open source libraries already trying to solve some of these problems, and focus your engineering efforts on your core product?

[0] Fivetran is only considered "Orchestrate" but is actually competes directly with Alooma in the Extract and Load. Also, there are DOZENS of company in that space. https://gitlab.com/meltano/meltano/blob/master/README.md#dat...

sytse7y ago

What we're doing different is making one product that does the whole lifecycle instead of having to string tools together. It took us many months to string our toolset together and we felt there had to be a better way. Just like GitLab we try to leverage existing open source projects wherever possible.

I agree Fivetran also belongs in extract and load and updated it https://gitlab.com/meltano/meltano/commit/1df9813f5ab42c4479... Do you think it should be removed from Orchestrate? Any other suggestions for proprietary products in that category?

slap_shot7y ago

As someone who works very, very closely in this industry, I would just be very careful how much of this you think you want to bite off.

Consider how you trust using dbt more than rolling your own transformation tool. Why wouldn't this apply to the rest of your stack? The 10+ companies that offer data extraction and loading are likely a better choice. Again with Analytics - the dozens of companies that offer BI tools are probably going to be the better choice.

Maybe you can build all these tools better than the hundreds of companies with thousands of employees and millions of dollars. It just seems like the odds that you build the best of each is so unlikely.

I would have been more impressed if your team had designed some API that other tools/platforms could plug in to coordinate a lot of the above jobs with your CI system. There is a SERIOUS need for that and I've had a lot of conversations with companies about what that would look like.

To answer your quest, no, Fivetran does not currently belong in the orchestration area, IMO. I've heard they are soon to release some sort of orchestration tooling to compete with dbt, but it isn't the type of orchestration you get with Airflow.

joshlambert7y ago

slap_shot one of our major goals is to provide a solution that startups and small companies can utilize to start putting their data to work.

It shouldn't take weeks of effort, a data engineer, multiple proprietary solutions, and tens of thousands of dollars to answer key questions like CAC or the efficiency of a given marketing campaign.

We're hoping to lower the barrier to entry in both cost and effort, by providing an open source pre-packaged solution.

1 more reply

sciurus7y ago

> designed some API that other tools/platforms could plug in to coordinate a lot of the above jobs with your CI system

That's GitHub's strategy. Don't choose solutions for their customers. Be a platform other tools can plug into.

Gitlab's strategy is to cobble together a bunch of open source software (including their own) to provide a solution out of the box. It's not necessarily the best one for you, but it's certainly less effort for you.

veritas32417y ago

On the analytics side, we're using GitLab CI as our orchestration tool. We're pushing it to its limits and trying to find ways to make it better for us (i.e. data teams) and for GitLab more generally.

I'd love to learn more about what you'd like to see CI be able to do from a dataops perspective.

1 more reply

numbsafari7y ago

Yo. Just keep doing what you're doing. I dig it.

I'm not 100% with all the tools you are using, but stringing together random SaaS tools and having to survey a random number of open source tools in order to assemble a sensible platform makes way less sense.

At the very least, what we end up with is a group of folks working together in the open to surface some of the limitations and challenges and attempt to work out some of the alternative solutions to the problems that arise in this space.

So, I applaud your effort. Ignore the salesmen and the haters.

veritas32417y ago

Thanks for the positive comment! We're generally taking the same approach that was taken with GitLab the product: do it out in the open, iterate constantly, and work with the community. Especially doing it out in the open enables these sorts of _awesome_ conversations! And we definitely want feedback - this needs to work for more than just us!

jakecodes7y ago

Our goal is to meet our data team's need by answering our company's data questions.

A lot of the solutions out there are fantastic but aren't up to the tasks we are looking for. Why shouldn't the whole life cycle be in one tool, be open source, and be version controllable? That's what we are looking for in a tool.

dantiberian7y ago

There's no inherent reason that the whole life cycle can't be handled in a single tool. However, there have been tens of thousands of person-years spent on these tools, so people here are pointing out that it is a tall ask for any company to create one tool that integrates everything. This goes doubly so if it is only going to be a side project to GitLab itself.

jakecodes7y ago

Thanks. It's currently 4 devs full time jobs. I agree, a tall order. It's a lot of work. We won't reproduce every feature of every product tomorrow.

1 more reply

fipple7y ago

You could say the same about Github/Gitlab themselves... that they mash together git and JIRA and .plan etc.

Ajedi327y ago

_Especially_ GitLab. Basically their entire product seems to be about building a whole bunch of separate tools and integrating them seamlessly into each other. GitLab has a built-in CI system, a deployment pipeline with Kubernetes integration, a built-in Docker container registry, performance monitoring tools for deployed applications, automated static analysis tools, etc. Describing it as "an amalgamation of 8+ open source projects or industries" seems pretty accurate.

That's by no means a bad thing though. While yes, there are downsides to tightly coupled tools, there are also advantages. If GitLab is trying to do the same thing for data analytics that they've already done for source control, they may very well succeed.

sytse7y ago

Thanks. That is exactly the plan.

cheghook7y ago

I can't understand why GitLab thinks they have to embark on a new project every so often instead of focusing on their current product and features. There is just a lot to work on, so many of the current features/products are half assed. At my place we moved to GitLab 2.5 years ago and updates where smoother back then but the past few months we had to hire a new sys admin for our build machines and GitLab server to follow on new issues created on GitLab.com and decide if it's safe release and even then he still reports 4-5 issues to GitLab support after every update. We were expecting it to be an easy `yum update` like a normal package but it's just getting worse update after update. It's so bad that my manager asked me to look into GitHub + another CI/CD solution.

hunter237y ago

Agreed - our company moved to Gitlab about the same time (2.5 years ago) and it's very clear from their updates that their focus has splintered in different directions. Our company has recently moved to more Microsoft products so I am pushing our CTO to move to Github.

If the CEO is following this, please improve basic user stories like:

* As a user, I want to easily know who has approved my merge request. Note the word "easily". The UI lists the people who did not approve next to label "Approved" and the people who did approve next to the label "Approved by". Makes absolutely no sense

* As a user, I want to see all the merge requests that I need to review because I am listed as an approved (it boggles my mind that this doesn't exist)

* As a user, I want to be notified by todos that only have any pending actions on them

* As a user, I want to disapprove a merge request

There are so many basic areas of the core product that are almost unusable. All of our engineers who have to regularly switch between github and gitlab prefer the github ui.

merb7y ago

Loading a Merge Request with 168 changes, basically breaks a 4cpu's 4gb instance on gitlab, so yes, "the most basic areas of the core product is almost unusable".

And while some integration is good... A lot of recent stuff is just "we try to grab the easy money"

sytse7y ago

Yep, load times of large merge requests was a big problem. In 11.1 we launched a refactor of merge requests to solve this https://about.gitlab.com/2018/07/22/gitlab-11-1-released/#me...

That got the time down for the worst case we measure from 15 seconds to 3 seconds, see https://news.ycombinator.com/item?id=17671300

1 more reply

victorwu7y ago

--- As a user, I want to be notified by todos that only have any pending actions on them ---

Can you explain further what you mean by "pending actions on them"? We are working to simplify and streamline our notifications and todos in GitLab. In particular, the current thinking is that they are very similar. A "notification" is an email, and a "todo" is a something that GitLab calls your attention to in the Web UI to take action on. So mechanically, they are very similar and we would like to harmonize them.

Our latest discussion is in https://gitlab.com/gitlab-org/gitlab-ce/issues/48787.

hunter237y ago

If I have a merge request where I am added as an approver and that code is merged in, then I don't want it to show up on my todos. A merge request has a specified amount of approvals required and when that threshold is hit and the code is merged, there is no work to be done. It is no longer a todo but a done. Unfortunately todos doesn't work this way and become useless for senior members who get listed as approvals for many merge requests.

In terms of combining them with notifications, I agree. I just need a web place to see all the "pending" action items I need to work on. A web notification feature should be the place to see all the pending notifications left for me (similar to how it would work on email except you can't expire emails).

jramsay7y ago

Thanks for taking the time to share your feedback. I'm the Product Manager responsible for merge requests. Approvals are such an important part of merge requests, and we are working to make them better.

We've improved a number of confusing approval widget states in GitLab 11.2 (https://gitlab.com/gitlab-org/gitlab-ee/issues/5439) which will ship later this month in and the ability to filter merge requests by approver is in development by a community member (https://gitlab.com/gitlab-org/gitlab-ee/issues/1951).

This is just the beginning though – code reviews and approvals are at the heart of the daily workflows of writing software and we'll be continuing to make them even better. I'm particularly excited about more structured code reviews with batch comments in 11.3 (https://gitlab.com/gitlab-org/gitlab-ee/issues/1984), better navigation between files in merge request diffs with a file tree in 11.4 (https://gitlab.com/gitlab-org/gitlab-ce/issues/14249), and our first iteration of code owners (https://gitlab.com/gitlab-org/gitlab-ee/issues/5382) also in 11.4.

Thanks for the disapprove merge request idea. We're considering this idea in https://gitlab.com/gitlab-org/gitlab-ee/issues/761 where further feedback would be much appreciated, or on any other issue.

hunter237y ago

Hi - thanks for responding!

I'd point out that if you look at issue 5439 the team itself was originally unaware of the high number of edge case states of the merge request and closed the issue prematurely. Having many code paths is a code smell so I'd suggest simplifying your UX and edge cases here.

Since you own the merge request flow, I would suggest looking at the page and all it's edge cases and seeing where you can simplify for the user. There is a dizzying large amount of info and CTAs presented to the user; it's pure information overload. Don't just measure yourself by how many features you ship but rather on how you communicate those features to your users. Simplicity is a powerful feature in itself.

Looking forward to batch comments.

The disapprove merge request is a feature available in Phabricator and other competitors so I would look to see how they've implemented it.

sytse7y ago

I'm sorry to heard your experience with GitLab hasn't been smooth. We have more people then ever working on the core of GitLab. And the number of reported issues per customer are going down. But every problem is one too many. Please email me at sytse@gitlab.com if you're open to a call about your situation.

cheghook7y ago

> And the number of reported issues per customer are going down.

This doesn't mean anything, maybe the customers are simply tired of reporting issues. For example last year we didn't do any updates for 6 months because we were afraid it'd break something and we were too busy to be willing to spend the time reporting problems.

We also don't report issues that are already open on gitlab.com, reporting the issue means your customer is willing to spend time reporting, following up and testing your bug. This is your job, not the customer's. At the moment we are only reporting issues that are either blocking us from work or slowing down our development. The majority of issues we are facing are performance problems.

I just wrote a script to plot the number of issues on gitlab-ce over time and percentage of open/close issues, and the overall period they have been open for, you are accumulating issues with: `backend`, `UX`, `technical debt`, `performance`, `CI/CD`, ... labels, a lot of them don't have a Milestone and have been open for a long time.

I am not sure how emailing you would help us, it's not like the problems are not reported or you don't already know about them. It just appears that the priority of GitLab, as a company, is not shipping a quality product anymore.

EDIT: I work in the aerospace industry and one of the stages of our pipelines is to run stress test on our product. I would suggest you to run a stress test on an instance of GitLab, this would be an amazing place to start looking for performance problems.

akerro7y ago

>maybe the customers are simply tired of reporting issues. For example last year we didn't do any updates for 6 months because we were afraid it'd break something and we were too busy to be willing to spend the time reporting problems.

We just stopped upgrading GitLab over 2 years ago, we're on 8.9

1 more reply

sytse7y ago

Can you maybe share the plot you made and/or the code you made it with?

As GitLab gets more popular I'm not surprised the number of issues grows.

We are measuring a lot of metrics on GitLab.com. And we are shipping a lot of performance improvements to improve those metrics. https://about.gitlab.com/handbook/engineering/performance/#p...

For example a really big MR had a time to first byte of 15 second. It now is 3 seconds. https://www.dropbox.com/s/ymo28t2v4i4jl4x/Screenshot%202018-...

wmccullough7y ago

I appreciate how dedicated GitLab is to continuously improving the product. Thinking about moving my projects from Bitbucket to GitLab for that reason.

sytse7y ago

Yay! Thanks for that.

corobo7y ago

> the number of reported issues per customer are going down

Couldn't that just be because you have more silent customers now? Probably from the people moved their projects from GitHub

jakecodes7y ago

It is truly a bummer that you feel you are receiving half assed updates and features with constant problems.

The size of the GitLab is constantly growing and Meltano is adding to GitLabs capabilities, not subtracting. We've hired 2 very awesome Python developers for Meltano specifically. They each have tons of experience in the ELT space.

All this to say, that no one at GitLab has turned their eyes away from GitLab, it's the opposite. This business is here to help GitLab as our first customer. Rather than having GitLab struggle to get it's data tools together, and make business decisions based on that data, we've devoted a whole team to provide a solution while helping the community at the same time.

georgewfraser7y ago

Data pipelines are not a great subject for an open-source project. We've been building these for the last 3+ years at Fivetran, and I can tell you that the challenge is:

  - Studying each source to figure out the right data model
  - Chasing down a million weird corner cases
  - Working around dumb bugs in the data sources

This is the kind of problem where paying for software really works better. When people build data pipelines in-house, they tend to hack at it until it works for their use case and then stop. When we build data pipelines, we map out every feature of the data source, implement the whole thing at once, and then put it through a beta period with multiple real users. This is easy to do when you have a tight-knit dev team; much harder for a group of part-time open-source contributors.

MarHoff7y ago

I think the point is to provide a set of tools for people that build data pipelines. Period. The software being open source don't reflect in any way WHO will use this tool. Depending on the success of this project, it might be that you could switch your team to this new tool at some point.

Personally I work as a "lone wolf" (to my own complains) because I'm in a small company that can't afford a huge team. Most of my (ETL) Transforms are done in SQL which happen to be pretty standardized as opposed to many ETL products I've seen so far.

This solution is probably far from being ready, but I find this approach quite interesting, because it look like a code based ETL that use SQL for transform (so I might be biased). Overall this might result in a more maintainable/versionable data pipeline model than GUI-first ETL which usually generate spaghetti code. Because you are usually forced to regularly adapt data-pipeline to unstable external inputs, being able to easily diff ETL process would be a blessing.

veritas32417y ago

The scope of Meltano isn't limited to just data pipelines, though that is the first major part of it.

One thing that gets me really excited about it is the way we want to build version control in from the start. To give you an example of where that's really powerful - we have a bunch of dashboards in Looker. Right now, figuring out what Looks/Dashboards rely on a given field is very challenging. If I change a column in my extraction, right now I can fairly easily propagate it to my final transformed table (thanks to dbt!) and even to the LookML. But knowing what in Looker is going to change / break if I change the LookML is way harder.

But if everything was defined in code from extraction, loading, transformation, modeling, _and_ visualization, that'd be really powerful from my perspective.

The Meltano team has several user personas that they're looking at focusing on, data engineers are definitely one of them, but data analyst/BI users are as well, and we want the product to work well for the whole data team.

js87y ago

Can't agree more.

IMHO, if you want to make a dent in the space, figure out better debugging tools!

In particular, tools that explain how a certain (specific) value was calculated in the system, tools that let you bisect the source data in some way and let you focus on the source data that are likely to have a problem, tools that help you figure out that certain intermediate value in calculations is an outlier, tools that let you test certain assumptions about data over the whole pipeline..

veritas32417y ago

You're talking about more debugging tools within the transformation steps of a pipeline, right? dbt is helping with that via data tests (see https://gitlab.com/meltano/analytics/tree/master/elt/dbt/tes...) as an example.

I'd love for a more robust way to test data pipelines and the data within them generally. I was at DataEngConf earlier this year and many people were talking about this problem exactly. One way we're trying to address it a bit is by using the Review Apps feature on Merge Requests within GitLab. Right now, when you open an MR on our repo it will create a clone of the data warehouse that's completely isolated from production. This, obviously, can't scale once the DW is beyond a certain size, but I think there are ways to keep this sort of practice going.

MechanicalTwerk7y ago

I kind of agree with this. To take an example outside of ETL/DW/BI, when I first saw Zapier I was skeptical of how many APIs they could support because I'd seen a decent amount of open source ESBs like Mulesoft run out of steam after a certain number of connectors. Zapier, being proprietary from day one (albeit less featureful than a full blown enterprise ESB) has done better than I expected. Still, they only support 100 or so datasources and the types of data/objects/triggers/whatever they support is limited at times. IMO at some point both open source and proprietary models fall apart in the face of the long tail. Amazon has tackled the long tail of ecommerce but that's an enormous market that allows them to employ hundreds of thousands of people to tackle that long tail. Tackling the long tail of connectors (whether it's for ESBs/SaaS integration or ETL/DW/BI) is just too expensive compared to the size of the markets that are willing to take a shot at it.

jakecodes7y ago

Thanks for the great advice! Obviously your years of experience, with trial and error, advice is greatly appreciated.

The idea is to give users a set of default extractors (which are the ones we use internally, so they are battle tested), along with loaders, transformers etc. With documentation on how to build their own. For our MVP, and possibly into the future, it will work similar to Wordpress plugins where you have an extractor directory that you place your extractor which is written following our protocol, and the UI will recognize it and give you choices of extractors to run, same for loaders, and so on.

We do not want to be chasing down every last corner case, for extractors (except for our own) because that's just not a good long term solution, needing constant maintenance (as we've seen already). With user contributions, I believe it can work.

tbrock7y ago

I wish they would focus on making a fast, stable, GitHub alternative.

parasubvert7y ago

This is Gitlab taking stuff they were doing already internally and making it available to a broader audience.

Once you take VC funding, you gotta go where the money is. Everyone wants/expects "fast, stable, like Github" for free unless you have special needs. So, you do analytics on what people are doing with your free site, you offer enterprisey features, you get into the "platform" business etc.

I think Gitlab distracts itself, spreads itself thin, and isn't great at partnering, its ambition to do-it-all knows no bounds, which is both commendable and a smh moment. It's not likely sustainable or scalable. They're definitely trying to "go big or go home" as a company, which is not how most originally felt about Gitlab (a fast, stable OSS alternative to Github).

At the same time, I can't blame them. I think it comes down to: Don't hate the player, hate the game.

sytse7y ago

We are building a fast, stable, GitHub alternative.

We have hired 3 times as many people in our security team for GitLab.com (not our product team for security) as are working on Meltano.

We have hired 3 times as many people in our SRE teams as are working on Meltano.

And we still have a lot of vacancies for both https://about.gitlab.com/jobs/

parasubvert7y ago

I meant no offense. But you’re also dabbling in building your own k8s distro / platform, you have your own CI/CD , Jira storyboard , and data science stuff, etc.

My point is that you’re aiming a lot broader than Github ever did - you are competing more as a suite than as a focused product.

And I’ve seen personally this impact the support side with customers, partnership side, etc. I help maintain a medium-large Gitlab for one of your bigger customers. Anyway this isn’t the place for me to get specific, I am just saying that you are taking a risky path in terms of sustainability IMO as a rando on the internet.

n427y ago

Is there any example of an open source software company that has taken on so many products at once, so early in its life, and succeeded?

sytse7y ago

We did https://about.gitlab.com/2017/10/11/from-dev-to-devops/ when we where at 50% of our current number of engineers. So far so good.

n427y ago

Not trying to be negative. I genuinely would like for GitLab to succeed. My experience (in a totally different industry and scenario, but with product building all the same) was that our decision to pare down to our core competency and focus was the best decision we ever made. We were attempting a full productivity suite, similar in concept but again a different industry. I’m interested in finding an example of a similarly modeled company to compare.

gregoriol7y ago

Really no, look at all the comments here (and this is only from techies): you have lost us, we don't know anymore what you are doing, or even trying to do.

sytse7y ago

We are trying to make a single application that covers the whole DevOps lifecycle, from planning your change up to monitoring its effect.

We're doing it because we believe there are emergent benefits to having the lifecycle in a single application https://about.gitlab.com/handbook/product/single-application...

1 more reply

veritas32417y ago

Taylor from GitLab here! Happy to answer any questions about what we're doing.

thebiglebrewski7y ago

Kudos to you for trying something new!

veritas32417y ago

Thanks so much!

_pmf_7y ago

GitLab's usage of team members in marketing material is creeping me out (as does the whole team page[0]).

[0] https://about.gitlab.com/team/

sytse7y ago

We say team members instead of employees because some are contractors. Why does it freak you out?

BTW We don't call it a family https://about.gitlab.com/handbook/leadership/#management-tea...

_pmf_7y ago

I wouldn't want to have that level of public affiliation with my employer (no matter who that employer might be).

ageofwant7y ago

https://quiltdata.com/ ticks a lot of boxes in this space for me.

veritas32417y ago

This could potentially become part of the Meltano stack. At GitLab, we're not at the phase yet where we're in need of data versioning. But I could imagine a data registry that's integrated with the workflow of data analysts/scientists to easily link versions of code and data.

Thanks for the link - we'll definitely keep an eye on it.

sytse7y ago

Is that project more for versioning data like https://docs.dotmesh.com/tutorials/subdots/ or http://www.pachyderm.io/ ?

danpalmer7y ago

Reading this I was concerned that it would be written in Ruby. While Ruby is a reasonable language for server development, it has almost no data science community when compared with some other ecosystems.

I was very glad to see this is Python! Python has some of the best data tools out there, and a mature ecosystem for solving all the engineering problems that go along with a great data stack.

ksec7y ago

I am on the opposite side, Given Gitlab is a Ruby house I was secretly hoping some innovation coming from Ruby Data Science.

tamersalama7y ago

Is there some resemblance with Floydhub http://floydhub.com/ ?

veritas32417y ago

Personally, I quite like the approach FloydHub has for deep learning projects. At GitLab, we currently don't have any deep learning projects happening - we're still further down the AI hierarchy of needs - i.e. focusing on solid data infrastructure and descriptive analytics.

I fully expect we'll have a use case for the "cool" machine learning stuff, but there's a lot of groundwork to cover with the basics first. Meltano is focusing on those basics for right now.

NegatioN7y ago

Does anyone have a comprehensive list of similar offerings to floydhub? or OSS alternatives?

I think this market is not being served properly, most of them seem to still require most of the heavy lifting to be done by the ML practitioner.

I suppose I would even be okay with a service that just saves all my graphs from tensorboard for later reviewing.

houqp7y ago

I am interested in knowing more about how you think FloydHub can better serve the market. FloydHub does have metrics support for later reviewing: https://docs.floydhub.com/guides/jobs/metrics. Are you only interested in using tensorboard for graph viewing?

Luuseens7y ago

The page talks mentions MVC, and the issue page[0] keeps mentioning MVC as well. Was this supposed to be MVP, or something else? Model-view-controller doesn't make sense in the context.

[0] https://gitlab.com/meltano/meltano/issues/10

jakecodes7y ago

We use the term mvc here, as "minimal valuable change", in a recognition that it may not be a product yet.

ajbosco7y ago

Do you see this as a (future) competitor of Airflow/Luigi type workflow tools?

sytse7y ago

Yes, the orchestrate part (working on GitLab CI) is an alternative for Airflow. Also see https://gitlab.com/meltano/meltano/blob/master/README.md#dat...

hn_throwaway_997y ago

Be interested to know all the competitors in this space. https://data.world/ is one I am most familiar with.

slap_shot7y ago

This projects competes with too many industries to really give a succinct answer, but here's just Extraction/Loading and Analyze:

Extraction/Loading Dell Boomi SAP SAS Pentaho Domo Oracle IBM Microsoft Informatica Talend JitterBit SnapLogic Mulesoft SyncSort Information Builders Actian Attunity Datameer Alteryx Striim Treasure Data Cask StreamSets Snowplow DataTorrent Astronomer Panoply Apache Nifi Stitch Data FlyData Bedrock Data Alooma ETLeap Fivetran Xplenty MethodMill Celigo TerraSky DBSync Youredi Scribe Civis Analytics DataScience Dataloader.io datorama Astera

Analyze Microsostrategy GoodData Sisense Looker Power BI Wagon Birst Tableau Qlik Domo Hue Mode Chartio Periscope Pentaho

The amount of hype and BS in the Notebook space would require me to spend some time combing through that again.

chasewright7y ago

slap_shot, I agree and as I disclaimer I also work at GitLab. There is no shortage of data tools in the space today. A majority of my career has been spent in the data & analytics space and I've talked / worked with at least 60% of the companies you mentioned. At the end of the day, these are the questions I've asked over and over again.

1. Do we have enough money / budget for a tool like this? 2. Can we derive enough insights from this product fast enough to make a good ROI? 3. Does this tool use a proprietary language that no one wants to learn or can I code in a language that is relevant? 4. In all honesty, can I get insights faster in a spreadsheet than these tools? 5. What is the learning curve? 6. Can I answer the business question that was originally asked?

Open to more discussions around the topic as it is a lot harder to answer than a few philosophical questions, but it certainly resonates with many data & analytics professionals. A nice goal would be to have project where you can stand up a business, turn your data pipelines on, ingest the data, and view the insights needed to make a business decision all within a short timeframe of when a business goes live.

sytse7y ago

Some of the alternatives are listed on in this table in the readme: https://gitlab.com/meltano/meltano/blob/master/README.md#dat...

jakecodes7y ago

One major difference will be the complete data life cycle vs providing just one part of it. Just like we do in GitLab except for data teams instead of software development teams.

gandutraveler7y ago

Looks like gitlab just wants to be in news since Microsoft's aquisition of GitHub.

sbr4647y ago

Are you releasing/sharing any of the extractors you built for various services?

jakecodes7y ago

All of our extractors are available in our source code, which is open source. http://gitlab.com/meltano/meltano/. Right now we are working towards an MVP, so things might be in flux, but we value any feedback you have.

sbr4647y ago

Thanks. I had looked but only saw one for fastly, am I missing others somewhere?

sbr4647y ago

Apologies, I found them in another repo - GitLab Analytics. Thanks

https://gitlab.com/meltano/analytics/tree/master/elt

j / k navigate · click thread line to collapse

94 comments

slap_shot7y ago

This looks like an amalgamation of 8+ open source projects or industries with products put forth by companies that have dozens of employees and worked on their products for years.

It also doesn't even categorize the products they compete with correctly[0].

Why not contribute some of your resources to one of the many active open source libraries already trying to solve some of these problems, and focus your engineering efforts on your core product?

sytse7y ago

slap_shot7y ago

As someone who works very, very closely in this industry, I would just be very careful how much of this you think you want to bite off.

joshlambert7y ago

slap_shot one of our major goals is to provide a solution that startups and small companies can utilize to start putting their data to work.

It shouldn't take weeks of effort, a data engineer, multiple proprietary solutions, and tens of thousands of dollars to answer key questions like CAC or the efficiency of a given marketing campaign.

We're hoping to lower the barrier to entry in both cost and effort, by providing an open source pre-packaged solution.

1 more reply

sciurus7y ago

> designed some API that other tools/platforms could plug in to coordinate a lot of the above jobs with your CI system

That's GitHub's strategy. Don't choose solutions for their customers. Be a platform other tools can plug into.

veritas32417y ago

I'd love to learn more about what you'd like to see CI be able to do from a dataops perspective.

1 more reply

numbsafari7y ago

Yo. Just keep doing what you're doing. I dig it.

So, I applaud your effort. Ignore the salesmen and the haters.

veritas32417y ago

jakecodes7y ago

Our goal is to meet our data team's need by answering our company's data questions.

dantiberian7y ago

jakecodes7y ago

Thanks. It's currently 4 devs full time jobs. I agree, a tall order. It's a lot of work. We won't reproduce every feature of every product tomorrow.

1 more reply

fipple7y ago

You could say the same about Github/Gitlab themselves... that they mash together git and JIRA and .plan etc.

Ajedi327y ago

sytse7y ago

Thanks. That is exactly the plan.

cheghook7y ago

hunter237y ago

If the CEO is following this, please improve basic user stories like:

* As a user, I want to see all the merge requests that I need to review because I am listed as an approved (it boggles my mind that this doesn't exist)

* As a user, I want to be notified by todos that only have any pending actions on them

* As a user, I want to disapprove a merge request

There are so many basic areas of the core product that are almost unusable. All of our engineers who have to regularly switch between github and gitlab prefer the github ui.

merb7y ago

Loading a Merge Request with 168 changes, basically breaks a 4cpu's 4gb instance on gitlab, so yes, "the most basic areas of the core product is almost unusable".

And while some integration is good... A lot of recent stuff is just "we try to grab the easy money"

sytse7y ago

Yep, load times of large merge requests was a big problem. In 11.1 we launched a refactor of merge requests to solve this https://about.gitlab.com/2018/07/22/gitlab-11-1-released/#me...

That got the time down for the worst case we measure from 15 seconds to 3 seconds, see https://news.ycombinator.com/item?id=17671300

1 more reply

victorwu7y ago

--- As a user, I want to be notified by todos that only have any pending actions on them ---

Our latest discussion is in https://gitlab.com/gitlab-org/gitlab-ce/issues/48787.

hunter237y ago

jramsay7y ago

hunter237y ago

Hi - thanks for responding!

Looking forward to batch comments.

The disapprove merge request is a feature available in Phabricator and other competitors so I would look to see how they've implemented it.

sytse7y ago

cheghook7y ago

> And the number of reported issues per customer are going down.

akerro7y ago

We just stopped upgrading GitLab over 2 years ago, we're on 8.9

1 more reply

sytse7y ago

Can you maybe share the plot you made and/or the code you made it with?

As GitLab gets more popular I'm not surprised the number of issues grows.

We are measuring a lot of metrics on GitLab.com. And we are shipping a lot of performance improvements to improve those metrics. https://about.gitlab.com/handbook/engineering/performance/#p...

For example a really big MR had a time to first byte of 15 second. It now is 3 seconds. https://www.dropbox.com/s/ymo28t2v4i4jl4x/Screenshot%202018-...

wmccullough7y ago

I appreciate how dedicated GitLab is to continuously improving the product. Thinking about moving my projects from Bitbucket to GitLab for that reason.

sytse7y ago

Yay! Thanks for that.

corobo7y ago

> the number of reported issues per customer are going down

Couldn't that just be because you have more silent customers now? Probably from the people moved their projects from GitHub

jakecodes7y ago

It is truly a bummer that you feel you are receiving half assed updates and features with constant problems.

georgewfraser7y ago

Data pipelines are not a great subject for an open-source project. We've been building these for the last 3+ years at Fivetran, and I can tell you that the challenge is:

  - Studying each source to figure out the right data model
  - Chasing down a million weird corner cases
  - Working around dumb bugs in the data sources

MarHoff7y ago

veritas32417y ago

The scope of Meltano isn't limited to just data pipelines, though that is the first major part of it.

But if everything was defined in code from extraction, loading, transformation, modeling, _and_ visualization, that'd be really powerful from my perspective.

js87y ago

Can't agree more.

IMHO, if you want to make a dent in the space, figure out better debugging tools!

veritas32417y ago

MechanicalTwerk7y ago

jakecodes7y ago

Thanks for the great advice! Obviously your years of experience, with trial and error, advice is greatly appreciated.

tbrock7y ago

I wish they would focus on making a fast, stable, GitHub alternative.

parasubvert7y ago

This is Gitlab taking stuff they were doing already internally and making it available to a broader audience.

At the same time, I can't blame them. I think it comes down to: Don't hate the player, hate the game.

sytse7y ago

We are building a fast, stable, GitHub alternative.

We have hired 3 times as many people in our security team for GitLab.com (not our product team for security) as are working on Meltano.

We have hired 3 times as many people in our SRE teams as are working on Meltano.

And we still have a lot of vacancies for both https://about.gitlab.com/jobs/

parasubvert7y ago

I meant no offense. But you’re also dabbling in building your own k8s distro / platform, you have your own CI/CD , Jira storyboard , and data science stuff, etc.

My point is that you’re aiming a lot broader than Github ever did - you are competing more as a suite than as a focused product.

n427y ago

Is there any example of an open source software company that has taken on so many products at once, so early in its life, and succeeded?

sytse7y ago

We did https://about.gitlab.com/2017/10/11/from-dev-to-devops/ when we where at 50% of our current number of engineers. So far so good.

n427y ago

gregoriol7y ago

Really no, look at all the comments here (and this is only from techies): you have lost us, we don't know anymore what you are doing, or even trying to do.

sytse7y ago

We are trying to make a single application that covers the whole DevOps lifecycle, from planning your change up to monitoring its effect.

We're doing it because we believe there are emergent benefits to having the lifecycle in a single application https://about.gitlab.com/handbook/product/single-application...

1 more reply

veritas32417y ago

Taylor from GitLab here! Happy to answer any questions about what we're doing.

thebiglebrewski7y ago

Kudos to you for trying something new!

veritas32417y ago

Thanks so much!

_pmf_7y ago

GitLab's usage of team members in marketing material is creeping me out (as does the whole team page[0]).

[0] https://about.gitlab.com/team/

sytse7y ago

We say team members instead of employees because some are contractors. Why does it freak you out?

BTW We don't call it a family https://about.gitlab.com/handbook/leadership/#management-tea...

_pmf_7y ago

I wouldn't want to have that level of public affiliation with my employer (no matter who that employer might be).

ageofwant7y ago

https://quiltdata.com/ ticks a lot of boxes in this space for me.

veritas32417y ago

Thanks for the link - we'll definitely keep an eye on it.

sytse7y ago

Is that project more for versioning data like https://docs.dotmesh.com/tutorials/subdots/ or http://www.pachyderm.io/ ?

danpalmer7y ago

I was very glad to see this is Python! Python has some of the best data tools out there, and a mature ecosystem for solving all the engineering problems that go along with a great data stack.

ksec7y ago

I am on the opposite side, Given Gitlab is a Ruby house I was secretly hoping some innovation coming from Ruby Data Science.

tamersalama7y ago

Is there some resemblance with Floydhub http://floydhub.com/ ?

veritas32417y ago

I fully expect we'll have a use case for the "cool" machine learning stuff, but there's a lot of groundwork to cover with the basics first. Meltano is focusing on those basics for right now.

NegatioN7y ago

Does anyone have a comprehensive list of similar offerings to floydhub? or OSS alternatives?

I think this market is not being served properly, most of them seem to still require most of the heavy lifting to be done by the ML practitioner.

I suppose I would even be okay with a service that just saves all my graphs from tensorboard for later reviewing.

houqp7y ago

Luuseens7y ago

The page talks mentions MVC, and the issue page[0] keeps mentioning MVC as well. Was this supposed to be MVP, or something else? Model-view-controller doesn't make sense in the context.

[0] https://gitlab.com/meltano/meltano/issues/10

jakecodes7y ago

We use the term mvc here, as "minimal valuable change", in a recognition that it may not be a product yet.

ajbosco7y ago

Do you see this as a (future) competitor of Airflow/Luigi type workflow tools?

sytse7y ago

Yes, the orchestrate part (working on GitLab CI) is an alternative for Airflow. Also see https://gitlab.com/meltano/meltano/blob/master/README.md#dat...

hn_throwaway_997y ago

Be interested to know all the competitors in this space. https://data.world/ is one I am most familiar with.

slap_shot7y ago

This projects competes with too many industries to really give a succinct answer, but here's just Extraction/Loading and Analyze:

Analyze Microsostrategy GoodData Sisense Looker Power BI Wagon Birst Tableau Qlik Domo Hue Mode Chartio Periscope Pentaho

The amount of hype and BS in the Notebook space would require me to spend some time combing through that again.

chasewright7y ago

sytse7y ago

Some of the alternatives are listed on in this table in the readme: https://gitlab.com/meltano/meltano/blob/master/README.md#dat...

jakecodes7y ago

One major difference will be the complete data life cycle vs providing just one part of it. Just like we do in GitLab except for data teams instead of software development teams.

gandutraveler7y ago

Looks like gitlab just wants to be in news since Microsoft's aquisition of GitHub.

sbr4647y ago

Are you releasing/sharing any of the extractors you built for various services?

jakecodes7y ago

sbr4647y ago

Thanks. I had looked but only saw one for fastly, am I missing others somewhere?

sbr4647y ago

Apologies, I found them in another repo - GitLab Analytics. Thanks

https://gitlab.com/meltano/analytics/tree/master/elt

j / k navigate · click thread line to collapse