Monorepo: please do (opens in new tab)

(medium.com)

234 pointsSoliah7y ago161 comments

161 comments

In my experience, this discussion gets convoluted by confusing modularity with monorepo. They are orthogonal to each other; you can have a very modular codebase in a monorepo but also a very coupled (non-modular) codebase with polyrepo.

Though it's true that monorepos without proper discipline can tend towards coupling. Yet, when discussing mono vs poly, we should keep this in mind.

hinkley7y ago

First monorepo I worked on, we used separate compilation units for each 'module'. We paid a tax on build time but it added a bit of friction to adding new cross-module dependencies willy nilly.

I don't know how you maintain that arm's length separation if you don't have compilation units in your language of choice, and that may contribute to some of the muddiness in this kind of discussion. "It depends."

lstamour7y ago

I think the private visibility and shared build chain that Bazel offers could step in here, in that it makes it harder to build a project without specifying every dependency, when combined with linting tools and clearly assigning code ownership...?

benmarten7y ago

It's not, why do I have to checkout terabyte of code that I don't need, even if the code is modularized?

jchw7y ago

No need to checkout a terabyte of code. If your repo is scaling that high, you're going to want a VFS layer. Microsoft made a VFS layer for Git. As you might imagine, you simply grab files as needed, and your version control just deals with diffs for the most part. Google's own monorepo is proprietary but the Bazel build system is open source and would work great with a VCS hooked up with a VFS layer.

jacques_chester7y ago

I want to like Bazel. I really do. But on first encounter the syntax is filled with sigils that don't seem to have obvious differences or purpose for existence. Then it turns out that I and others have spent as much time fighting it as using it. Lastly the coverage of ecosystems is sparse and there does not seem to be a lot of activity around extending them -- doing the boring, tedious, unloved work of dealing with everyone's quirks and bugs and corner cases and annoyances (been there, done that).

Again: I wish it was a smooth experience. Because I like the ideas very much. But it wasn't when I tried and I don't know anyone -- outside of Google -- for whom it was a smooth experience.

2 more replies

erulabs7y ago

If a mono-repo has a terabyte of code, or if 10 small repos have 1/10th a terabyte each, what have you really gained? In any case, git LFS solves large file storage effectively, as do a number of other artifact storage solutions, and a repo with a terabyte of code is _not_ going to be trivially split apart, since it would be by a factor of thousands, the biggest codebase ever created by humankind.

twblalock7y ago

If I only need to check out one of the smaller repos then I've gained quite a lot in terms of download speed, storage size, etc. Git LFS adds a lot of complexity I'd rather avoid.

1 more reply

nkozyra7y ago

> If a mono-repo has a terabyte of code, or if 10 small repos have 1/10th a terabyte each, what have you really gained?

If it's a small company where every developer touches every part of the application, sure. Taking the FAANG approach if you're not part of that acronym sounds like introducing inefficiency.

1 more reply

skj7y ago

Sounds like a tooling problem. We shouldn't use the current state of tooling as an excuse.

jessaustin7y ago

Isn't the entire argument about the current (or maybe "immediately foreseeable") state of tooling? We don't really care one way or the other, in a philosophical sense. What works?

1 more reply

rhacker7y ago

It's not for everyone, but damn, why is there a TERABYTE of code? Just curious - assets? checking in binaries?

malkia7y ago

Test protos. Evaluated configs. Golden data. JAR archives, etc.

alexnewman7y ago

Signs your build system is never going to be adopted outside of people cargo culting you?

  - [x] Namespaces and the like without much security benefit
  - [x] Giant Java dependency
  - [x] Strange syntax and glyphs

mwkaufma7y ago

We have a perforce monorepo with ~80gb total payload for the whole thing, but everyone uses streams to filter it, so that's not a problem.

hinkley7y ago

I think there's a false dichotomy here.

In the post yesterday one of the arguments was that if nobody checks out all of the code then what's the value of having the code all in one place?

Last monorepo I worked on, individual contributors checked out just the tree they were working on (we had a suite of applications with several shared modules). We made it simple and straightforward for them to get what they wanted and ignore people whose work didn't impact them.

But the senior people, who were better with architecture and version control trivia, checked out the entire thing. They would steward any cross-cutting changes that needed to be done, and make sure any callers to shared libraries were updated in the face of breaking changes. They were also backstopped by the build plans, (some of) which also checked out the entire thing.

1 more reply

slobotron7y ago

Chances are you will end up downloading a lot of dependencies anyways, why not have git deliver it all?

nkozyra7y ago

Huh? You'd download dependencies for the repos you need, not the code and dependencies for the entire company.

It could be several orders of magnitude larger and with a larger organization could be a lot of unnecessary code that any given Dev may never touch.

1 more reply

jonthepirate7y ago

The reason I am upvoting this is that it is written in a positive tone. Too many people - especially in the world of DevOps, trash everything. (X is the worst, don't do that, etc) and more often than not do not offer better guidance following their whiney tone. We need more "please do's" in this industry. Thank you Adam.

klodolph7y ago

I have personally migrated a medium size polyrepo code base (something like ~20 repos?) into a monorepo and I agonized over the decision. But it lifted a huge weight off my shoulders.

I feel like if you are working completely in the open-source world, and you are contributing one open-source project to a larger array of available projects, then the decision to use a polyrepo makes a lot of sense. You can submit libraries to a package repository like Yarn/NPM/PyPI or you can use Git references for e.g. Go's package manager.

But what I experienced with polyrepos outside this world is that we ended up with a weird DAG of repos. It was always unclear whether a specific piece of code that was duplicated between projects should be moved into one dependency or another, or whether it should have its own repo. Transitive dependencies were no fun at all, if you used git modules you might end up with two copies of the same dependency. You might have to make a sequence of commits to different repos, remembering to update cross-repo references as you go, and if you got stuck somewhere you had to work backwards. This feels like a step backwards, like the step backwards from CVS to RCS.

Again, in the open-source world you might have some of this taken care of by using a package manager like Yarn. But if your transitive dependencies aren't suitable for being published that way, it can be tough. Monorepo + Bazel right now is a bit rough around the edges but overall it's reduced the amount of engineering time spent on build systems.

On the other hand, it's not like Bazel can't handle polyrepos. In fact, they work quite nicely, and Bazel can automatically do partial checkouts of sets of related polyrepos, if that's your thing.

As for VCS scalability problems, I expect that Git is really just the popular VCS du jour and some white horse will show up any day now with a good story for large, centralized repos with a VFS layer. In the meantime any company large enough to experience VCS performance problems but not large enough have their own VCS team (like Google and Facebook) will suffer, or possibly pay for Perforce.

malkia7y ago

BAZEL has WORKSPACE file that can work with multi-repos, but AFAIK things are still rough there, though would get better eventually (I'm bit hand-wavy on the details).

klodolph7y ago

Yes, exactly. Unfortunately, sometimes the partial checkouts can be somewhat limited by the fact that your WORKSPACE code will import Starlark defined in other repos. This can get a bit ridiculous if your repo uses a bunch of different languages, if you browse through e.g. the TypeScript support instructions for Bazel you’ll see some of what you’re in for.

If your project is mostly something like C++ (which has support built-in to Bazel) then the WORKSPACE rules will be much more manageable and partial checkouts become a lot easier.

kokokokoko7y ago

Its almost as if both approaches have positives and negatives. Some of which are more important depending on your project and organization.

I'd be more interested to read about a project or company that failed due to making one choice or the other. And then by switching things to the other way, things were fixed.

Otherwise, as someone who was worked with both, I imagine there are a host of other decisions that will be much determinant on your success.

Let's not get too wrapped up in what color to paint the shed.

natalyarostova7y ago

>Its almost as if

Please don't do this.

ceronman7y ago

I work at a large organization (2000+ devs). We have used both a Monorepo and Polyrepo. After some extensive experience with both models my conclusion is that the Monorepo is by far a superior model, specially for a large organization.

Of course the Monorepo is not free of downsides, those mentioned in the original article are real, although a bit exaggerated in my opinion. VCS operations can be slow and scaling a VCS system is challenging, but possible. And the risk of high coupling and a tangled architecture is also very real if you don't use a dependency management system like Bazel/Buck/Pants.

But in my opinion the downsides of the Polyrepo are much worse and much much harder to fix. The main problem is that you need a parallel version control system like SemVer on top of your VCS. SemVer is fine for open source projects but for a dynamic organization is a nightmare because it is a manual process prone to failure. SemVer dependency hell is really hard to deal with and creates a lot of technical debt.

Additionally, once you go Polyrepo you lose true CI/CD. Yes, you still have CI/CD pipelines but those apply only to a fraction of the code. Once you get used to run `bazel test` and you know you will run every single test of any piece of code that could depend on the code you just changed, you never want to go back. Yes, you could have true CI/CD with Polyrepos, but it requires a lot of work and writing a lot of tooling that does not exist in the wild. It is cheaper to invest in scaling your VCS in a multi-repo.

woolvalley7y ago

My org went from polyrepo 10 commit semver dependency hell when updating an internal API to monorepo and it saves a lot of time. Unmigrated semver breaking changes are a form of technical debt, and it takes a lot more total man hours to do the 'proper' one by one many commit poly repo migration than the other way around.

If we had the tooling to do multirepo atomic commits and reviews then maybe we would of stuck with polyrepos, but it doesn't really exist out in the wild, so monorepo it was.

albedoa7y ago

> Unmigrated semver breaking changes are a form of technical debt

Maybe you can clear my confusion. If Module B is dependent on Module A, then every version of B should refer to a specific version of A, correct? What is there to break? Development can continue on A without interfering with B, and then you can uptick B once it points to a later A.

I'm not sure what this has to do with the mono/poly discussion.

woolvalley7y ago

Engineering resources are not unlimited, so naturally new bugs and features will be updated only on master vs. 2 or 5 major semver branches, because 2-5 module Bs haven't bothered updating yet. If you maintain 5 separate branches, then you're spending that much more engineering resources for little benefit, because you don't have external customers. So the modules that haven't migrated yet decay under a state of deferred maintenance, which is a form of technical debt.

To avoid that, you do 10 migration commits so everyone is on the latest version. If you're going to do that as standard operating procedure anyway might as well make it far easier and have a monorepo.

mlthoughts20187y ago

My org went from a monorepo where every project had to obey the same CI model and you could not introduce entirely new CI tools for new prototypes over to a polyrepo with separated semver library repos for shared dependencies, and it simplified everything so much.

Adding additional PRs across different repos is functionally no different than the same PR with scattered dependencies in a monorepo, except that separating the PRs makes each isolated set of changes more atomic and focused, which has led to fewer bugs and better quality code review and, the hugest win, each repo is free to use whatever CI & deployment tooling it needs, with absolutely no constraints based on whatever CI or deployment tool another chunk of code in some other repo uses.

The last point is not trivial. Lots of people glibly assume you can create monorepo solutions where arbitrary new projects inside the monorepo can be free to use whatever resource provisioning strategy or language or tooling or whatever, but in reality this not true, both because there is implicit bias to rely on the existing tooling (even if it’s not right for the job) and monorepos beget monopolicies where experimentation that violates some monorepo decision can be wholly prevented due to political blockers in the name of the monorepo.

One example that has frustrated me personally is when working on machine learning projects that require complex runtime environments with custom compiled dependencies, GPU settings, etc.

The clear choice for us was to use Docker containers to deliver the built artifacts to the necessary runtime machines, but the whole project was killed when someone from our central IT monorepo tooling team said no. His reasoning was that all the existing model training jobs in our monorepo worked as luigi tasks executed in hadoop.

We tried explaining that our model training was not amenable to a map reduce style calculation, and our plan was for a luigi task to invoke the entrypoint command of the container to initiate a single, non-distributed training process (I have specific expertise in this type of model training, so I know from experience this is an effective solution and that map reduce would not be appropriate).

But it didn’t matter. The monorepo was set up to assume model training compute jobs had to work one way and only one way, and so it set us back months from training a simple model directly relevant to urgent customer product requests.

Had we been able to set this up as a separate repo where there were no global rules over how all compute jobs must be organized, and used our own choice of deployment (containers) with no concern over whatever other projects were using / doing, we could have solved it in a matter of a few days.

In my experience, this type of policy blocker is uniquely common to monorepos, and easily avoided in polyrepo situations. It’s just a whole class of problem that rarely applies in a polyrepo setting, but almost always causes huge issues with monorepo policies and fixed tooling choices that end up being a poor fit for necessary experiments or innovative projects that happen later.

falsedan7y ago

> each repo is free to use whatever CI & deployment tooling it needs, with absolutely no constraints based on whatever CI or deployment tool another chunk of code in some other repo uses.

Hear, hear. Let teams choose the processes and tools that work best for them. In previous release engineering positions, I resisted the many attempts to instroduce a single standard workflow for all projects. The support burden of letting a thousand flowers bloom was not great, but the benefit was that devs understood their project and were empoiwered to make changes when the business requirements changed faster than standardized tooling could.

We had a few contracts for standard behaviours, but they were low-overhead: must respond to 'make/make test', have a /status endpoint that 500'd when it was unhealthy, register a port in the service conf repo, etc.

jacques_chester7y ago

> except that separating the PRs makes each isolated set of changes more atomic and focused

It makes it less atomic if you need simultaneous changes in multiple repositories.

> Had we been able to set this up as a separate repo where there were no global rules over how all compute jobs must be organized, and used our own choice of deployment (containers) with no concern over whatever other projects were using / doing, we could have solved it in a matter of a few days.

I think this was an organisational problem, but I accept the argument that monorepos will provide a seed around which such pathologies can crystallise. But I don't believe it's the only such seed and I don't think it's an inevitable outcome from monorepos.

mlthoughts20187y ago

> It makes it less atomic if you need simultaneous changes in multiple repositories.

No, each individual set of changes is more atomic (smaller in scope, mutating a system from one state of functionality to a new state of functionality).

The problem is that it’s a linguistic fallacy to act like in the monorepo case “the system” is the sum of a bunch of separate systems (it isn’t, because they are not logically required to depend on simultaneously transitioning). So in that monorepo case, to move subcomponent A from some state of functionality to a new state of functionality, you unfortunately have to also make sure you include totally unrelated (from subcomponent A’s point of view) changes that also correctly transition subcomponent B to a new state of functionality, and subcomponent C, etc., which is exactly less atomic (to transition states, you are required to have simultaneous other transitions that are not logically required for any reason other than the superficial sake of the monorepo).

1 more reply

joshuamorton7y ago

This has nothing to do with monorepos though. Its entirely a company policy issue. There's nothing about the monorepo that prevents you from writing a script that ran on precommit and built and deployed via docker to a test cluster.

Unless you mean your presubmit test would push to production machines, that's bad and shouldn't be allowed, but again has nothing to do with a monorepo.

A company could just as easily have draconian policy about testing and deployment and multiple repos. Maybe you could break the rules (hell you could have broken the rules in monorepo land), but again, that's just a rules issue, not an issue of the repository.

mlthoughts20187y ago

If a tool begets using it wrong all the time, then after a certain point, it’s the tool’s fault.

What you’re saying amounts to something of a No True Scotsman fallacy... “no _real_ monorepo would limit different projects from using individualized tooling if needed...” Yet that limitation suspiciously coexists with monorepo tooling frequently, and does not frequently coexist with polyrepo tooling.

1 more reply

pdpi7y ago

Can we just move along and get to "Monorepo: Maybe do it, maybe don't. Just think it through and own your decision"?

Both monorepos and polyrepos have advantages and disadvantages. Many factors — scale, overall team quality and experience, level of integration between projects are a few that come to mind — will affect how much those advantages and disadvantages matter to any given company at any given point in time. The right choice for you isn't necessarily the right choice for me.

Much more important than which approach you choose is understanding, and accepting, the consequences of your choice. You'll want to extract value out of the advantages, you'll need to mitigate the disadvantages. You won't be able to adopt tools and processes meant for the other approach without some degree of friction.

0xFACEFEED7y ago

That's what most people do. They just don't blog about it.

sigil7y ago

Observe how the verb "force" gets used 6 times. Monorepos "force the conversation." You the individual contributor are "forced to deal with the situation" and "forced to see the upfront cost" of breaking contracts. Your team is forced to "look up from their component, and see the perspectives of other teams and consumers."

All this forcing people to do things the Right Way (my way) is surely part of the pushback against monorepos.

But set that aside for the moment. Let's suppose defaults should force people to do things the Right Way, and that we also know what the Right Way is.

Instead of letting anyone sloppily depend on any code checked into the monorepo, shouldn't we force people to think long and hard about contracts between components -- the default concern in a polyrepo architecture? When and how to make contracts, when and how to break contracts? Isn't this how Amazon moved past their monorepo woes, adopted SOA, built AWS, and became one of the largest companies on earth? Heck, isn't this how the Internet itself was built?

holoway7y ago

Author here. The Right Way :tm: is situational - there isn't one right answer to things like when and how to make contracts, or how to break them. When I used the term "force", you'll see that I'm usually talking about dialog between people and teams.

It's not that it's a single right way to do it. There isn't, and anyone who tells you there is has something to sell you, or is inexperienced enough to not have seen enough of the problem domain.

What is for certain: teams need to have tooling that causes the conversations and behavior that lead to the outcomes we want. As systems and teams scale large enough, this tooling becomes essential - without it, teams go their own way, and in so doing, may or may not create the culture needed for the outcomes you want.

I have never once in my career, so far, had to tell a team to communicate less. When we're talking about engineering organizations that are large enough to diverge, you must solve these problems somehow, and it needs to be systemic and intentional.

sigil7y ago

Thanks for the response. Out of curiosity, how does your engineering organization introduce new dependencies within the monorepo? Can B, C and D all depend on A without A's consent or even awareness? (Suppose A is some checked in code that's useful, going to see updates in future, but is dormant at present.)

Your post puts a lot of the onus on A for breaking B, C, and D, but I think equal care and consideration needs to come from the other side of the contract. Eg, What are you depending on? Is it a dependency you want to take on, or are you and the shared code likely to diverge in life? These are top of mind decisions in a polyrepo architecture, but from my experience they're often not even considered in a monorepo. Anything checked in is fair game for reuse. This is why I suspect you may be "forcing" the wrong thing.

For reference I've worked in companies large and small, both monorepo and polyrepo. When I worked on Windows back in the 00's the monorepo tooling (SourceDepot) was quite amazing for the time, but the costs of that sort of coordination were also painfully apparent to everyone.

The place I currently work has a monorepo for desktop software and polyrepos for everything else. It isn't a straight up A/B experiment, but anecdotally the pain is higher and shipping velocity lower in the monorepo half of the world. Most of the monorepo pain is related to CI or other costs of global coordination, the kind of things Matt touches on midway (albeit probably too subtlely). I'd be interested to see your counterarguments to those points as well. Do you need fancy dependency management tooling to make your global CI builds fast and reproducible? Matt argues those end up being equivalent to the kind of dependency tooling that's intrinsic to polyrepo architectures anyway.

holoway7y ago

Disclaimer: it depends. :) Since that's not a good answer at all, I'm going to write the rest of this as if I have the answer, even though I know I do not, because it's deeply situational.

Equal care does need to come from the other side of the contract. Most frequently, I see teams B, C, and D in a polyrepo world do the worst of all worlds: take dependencies liberally, pin them in place, and try to forget about them. Of course, high functioning engineering teams (and cultures) will try and avoid this: they will be thoughtful about dependencies, and they will keep them up to date. In practice, they most frequently do not. This is especially true in the enterprise broadly. When we get it wrong, and take a dependency we wish we hadn't, how do we know? When do we know? What is our recourse? If I depend on code in the monorepo that diverges, I'm more likely to know near to the point of divergence (because of the nature of the system). That means the conversation about how to fix it happens sooner. I'm not interested in avoiding error - that's going to happen. I'm interested in how close to the introduction of the error do we understand it, and how do we communicate about its remediation.

As far as CI and global coordination goes, the cost is high in either direction if the system is distributed, and the solutions are similar in my experience. I think the worst case is the mixed one (which is a world I inhabit) - you wind up splitting your investment in both style and effort across both approaches. With the monorepo style, one big advantage is where the complex CI interactions can be encoded, since you have access to more of the code itself. Granted, at scale, you likely are testing against artifacts rather than point in time commits outside of the component in question (this is very similar to what you're going to do in a polyrepo, too.)

I think solid testing design requires real effort and understanding of the system under test, regardless of repository layout. Which brings us back to communications again. The more you can see, and the more clearly experienced the pain is across the teams, the more likely you are to have the critical conversations needed to improve the system - rather than making local fixes ("my teams tests are fast", "their component sucks").

1 more reply

senderista7y ago

Actually Windows wasn’t a monorepo back then: there were separate repos for the shell, kernel, filesystem, etc. Hence the need for cross-repo tooling like “sdx”.

Source Depot was great (modulo availability issues), but I don’t think they got anywhere near the scale of Piper.

1 more reply

mmmeff7y ago

Thank you so much for writing this. As someone who’s worked in the best and worst of these two words, the productivity gains are absolutely insane and the limitations, as stated by the author, are no more painful than limitations of federated/polyrepo code.

Fighting back against monorepo design is dangerous - embrace experimentation.

shados7y ago

> Fighting back against monorepo design is dangerous

What's dangerous about it? Monorepos have a lot of benefits, and should absolutely be considered. Maybe even by most. But right now in the community it's almost pushed as the "only true way with all benefits and no drawbacks", and that's absolutely not true. To the point the knowledge of why and how to poly repo is already starting to get lost.

That's dangerous.

Benjammer7y ago

The real danger here is anyone talking about any system architectures or tooling as "dangerous" (or "not dangerous") absent any other context...

What do you even mean by "dangerous"? To a business? To your health?

What is the deal with people trying to make these sorts of global assertions in a vacuum about what's "good" and "bad"? This doesn't make any engineering sense in any way to me. You have a problem and you figure out the best way for your business to solve that problem given some bounded resources. Nothing in the basic problem solving process (scientific method?) necessitates all the arbitrary "should" axioms. Why don't people just analyze their specific situation and figure out a solution?

It's like people arguing vehemently about the optimal design that every company "should" be using for all windshields for all personal vehicles on the road, without even remotely discussing various vehicle body shapes and sizes.

sebastos7y ago

Well just to present the other side, I don't really understand the prevalence of the "there's no one answer that fits for everybody" comment trope. You see a couple of comments like yours in every discussion like this. So no offense, but I'm going to rant about it for a few paragraphs.

If the "no one-size-fits-all" claim happens to be genuinely and axiomatically true for a particular engineering trade-off, then fine. There's no one correct displacement of an internal combustion engine. There's no one correct resolution of an LCD screen. Fine. It's demonstrably true that a trade space exists.

But a lot of times people seem to just throw up their hands and call it a trade space when really they just haven't reached a conclusion yet. "There's nothing inherently better or worse between Ubuntu and Windows, they're basically just ice cream flavors!" No! Maybe we haven't fully realized a more perfect operating system yet to settle the debate, but that doesn't just make it a meaningless question. It's perfectly possible for a system to be architected poorly given both the real world it has to interact in and the future world it makes possible. To say that this question is an unanswerable matter of taste is to be completely unimaginative about how good an operating system _COULD_ be. (See the death of operating system research and all that).

CVS is _worse_ than git. It just is. I don't want to hear this "well maybe if it fits your use case" mumbo jumbo. If you think that you have a unique snowflake reason that CVS is more appropriate than git, than you are almost certainly lying to yourself or misinformed.

And it's strict hierarchies like that that inspire these articles. There are a lot of technologies out there, and lot of ideas, and most people don't know most of the things you need to know to come up with a good answer to what suits "their specific situation". So people like myself are looking for lessons learned and certain invariants that help them narrow the solution space. I have no idea whether a monorepo would work well for my organization, and if the only thing that your article has to contribute is "monorepos sometimes work for some people, but YMMV! Good luck!" then I have learned nothing. But if somebody thinks that they've learned a fundamental truth about the universe, that that could be useful to me. Whats more, most people like me have a situation that _isn't_ that specific. We have to write some code, there's some ML shit in there, and some real-time critical stuff in there. Nothing mindblowing. _Most_ software shops shouldn't need something that is particularly bespoke. So coming in with the prior that everybody will have to do something unique to their organization is bizarre. There is so much commonality between what each software company does, in fact, that if a commonly used technology can be used by shop A but legitimately can't be used by shop B, there's a decent chance that this is a problem or limitation with the tech.

So who knows, maybe saying monorepos are _always_ better or _always_ worse really is too ambitious. But I don't think the concept that they _could_ be is a priori ridiculous. End this software relativism! Things can be made better! Yes, strictly better!

2 more replies

jessaustin7y ago

This complaint belongs one level higher in the thread.

weberc27y ago

I'm currently working on a monorepo now in our ~30 person engineering organization. We have a microservice architecture, and would like to avoid rebuilding the things that don't need to be rebuilt; however, it seems like most tooling assumes that the whole repo is your project. I've tried working around this by diffing against master or diffing against HEAD~1 or keeping a special tag that tracks the last good state, and all of these seem to spawn odd edge cases and generally are tedious when your build target is entirely contained in a single directory--no idea how to solve for build targets that depend on files elsewhere in the repo. A tool like Bazel seems really heavy for an organization our size (and unless I'm mistaken, it seems to assume that all of your code lives in your repository--pulling dependencies from Pypi seems like a fourth class citizen).

Have you (or anyone reading this thread) encountered similar issues? How do you solve them in a monorepo?

zer00eyz7y ago

> (and unless I'm mistaken, it seems to assume that all of your code lives in your repository--pulling dependencies from Pypi seems like a fourth class citizen).

My feelings here are apart from your tool of choice (Pypi) so read them with that in mind.

Why are you dependent on 3rd party code that isn't in your repo? I am a huge advocate of the monorepo and vendoring. Depending on your tooling of choice and your workflow checks for updates on this third party code should be frequent (security) and done by someone qualified (not a job for the "new guy").

The question is where should this start and end? The answer (for me) is everything and I have elected to use less (and reduce complexity) to avoid bloat. Really though this is an artifact of my use of Git: https://unix.stackexchange.com/questions/233327/is-it-possib... --

shados7y ago

> Why are you dependent on 3rd party code that isn't in your repo

Not the parent, but for us, the 3rd party code is in a private package manager (artifactory, private npm, whatever). Having thousands of libraries we didn't write in our repo doesn't sound like fun.

1 more reply

est317y ago

There aren't good monorepo solutions out there (yet). Git LFS is great for few large files, but it doesn't help with tons of smaller files. Git submodules are crap when it comes to usability, and have been for a long time, it's even mentioned in the famous Torvalds Git Talk.

Git had a sparse checkouts feature since a long time, but it only affected the checkout itself, all the blobs would still be synced.

Now, Git is gaining good monorepo capabilities with the git partial clone feature [1]. Their idea is that with them you can only clone the parts of a repository that are interesting to you. This has been brewing for a while already but I'm not sure how ready it is. There doesn't seem to be user-level documentation for it yet, to my knowledge, so I am linking to the technical docs.

[1]: https://github.com/git/git/blob/master/Documentation/technic...

dangoor7y ago

From earlier discussions around monorepos, I saw references that Google, Facebook, and other large monorepo orgs have been making use of Mercurial.

est317y ago

Yes, Facebook is mercurial based to my knowledge. Google is using its custom solution called piper I think: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

itsdrewmiller7y ago

https://vfsforgit.org/ is another option here - MS-originated and Github is adopting it - https://venturebeat.com/2017/11/15/github-adopts-microsofts-...

malkia7y ago

Monorepo is a total win, if you have something like https://github.com/Microsoft/VFSForGit (ex GVFS) - e.g. any monorepo that overlays changes, and the rest are simply file names with no actual contents is a win.

You can certainly achieve this with Perforce, SVN, HG, any repo system there too.

Linux: FUSE + ?

Windows: Dokan? CBFS? Or the new fangled https://docs.microsoft.com/en-us/windows/desktop/projfs/proj... which VFSForGit uses

e3b0c7y ago

Monorepo could be a decent choice if your software stack does not require too much external dependencies. Or more precisely, the ratio of own code to the third-party code is reasonably high.

Let me give a concrete example. The Android open source project (AOSP) which builds the system of Android devices has the code size close to the scale of tens of GB (let alone all the histories!). It is already a massive monorepo in itself. And typically you would have many of them from different OEM/SoC vendors of different major releases. In such a scenario, it would turn into 'a monorepo of monorepos,' which is quite unpleasant to imagine.

totallysnowman7y ago

I think that the reason of the argument is that both authors understand the definition of "large repository" very differently.

With 100 engineers a monorepo might seem a good idea. With 500 it becomes nearly impossible to do anything involving a build. Some isolation is needed.

Also from my experience many engineers just don't give a shit about architecture. They create entangled mess, that kind of works for the customer, and go home. Without some enforced isolation it is impossible to maintain it.

That being said I am more inclined to polyrepos.

thurn7y ago

the fact that essentially 100% of big tech companies use monorepos seems like evidence that it is at least possible to do it in a scalable way...

shados7y ago

Definitely not 100%. It also has a lot less to do with company size, and more about when the company was created. Before the git and similar tools of the world came to be, managing a single repo was a pain, nevermind hundreds or thousands of them. So (almost) everyone did it the way these big companies did.

Today, not quite. I work for a multi billion dollar tech company and we have several thousand repos (and it's awesome)

user59944617y ago

Not true. Google, Facebook, Goldman Sachs and JP Morgan, all companies that run mono repos and predate git by very far.

Git cannot checkout sub directories and it slows down exponentially with the number of branches. It's the opposite of what is needed to run a mono repo in a large company.

1 more reply

influx7y ago

Amazon does not use a monorepo, so you might want to rethink your "statistic".

senderista7y ago

AMZN doesn’t, unless things have changed drastically in the last 3 years.

denimnerd427y ago

yeah by writing custom version control software. Am I going to convince my company to do that (which has like 50k software engineers) probably not.

rabidrat7y ago

How many companies have 50k software engineers? Seems like the handful that do, should do whatever works for them. The rest of us can just use a monorepo.

pbalau7y ago

> With 500 it becomes nearly impossible to do anything involving a build.

Both FB and Google have more than 500 devs and are using a monorepo.

hocuspocus7y ago

At what cost? Both FB and Google employ hundreds of devs to work on internal tooling only. For most companies this isn't feasible.

ceronman7y ago

It's actually the other way around. The cost of dealing with all the downsides multiple version control systems is way higher than investing in tooling for the Mono-Repo. This is mentioned in articles by Facebook and Google.

0xFACEFEED7y ago

Source? Sounds like you're pulling that number out of thin air.

Isn't it reasonable to assume that FB/Google will do a cost analysis of mono/poly repo approaches and pick the one that is the most cost effective? At that scale they have absolutely no room for dogma; it's all about costs.

pbalau7y ago

Afaik, fb employs less than 20 people to take care of the monorepo.

woolvalley7y ago

Why does the build not work at 500 engineers? Just because there are 5 apps / services and their 25 libraries doesn't mean you have build the 4 other apps when doing development just because you did a git pull. Libraries would still need to build their dependants in a CI system with poly or mono repos.

totallysnowman7y ago

The 5 apps and their 25 libraries are in fact one entangled monolith. That is what I am talking about. It is only marketed as 5 apps. That has some hundreds external dependencies, of course.

skybrian7y ago

I wonder if a star pattern would work, where you have a single, shared repo for all your libraries and a repo for each app.

This would help people working on smaller apps, since they don't need to look at other apps unless they're working on shared library code.

Of course, once you are working on library code, you have to build and test all the apps that use it. But even at Google, the people working on the lowest levels of the system can't use the standard tools anyway.

ceronman7y ago

A star pattern still has most of the downsides of the multirepo approach. Specifically, it has the problem of needing a parallel version control (e.g. SemVer) on top of your individual repositories. This creates fragmentation, where different applications have dependencies on different versions of the libraries which ends up in dependency hell, technical debt, and CI hell.

skybrian7y ago

An alternative would be to have a policy where all the app repos must use the same version (nobody can upgrade until they all upgrade). This makes things harder for the library maintainers, but no more than a monorepo.

I don't see why you'd need semver. The apps could sync to a particular commit in the library repo.

Too7y ago

What you propose is just a fake monorepo, containing the global policy of allowed version of X, disguised as multiple subrepos. OP discusses this.

mindcrime7y ago

"Shared responsbility" is one of those ideas that sounds good on paper, but doesn't really scale terribly well in the real world. As the old saying goes "when everybody is responsible, nobody is responsible".

More to the point, as the author of TFA allows, once a system reaches a certain size, nobody can understand it all. At some point you have to engage division of labor /specialization, and once you do that, it doesn't make sense to have just anybody randomly making changes in parts of the code-base they don't normally work in.

I'd rather see a poly-repo approach, with a designated owner for discrete modules, but where anybody can clone any repo, make a proposed fix, and submit a PR. Basically "internal open source" or "inner source"[1].

In my experience, this is about as close as you can get to a "best of both worlds" situation. But, as the author of TFA also says, you absolutely can make either approach work.

[1]:https://en.wikipedia.org/wiki/Inner_source

falsedan7y ago

That's what we did at my last workplace; 800-odd devs, 3000-odd repos, and a service we wrote to record and gather all the ownership metadata (which team owns this repo? where can I contact them? who should be on the code reviews? how do I page them? which project should tickets be created in?)

It was a gigantic pain trying to find owners for half-dead repos for services still running and in use, where the original authors had left years ago & from teams 4 or 5 restructures ago. The one thing I learned was: never make a user the owner of a repo (unless it is in their personal space), always find a team to accept responsibility for it.

lhorie7y ago

> poly-repo approach, with a designated owner for discrete modules, but where anybody can clone any repo, make a proposed fix, and submit a PR

This is how it works at my company. The issue we run into is that PRs coming from non-core maintainers tend to either get over-scrutinized (e.g. "this diff may work for you but it's not generic enough for X/Y/Z") or flat out ignored at the code review stage and sometimes don't land in a timely enough manner.

Another challenges with this approach is when you have deeply nested dependencies and need to "propagate" an upgrade in some deep dep up the tree. In the JS/Node world, this usually means fixing an issue involves hacking on transpiled files in the node_modules folder of a project to figure out what change needs to be made, and then mirroring said change into the actual repo and then tweaking things until type checking/linting/CI pass. Not really conducive for collaboration.

One other problem is that security/bug fix rollouts are a bit more challenging. We had a case a while back where a crash-inducing bug was fixed and published but people still experienced crashes due to not having upgraded the one out dozens of packages required by their projects.

jacques_chester7y ago

> As the old saying goes "when everybody is responsible, nobody is responsible".

Here's my rule: You break it, you fix it.

> I'd rather see a poly-repo approach, with a designated owner for discrete modules, but where anybody can clone any repo, make a proposed fix, and submit a PR.

I'd rather see pairing, extensive tests and fast CI. I see PRs as a necessary evil, rather than a good thing in themselves. If I make a change that breaks other teams, I should fix it. If I can make a change to fix code anywhere in the codebase, I should write the test, write the fix and submit it.

Small, frequent commits with extensive testing creates a virtuous cycle. You pull frequently because there are small commits. You are less likely to get out of sync because of frequent pulls. You make small commits frequently because you want to avoid conflicts. Everyone moves a lot faster. I have had this exact experience and it is frankly glorious.

sixstringtheory7y ago

> You break it, you fix it.

I’ve seen this invoked so many times to shirk responsibility though. Someone piles up all kinds of crap in a tight little closet, complete with a bowling ball on top, and the next unsuspecting dev who comes by and opens it gets an avalanche of crap falling on them while the original author can be heard somewhere in the background saying “it’s not my problem.”

This winds up leading to more crap-stacking just to get the work done ASAP and you wind up with a mountain of tech debt.

I like the zero flaw principle where new feature work stops until all currently known flaws are fixed. Then everyone is forced to pitch in and responsibility is shared whether you want it or not.

jacques_chester7y ago

> I’ve seen this invoked so many times to shirk responsibility though. Someone piles up all kinds of crap in a tight little closet, complete with a bowling ball on top, and the next unsuspecting dev who comes by and opens it gets an avalanche of crap falling on them while the original author can be heard somewhere in the background saying “it’s not my problem.”

I'm accustomed to collective ownership where, ideally, this never happens and in practice happens rarely (followed by the little closet being torn out and replaced).

> I like the zero flaw principle where new feature work stops until all currently known flaws are fixed.

I agree: stop the line. But I think it's orthogonal to the sins or virtues of n-repology.

benmarten7y ago

I couldn't agree more!

Rapzid7y ago

Any good mono repo build tools out there? I've been thinking about this for the past few weeks. Considering creating a general purpose monorepo tool chain and potentially a mono repo first CI system.

Unfortunately some of the most popular CI/CD services out there(Travis, Circle, etc) don't even support cross-repo pipelines, much less mono repo builds.

fxfan7y ago

Pants and bazel sound like favorites

Rapzid7y ago

Interesting, thanks! Didn't realize Bazel was open sourced..

Those both look way more in the weeds than what I would have imagined.. I guess for Bazel at least it makes sense given Googles scale how fine-grain they would get into caching and incremental builds..

For my needs a simple tool that would allow discovering "WORKSPACES" and constructing a build graph based on what's changed, while handing off the actual building to some entry point in the workspace, would be good enough. Have a weird collection Gradle projects, node projects, test suites, docs, and etc with their own build processes already in place.

Some things are also on a "critical" path while others can run async given the context(branch, tag, etc)...

I'm rambling though.

jpeeler7y ago

Does anyone know how please (https://please.build) compares?

cryptonector7y ago

I agree, use a monorepo. I anxiously await MSFT's git megamonorepo functionality. Until then there's things like git meta[0].

[0] http://twosigma.github.io/git-meta/

luord7y ago

Yet another chapter in one of the big flamewars. Seeing as I fall in the monorepo camp, I must say I mostly agree; also, I much prefer this tone for an article.

I find it enjoyable how plenty of comments both here and in the other discussion are of people saying "We had a mono/polyrepo and things improved tremendously when we migrated towards a poly/monorepo". The issue might be one of growth and complacency: a drastic change like that forces the team to face the technical debt that was being ignored and do a better implementation using what was learned from past mistakes.

coldtea7y ago

>But I think Matt’s argument misses the #1 reason I’ve flipped quite hard to a monorepo perspective as my own level in the organization has gotten higher

Perhaps the fact that since their level was now higher, they wouldn't have to deal with the nitty gritty details and pain of working with a monorepo as a developer?

E.g. I wasn't for it when I was a dev, but now that I can just impose it on others, I love it. Same with how various 'development process' rituals are adopted...

Tempest19817y ago

For those using monorepos, what is your branch strategy? Say that 3 projects share a library, and release on different schedules. How does each project freeze shared library changes? Do you keep N version branches?

How does the library team know which consumers a commit may break? What tools are recommended?

AzzieElbab7y ago

As engineers we spend wast amounts of time in constant search for a rival to "tabs vs spaces" debate

randyrand7y ago

The more complicated answer is sometimes you should use a mono repo and other times you shouldn't.

rdsubhas7y ago

This is starting to get a debate of "principles", like forcing A and B to talk, or forcing A and B to have more explicit boundaries, and so on. Guess where that ends (hint: it doesn't).

With a monorepo, the basic effort you have to put in to start scaling is quite high. To properly do a local build, you need bazel or something. But bazel doesn't stop at just building, but it manages dependencies all the way down to libraries and stuff. Let's say you're using certain maven plugins, like code coverage, shading, etc. Would bazel have all the build plugins your project needs? Most likely not. You have to backport a bunch of plugins from maven to bazel and so on. Guess how many IDEs support bazel? Not a lot.

Then you need to run a different kind of build farm. When you check-in stuff to a monorepo, you need to split and distribute one single build. Compared to a polyrepo where one build == one job, a monorepo is like one build == a distributed pool of jobs, which again needs very deep integration with the build tool (bazel again here), to fan out, fan in across multiple machines, aggregate artifacts, and so on.

Then the deployment. Same again. There is no "just works" hosted CI or hosted git or anything for monorepos. People still dabble with concourse or so on.

And guess what, for a component in its own repo, you don't need to do anything. Existing industry and OSS tooling is built from ground up for that. Just go and use them.

To provide a developer a "basic experience" to go from working on, building and deploying a single component – the upfront investment you need to provide with a monorepo is very high. Most companies cannot spend time on that, because scale means different things to different companies. There is a vast gap in the amount of ops/dev tooling you have for independent hosted components vs monorepo tools. Just search for "monorepo tools" or DAG and see how many you can come up with. So what really happens with a monorepo is, most companies go with multi-module maven and jenkins multi-job. The results are easy to predict. I'm not saying that maven/jenkins are bad, but they are _not_ sophisticated, and are not anywhere close to what Twitter/Facebook/Google or any modern company uses to deal with a monorepo (for a good reason). They are just not good at DAG. If you're relying on maven+jenkins as your monorepo solution, all I can say is "good luck".

Instead, if you start by putting one component in one repo, you keep scaling for _much longer_ before you hit a barrier.

In principle, monorepos are better. In practice, they don't have the basic "table stakes" tooling that you need to get going. Maybe monorepo devops tooling is a next developer productivity startup space. But until then, it's not mainstream for very good reasons.

marcosdumay7y ago

So... An article based on equating change recording medium with integration testing procedures.

fxfan7y ago

There's a lot of discussion of bazel and co inside sub-comments but i have a question that isn't addressed-

How do the "global build tools" play with language specific build tools?

My primary stack is Rust and Scala. Both have excellent build capabilities in their native tools. How well do pants/bazel integrate with them? I wouldn't want to rewrite complex builds nor would I expect these tools to have 100% functionality of native ones.

laurentlb7y ago

Bazel has some level of support for many languages: https://docs.bazel.build/versions/master/be/overview.html#ad...

I know the Scala rules are used in production by multiple companies. Rust support is improving quickly, but it's not perfect. See the dedicated GitHub repositories for more information.

(I work on Bazel)

benmarten7y ago

Please don't. It's just too slow and not efficient. Instead use common open source best practices of shared library architecture. Problem solved! Putting everything into one repo is just lack of organization and creates a huge mess.

klodolph7y ago

I feel like you've really done no work supporting your argument there. "Slow and inefficient"... what, exactly, is slow and inefficient? Because there are plenty of things slow and inefficient about polyrepos.

I'd say that open-source best practices for shared libraries are appropriate if you're making an open-source shared library. However, these practices are inappropriate for internal libraries, proprietary libraries, and other use cases. In my experience, it's also far from "problem solved". You can point your finger at semantic versioning but in the meantime we go through hell and back with package managers trying to manage transitive library dependencies and it SUCKS. Why, for example, do you think people are fed up with NPM and created Yarn? Or why people constantly complain about Pip / Pipenv and the like? Why was the module system in Go 1.11 such a big deal? The answer is that it's hard to follow best practices for shared libraries, and even when you do follow best practices, you end up with mistakes or problems. These take engineering effort to solve. One of the solutions available is to use a monorepo, which doesn't magically solve all of your problems, it just solves certain problems while creating new problems. You have to weigh the pros and cons of the approaches.

In my experience, the many problems with polyrepos are mostly replaced with the relatively minor problems of VCS scalability and a poor branching story (mostly for long-running branches).

mindcrime7y ago

However, these practices are inappropriate for internal libraries, proprietary libraries, and other use cases.

Why do you say so?

klodolph7y ago

Basically because for certain projects and teams, the effort to package internal / proprietary libraries and other similar dependencies can be much larger than the benefit. Packaging is effort. You decide to cut a release, stamp a version number, write a changelog, package and distribute it, and then backport fixes into a long-running branch.

This effort makes a lot of sense if your consumers are complete strangers who work for other organizations. If your consumers are in the same organization, then there are easier ways to achieve similar benefits. See Conway’s Law. It’s not an accident that code structure reflects the structure of the organization that created it, I would claim that organizational boundaries should be reflected in code. Introducing additional boundaries between members of the same organization should not be done lightly.

One of the main benefits of version numbers is that it tells your consumers where the breaking changes are, but if you have direct access to your consumers’ code and can commit changes, review them, and run their CI tests, then you have something much better than version numbers. If you are running different versions of various dependencies you can potentially have a combinatoric explosion of configurations. Then there’s the specter of unknown breaking changes being introduced into libraries. It happens, you can’t avoid it without spending an unreasonable amount of engineering effort, but the monorepo does make the changes easier to detect (because you can more easily run tests on downstream dependencies before committing).

Cross-cutting changes are also much more likely for certain types of projects. These are difficult with polyrepos for obvious reasons (most notably, the fact that you can’t do atomic commits across repos).

Packaging systems also have administrative overhead. If you shove everything in a monorepo you can ditch the packaging system and spend the overhead elsewhere. These days it’s simple enough to shove everything in the same build system.

Various companies that I’ve worked for have experimented with treating internal libraries the same way that public libraries are treated—with releases and version numbers. Most of them abandoned the approach and reallocated the effort elsewhere. The only company that I worked for that continued to use internal versioning and packaging was severely dysfunctional. One startup I worked for went all in on the polyrepo approach and it was a goddamn nightmare of additional effort, even though there were only like three engineers.

2 more replies

zamadatix7y ago

Too slow as in "to do it" or too slow as in "to use it". In either case I think if that were true there wouldn't be monorepo's at Google, Facebook, and Microsoft. I will say it's true that didn't come for free, e.g. Microsoft had to make GVFS due to the sheer enormity of their codebase but that's already done and works pretty well.

I agree share library style makes more sense in most cases though. The main problem with it is forcing everyone to use the latest library versions but that isn't insurmountable by any means.

mlthoughts20187y ago

My old boss was an engineering manager at Google in the 90s and early 2000s. He used to tell us that _everyone_ he interacted with at Google _hated_ the monorepo, and that Google’s in-house tooling did not actually produce anything approaching a sane developer experience. He used to laugh so cynically at stories or that big ACM article touting Google’s use of a monorepo (which was a historical unplanned accident based on toppling a poorly planned Perforce repository way back when), because in his mind, his experience with monorepos at Google was exactly why his engineering department (several hundred engineers) in my old company did not use a monorepo.

user59944617y ago

His experience from the 90s and early 2000s is meaningless in the current era. Version control and Google were in their infancy.

SVN was first released in 2000. Git in 2008. Branching, tagging and diffing were nowhere near what is possible now.

That goes back to desktop with a disk smaller than a GB, CPU in the tens of MHz with a network so slow and reliable, if you have one at all.

1 more reply

woadwarrior017y ago

I work at one of the monorepo companies that you mention and there’s some truth to the “too slow” part. Although it’s it’s been a lot better lately (largely, due to the efforts of the internal version control dev teams), I’ve noticed at times in the past that you could do a ‘<insert vcs> pull’, go on a 15 minute break and it wouldn’t be done by the time you’re back.

Personally, I think there’s a place for mono repos and there’s a place for smaller independent repos. If a project is independent and decoupled from the rest of the tightly coupled code base (for instance things which get opesourced), it makes no sense to shove it into a huge monorepo.

falsedan7y ago

I hate how these monorepo pieces gloss over the CI requirements. Just checkout the code that's affected by the change? Either you have a shared buid job that adds thoussands of builds a day & matching a commit to a build takes ages, or you have a plethora of jobs for each subrepo and Jenkins eats all the disk space with stale workspaces. And let's not talk about how to efficiently clone a large repo... our big repo took 5 minutes to clone from scratch, which killed our target time of 10 minutes from push to test results. We ran git mirrors on our build nodes to have fresh git objects to shallow/reference clone from to get it down to 30 seconds, and the whole system had to work perfectly or else hundreds of devs would be blocked waiting to see if their changes could be merged.

scaleout17y ago

Last time I work at a massive Monorepo, half of my team was running got fetch as a cron job. It was an extremely painful experience

greenshackle27y ago

It would be quite remarkable if in-house corporate software, which face different constraints and challenges than open source software, turned out to nonetheless have exactly the same best practices.

mindcrime7y ago

The idea of using open source styled practices for internal development is not exactly new or remarkable. It's something people have been doing for a long time.

https://en.wikipedia.org/wiki/Inner_source

j / k navigate · click thread line to collapse

161 comments

sharpercoder7y ago

Though it's true that monorepos without proper discipline can tend towards coupling. Yet, when discussing mono vs poly, we should keep this in mind.

hinkley7y ago

First monorepo I worked on, we used separate compilation units for each 'module'. We paid a tax on build time but it added a bit of friction to adding new cross-module dependencies willy nilly.

lstamour7y ago

benmarten7y ago

It's not, why do I have to checkout terabyte of code that I don't need, even if the code is modularized?

jchw7y ago

jacques_chester7y ago

Again: I wish it was a smooth experience. Because I like the ideas very much. But it wasn't when I tried and I don't know anyone -- outside of Google -- for whom it was a smooth experience.

2 more replies

erulabs7y ago

twblalock7y ago

If I only need to check out one of the smaller repos then I've gained quite a lot in terms of download speed, storage size, etc. Git LFS adds a lot of complexity I'd rather avoid.

1 more reply

nkozyra7y ago

> If a mono-repo has a terabyte of code, or if 10 small repos have 1/10th a terabyte each, what have you really gained?

If it's a small company where every developer touches every part of the application, sure. Taking the FAANG approach if you're not part of that acronym sounds like introducing inefficiency.

1 more reply

skj7y ago

Sounds like a tooling problem. We shouldn't use the current state of tooling as an excuse.

jessaustin7y ago

Isn't the entire argument about the current (or maybe "immediately foreseeable") state of tooling? We don't really care one way or the other, in a philosophical sense. What works?

1 more reply

rhacker7y ago

It's not for everyone, but damn, why is there a TERABYTE of code? Just curious - assets? checking in binaries?

malkia7y ago

Test protos. Evaluated configs. Golden data. JAR archives, etc.

alexnewman7y ago

Signs your build system is never going to be adopted outside of people cargo culting you?

  - [x] Namespaces and the like without much security benefit
  - [x] Giant Java dependency
  - [x] Strange syntax and glyphs

mwkaufma7y ago

We have a perforce monorepo with ~80gb total payload for the whole thing, but everyone uses streams to filter it, so that's not a problem.

hinkley7y ago

I think there's a false dichotomy here.

In the post yesterday one of the arguments was that if nobody checks out all of the code then what's the value of having the code all in one place?

1 more reply

slobotron7y ago

Chances are you will end up downloading a lot of dependencies anyways, why not have git deliver it all?

nkozyra7y ago

Huh? You'd download dependencies for the repos you need, not the code and dependencies for the entire company.

It could be several orders of magnitude larger and with a larger organization could be a lot of unnecessary code that any given Dev may never touch.

1 more reply

jonthepirate7y ago

klodolph7y ago

I have personally migrated a medium size polyrepo code base (something like ~20 repos?) into a monorepo and I agonized over the decision. But it lifted a huge weight off my shoulders.

On the other hand, it's not like Bazel can't handle polyrepos. In fact, they work quite nicely, and Bazel can automatically do partial checkouts of sets of related polyrepos, if that's your thing.

malkia7y ago

BAZEL has WORKSPACE file that can work with multi-repos, but AFAIK things are still rough there, though would get better eventually (I'm bit hand-wavy on the details).

klodolph7y ago

If your project is mostly something like C++ (which has support built-in to Bazel) then the WORKSPACE rules will be much more manageable and partial checkouts become a lot easier.

kokokokoko7y ago

Its almost as if both approaches have positives and negatives. Some of which are more important depending on your project and organization.

I'd be more interested to read about a project or company that failed due to making one choice or the other. And then by switching things to the other way, things were fixed.

Otherwise, as someone who was worked with both, I imagine there are a host of other decisions that will be much determinant on your success.

Let's not get too wrapped up in what color to paint the shed.

natalyarostova7y ago

>Its almost as if

Please don't do this.

ceronman7y ago

woolvalley7y ago

If we had the tooling to do multirepo atomic commits and reviews then maybe we would of stuck with polyrepos, but it doesn't really exist out in the wild, so monorepo it was.

albedoa7y ago

> Unmigrated semver breaking changes are a form of technical debt

I'm not sure what this has to do with the mono/poly discussion.

woolvalley7y ago

mlthoughts20187y ago

One example that has frustrated me personally is when working on machine learning projects that require complex runtime environments with custom compiled dependencies, GPU settings, etc.

falsedan7y ago

> each repo is free to use whatever CI & deployment tooling it needs, with absolutely no constraints based on whatever CI or deployment tool another chunk of code in some other repo uses.

jacques_chester7y ago

> except that separating the PRs makes each isolated set of changes more atomic and focused

It makes it less atomic if you need simultaneous changes in multiple repositories.

mlthoughts20187y ago

> It makes it less atomic if you need simultaneous changes in multiple repositories.

No, each individual set of changes is more atomic (smaller in scope, mutating a system from one state of functionality to a new state of functionality).

1 more reply

joshuamorton7y ago

Unless you mean your presubmit test would push to production machines, that's bad and shouldn't be allowed, but again has nothing to do with a monorepo.

mlthoughts20187y ago

If a tool begets using it wrong all the time, then after a certain point, it’s the tool’s fault.

1 more reply

pdpi7y ago

Can we just move along and get to "Monorepo: Maybe do it, maybe don't. Just think it through and own your decision"?

0xFACEFEED7y ago

That's what most people do. They just don't blog about it.

sigil7y ago

All this forcing people to do things the Right Way (my way) is surely part of the pushback against monorepos.

But set that aside for the moment. Let's suppose defaults should force people to do things the Right Way, and that we also know what the Right Way is.

holoway7y ago

It's not that it's a single right way to do it. There isn't, and anyone who tells you there is has something to sell you, or is inexperienced enough to not have seen enough of the problem domain.

sigil7y ago

holoway7y ago

Disclaimer: it depends. :) Since that's not a good answer at all, I'm going to write the rest of this as if I have the answer, even though I know I do not, because it's deeply situational.

1 more reply

senderista7y ago

Actually Windows wasn’t a monorepo back then: there were separate repos for the shell, kernel, filesystem, etc. Hence the need for cross-repo tooling like “sdx”.

Source Depot was great (modulo availability issues), but I don’t think they got anywhere near the scale of Piper.

1 more reply

mmmeff7y ago

Fighting back against monorepo design is dangerous - embrace experimentation.

shados7y ago

> Fighting back against monorepo design is dangerous

That's dangerous.

Benjammer7y ago

The real danger here is anyone talking about any system architectures or tooling as "dangerous" (or "not dangerous") absent any other context...

What do you even mean by "dangerous"? To a business? To your health?

sebastos7y ago

2 more replies

jessaustin7y ago

This complaint belongs one level higher in the thread.

weberc27y ago

Have you (or anyone reading this thread) encountered similar issues? How do you solve them in a monorepo?

zer00eyz7y ago

> (and unless I'm mistaken, it seems to assume that all of your code lives in your repository--pulling dependencies from Pypi seems like a fourth class citizen).

My feelings here are apart from your tool of choice (Pypi) so read them with that in mind.

shados7y ago

> Why are you dependent on 3rd party code that isn't in your repo

Not the parent, but for us, the 3rd party code is in a private package manager (artifactory, private npm, whatever). Having thousands of libraries we didn't write in our repo doesn't sound like fun.

1 more reply

est317y ago

Git had a sparse checkouts feature since a long time, but it only affected the checkout itself, all the blobs would still be synced.

[1]: https://github.com/git/git/blob/master/Documentation/technic...

dangoor7y ago

From earlier discussions around monorepos, I saw references that Google, Facebook, and other large monorepo orgs have been making use of Mercurial.

est317y ago

Yes, Facebook is mercurial based to my knowledge. Google is using its custom solution called piper I think: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

itsdrewmiller7y ago

https://vfsforgit.org/ is another option here - MS-originated and Github is adopting it - https://venturebeat.com/2017/11/15/github-adopts-microsofts-...

malkia7y ago

You can certainly achieve this with Perforce, SVN, HG, any repo system there too.

Linux: FUSE + ?

Windows: Dokan? CBFS? Or the new fangled https://docs.microsoft.com/en-us/windows/desktop/projfs/proj... which VFSForGit uses

e3b0c7y ago

Monorepo could be a decent choice if your software stack does not require too much external dependencies. Or more precisely, the ratio of own code to the third-party code is reasonably high.

totallysnowman7y ago

I think that the reason of the argument is that both authors understand the definition of "large repository" very differently.

With 100 engineers a monorepo might seem a good idea. With 500 it becomes nearly impossible to do anything involving a build. Some isolation is needed.

That being said I am more inclined to polyrepos.

thurn7y ago

the fact that essentially 100% of big tech companies use monorepos seems like evidence that it is at least possible to do it in a scalable way...

shados7y ago

Today, not quite. I work for a multi billion dollar tech company and we have several thousand repos (and it's awesome)

user59944617y ago

Not true. Google, Facebook, Goldman Sachs and JP Morgan, all companies that run mono repos and predate git by very far.

Git cannot checkout sub directories and it slows down exponentially with the number of branches. It's the opposite of what is needed to run a mono repo in a large company.

1 more reply

influx7y ago

Amazon does not use a monorepo, so you might want to rethink your "statistic".

senderista7y ago

AMZN doesn’t, unless things have changed drastically in the last 3 years.

denimnerd427y ago

yeah by writing custom version control software. Am I going to convince my company to do that (which has like 50k software engineers) probably not.

rabidrat7y ago

How many companies have 50k software engineers? Seems like the handful that do, should do whatever works for them. The rest of us can just use a monorepo.

pbalau7y ago

> With 500 it becomes nearly impossible to do anything involving a build.

Both FB and Google have more than 500 devs and are using a monorepo.

hocuspocus7y ago

At what cost? Both FB and Google employ hundreds of devs to work on internal tooling only. For most companies this isn't feasible.

ceronman7y ago

0xFACEFEED7y ago

Source? Sounds like you're pulling that number out of thin air.

pbalau7y ago

Afaik, fb employs less than 20 people to take care of the monorepo.

woolvalley7y ago

totallysnowman7y ago

The 5 apps and their 25 libraries are in fact one entangled monolith. That is what I am talking about. It is only marketed as 5 apps. That has some hundreds external dependencies, of course.

skybrian7y ago

I wonder if a star pattern would work, where you have a single, shared repo for all your libraries and a repo for each app.

This would help people working on smaller apps, since they don't need to look at other apps unless they're working on shared library code.

ceronman7y ago

skybrian7y ago

I don't see why you'd need semver. The apps could sync to a particular commit in the library repo.

Too7y ago

What you propose is just a fake monorepo, containing the global policy of allowed version of X, disguised as multiple subrepos. OP discusses this.

mindcrime7y ago

In my experience, this is about as close as you can get to a "best of both worlds" situation. But, as the author of TFA also says, you absolutely can make either approach work.

[1]:https://en.wikipedia.org/wiki/Inner_source

falsedan7y ago

lhorie7y ago

> poly-repo approach, with a designated owner for discrete modules, but where anybody can clone any repo, make a proposed fix, and submit a PR

jacques_chester7y ago

> As the old saying goes "when everybody is responsible, nobody is responsible".

Here's my rule: You break it, you fix it.

> I'd rather see a poly-repo approach, with a designated owner for discrete modules, but where anybody can clone any repo, make a proposed fix, and submit a PR.

sixstringtheory7y ago

> You break it, you fix it.

This winds up leading to more crap-stacking just to get the work done ASAP and you wind up with a mountain of tech debt.

I like the zero flaw principle where new feature work stops until all currently known flaws are fixed. Then everyone is forced to pitch in and responsibility is shared whether you want it or not.

jacques_chester7y ago

I'm accustomed to collective ownership where, ideally, this never happens and in practice happens rarely (followed by the little closet being torn out and replaced).

> I like the zero flaw principle where new feature work stops until all currently known flaws are fixed.

I agree: stop the line. But I think it's orthogonal to the sins or virtues of n-repology.

benmarten7y ago

I couldn't agree more!

Rapzid7y ago

Any good mono repo build tools out there? I've been thinking about this for the past few weeks. Considering creating a general purpose monorepo tool chain and potentially a mono repo first CI system.

Unfortunately some of the most popular CI/CD services out there(Travis, Circle, etc) don't even support cross-repo pipelines, much less mono repo builds.

fxfan7y ago

Pants and bazel sound like favorites

Rapzid7y ago

Interesting, thanks! Didn't realize Bazel was open sourced..

Some things are also on a "critical" path while others can run async given the context(branch, tag, etc)...

I'm rambling though.

jpeeler7y ago

Does anyone know how please (https://please.build) compares?

cryptonector7y ago

I agree, use a monorepo. I anxiously await MSFT's git megamonorepo functionality. Until then there's things like git meta[0].

[0] http://twosigma.github.io/git-meta/

luord7y ago

Yet another chapter in one of the big flamewars. Seeing as I fall in the monorepo camp, I must say I mostly agree; also, I much prefer this tone for an article.

coldtea7y ago

>But I think Matt’s argument misses the #1 reason I’ve flipped quite hard to a monorepo perspective as my own level in the organization has gotten higher

Perhaps the fact that since their level was now higher, they wouldn't have to deal with the nitty gritty details and pain of working with a monorepo as a developer?

E.g. I wasn't for it when I was a dev, but now that I can just impose it on others, I love it. Same with how various 'development process' rituals are adopted...

Tempest19817y ago

How does the library team know which consumers a commit may break? What tools are recommended?

AzzieElbab7y ago

As engineers we spend wast amounts of time in constant search for a rival to "tabs vs spaces" debate

randyrand7y ago

The more complicated answer is sometimes you should use a mono repo and other times you shouldn't.

rdsubhas7y ago

This is starting to get a debate of "principles", like forcing A and B to talk, or forcing A and B to have more explicit boundaries, and so on. Guess where that ends (hint: it doesn't).

Then the deployment. Same again. There is no "just works" hosted CI or hosted git or anything for monorepos. People still dabble with concourse or so on.

And guess what, for a component in its own repo, you don't need to do anything. Existing industry and OSS tooling is built from ground up for that. Just go and use them.

Instead, if you start by putting one component in one repo, you keep scaling for _much longer_ before you hit a barrier.

marcosdumay7y ago

So... An article based on equating change recording medium with integration testing procedures.

fxfan7y ago

There's a lot of discussion of bazel and co inside sub-comments but i have a question that isn't addressed-

How do the "global build tools" play with language specific build tools?

laurentlb7y ago

Bazel has some level of support for many languages: https://docs.bazel.build/versions/master/be/overview.html#ad...

I know the Scala rules are used in production by multiple companies. Rust support is improving quickly, but it's not perfect. See the dedicated GitHub repositories for more information.

(I work on Bazel)

benmarten7y ago

klodolph7y ago

In my experience, the many problems with polyrepos are mostly replaced with the relatively minor problems of VCS scalability and a poor branching story (mostly for long-running branches).

mindcrime7y ago

However, these practices are inappropriate for internal libraries, proprietary libraries, and other use cases.

Why do you say so?

klodolph7y ago

2 more replies

zamadatix7y ago

I agree share library style makes more sense in most cases though. The main problem with it is forcing everyone to use the latest library versions but that isn't insurmountable by any means.

mlthoughts20187y ago

user59944617y ago

His experience from the 90s and early 2000s is meaningless in the current era. Version control and Google were in their infancy.

SVN was first released in 2000. Git in 2008. Branching, tagging and diffing were nowhere near what is possible now.

That goes back to desktop with a disk smaller than a GB, CPU in the tens of MHz with a network so slow and reliable, if you have one at all.

1 more reply

woadwarrior017y ago

falsedan7y ago

scaleout17y ago

Last time I work at a massive Monorepo, half of my team was running got fetch as a cron job. It was an extremely painful experience

greenshackle27y ago

It would be quite remarkable if in-house corporate software, which face different constraints and challenges than open source software, turned out to nonetheless have exactly the same best practices.

mindcrime7y ago

The idea of using open source styled practices for internal development is not exactly new or remarkable. It's something people have been doing for a long time.

https://en.wikipedia.org/wiki/Inner_source

j / k navigate · click thread line to collapse