Monorepo or Multirepo? Role-Based Repositories (opens in new tab)

(blog.7mind.io)

141 pointsoptician_owl6y ago115 comments

115 comments

The main driver of success in either model is in the tooling and practices invested in it to make it work in an organization. Google is successful with their monorepo because they have invested in building (blaze), source control (piper, code search), and commit to always developing on HEAD. Multirepo is currently easier for most companies because most public tooling (git, package manager) is built around multirepos. One place I see multirepos fall over is awful dependency management practices internally and in open source. Many dependencies quickly become outdated and are not updated in cadence, slowing down writers and consumers. Better tooling can help here but an organization needs real discipline to stay on top of things.

gnachman6y ago

When I started at Google the tooling was not very good, and monorepo was pretty painful. They used perforce, and it simply couldn't keep up. Commits could take minutes. Code review was also unbearably slow. Blaze didn't exist yet; just before I started they had tools that generated million+ line makefiles that everyone hated. So yeah, you need good tooling, but even Google didn't have it for the first ~10 years of its existence.

tomrod6y ago

I went to graduate school with a guy who ended up on Google's infrastructure team. He'd previously worked as head or lead dev for subversion (a little hazy on the details).

We worked on a small project where we put together statistical measures for codebases. It was a lot of fun, even if the infrastructure was out of my wheelhouse at the time.

Folks that can manage billion-line codebases are on a whole different level I think. I wonder sometimes how many folks like that there are.

EDIT: Looks like he left for a bit and is now back. Good on him!

username906y ago

I wonder why nobody have made a good public monorepo offering similar to what Google have internally. Would probably be a hit at many companies since it fixes so many issues related to working in very large teams.

dmoy6y ago

At large enough scale, it causes a lot of problems and breaks like every other dev tool requiring a lot of work to get it back together.

That said, there are some open source pieces to help. Facebook open sourced their mercurial stuffs so you can get version control at scale (and before then you just use perforce). Google open sourced bazel. Google open sourced some parts of the underlying infra behind code search, but not enough to really work properly. And of course lower level there's a plethora of reasonable db offerings, etc.

It would still require a lot of glue though.

username906y ago

It is just that Googles tooling around code works really well together. Code search to view code and directory based history so you aren't swamped by others commits, tap to run all unit tests all the time but with sectioned projects so you don't run all tests on every presubmit, sponge to gather every test log ever (even for the tests you run locally) so you can link full test logs to coworkers when you have problems (also note that the logs source file links are actual links into code search), critique for easy versioned code reviews where you can diff code between any sets of comments so you can see its evolution and running presubmit checks and sponge links for tests, blaze to make a structured directory dependency management system to make partial checkouts and distributed cached builds work well.

I'd like a set of tightly coupled tool like that working outside of Google, but I guess it might be a just a dream, it is a bit too big of a project.

1 more reply

Cedricgc6y ago

Now that Git Virtual File System (VFS) is coming to GitHub, I think we shall see an uptick in monorepo adoption. Though what I said early about compatible tooling still applies. The repo style affects so much of the tooling, from Continous Integration, deployment, building, versioning, etc.

It really is not a small difference.

fragmede6y ago

Git VFS, to allow monorepos to be operable at enterprise scale (courtesy of Microsoft) is out there, as is Uber's white paper on ML to predict branch merge compilation success - an actual problem at enterprise scale. That doesn't cover all issues but every (huge) company's workflow is different, thus there's no "one size fits all" that's totally suitable - part of git's success is that it fits into whatever existing workflow a company already has. So I'm not sure the full suite of Google's tools would be at hit at all companies, especially anything smaller, and less dedicated, than. Google. Having a team to work on and another to operate, Bazle (and the internal Google project it springs from); not all enterprises are willing to staff that work, and definitely not as strongly as Google.

OneMoreGoogler6y ago

You know Google's monorepo is successful because they have so many of them! google3, Android, Chrome browser, ChromeOS...

Kidding aside, my point is Google recognizes obvious boundaries between e.g. their web stuff and android, and organizes their code accordingly.

alexnewman6y ago

Google uses no versioned libraries?

Cedricgc6y ago

Libraries internal to Google are kept at the latest and consumers are updated to use the latest APIs. 3rd party libraries are checking into the monorepo at a specific version and everyone uses the same version.

kyrra6y ago

It's documented here

https://opensource.google/docs/thirdparty/

1 more reply

chrisseaton6y ago

I think they do, but there is only ever one single version in use - the version in the repo.

minxomat6y ago

Yes, and that prevents duplication of work when it's time to upgrade: https://github.com/microsoft/TypeScript/issues/33272

user59944616y ago

If so, they would have a massive problem upgrading libraries like numpy because there are too many and too big breaking changes between releases.

2 more replies

011000116y ago

Monorepo shortcomings 1 and 2 seem like bullshit to me. Perforce, the popular monorepo at most companies I've worked at, supports access control. Monorepos do not prevent you from segmenting your code into modules and pushing binary/source packages into source control so that builds can avoid compiling everything(TiVo used to do this, and it worked well when you got the hang of it).

I feel like these debates are often fueled by false arguments. Either way you go, you're going to want to build support tools and processes to tailor your VCS to your local needs.

jsnell6y ago

VCS access control are the wrong tool for solving the "people use code they shouldn't" complaint.

First, VCS ACLs will massively reduce the benefits you're supposed to get from a monorepo. How will you do global refactors in that kind of a situation? How does a maintainer of a library figure out how the clients are actually using it? (The clients must have visibility into the library, but the opposite it unlikely to be true.)

Second, let's say that I maintain a library with a supported public interface that's implemented in terms of an internal interface that nobody's supposed to use. How will VCS ACLs allow me to hide the implementation but not the interface? When they kick off a build, the compiler needs to be able to read the implementation parts to actually build the library. It can't be that the clients have access to read the headers but then link against a pre-build binary blob. At that point you don't have a monorepo, you've got multirepos stored in a monorepo.

The actual solution are build system ACLs. Not ACLs for people, but ACLs for projects. Anyone can read the code, but you can say "only source files in directory X can include this header" or "only build files in directory Y can link against this object file".

011000116y ago

VCS ACLs can allow for read-only access. You can also split public interfaces into their own header. If you want the maintainer of a library to be able to refactor clients of the library, then you have to grant them access to the client code. How does a multirepo solve this issue?

> How will VCS ACLs allow me to hide the implementation but not the interface?

If you don't give people access to the code, they can't build it. So what? Publish pre-built binaries from your CI system back to source control.

> At that point you don't have a monorepo, you've got multirepos stored in a monorepo.

I think it's a spectrum. It would be stupid to dogmatically stick to either extreme. You modify things in a pragmatic fashion to solve the problems you're facing. In my experience, starting with a monorepo and making exceptions as needed has worked better than the alternative.

Your post sounds similar to a lot of the multi/mono repo discussions. You've focused on one problem and one way to solve that problem without considering that there are many ways to work around it. Neither approach is going to be pain-free and both require tooling for special scenarios.

alexhutcheson6y ago

Bazel has this via the 'visibility' attribute on packages and build rules: https://docs.bazel.build/versions/master/skylark/build-style...

pshirshov6y ago

> VCS access control are the wrong tool for solving the "people use code they shouldn't" complaint.

I agree

> The actual solution are build system ACLs.

Or, maybe, better languages enforcing better design. In most of the cases artifacts and libraries are not related to the domain, engineers create them just to establish artificial boundaries between code components, isolate irrelated things, enforce encapsulation and avoid accidental mixing of metalanguages.

It would be lot better to have a smart compiler for this.

A tool which can prevent us from mixing different abstraction layers, creating unneccessary horizontal links between our components, etc, etc.

I have a couple of ideas how such a thing may look like.

pshirshov6y ago

> Monorepo shortcomings 1 and 2 seem like bullshit to me.

It's a blogpost and the author didn't try to build a total and exhaustive formal system. These shortcomings are not absolute truth but actually they are true.

I've seen this multiple times: a small projects evolves over years into a monster. Engineers add new components and reuse any other components they may need creating horizontal links. At some point they feel like they lost their productivity and they blame monorepo because it's easy to create horizontal links in a typical monorepo. So, they try to build a multirepo flow and they spend a lot of effort, time and money trying to make it working. At some point they feel that their productivity is even worse than it was before because now they need to orchestrate things so they merge everything back.

Same applies not only to VCS flows, but to system design as well.

When we discuss monolith/microservices controversy all the monorepo/multirepo arguments may be isomorphically translated to that domain. What is better, monolithic app or a bunch of microservices? A role-based app of course: https://github.com/7mind/slides/blob/master/02-roles/target/...

jyounker6y ago

Monorepo/multirepo and monolith/microservice are orthogonal concepts. When organizations don't understand that then they may end up building a distributed monolith in across multiple repos. (The "Distributed Big Ball of Mud" anti-pattern.)

Monorepo advocates are typically advocating for microservices, but within a single code base.

The way you provide access control is through code review and build system visibility.

In order to modify another group's code you require their approval on the review for that section of the code base. (Using mechanisms like github/gitlab owners files or rules within upsource.)

This still means that if one group needs to make extensive changes to another groups code, the path of least resistance may be to fork it into your own group's section of the repo.

Build tools provide another point of control. If you're using a tool like bazel, the way you link to a component in another portion of the repo is through target names. The only targets your code will have access to are those that the owners has declared as being available for external builds.

pshirshov6y ago

> Monorepo/multirepo and monolith/microservice are orthogonal concepts.

Yes and no. In both the cases it's a story about components and their isolation.

> they may end up building a distributed monolith

Yup, seen that many times.

> Monorepo advocates are typically advocating for microservices, but within a single code base.

I'm avocating roles. Everywhere.

> If you're using a tool like bazel

If only Bazel supports Scala well enough...

1 more reply

jumpingmice6y ago

All of the supposed flaws of a monorepo in this article are actually flaws of git. This is a very common phenomenon. I often joke there are two kinds of developers: those who prefer monorepos and those who have never used perforce.

jayd166y ago

This is all true BUT I think the monorepo as described here is the act of treating all your projects as directly referencing each other.

Sure you could just use a manyrepo style of dependency tracking in a monorepo but I think that's not exactly what the author is exploring.

jyounker6y ago

> This is all true BUT I think the monorepo as described here is the act of treating all your projects as directly referencing each other.

From what I read that is a correct assessment. What the OP is proposing is something of a strawman argument. No advocate of monorepos I've ever met believe that a monorepo should imply a monolith.

Generally they're advocating monorepos in order to develop microservices faster, and with less effort. Using a monorepo and the associated tooling side steps the pain that comes from complicated CI, the difficulty of sharing code, the difficulties of non-atomic cross-repo reviews, and the difficulties of making multi-app refactorings.

weberc26y ago

Can you elaborate on “monorepos do not prevent you from checking packages into source control” and how that helps to avoid recompiling everything? Why would you check a package into source control anyway? Surely source control is for source code? And I lean toward monorepos, btw, but there are still lots of obstacles and monorepo proponents don’t tend to acknowledge them or offer clear suggestions for how to solve or workaround them.

jayd166y ago

You can use something like a shared binary repo such as maven or you could just check in dependencies and not worry about an external server being available for builds.

>Surely source control is for source code?

This is just pedantry. Checking in binaries is a pragmatic solution that solves a lot of problems.

yjftsjthsd-h6y ago

I was rather under the impression that checking in binaries was discouraged because it led to performance issues and tends to blow up the repository size. I don't think it's just pedantry.

weberc26y ago

I wasn’t trying to be a pedant, I’ve just never heard of anyone doing this. I was wondering how it helped solve the problem of not rebuilding everything.

1 more reply

username906y ago

At Google we check in the source of every library into the monorepo and compile them ourselves with cached builds from a central server, I don't think we use package managers.

011000116y ago

You don't have to use a package manager, that's just the approach the TiVo folks came up with a couple decades ago. They use RPM to package independent software modules and check them into (IIRC) a separate build repository which saves the last n months of work. A local config file is used to choose the binary package version to use, or, alternatively, the locally built files to use. They probably could have just made tarballs, since I don't think they used any of the dependency checking.

jayd166y ago

How do you track dependencies of dependencies. Do you need to manually add the full dependency tree and re implement the dependency tracking through your internal system? If a project uses maven or gradle, you need to rewrite those files to point to your internal builds instead?

4 more replies

tom_6y ago

It's version control, not necessarily just source control! If something could benefit from being versioned, why would you not check it in? You then guarantee everybody has the same version. That's exactly what this thing is there for.

Git's design can limit its usefulness in this respect - though perhaps you could solve this to some extent with git LFS? - but not all version control systems have this problem.

fragmede6y ago

git annex (or git LFS, if you buy into github's NIH) is requisite if you want to use git like this, broadly. git will happily store any and all binaries you ask it to, but upon (blind) checkout, it will grab every single revision of said binary, taking up as much however much space that takes.

(partial clones avoid this, but, as git isn't designed for this use case, grabbing all of history happens far too easily.)

pricechild6y ago

I regularly join projects where someone has decided to place the project's code in half a dozen different repositories.

Even though it's one project.

Even though they refuse to allow a release of a single component - it must all be released together without forwards/backwards compatibility.

I think most of of the time, the mono/multi debate is spoiled by people who feel they can have their cake and eat it too.

megous6y ago

I think that whether to use mono/multi repo depends on whether you're willing to dump money into updating everyting at once, or not. If not, monorepos are really a big hindrance. It's better to split on the project boundary (things that may have different development paces), and use git worktree for having different versions of libraries checked out for building/bundling.

It works fairly nicely with meson, as you can simply checkout a worktree of a library into a subprojects directory, and let individual projects move at their own paces even if you don't do releases for the libraries/common code.

It's not really clear why having to update every consumer in sync with library changes is beneficial. Some consumers might have been just experiments, or one off projects, that don't have that much ongoing value to constantly port them to new versions of the common code. But you may still want to get back to them in the future, and want to be able to build them at any time.

It's just easier to manage all this with individual repos.

pricechild6y ago

> whether you're willing to dump money into updating everyting at once

I think the majority of projects in this world only update everything at once. They haven't investing in testing, sensible api's and testing to allow updating small pieces of their solution.

From my experience, I also think the majority of people who think they have a library and need multi repos to deal with that, don't have a library.

To further clarify, one user of your library means you could stop pretending you have a library and avoid the pain.

I don't mean to insist these problems do not exist, I simply don't think many people have them.

megous6y ago

Project != company. Project != consultancy.

Monorepo just didn't work for me. I have ~10 web projects for different customers + my personal projects that use various versions of some common code.

It doesn't make any financial sense to evolve my common code by updating all the customer's code for free when they're not even asking for it. So on this level it doesn't work.

Even within a single company with just two devs and around 90 repos for the main product and plugins, it was hard to justify making a mono-repo with plain git, because plugins and the main app had different release schedules, priorities, so it never really happened that it was economical to port all plugins right away to the new version of the main app on every change.

I still think going multi vs mono is a business decision, rather than a technical one. You'll have to have special tooling for either case, just a different one.

1 more reply

mattbillenstein6y ago

One of the things I've done at a couple companies now is flatten multi into mono - it just simplifies everything, it's all deployed as one unit, so easier to track and do changes across different parts of the code base in unison.

I have typically left mobile iOS/Android in separate repos however - they have a different deployment cadence, so you need to manage breaking changes differently anyway.

djhaskin9876y ago

There's a lot of people on here defending their current workflow, whatever that is.

I for one find it refreshing that people are willing to think about different workflows, even if they are different.

It feels like what is described is a cross between a good language package manager and git submodules. It's an interesting space to explore, because a lot of nice things come out of submodules, but it's not a proper package manager.

A proper dependency manager that puts code in a workspace and manages it as you are working on it in a non clunky way is not something we have right now and may be a game changer. Thanks for sharing to the authors.

scarmig6y ago

I'm curious: how would most people here define monorepo vs multirepo?

On the surface, most people seem to think of a monorepo as a source control management system that exposes all source code as if it's a traditional filesystem accessed through a single point of entry. Multirepo, in contrast, seems to be about multiple points of entry.

But that's a superficial and uninteresting distinction. All the hard parts of managing code remain for both and, for a sufficiently large organization, you'll still need multiple dedicated teams to build tooling to make either work at scale. All the pros listed in the article need a team to make them work for either approach, and all the cons are a sign that you need a team to be make up for that deficiency for either approach.

Aesthetically a single point of entry appeals to me, in that it allows for a more consistent interface to code. But I'd go for good tooling above that in a heartbeat.

givehimagun6y ago

I've shifted to focusing on repo == team. If your organizational structure is to have many little teams that are independent from each other, then you build your source code management to reflect that.

I built my engineering staff to focus on any of the initiatives that my boss hands to me (changes week/week) - so we went monorepo so we could move between those projects/apps/programs quickly.

We knew that we didn't want to pay the maintenance cost just because microservices/multirepo was a buzzword AND we wanted future ventures to get faster (example: we solved identity for authn/authz once and now every app that needs it after can leverage it and we can upgrade identity and all of its consumers in one pull request).

huherto6y ago

This is my conclusion too. Team becomes fundamental entity and projects/products belong to teams. Everything the team produces stays on the team's repo. So you always know which team owns what. It is also easier to review, supervise and clean up.

Too6y ago

Sounds like that would enforce Conway's Law a lot and make cross team collaboration more difficult.

dodobirdlord6y ago

It's easy to use a monorepo in a way that feels like a multirepo, and vice versa. I'm inclined to say that the defining difference is around versioning. To put it another way, can you choose to ignore that your dependencies have upgraded?

In a monorepo your builds are at the same point in time horizontally across all of your dependencies. You build together or not at all (though not necessarily at HEAD). In a multirepo you have the option to build against any (or some subset of) point-in-time snapshots of your dependencies on a dependency-by-dependency basis.

If you have a single monorepo that all of the code is in, but your build system allows you to specify what commit to build your dependency build targets at instead of forcing you to use the same commit as your changes, you actually have a multirepo. If you have a bunch of repos but you build them all together in a CI/CD pipeline that builds each at it's most recently released version then you actually have a monorepo.

ChristianBundy6y ago

Why not both? I've been using https://github.com/mateodelnorte/meta and having a great time so far, it's just that GitHub (and others) don't have a simple way to bundle multi-repo commits in pull requests.

punkdata6y ago

I agree with Christian... Why not both? Lots of teams I interact with have great reasons for a monorepo which they admit requires some work in tooling and processes and claim they're successfully releasing software faster with less effort if their code lived in disparate repos. I believe teams must choose the appropriate patterns that work best for their architectures and situations.

jayd166y ago

Whats the current state of git submodules? It seems like you could get some of the benefits of mono-repos in that you can reference dependency projects directly like a mono-repo. You can, in theory, treat many projects like a single code base.

I don't see it used very often though. Why not?

avisser6y ago

Even with submodules it's still a PR per repo. Global, atomic changes are super powerful.

likeliv6y ago

There is another tool called git-subtree that should solve these problems, I think. But I've never seen it in use

Skunkleton6y ago

git subtree is just a wrapper around subtree merges, I don't think they solve the same problem as submodules.

jayd166y ago

Good point! Hmm. Maybe git(hu|la)b could solve this with some kind of pull request batching.

vogre6y ago

Because if some change requires you to make commits to 3 repos, submodule approach will require to make 4 commits instead. Very annoying.

kerng6y ago

In one of my first jobs like 15 years ago at a large software company we had just moved to a monorepo.

It was introduced to counterbalance what many saw as a big mess. Result was a lot of process being introduced which slowed everything down, but that was probably necessary at that stage. To my knowledge the company keeps switching back and forth- but new projects that need to move fast typically are done independently still.

Stevvo6y ago

I would expect you need really good training in place to make it work. e.g. Microsoft uses a git monorepo for the Windows codebase; obviously that is not something you could just come in on and do a "git clone" as you might on a small project.

philwelch6y ago

I bet you could address this with a third approach: metarepo. The metarepo is a repo that uses sub modules to combine your multi repo ecosystem into a simulated monorepo. The metarepo is what ultimately gets built and deployed—no versioned dependencies to manage. Local development usually happens at the multirepo level, and the metarepo is managed mostly via CI.

amelius6y ago

Good idea. But what gets checked in at the metarepo level? The names of the branches that are checked out in the submodules under it?

Can you have two metarepos, each with its own set of checked-out branches of the same original submodules?

philwelch6y ago

I believe submodules track specific branches.

jesse_m6y ago

Also Google's repo tool. I like the idea of manifests and the commands it includes for creating feature branches across repos

intellix6y ago

For TypeScript/JS projects I think NX CLI is pretty awesome as it handles multiple frameworks

ledneb6y ago

So, in a monorepo world, isn't it often that you have to deploy components together, rather than "it's easy to"? How are services deployed only when there has been a change affecting said service? Presumably monorepo orgs aren't redeploying their entire infrastructure each time there's a commit? Are we taking writing scripts which trigger further pipelines if they detect change in a path or its dependencies? How about versioning - does monorepo work with semver? Does it break git tags given you have to tag everything?

So many questions, but they're all about identifying change and only deploying change...

peterwwillis6y ago

Each service has its own code directory, and there's one big "shared code" directory. When you build one service, you copy the shared code directory and the service-specific directory, move to the service-specific folder, run your build process. The artifact that results is that one service. Tagging becomes "<service>-<semver>" instead of just "<semver>". You may start out with deploying all the services every time (actually hugely simplifies building, testing, and deploying), but then later you break out the infra into separate services the same way as the builds.

smallnamespace6y ago

> Are we taking writing scripts which trigger further pipelines if they detect change in a path or its dependencies

Unless one enforces perfect one-to-one match between repo boundaries and deployments, this is also an issue with multirepos.

In practice, it's straightforward to write a short script that deploys a portion of a repo and have it trigger if its source subtree changes and then run it in your CI/CD environment.

bechampion6y ago

I worked in a big bank in the UK using monorepo "cuz Google uses it", error number 1, your not Google. The clones were gigantic, Jenkins would timeout cloning the whole project when all it needed was a bunch of files. Merge conflicts all over the place, but the best part, we had scripts on our pipeline literally removing folders after cloning the repo to avoid automatic inclusions of libs etc. In my opinion separation of boundaries is one of those things that should t be mess with.

kyrra6y ago

Monorepos with Git don't play together nicely. Perforce is key if you have lots of devs on a monorepo.

jayd166y ago

I don't understand the isolation difference. You can hide, protect and branch code in a monorepo so why is isolation a concern?

swsieber6y ago

It depends on which VCS you use. Git for example, doesn't have any native support for hiding or protecting code in particular folders within the repository.

jayd166y ago

Hmm seems unfair to judge monorepos on what git is capable of. I hate perforce but it accomplishes this easily.

damnyou6y ago

For all its benefits, Git has plenty of limitations that don't exist in other systems.

beagle36y ago

git has hooks for access control (which is how e.g. gitolite manages permissions, albeit at the whole repo level - I'm not familiar with an open-source hook that does directory level).

With respect to hiding, git has sparse checkouts that can give you a limited view of a repository (for performance reasons - not for security reasons)

But that's just today's git. Other VCSs like perforce provide much finer grained access control and hiding.

edoceo6y ago

We do multi-repo. It makes it a little slower, cause we have to get commits into our common libs repos (there are two) before we can do app/product repos updated. Using the environment package manager (composer, nom, yarn) rather than git-sub-module helps a lot.

solarengineer6y ago

GoCD provides “fan in” which supports monorepos

https://docs.gocd.org/current/advanced_usage/fan_in.html

akhilcacharya6y ago

Amazon does multi-repo. I don't see what the problem or debate over this is. We seem to be handling it pretty fine despite a massive-scale SOA architecture.

dodobirdlord6y ago

The multi-repo pattern certainly meshes well with Amazon's team structure, and of course integrates well with the build system and deployment system, given that they were created around it. But "handling it pretty fine" seems like a stretch.

When last I was there things were finally beginning to burst at the seams. Platform architecture migrations were failing or being abandoned over too many untracked dependencies on specific versions of platform-provided libraries. (RHEL5, anyone?) Third-party had become a jungle of unmaintained libraries with dozens of versions that nobody ever upgraded, that may or may not have security vulnerabilities or known bugs, and many teams hadn't released new versions of their clients/libraries into Live for years in fear of breakage. The Builder Tools team was talking about giving up and abandoning both Brazil and Live as unsalvageable. Framework teams (Coral) were throwing their hands up in the air about how Coral-dependent services would not be able to upgrade to Java 11 without fixing a bunch of breaking changes that they would never agree to fix. The solutions being proposed to these problems by the Builder Tools team looked a lot like moving toward a monorepo, at least conceptually.

catalogia6y ago

When I was there, they were migrating away from perforce because they could no longer scale perforce fast enough to meet demand. I've not seen this talked about much outside of Amazon.

It was also a huge day-to-day quality of life improvement for the users (the developers.) There are UX problems with git, but they pale in comparison to the UX problems with perforce which is truly unpleasant software.

blandflakes6y ago

The Alexa division migrated aggressively to git as soon as it was available and nobody publicly voiced any regret about losing perforce.

catalogia6y ago

Several of the people I worked with at Amazon were skeptical of git, at least initially. Some people prefer tools they already know, prefer the routine and habit over learning a new tool. And I totally respect that by the way, git's UX is superior in mainly aesthetic ways, in terms of tactical productivity it's more of a wash. I still think git has the edge, but there is nothing to say a seasoned developer who's used perforce for years isn't being exceptionally productive with it.

Nearly everybody I talked to about it eventually came around to prefer git though. Once you've been forced to swallow the bitter pill of learning something new and changing your workflow, I think the advantages begin to shine through.

On the other hand, maybe I'm just biased because I was proficient in git years before I was ever exposed to perforce. So maybe it was myself who was balking at learning something new, and that's why I was so relieved when my team switched to git. But I do genuinely believe that git has a superior UX.

1 more reply

sthomas16186y ago

I'm curious about those who use a monorepo with microservices: how do you solve CD/CI? Is Bazel the only solution?

peterwwillis6y ago

CI and CD are more workflows than tools. It doesn't really matter what your repo setup is, you just adapt your workflows to it. On one project I work on we use a monorepo for a handful of microservices. We use standard GitHub flow, no special repo consideration for the CI.

For CD, we have scripts that ask what service you want to build, and they specifically package that service using the set of files & processes dedicated to that service. The build generates a versioned artifact. After that, repo doesn't matter at all, we're just moving service artifacts around.

nhumrich6y ago

The cons to multi repo are all anti patterns for microservices anyway. If you're doing microservices you shouldn't have build dependencies on other projects. The should only call eachother at a network level.

likeliv6y ago

Calling eachother at network level is still a dependency. (And even a build dependency if you use something like protobuff or other protocol description files)

nhumrich6y ago

A network dependency is not a build dependency. Protobuf files should be copy pasted, not referenced directly. Saying you need a single repo to build correctly for your network dependencies is like saying you cant use a third party system (aws, etc.) Without having a link to their code base.

likeliv6y ago

Point is that when you want to do a change in the "API" (or call it "protocol"), you need to touch the different repositories and coordinate to use the right versions together.

About the copy/paste of protobuf files, it works but makes it more difficult to keep them in sync.

And I did not say you need a single repo. I'm saying the stated disadvantages of multi repo are real.

shhsshs6y ago

That’s the part about monorepos I can’t quite wrap my mind around - yes deploying a single large change to many different systems simultaneously is cool in theory, but how does it actually pan out? Deployment is never instant, so any system-to-system breaking changes would cause a short downtime while everything deploys. In the world I operate in, that’s absolutely not acceptable.

Not that you can’t still make your changes backwards compatible with themselves. But if I’m going to have to deploy everything in two steps anyway, what’s the point?

pshirshov6y ago

But... we do better things than microservices: https://github.com/7mind/slides/blob/master/02-roles/target/...

karmakaze6y ago

Read this, not worthwhile and OT.

pshirshov6y ago

> not worthwhile

Why so?

> and OT

Not at all. In both cases it's about roles.

2 more replies

gravypod6y ago

I am a big fan of monorepos and I've worked on a few open source projects that have used mutli-repos and at some places that used a hybrid approach. I agree with some of the ideas this article has put into writing but I wanted to provide some pointers from my experience.

Some background: at my current place of employment I have 28 services, should be 30 in the next few days, and so I think my use current case is very representative of a small to medium monorepo. At my last job right before this one we had sort of a monorepo that was strung together with git submodules although each project was developed independently with it's own git repo+ci.

> Isolation: monorepo does not prevent engineers from using the code they should not use.

Your version control software does not prevent or allow your developers from using code they should not use. It is trivial to check in code that does something like this:

    import "~/company/other-repo/source-file.lang" as not_mine;

Or even worse in something like golang:

    import "github.com/company/internal-tool/..."

Because of this it is my opinion that it is impossible to rely solely on your source control to hide internal packages/source/deps from external consumers. That responsibility, of preventing touching deps, has to be pushed upwards in the stack either to developers or tooling.

> So, big projects in a monorepo have a tendency to degrade and become unmaintainable over time. It’s possible to enforce a strict code review and artifact layouting preventing such degradation but it’s not easy and it’s time consuming,

I think my above example demonstrates this is something that is not unique to monorepos. The level of abstraction that VCS' operate at is not ideal for code-level dependency concepts.

> Build time

Most build systems support caching. Some even do it transparently. Docker's implementation of build caching has, in my experience, been lovely to work with.

---- Multi repo section ----

> In case your release flow involves several components - it’s always a real pain.

This is doubly or tripply true for monorepos because the barrier of cross-service refactors is so low. Due to a lack of good rollout tooling most people with monorepos release everything together. I know my CI essentially does `kubectl apply -f`. Unfortunately, due to the nature of distributed compute, you have no guarantee that new versions of your application won't be seen by old versions (especially so of 0-downtime deployments like blue-green/red-black/canary). Because of this you constantly need to be vigilant of backwards compatibility. Version N of your internal protocol must be N-1 compliant to support zero-downtime deployments. This is something that new members of monorepo have a huge huge difficulty working with.

> It allows people to quickly build independent components,

To start building a new component all one must do is `mkdir projects/<product area>/<project name>`. This is a far lower overhead than most multi-repo situations. You can even `rm -r projects/<product area>/<thing you are replacing>` to completely kill off legacy components so they don't distract you while you work. The roll out of this new tool whet poorly? Just revert to the commit before hand and redeploy and your old project's directories, configs, etc are all in repo. Git repos present an unversioned state that inherently can never be removed f you want a source tree that is green and deployable at any commit hash.

--- Their solution ---

I accomplish the same tasks as a directory structure. As mentioned before if you just put your code into a `projects/<product area>/<project>` structure you can get the same effect they are going for by minimizing the directory layout in your IDE's file view. The performance hit from having the entire code base checked out is very much a non-issue for >99% of us. Very very few of us have code bases larger than the linux mainline and git works fine for their use cases.

Also, any monorepo build tool like Bazel, Buck, Pants, and Please.build will perform adequately for the most common repo sizes and will provide you hermetic, cached, and correct builds. These tools also already exist and have a community around them.

[0] - https://docs.microsoft.com/en-us/azure/devops/learn/git/git-...

j / k navigate · click thread line to collapse

115 comments

Cedricgc6y ago

gnachman6y ago

tomrod6y ago

I went to graduate school with a guy who ended up on Google's infrastructure team. He'd previously worked as head or lead dev for subversion (a little hazy on the details).

We worked on a small project where we put together statistical measures for codebases. It was a lot of fun, even if the infrastructure was out of my wheelhouse at the time.

Folks that can manage billion-line codebases are on a whole different level I think. I wonder sometimes how many folks like that there are.

EDIT: Looks like he left for a bit and is now back. Good on him!

username906y ago

dmoy6y ago

At large enough scale, it causes a lot of problems and breaks like every other dev tool requiring a lot of work to get it back together.

It would still require a lot of glue though.

username906y ago

I'd like a set of tightly coupled tool like that working outside of Google, but I guess it might be a just a dream, it is a bit too big of a project.

1 more reply

Cedricgc6y ago

It really is not a small difference.

fragmede6y ago

OneMoreGoogler6y ago

You know Google's monorepo is successful because they have so many of them! google3, Android, Chrome browser, ChromeOS...

Kidding aside, my point is Google recognizes obvious boundaries between e.g. their web stuff and android, and organizes their code accordingly.

alexnewman6y ago

Google uses no versioned libraries?

Cedricgc6y ago

kyrra6y ago

It's documented here

https://opensource.google/docs/thirdparty/

1 more reply

chrisseaton6y ago

I think they do, but there is only ever one single version in use - the version in the repo.

minxomat6y ago

Yes, and that prevents duplication of work when it's time to upgrade: https://github.com/microsoft/TypeScript/issues/33272

user59944616y ago

If so, they would have a massive problem upgrading libraries like numpy because there are too many and too big breaking changes between releases.

2 more replies

011000116y ago

I feel like these debates are often fueled by false arguments. Either way you go, you're going to want to build support tools and processes to tailor your VCS to your local needs.

jsnell6y ago

VCS access control are the wrong tool for solving the "people use code they shouldn't" complaint.

011000116y ago

> How will VCS ACLs allow me to hide the implementation but not the interface?

If you don't give people access to the code, they can't build it. So what? Publish pre-built binaries from your CI system back to source control.

> At that point you don't have a monorepo, you've got multirepos stored in a monorepo.

alexhutcheson6y ago

Bazel has this via the 'visibility' attribute on packages and build rules: https://docs.bazel.build/versions/master/skylark/build-style...

pshirshov6y ago

> VCS access control are the wrong tool for solving the "people use code they shouldn't" complaint.

I agree

> The actual solution are build system ACLs.

It would be lot better to have a smart compiler for this.

A tool which can prevent us from mixing different abstraction layers, creating unneccessary horizontal links between our components, etc, etc.

I have a couple of ideas how such a thing may look like.

pshirshov6y ago

> Monorepo shortcomings 1 and 2 seem like bullshit to me.

It's a blogpost and the author didn't try to build a total and exhaustive formal system. These shortcomings are not absolute truth but actually they are true.

Same applies not only to VCS flows, but to system design as well.

jyounker6y ago

Monorepo advocates are typically advocating for microservices, but within a single code base.

The way you provide access control is through code review and build system visibility.

In order to modify another group's code you require their approval on the review for that section of the code base. (Using mechanisms like github/gitlab owners files or rules within upsource.)

This still means that if one group needs to make extensive changes to another groups code, the path of least resistance may be to fork it into your own group's section of the repo.

pshirshov6y ago

> Monorepo/multirepo and monolith/microservice are orthogonal concepts.

Yes and no. In both the cases it's a story about components and their isolation.

> they may end up building a distributed monolith

Yup, seen that many times.

> Monorepo advocates are typically advocating for microservices, but within a single code base.

I'm avocating roles. Everywhere.

> If you're using a tool like bazel

If only Bazel supports Scala well enough...

1 more reply

jumpingmice6y ago

jayd166y ago

This is all true BUT I think the monorepo as described here is the act of treating all your projects as directly referencing each other.

Sure you could just use a manyrepo style of dependency tracking in a monorepo but I think that's not exactly what the author is exploring.

jyounker6y ago

> This is all true BUT I think the monorepo as described here is the act of treating all your projects as directly referencing each other.

From what I read that is a correct assessment. What the OP is proposing is something of a strawman argument. No advocate of monorepos I've ever met believe that a monorepo should imply a monolith.

weberc26y ago

jayd166y ago

You can use something like a shared binary repo such as maven or you could just check in dependencies and not worry about an external server being available for builds.

>Surely source control is for source code?

This is just pedantry. Checking in binaries is a pragmatic solution that solves a lot of problems.

yjftsjthsd-h6y ago

I was rather under the impression that checking in binaries was discouraged because it led to performance issues and tends to blow up the repository size. I don't think it's just pedantry.

weberc26y ago

I wasn’t trying to be a pedant, I’ve just never heard of anyone doing this. I was wondering how it helped solve the problem of not rebuilding everything.

1 more reply

username906y ago

At Google we check in the source of every library into the monorepo and compile them ourselves with cached builds from a central server, I don't think we use package managers.

011000116y ago

jayd166y ago

4 more replies

tom_6y ago

Git's design can limit its usefulness in this respect - though perhaps you could solve this to some extent with git LFS? - but not all version control systems have this problem.

fragmede6y ago

(partial clones avoid this, but, as git isn't designed for this use case, grabbing all of history happens far too easily.)

pricechild6y ago

I regularly join projects where someone has decided to place the project's code in half a dozen different repositories.

Even though it's one project.

Even though they refuse to allow a release of a single component - it must all be released together without forwards/backwards compatibility.

I think most of of the time, the mono/multi debate is spoiled by people who feel they can have their cake and eat it too.

megous6y ago

It's just easier to manage all this with individual repos.

pricechild6y ago

> whether you're willing to dump money into updating everyting at once

I think the majority of projects in this world only update everything at once. They haven't investing in testing, sensible api's and testing to allow updating small pieces of their solution.

From my experience, I also think the majority of people who think they have a library and need multi repos to deal with that, don't have a library.

To further clarify, one user of your library means you could stop pretending you have a library and avoid the pain.

I don't mean to insist these problems do not exist, I simply don't think many people have them.

megous6y ago

Project != company. Project != consultancy.

Monorepo just didn't work for me. I have ~10 web projects for different customers + my personal projects that use various versions of some common code.

It doesn't make any financial sense to evolve my common code by updating all the customer's code for free when they're not even asking for it. So on this level it doesn't work.

I still think going multi vs mono is a business decision, rather than a technical one. You'll have to have special tooling for either case, just a different one.

1 more reply

mattbillenstein6y ago

I have typically left mobile iOS/Android in separate repos however - they have a different deployment cadence, so you need to manage breaking changes differently anyway.

djhaskin9876y ago

There's a lot of people on here defending their current workflow, whatever that is.

I for one find it refreshing that people are willing to think about different workflows, even if they are different.

scarmig6y ago

I'm curious: how would most people here define monorepo vs multirepo?

Aesthetically a single point of entry appeals to me, in that it allows for a more consistent interface to code. But I'd go for good tooling above that in a heartbeat.

givehimagun6y ago

I built my engineering staff to focus on any of the initiatives that my boss hands to me (changes week/week) - so we went monorepo so we could move between those projects/apps/programs quickly.

huherto6y ago

Too6y ago

Sounds like that would enforce Conway's Law a lot and make cross team collaboration more difficult.

dodobirdlord6y ago

ChristianBundy6y ago

punkdata6y ago

jayd166y ago

I don't see it used very often though. Why not?

avisser6y ago

Even with submodules it's still a PR per repo. Global, atomic changes are super powerful.

likeliv6y ago

There is another tool called git-subtree that should solve these problems, I think. But I've never seen it in use

Skunkleton6y ago

git subtree is just a wrapper around subtree merges, I don't think they solve the same problem as submodules.

jayd166y ago

Good point! Hmm. Maybe git(hu|la)b could solve this with some kind of pull request batching.

vogre6y ago

Because if some change requires you to make commits to 3 repos, submodule approach will require to make 4 commits instead. Very annoying.

kerng6y ago

In one of my first jobs like 15 years ago at a large software company we had just moved to a monorepo.

Stevvo6y ago

philwelch6y ago

amelius6y ago

Good idea. But what gets checked in at the metarepo level? The names of the branches that are checked out in the submodules under it?

Can you have two metarepos, each with its own set of checked-out branches of the same original submodules?

philwelch6y ago

I believe submodules track specific branches.

jesse_m6y ago

Also Google's repo tool. I like the idea of manifests and the commands it includes for creating feature branches across repos

intellix6y ago

For TypeScript/JS projects I think NX CLI is pretty awesome as it handles multiple frameworks

ledneb6y ago

So many questions, but they're all about identifying change and only deploying change...

peterwwillis6y ago

smallnamespace6y ago

> Are we taking writing scripts which trigger further pipelines if they detect change in a path or its dependencies

Unless one enforces perfect one-to-one match between repo boundaries and deployments, this is also an issue with multirepos.

In practice, it's straightforward to write a short script that deploys a portion of a repo and have it trigger if its source subtree changes and then run it in your CI/CD environment.

bechampion6y ago

kyrra6y ago

Monorepos with Git don't play together nicely. Perforce is key if you have lots of devs on a monorepo.

jayd166y ago

I don't understand the isolation difference. You can hide, protect and branch code in a monorepo so why is isolation a concern?

swsieber6y ago

It depends on which VCS you use. Git for example, doesn't have any native support for hiding or protecting code in particular folders within the repository.

jayd166y ago

Hmm seems unfair to judge monorepos on what git is capable of. I hate perforce but it accomplishes this easily.

damnyou6y ago

For all its benefits, Git has plenty of limitations that don't exist in other systems.

beagle36y ago

git has hooks for access control (which is how e.g. gitolite manages permissions, albeit at the whole repo level - I'm not familiar with an open-source hook that does directory level).

With respect to hiding, git has sparse checkouts that can give you a limited view of a repository (for performance reasons - not for security reasons)

But that's just today's git. Other VCSs like perforce provide much finer grained access control and hiding.

edoceo6y ago

solarengineer6y ago

GoCD provides “fan in” which supports monorepos

https://docs.gocd.org/current/advanced_usage/fan_in.html

akhilcacharya6y ago

Amazon does multi-repo. I don't see what the problem or debate over this is. We seem to be handling it pretty fine despite a massive-scale SOA architecture.

dodobirdlord6y ago

catalogia6y ago

When I was there, they were migrating away from perforce because they could no longer scale perforce fast enough to meet demand. I've not seen this talked about much outside of Amazon.

blandflakes6y ago

The Alexa division migrated aggressively to git as soon as it was available and nobody publicly voiced any regret about losing perforce.

catalogia6y ago

1 more reply

sthomas16186y ago

I'm curious about those who use a monorepo with microservices: how do you solve CD/CI? Is Bazel the only solution?

peterwwillis6y ago

nhumrich6y ago

likeliv6y ago

Calling eachother at network level is still a dependency. (And even a build dependency if you use something like protobuff or other protocol description files)

nhumrich6y ago

likeliv6y ago

Point is that when you want to do a change in the "API" (or call it "protocol"), you need to touch the different repositories and coordinate to use the right versions together.

About the copy/paste of protobuf files, it works but makes it more difficult to keep them in sync.

And I did not say you need a single repo. I'm saying the stated disadvantages of multi repo are real.

shhsshs6y ago

Not that you can’t still make your changes backwards compatible with themselves. But if I’m going to have to deploy everything in two steps anyway, what’s the point?

pshirshov6y ago

But... we do better things than microservices: https://github.com/7mind/slides/blob/master/02-roles/target/...

karmakaze6y ago

Read this, not worthwhile and OT.

pshirshov6y ago

> not worthwhile

Why so?

> and OT

Not at all. In both cases it's about roles.

2 more replies

gravypod6y ago

> Isolation: monorepo does not prevent engineers from using the code they should not use.

Your version control software does not prevent or allow your developers from using code they should not use. It is trivial to check in code that does something like this:

    import "~/company/other-repo/source-file.lang" as not_mine;

Or even worse in something like golang:

    import "github.com/company/internal-tool/..."

I think my above example demonstrates this is something that is not unique to monorepos. The level of abstraction that VCS' operate at is not ideal for code-level dependency concepts.

> Build time

Most build systems support caching. Some even do it transparently. Docker's implementation of build caching has, in my experience, been lovely to work with.

---- Multi repo section ----

> In case your release flow involves several components - it’s always a real pain.

> It allows people to quickly build independent components,

--- Their solution ---

[0] - https://docs.microsoft.com/en-us/azure/devops/learn/git/git-...

j / k navigate · click thread line to collapse