Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"
Why can't we add millions of dollars of tool engineering on top of polyrepos to get some of the benefits of monorepos without a lot of the pain? E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure
And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository
> Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"
The costs of the infra/build/CI work are of course more visible when there is a dedicated team doing it. If there is no such central team, the cost is just invisibly split between all the teams. In my experience this is more costly overall, due to every team rolling their own thing and requiring them to be jack-of-all-trades in rolling their own infra/build/CI.
> And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository
If repository permissions aren't set centrally but every team gets to micromanage them, then they usually end up too restrictive and you don't get even read-only access.
Great call out. Amazon has an extremely effectively polyrepo setup and it’s a shame there’s no open source analog. Probably because it requires infrastructure outside of the repo software itself. I’ve been toying around with building it myself but it’s a massive project and I don’t have much free time.
The Amazon poly-repo setup is an engineering marvel, and a usability nightmare, and doesn't even solve all the major documented problems of poly-repos. The "version set" idea was probably revolutionary when it was invented, but everyone I know who has ever worked at amazon has casually mentioned that their team has at least one college-hire working 25%+ time on keeping their dependency tree building.
This really shouldn't be the case as of about 5 years ago, a massive effort was done to get all version sets merging from live regularly and things were much healthier after that. For what it's worth I suspect the usability of Brazil before then was still on par or better than the usability of a unkempt monorepo (which is unfortunately all too common).
I think one reason is that there are various big companies (Google, Microsoft, Meta) who have talked about the tech they've deployed to make monorepos work, but I at least have never seen an equivalent big successful company describe their polyrepo setup, how they solved the pain points and what the tech around it looks like.
>equivalent big successful company describe their polyrepo setup, how they solved the pain points and what the tech around it looks like.
I've worked at big successful F500 boring companies with polyrepo setup and it's boring as well. For this company, it was Jenkins checked out the repo, ran the Jenkins file, artifact was created and stuck into JFrog Artifactory. We would update Puppet file in our repo and during approved deploy window in ServiceNow, Puppet would do the deploy. Because of this, Repos had certain fixed structure which was annoying at times.
Pain Points that were not solved is 4 different teams involved in touching everything (Jenkins, Puppet, InfoSec and dev team) and break downs that would happen.
The short answer, start with a package management system like conan or npm (we rolled our own - releasing 1.0 the same month I first heard of conan which was then around version 0.6 - don't follow our example). Then you just need processes to ensure that everyone constantly has the latest version of all the repos they depend on - which ends up being a full time job for someone to manage.
Don't write your own package manager - if you use a common one that means your IDE will know how to work with it - our custom package manager has some nice features but we have to maintain our own IDE plugin so it can figure out the builds.
> Then you just need processes to ensure that everyone constantly has the latest version of all the repos they depend on - which ends up being a full time job for someone to manage.
One full time job equivalent can buy a lot of tooling. Tooling that not only replaces this role but also shifts the feedback a lot closer to dev introducing the breaking change.
I realize this is more than a week ago and nobody will see it, but...
Every large project has this position. Smaller projects it isn't a full time job and so they distribute it amoung the team members and don't track the cost. Larger projects it is too large for a full time person and so they are forced to distribute the costs of this and just cannot track the costs. We happen to be in the sweet spot where a full time person can do the job and so we can track that cost. However make no mistake everything this person is doing is done on every project.
I agree with tooling being important. the person we have doing this job is a great engineer who automates everything he can, but there is still a lot of work that needs to be done. Some of it cannot be automated. (many of the problems are people problems)
I also think a lot of it is quiet for a reason. There aren't interesting problems to solve. A lot of it is boring. It isn't without pain, but most of the pain consists of lots of little papercuts rather than big giant showstopping injuries. A lot of the papercuts are just annoying enough itches that aren't worth scratching. Or are solved with ecosystems of normal, boring tools like Jenkins or GitHub Advanced Security or SonarQube or GitHub Actions or… Boring off-the-shelf tools for boring off-the-shelf pain points.
My company has millions of dollars in tooling for our polyrepo. It would not be hard to throw several more million into the problem.
If you have a large project there is no getting around the issues you will have. Just a set of pros and cons.
There are better tools for polyrepo you can start with, but there is a lot of things that we have that I wish I could get upstreamed (there is good reason the open source world would not accept our patches even if I cleaned them up)
a) At least with Github Actions it is trivial to support polyrepos. At my company we have thousands of repositories which we can easily handle because we can sync templated CI/CD workflows from a shared repository to any number of downstream ones.
b) When you are browsing through repositories you see a description, tags, technologies used, contributors, number of commits, releases etc. Massive difference in discovery versus a directory.
Exactly. Take your monorepo, split it into n repos by directory at certain depth from root, write very a rudimentary VCS wrapper script to sync all the repos in tandem and you have already solved a lot of pain points.
> E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure
That sort of directory-based splitting almost never works in my experience. The code between those directories is almost always tightly coupled. Splitting arbitrarily like this gives the illusion of a non-tightly coupled code base with all the disadvantages of highly coupled dependencies. It’s basically the worst possible way to migrate workflows.
> Take your monorepo, split it into n repos by directory at certain depth from root, write very a rudimentary VCS wrapper script to sync all the repos in tandem and you have already solved a lot of pain points.
Then you lose the capability to atomically make a commit that crosses repoes. I'm not sure if there is any forge that allows that, except Gerrit might with its topics feature (I've not gotten the opportunity to try that).
You could also use git submodules in an overarching separate repo, if you want to lock down a set of versions. It doesn't even have to affect the submodule repos in any way. That would simplify branches in the single repos and enable teams to work independently on each repo. Then you only deploy from the overarching repo's main branch for example, where you have to create PRs for merging into the main branch and get it reviewed and approved.
That's not a nice workflow from pipelines/CI point of view.
Let's take for example a service 'foobar' that depends on in-house library 'libfoo'. And now you need to add a feature to foobar that needs some changes to libfoo at same time (and for extra fun let's say those changes will break some other users of libfoo). Of course during development you want to run pipelines for both libfoo and foobar.
In such 'super module' system it gets pretty annoying to push changes for testing in CI when every change to either libfoo or foobar needs to be followed by a commit to the super repo.
> In such 'super module' system it gets pretty annoying to push changes for testing in CI when every change to either libfoo or foobar needs to be followed by a commit to the super repo.
Again, tooling issue. CI can easily pull required changeset across multiple repos. We are in a subthread under "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"
> Every time I've been monorepos compares with polyrepos it's always "monorepo plus millions of dollars of custom tool engineering" vs "stock polyrepo"
Not quite - it's "vs stock polyrepo with millions of dollars of engineering effort in manually doing what the monorepo tooling does".
> Why can't we add millions of dollars of tool engineering on top of polyrepos
I don't think the "stock polyrepo" characterization is apt. Organizations using polyrepos already do invest that kind of money. Unfortunately, this effort is not visible because it's spread out across repos and every team does their own thing. So then people erroneously conclude that monorepos are much more expensive. Like the GP said:
> Polyrepos are a thousand species living and dying, some thriving, some never to be known, most completely in the dark.
Hey, do you think Gitlab should do anything except running after the next trend and develop shitty not-solutions for that?
Why, that could improve Gitlab. We cannot have that!
Why can't we add millions of dollars of tool engineering on top of polyrepos to get some of the benefits of monorepos without a lot of the pain? E.g. it wouldn't be too hard to create "linked" PRs across repos for changes that span projects, with linked testing infrastructure
And I don't see how discovery changes significantly from browsing through loads of repositories instead of loads of directories in a repository