Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My advice is that if components need to release together, then they ought to be in the same repo. I'd probably go further and say that if you just think components might need to release together then they should go in the same repo, because you can in fact pretty easily manage projects with different release schedules from the same repo if you really need to.

On the other hand if you've got a whole bunch of components in different repos which need to release together it suddenly becomes a real pain.

If you've got components that will never need to release together, then of course you can stick them in different repositories. But if you do this and you want to share common code between the repositories then you will need to manage that code with some sort of robust versioning system, and robust versioning systems are hard. Only do something like that when the value is high enough to justify the overhead. If you're in a startup, chances are very good that the value is not high enough.

As a final observation, you can split big repositories into smaller ones quite easily (in Git anyway) but sticking small repositories together into a bigger one is a lot harder. So start out with a monorepo and only split smaller repositories out when it's clear that it really makes sense.



Components might need to be released “together”, but if they are worked on by different teams, it means they’ll have a different release process, as in different timeline, different priorities.

First of all this is normal, because otherwise the development doesn’t scale.

In such a case the monorepo starts to suck. And that’s the problem with your philosophy ... it matters less how the components connect, it matters more who is working on it.

Truth of the matter is that the monorepo encourages shortcuts. You’d think that the monorepo saves you from incompatibilities, but it does so at the expense of tight coupling.

In my experience people miss the forest from the trees here. If breaking compatibility between components is the problem, one obvious solution is to no longer break compatibility.

And another issue is one of responsibility. Having different teams working on different components in different repos will lead to an interesting effect ... nobody wants to own more than they have to, so teams will defend their components against unneeded complexity.

And no, you cannot split a monorepo into a polyrepo easily. Been there, done that. The reason is that working in a monorepo versus multiple repos influences the architecture quite a lot and the monorepo leads to very unclear boundaries.


> Components might need to be released “together”, but if they are worked on by different teams, it means they’ll have a different release process, as in different timeline, different priorities.

released "together" == part of the same feature. Timelines, release process and team priorities are all there to help to deliver features. If they stand in the way, they need to be adjusted. Not the other way around.

Multi repos encourage silos. Silos encourage focusing on the goals of the silo and discourage poking around the bigger picture. Couple that with scrum, that conveniently substitute real progress metrics with meaningless points, and soon enough you end up with an IT department, full on with processes but light on delivering value.


And no, you cannot split a monorepo into a polyrepo easily. Been there, done that. _The reason is that working in a monorepo versus multiple repos influences the architecture quite a lot and the monorepo leads to very unclear boundaries.

I think you are conflating a monorepo (where boundaries can still be established, e.g. via a module isolation mechanism specific to the stack used) with a "monoproject"/"monomodule", where is no modularization at all.

Edit: expanded wording


No, there's no such confusion.

> where boundaries can still be established, e.g. via a module isolation mechanism specific to the stack used

Unfortunately this isn't a technical issue and that's the problem.


If the projects within the monorepo are decoupled and have clear boundaries then why not have them in separate repositories?...

In my opinion monorepos make refactoring dependant projects much easier. However it is much harder to establish and enforce clear boundaries...


With monorepos you don't have to manage PRs for 8 different repositories when adding a feature.

In my experience it's hard to establish clear boundaries, regardless of repository kind. It may be more difficult to create features which are tightly coupled across multiple repositories, but people do it regularly. And when they do, you suddenly have to manage and maintain synced features across multiple repositories.

In fact, the repo tool for the android project makes it quite easy to develop features across repositories, thus lowering the boundaries significantly.


i have a monorepo that contains a few different early stage frontend web projects that does not interact with each other at all. They do however uses a shared component library that is also placed inside the monorepo. Tools like yarn workspaces makes sharing the library easy if the projects are located on the same repo.

When I change something on the library, i could easily also run tests across all the projects that depends on it with the latest changes of the library and make sure that my change is not breaking things all over the place, which is also pretty nice.

I am not sure yet if using a monorepo is actually the best way deal with this kind of projects, but for now it feels better than having them on seperate repos and then having to deal with the complexity of sharing the library across repos by publishing it somewhere or using git submodules or something.


I work on a project structured into microservices and use both. There is one global repo with submodules in subrepositories.

So when someone only wants a submodule they can happily only clone that, but when someone wants all stuff (which is the default case), the can clone and install all at once.

Downside is that I have to commit twice


> Having different teams working on different components in different repos will lead to an interesting effect ... nobody wants to own more than they have to

... and so nobody really understands how all of the components tie together and as a result it takes weeks of manual testing to release.


My rule of thumb is: if you need to do PRs in several repositories to do one features, you should probably merge the repositories. At work, we have code spread among a bunch of repositories, and having to link to the 2/3 related PRs in other repos is a major PITA, and even more so for the reviewers.


My rule of thumb is: if you need to do PRs in several repositories to do one feature, your projects are either tightly coupled enough that they should be one monolithic piece of software, or your tight coupling is a problem you should work on resolving.

Requiring multiple PRs to multiple repos to roll out one user-facing feature is fine, as long as your independent modules/projects are not actually interdependent (i.e. one of those PRs will not break another independent repo that lacks a corresponding PR).


Sometimes a feature needs to change a shared dependency library.

But in that case you could consider the change to the dependency a single release. And ingesting it into another app a separate release.


At a past job, I had to edit roughly 5 different repositories in order to do some trivial programming task (send an email or some such). It was quite easily the least productive / most demoralizing workflow I've ever experienced.

Context switching really sucks. You should aim to reasonably avoid it


Sending an email can have a few different responsibilities:

Who is the email being sent to?

What is the content of the email?

What data does the email content and recipient depend on?

What are you tracking on the email?

How is the email visually formatted?

All those things might be in different apps as the logic gets more complicated.


Don't mix up downsides of multirepo and bad composition of your microservices


Just because things change in tandem, that does not mean that they're all the same thing. When I add a new function to my backend service, all frontends that consume its API also need to be adjusted. But that doesn't mean that the backend service, its command-line clients and its web GUI client should live in the same repo.


It's probably a matter of taste - but I think they should be in the same repo. I like tying test failures/regressions to a specific commit for documentation and admin purposes. Having a test fail or regression due to an 'unrelated' commit in another repo sounds like a nightmare waiting to happen when you try investigating.

I the difference of opinion is between developers who work on self-hosted "evergreen" products where the latest version is deployed, and others who work with multiple release branches with fixes/features constantly being cherry-picked.


Why? You are just creating more work for yourself by keeping the components in different repos. Now you need to create N commits when updating something. If your future self wants to investigate how the software has evolved there are N times as many commits to analyze.


I really think it should if possible. Makes life much easier in my experience.


Not always. It makes absolutely sense to have a repository for the gui and one for the server. When writing a new feature you usually write some gui code and some server code and create different pull requests. I think monorepos are seriously wrong and I completely agree with this article.


Well... Why does that make sense? I have a repository containing both the GUI and the server, and sometimes I have to make changes to both. Locating those related changes together in the same commit and/or PR makes a lot of sense to me: the changes depend on each other, and thus should be reviewed together. What's the advantage of splitting them up?


Because obviously the changes that you make in the gui are completely isolated from the changes you make on the server. When you are working on the gui the server code is just noise and vice versa. And it gets even worse when you use two different languages for the gui and the server.


> Because obviously the changes that you make in the gui are completely isolated from the changes you make on the server.

In my experience, that is almost never the case. Often, the frontend requires a new endpoint or a modification to an existing endpoint. If you don't coordinate this change, you end up with a non-functional PR that cannot even be tested. Same happens when the backend proposes an endpoint change that affects the frontend.

We have moved the frontend and backend to the same repo to make coordination and testing of such cases simpler.


You make the endpoint first, and test it without the UI. What challenges do you foresee here?


* Changing graphql schemas.

* Any non-backwards compatible change in the interface between the components. Yes this can be solved. But when working in a smaller team on proprietary software why use time solving a problem you don't need to solve?

(This is from experience.)


> why use time solving a problem you don't need to solve?

Unless they're running on the same computer and deploy literally simultaneously, this is already a problem you need to solve.


A surprising number of companies are prepared to accept an hour of downtime for an internal system if it saves them money.. In my experience the best business practice is to offer the product owner/manager the costed options in such a situation and allow them to choose.


That only really matters if your backend developers are a different team to your frontend developers where they'd want to be working concurrently. And even then, they could work in different branches and both teams merge into a development branch when finished.

The idealistic discussions for or against monorepos often overlook the most important detail: who's working on the code and how would you want them to version control it?

If it's separate projects with their own versioning then it makes sense to have them as separate repositories. If it's a single project but with individual components you'd want to version (eg because it's developed by different teams with different release timelines) then there you also have a situation where you'd want to version the code separately so once again there is a strong argument for separate repositories. However if it's one product with a single release schedule then splitting up the frontend from the backend can often be a completely unnecessary step if you're doing it purely for arbitrary reasons such as the languages being different. (I mean Git certainly doesn't care about that. A project might have Bash scripts, systemd service files, Python bootstrapping, code for an AOT compiled language (eg Rust, Go, C++, etc), YAML for Concourse, etc. They're all just text files required for testing and compiling so you wouldn't split all of those into dozens of separate repos).


> That only really matters if your backend developers are a different team to your frontend developers

What if there is one team, but different developers (one working on the frontend, another on the back)? What if QA can test the API while the frontend development is ongoing?

What if the front and backends have different toolchains, and ultimately separate execution environments (server app backend vs JS running on client machines).


I’m not sure what your point is. There’s obviously going to be thousands of different scenarios that I didn’t cover; it would be impossible to cover every imaginable use case.

> What if there is one team, but different developers (one working on the frontend, another on the back)?

Then presumably everyone in that team are full stack?(Otherwise it would be different teams in the same department) so it still makes sense to have a monorepo because you could have a situation (holiday, sickness) where someone would be working on both the front end and back end. Thankfully got is a distributed version control solution and supports feature branches so you can still have multiple people working on the same repo and then merge back into a developement branch.

> What if QA can test the API while the frontend development is ongoing?

Testing isn’t the same as released versions. You can (and should) test code at all stages of development regardless of team structures, git repo structures nor release cycles.

> What if the front and backends have different toolchains, and ultimately separate execution environments (server app backend vs JS running on client machines).

I’d already covered that point when talking about different languages in the same repo. You’re making a distinction about something that version control doesn’t care in the slightest about.

I think it’s fair to say any significant cross-project tooling should be it’s own repo (you wouldn’t include the web browser or JVM with your frontend and backend repos). But if it’s just bootstrapping code that is used specifically by that project then of course you’d want that included. Eg you wouldn’t have Makefiles separate from C++ code. But you wouldn’t include GCC with it because that’s a separate project in itself.

Ultimately though, there is no right answer. It’s just what works best for the release schedule of a product and teams who have to work upon that project.


This is true for systems where there is a well-defined protocol between GUI and servers and a proper versioning process in place, i.e. most "old-school" client/server systems.

I expect lots of people on HN are working on systems with very tight coupling between client/GUI and server and no proper versioning between them, as is common in web applications. Hence the replies to the contrary: you're probably from quite different worlds :)

(Now, I personally think that maintaining sound versioning practices is a good idea even if you do have tightly coupled control of both the client and the server side. But that may just be me...)


I think what matters in the end is Conway's law. Conway's is frequently misinterpreted as an observation when it's actually advice: Structure your applications/repos like you structure your teams. You're going to end up with that code structure anyway, so might as well save some time.


Hmm, that's not really obvious to me. Sometimes the server has to deliver new data that is to be used in the GUI, so it's nice to be able to present those together in the same PR. If it then happens that the server-side changes do not match what you need in the GUI, it's relatively painless to add those changes in the same branch that hasn't yet been merged. In other words: although you can make changes in one without breaking the other, that doesn't make them completely isolated.


You should always have a communication layer between the gui and the server. For example using protobuf you would update the proto definition (that can be in a shared repo) and when building the gui and the server the protobuf layer is regenerated. So the only place where you make your changes for the new data contract is the shared repo and the gui and server would automatically have the new changes.


So now we're at three repo's, one of which is shared by the other two, and changes will have to be coordinated over them. I fail to see how that is an improvement over having both in the same repo.

In the end, I think the other comments are right that it mostly depends on who's working on something. If it's different teams, then different repo's probably make sense. But if I'm responsible for both the back-end and the front-end, they're usually not isolated at all at least in terms of project requirements, and hence keeping them together makes sense.

(But of course, even then there's nuances. I think the article is mostly arguing against monorepo's as in company-wide monorepo's. I'm willing to believe Googlers that it works well for Google, and I'm not in a position to claim what it'd be like for other companies. Team-wide monorepo's for different parts of the same project, however, make a lot of sense to me.)


> I'm willing to believe Googlers that it works well for Google

It doesn't. In my entire career, that was the only environment in which some random would break us and we couldn't do anything about it other than hope for a rollback and then wait for hours for the retest queue to clear before we could deploy anything at all.

Maybe not all the time, but you need the escape hatch of pinning healthy deps, because HEAD of everything is not guaranteed to work.


Well, I'm willing to believe you that it didn't work well for you as well. My point is that company-wide monorepo's are largely irrelevant to my point, as I'm not arguing in favour or against those (I'm leaving that to people who've worked with it).


It'll be really typical for a gui/server to want to share some is_valid_payload() function. The client to validate it before sending, and for the server to do its own validation.

If it's a monorepo your PR might be a 2 line patch to that function, then adding the GUI and server code.

If you split it you'll first need to have a PR on the "validation-lib" repo, then once that gets in a PR on the "server" repo, bumping the "validation-lib" version dependency, and finally a PR on the "gui" repo bumping the dependency for both "validation-lib" and "server" (for testing etc.). That's before you need do deal with the circular dependency that "server" also wants "gui" for its own "I changed my server code, does the GUI work?" testing.

Better just to have them in a monorepo if they're logically the same code and want to share various components.


> If it's a monorepo your PR might be a 2 line patch to that function, then adding the GUI and server code.

> If you split it you'll first need to have a PR on the "validation-lib" repo, then once that gets in a PR on the "server" repo, bumping the "validation-lib" version dependency, and finally a PR on the "gui" repo bumping the dependency for both "validation-lib" and "server" (for testing etc.). That's before you need do deal with the circular dependency that "server" also wants "gui" for its own "I changed my server code, does the GUI work?" testing.

The above is exactly why I am so firmly opposed to multirepo[0]-first. And it's really just a throwaway example: a real change would involve multiple different library and executable repos, all having separate PRs. And then there's the relatively high risk of getting a circular incompatibility.

This can be worth the cost, for organisational reasons. But until you need it, don't do it. It's very easy to split a git repo into multiple repos, each retaining its history (using git filter-branch). Don't incur the pain until you need to, because honestly, you're not likely to need to. You're probably not going to grow to the size of Google. Heck, most of Google runs in one monorepo, with a few other repos on the side: if they can make it work at their scale, so can you. And if, as the odds are, you never grow to their size, then you'll never have wasted time engineering a successful multirepo system instead of delivering features to your business & customers.

0: 'polyrepo,' really? https://trends.google.com/trends/explore?date=all&q=multirep... clearly shows that 'multirepo' is term.


These are two separate functions why would you ever want a function that checks both gui and server? The gui validation logic belongs to the gui layer, the server validation logic to the server layer. If you have a function that contains logic from both layers there is something seriously wrong with your design.


The classic reason for any validation is that you want the validation to be done in the frontend (to save a network roundtrip and provide better, immediate feedback), on the backend (so that if the frontend is compromised and maliciously circumvents that validation, it still gets validated), and both of the validations to be the same to prevent inconsistencies.

A good way to fulfil those requirements is to have the exact same function available in both places.


If you have the same functionality that can't be re-used (for no reason), then I'd call that a design flaw.

I'll need a few more validation functions for each clients. I don't want to write+maintain multiple functions that do the same thing, even if it's just copy+paste.

It's "data" validation. So let's put that in the "data layer" repo.

We now have, at least:

- Server

- Web (GUI)

- Android

- iOS

- Data

- More clients?

We'll also have branches for each development task. How do we know what branch the other branches should use? One "simple" feature can easily spread over multiple repos. Does each repo refer to the repo+branch it depends upon (don't forget to update the references when we merge!), or we add a "build" repo which acts as the orchestrator?

Most PRs will need to be daisy chained - who reviews each one? Will they get comitted at the same time?

How do we make the builds reproducible? commit hashes? tags? ok, we now need to tag each repo, and update the references to point to that tag/hash... but that changes the build.

Well, I'm glad our code base is split over multiple repos because "scalability".


Imagine something like "curl" where a client needs to validate a manually provided request before making it.

In any case, if you're nitpicking that example you're missing the point. The same would go for any number of other shared code you could imagine between a client/server trying that logically make up one program talking over a network.


I still can’t see how you would have a shared library for a C# gui and a Java server for example. Your communication layer would obviously live in both repositories. Even in case you are using the same language and you do have shared libraries then what is the problem? The shared libraries would surely be shared with other projects so it makes sense to have them in a separate repository.


In cases where there's a high degree of churn (i.e. early-stage startups) in shared libraries, updating those libraries can cause a large amount of busywork and ceremony.

If you had a `foo()` function shared between the GUI and the server (or two services on your backend, or whatever), in a monorepo your workflow is:

   - Update foo()
   - Merge to master
   - Deploy
In a polyrepo where foo() is defined in a versioned, shared library your workflow is now:

   - Update foo()
   - Merge to shared library master
   - Publish shared library
   - Rev the version of shared library on the client
   - Merge to master
   - Deploy client
   - Rev the version of shared library on the server
   - Merge to master
   - Deploy server
This problem gets even more compounded when your dependencies start to get more than one level deep.

I recently dealt with an incredibly minor bug (1 quick code change), that still required 14 separate PRs to get fully out to production in order to cover all of our dependencies. That's a lot of busywork to contend with.


It seems to me that the real problem is your toolchain. In a previous project the workflow was like this:

Update foo() Merge to master Publish shared library Deploy

So as you can see the only step added was to publish the shared library that would automatically update the version in all the projects using it. If you are really doing everything manually I can understand that this is a pain, but this has nothing to do with the monorepo / multiple repo distinction, this is a tooling problem.


But you've just invented a sharded monorepo, and now have all the monorepo problems without the solutions.

What if updating foo() breaks something in one of the clients (say due to reliance on something not specified). Then you didn't catch that issue by running client's tests, now client is broken, and they don't necessarily know why. They know the most recent version of shared broke them, but then they have to say "you broke me" or now one of the teams needs to investigate and possibly needs to bisect across all changes in the version bump under their tests to find the breakage.

How is that handled?

(the broader point here is that monorepo or multirepo is an interface, not an implementation, its all a tooling problem. There are features you want your repo to support. Do you invest that tooling in scaling a single repo or in coordinating multiple ones? Maybe I should write that blog post).


Some package managers that support git repos as dependency versions can offset this in development.


>It makes absolutely sense to have a repository for the gui and one for the server.

Not really. You can have a single repo with top level directories tigershark-gui and tigershark-server.


What is the point instead of having them in two separate repos?


Any full stack change will be represented by one PR that changes from pre change to post change. Two repos would introduce a new possible state where one has the change applied and the other doesn't.


And later you add the iOS and Android clients too. Will those go into the same repo? Better to keep server and clients apart, especially if release schedules are different.


Sure if the release schedules are different then have them in separate repos so things like tagging makes sense. But often people work with a single release schedule. There's just so many variables that go into these decisions that the thread here is bonkers.

Smart people can work through problems to get the job done. Monorepo vs polyrepo won't stop people from moving forward.


> As a final observation, you can split big repositories into smaller ones quite easily (in Git anyway) but sticking small repositories together into a bigger one is a lot harder. So start out with a monorepo and only split smaller repositories out when it's clear that it really makes sense.

If you only need to do this once, subtree will do the job, even retaining all your history if you want.

I'm not sure what the easier way to split big repos is.


To split, you can duplicate the repo and pull trees out of each dupe in normal commits.


In principle: Yes.

In practice, I can tell you from first-hand experience that this isn't all that simple in bigger, organically grown cases (you'll have many other things to consider if you want to keep the history in a useful way). Especially the broken branching model of SVN and co. is a problem here: In the wild, it immediately leads to "copy&paste branching" (usually through multiple commits. Migrating that to Git or Hg and splitting it up can be a challenge.


I haven't tried in Git, but with Mercurial merging repos is as simple as pulling from an unrelated repository and merging, that's it. It's a lot simpler than splitting a repo up unless you accept that all of the old history can remain, then you just make a clone and delete what should no longer be a part of the repository.

But monorepo leads to tight coupling, and that is just as much a pain to work on as versioning, or two teams are simultaneously working on the same shared code, and you have not only merge conflicts, but conflicting functionality.


So why is that? Why do we ned to couple together the software development efforts with release? Based on my experience there is no difference between the monorepo vs multirepo approach from the deployment point of view.


After trying to get the best of both with Subversion Externals and Git Submodules, I'd have to agree. At least until things are so loosely coupled they're begging for a public release.

That said, some packaging solutions can bridge the gap reasonably well. Unless you need instantaneous, atomic releases.


I switched to using submodules about a year ago, and they work very well for a project + a set of 4 dependencies. I handle that zoo from VS Code + Git Lens plugin.

Funnily, I only use Code to handle commits to submodules, because Git Lens is not available for the full VS IDE.


What are you talking about! In my perfect micro services world I just have these enforced bounded contexts that are so perfectly designed they never need to change. Consequently all parts of the system are perfectly independent snowflakes that can be deployed without thinking about any other parts of the system. It’s beautiful really when you think about the mess that things were before we could do this!


I generate Coq proofs of Swagger descriptions that were compiled from a speech to text dump during a 10 person Hangout. Downside is that some of the protobufs aren't laid out as cleanly as one would like.


While I know you are being sarcastic, I really have heard bushy tailed young “architects” say something similar who just read about Domain Driven Design and then decided they were trying to “educate us”.


Oh I worked on a project like this, which still hasn’t launched any software yet 5 months after I left...


I can think of situations where components 'need' to release together because of organizational rules and not any actual binding between the components, in that case of course they do not need to be in the same repository.

I agree that you should always start with one repo and split as needed, it's the MVR way (minimum viable repository)





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: