Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm a huge Gitlab fan. But I long ago lost faith in their ability to run a production service at scale.

Nothing important of mine is allowed to live exclusively on Gitlab.com.

It seems like they are just growing too fast for their level of investment in their production environment.

One of the only reasons I was comfortable using Gitlab.com in the first place was because I knew I could migrate off it without too much disruption if I needed to (yay open source!). Which I ended up forced to do on short notice when their CI system became unusable for people who use their own runners (overloaded system + an architecture which uses a database as a queue. ouch.).

Which put an end to what seemed like constant performance issues. It was overdue, and made me sleep well about things like backups :).

A while back one of their database clusters went into split brain mode, which I could tell as an outsider pretty quickly... but for those on the inside, it took them a while before they figured it out. My tweet on the subject ended up helping document when the problem had started.

If they are going to continue offering Gitlab.com I think they need to seriously invest in their talent. Even with highly skilled folks doing things efficiently, at some point you just need more people to keep up with all the things that need to be done. I know it's a hard skillset to recruit for - us devopish types are both quite costly and quite rare - but I think operating the service as they do today seriously tarnishes the Gitlab brand.

I don't like writing things like this because I know it can be hard to hear/demoralizing. But it's genuine feedback that, taken in the kind spirit is intended, will hopefully be helpful to the Gitlab team.



Hey Daniel, I want to thank you for your candid feedback. Rest assured that this sort of thing makes it back to the team and is truly appreciated no matter how harsh it is.

You're absolutely right -- we need to do better. We're aware of several issues related to the .com service, mostly focused on reliability and speed, and have prioritized these issues this quarter. The site is down so I can't link directly, but here's a link to a cached version of the issue where we're discussing all of this if you'd like to chime in once things are back up: https://webcache.googleusercontent.com/search?q=cache:YgzBJm...


I'm running a remote-only company and we moved to GitLab.com last summer from cloud hosted trac+git/svn combo (xp-dev). The reason we picked GitLab.com was because the stack is awesome and Trac is showing its age. We also wanted a solution that could be ran on premises if needed. We spent about a month migrating stuff over to GitLab from Trac. Once we were settled the reliability issues started to show. We were hoping that these would be quickly sorted out given the fact that the pace of the development with the UI and features was quite speedy.

A sales rep reached out and I told him we would be happy to pay if that's required to be able to use the cloud hosted version reliably but I got no response. Certainly we could host GitLab EE or CE on our own but this is what we wanted to avoid and leave it to those who know it best. xp-dev never ever had any downtime longer than 10 minutes that we actively used during the last 6 years. I'm still paying them so that I can search older projects as the response time is instant while gitlab takes more than 10 seconds to search.

Besides the slow response times and frequent 500 and timeout errors that we got accustomed to, gitlab.com displays the notorious "Deploy in progress" message every other day for over 20-30 minutes preventing us from working. I really hoped that 6-7 months would be enough time to sort these problems out but it only seems to be worsening and this incident kinda makes it more apparent that there are more serious architectural issues, i.e. the whole thing running on one single postgresql instance that can't be restored in one day.

We have one gitlab issue on gitlab.com to create automated backups of all our projects so that we could migrate to our own hosted instance (or perhaps github) but afair gitlab.com does not support exporting the issues. This currently locks us into gitlab.com.

On one hand I'm grateful to you guys because of the great service as we haven't paid a penny, on the other hand I feel that it was a big mistake picking gitlab.com since we could be paying GitHub and be productive instead of watching your twitter feed for a day waiting for the postresql database to be restored. If anyone can offer a paid hosted gitlab service that we could escape to, I'd be curious to hear about.


Meant to mention this earlier: Gitlab self-hosted actually has a built-in importer to import projects from Gitlab.com - including issues.

It's mostly worked reliably in my experience (it's only failed to import one project across the various times I've used it, and I didn't bother debugging because for that import we really only needed the git data).


Ping me and we'd be happy to discuss hosted Gitlab for you.


I'm a bit curious here. Do you think that your issues with scalability and reliability have to do with your tech choice (I think it was Ruby on Rails)? Don't want to bash Rails, I'm just genuinely curious, since I come from a Rails background as well and have seen issues similar to yours in the past.


It's not just the tech stack, but a combination of the technical choices made and with the human procedures behind them. We're actively pushing towards getting everybody to focus on scalability, but there's still a lot of debt to take care of.


You can check out their codebase here: https://github.com/gitlabhq/gitlabhq

Just looking at their gemfile is rather telling: a couple hundred gems. I've always felt that if you're going above 100, you should carefully consider how much your codebase is trying to achieve.

They're probably at the point where they really want to think about splitting off of their monolith codebase and into microservices.


Yeah, given how their ops situation is, I don't think that would be a good idea.


Maybe it's because I'm familiar with almost all of the gems, but I don't see anything wrong with their Gemfile. It's a pretty complex project, and they really do have a ton of integrations and features that need those gems.

There's probably a few small libraries that they could have rewritten in a few files (never a few lines), but what's the point? The version is locked, and code can always be forked if they need to make changes (or contribute fixes).


> (never a few lines)

You'd be surprised what you can do by carefully considering what the desired outcome actually needs to be.

Maybe there is justification for all the gems in gitlab's Gemfile, I didn't go through it with a fine tooth comb - but this reaffirms my experience that complex projects outgrow monolith codebases. Having an infrastructure outage take down your entire business is kind of a symptom of that.


> I've always felt that if you're going above 100, you should carefully consider how much your codebase is trying to achieve.

This is a mindset issue. Some communities reject NIH so strongly that you get the opposite problem that everything depends on hundreds of different developers. Gitlab can start some library forks with more stuff integrated, or change communities. Microservices is something that can't help, as all the dependencies will stay just where they are (Gitlab is already uncoupled to some extent).

But, anyway, most of those are stable¹, and I doubt many of Gitlab problems come from dependencies.

1 - They are unbelievably stable for somebody coming from the Python world. When I first installed Gitlab, I couldn't believe on how easy it was to get a compatible set of versions.


I see the opposite of NIH especially in the RoR/Ruby world and I don't think it's always a good thing. Developers reach for a library for one piece of functionality in a discrete area of the codebase when they could have achieved the same functionality with a few lines of code. That's not automatically NIH, that's being pragmatic about the dependencies you're bringing in and are going to need to support moving forward.


It is fairly large, but I still find it more organized than some examples I've seen.

Also, I don't see another very common issue with big gemfiles in that they don't seem to have multiple solutions of one thing in there (ie multiple REST clients, DB mockers, etc).


I've considered setting up gitlab locally, and have a couple of students that are trying to set it up on a vps. Customizing their bundle installer is... an interesting learning experience in managing complex * nix servers.

I think it's telling that their standard offering/suggestion for self-hosters is as complex as it is. While on the one hand I applaud the poor soul that maintains the script that tries to orchestrate five(?) services on a general, random, unix/linux server without any knowledge/assumption on what other things are running there -- it unsurprisingly falls over in "interesting" ways when you try to do radical stuff like install it on a server that runs another copy of nginx with various vhosts etc.

Now, running services like gitlab at "Internet scale" is far from trivial - but running it at "office scale" should be.

I fully understand how gitlab ended up where they are - but ideally, the self-host version should just need to be pointed at a postgresql instance, and be more or less a "gem install gitlab" -- or similar away - popping up with some ruby web-server on a high port on localhost -- and come with a five-line "sites"-config for nginx and apache for setting up a proxy.

I really don't mean to complain - it's great that they try to provide an install that is "production ready" -- but if the installer reflects the spirit of how they manage nodes on the gitlab.com side -- I'm surprised they manage to do any updates at all with little down-time...

For now I'm running gogs - and it seems to be more of a "devops" developed package - where deployment/life-cycle has been part of the design/development from the start. Single binary, single configuration file. Easily slips in behind and plays well with simple http proxy setups.

At some point I'll find a day or two to migrate our small install to gitlab (we could use the end-user usability and features) -- but I know I'll need to have some time for it. Time to migrate, time to test the install, time to test disaster-recovery/reinstall from backup... all those steps are slowed down and become more complex when the stack is complex.

(I'll probably end up letting gitlab have a dedicated lxc container, although I'll probably at least try to figure out how to reliably use an external postgres db -- it pains me to "bundle" a full fledged RDBMS. These things are the original "service daemons", along with network attached storage and auth/authz (LDAP/AD etc)).


LOL. GitHub is also a RoR shop.


It might be. I'm not saying it's impossible to scale Rails. It's just very, very hard. Github can do this, because they probably get the best of the best engineers. They even used to have their own, patched Ruby version.

Not everyone can afford that.


Why do you question Rails while the entire report is about Postgres only ?

And as someone working on one of the biggest and oldest Rails codebase out there, I can tell you that in term of scaling, Rails is the least of our concerns.

Sure it's not as efficient, so it's gonna cost you more in CPU and RAM, but it's trivial to scale horizontally. The real worry are the databases, they are fundamentally harder to scale without tradeoffs.

As for the patched Ruby, we used to have one too (but our patches landed upstream so now we run vanilla). It's not about allowing to scale at all. It's simply that once you reach a certain scale, it's profitable to pay a few engineers to improve Ruby's efficiency. If you have 500 app servers, a 1 or 2% performance gain will save enough to pay those engineers salary.


Depending on hundreds of gems means you are depending on the decisions of hundreds of developers with packages which are in constant churn.

Apps like Gitlab and Discourse that depend on hundreds of gems and require end users to have complex build environment and compile software are I think operating a broken user hostile model.

The potential for compilation failures, version mismatches and Ruby oddities like RVM is so gigantic with hundreds of man hours wasted one is left to conclude they may actually want to run a hosting business and not have users deploy themselves.

Compare that to Go or even PHP where things are so orders of magnitude simper that it is not even the same thing. To deal with this complexity you now have containers but have you solved the complexity or added another layer of complexity? There are technical but I think also social factors at play here.


Regardless of wether I agree or disagree with your critique, it has absolutely no relevance in the context of the current outage.

You don't like Ruby / Rails we get it. But that's totally out of topic.


I don't think it's that. GitLab IS a complex setup and Rails is not helping making it simple. There is a ton going on in the stack and the company only has limited resources.


It's not hard to scale a Rails server, when compared to other frameworks and languages. It's exactly the same as scaling a server written in Java, Node.js, Python, or any other language. You just spin up more machines and put them behind a load balancer.

Yes, Ruby is slower than other programming languages, but this usually doesn't matter. If you are charging people to use your software, or even if you are serving ads, you will always be making money before you need a second server. Plus, Rails is super productive, so you'll be able to build your product much faster.

I'm not sure why GitHub used a patched Ruby version, but no, that's not necessary.

Having said all of that, I'm moving towards Elixir and Phoenix. Not just because of the performance, but also because I really like the language and framework.


Nah this is just about having a robust backup system


I have searched the gitlab website and repositories looking for processes and procedures addressing change management, release management, incident management or really anything. I have found work instructions but no processes or procedures. Until you develop and enforce some appropriate processes and the resulting procedures I'm afraid you will never be able to deliver and maintain an enterprise level service.

Hopefully this will be the learning experience which allows you to place an emphasis on these things going forward and don't fall into the trap of thinking formal processes and procedures are somehow incongruent with speedy time to market, technological innovation or in conflict with DevOps.


Like you, I would like to add my 2 cent, which I hope will be taken positively, as I would like to see them provide healthy competition for GitHub for years to come.

Since GitLab is so transparent about everything, from their marketing/sales/feature proposals/technical issues/etc., they make it glaringly obvious, from time to time, that they lack very fundamental core skills, to do things right/well. In my opinion, they really need to focus on recruiting top talent, with domain expertise.

They (GitLab) need to convince those that would work for Microsoft or GitHub, to work for GitLab. With their current hiring strategy, they are getting capable employees, but they are not getting employees that can help solidify their place online (gitlab.com) and in Enterprise. The fact that they were so nonchalant about running bare metal and talking about implementing features, that they have no basic understanding of, clearly shows the need for better technical guidance.

They really should focus on creating jobs that pays $200,000+ a year, regardless of living location, to attract the best talent from around the world. Getting 3-6 top talent, that can help steer the company in the right direction, can make all the difference in the long run.

GitLab right now, is building a great company to help address low hanging fruit problems, but not a team that can truly compete with GitHub, Atlassian, and Microsoft in the long run. Once the low hanging fruit problems have been addressed, people are going to expect more from Git hosting and this is where Atlassian, GitHub, Microsoft and others that have top talent/domain expertise, will have the advantage.

Let this setback be a vicious reminder that you truly get what you pay for and that it's not too late to build a better team for the future.


> They really should focus on creating jobs that pays $200,000+ a year, regardless of living location

For those who haven't been following along, Gitlab's compensation policy is pretty much intentionally designed to not pay people to live in SF. It's a somewhat reasonable strategy for an all remote company. But they seem to have some pretty ambitious plans that may not be compatible with operating a physical plant.


> pretty ambitious plans

I would point you to some very ambitious feature proposals on their issue tracker, but I can't for obvious reasons. I think GitLab is at a cross roads and this setback might be the eye opener they need. Moving forward, they really need to re-evaluate how they develop and evolve GitLab. For both online and Enterprise.

This idea of releasing early and on the 22nd works very well for low hanging fruits problems, but not for the more ambitious plans they have. If they understood the complexity for some of the more ambitious plans, they would know they are looking at, at least a year of R&D to create an MVP.

I think it makes sense to keep doing the release on the 22nd, but they also need to start building out teams that can focus on solving more complex problems that can take months or possibly a year to see fruition. Git hosting has reached a point, where differentiating factors can be easily copied and duplicated, so you are going to need something more substantive, to set yourself apart from the rest. And this is where I think Microsoft may have the upper hand in the future.


> I think GitLab is at a cross roads and this setback might be the eye opener they need. Moving forward, they really need to re-evaluate how they develop and evolve GitLab.

Judging by their about team[1] page, they are currently short an Infrastructure Director. When you read their job listings, even for DBAs and SREs, it' all "scale up and improve performance." Very little "improve uptime, fight outages." One assumes it's upper management approving the job descriptions, so the missing emphasis on uptime, and redundancy probably pervades the culture. And again, judging by the team profile, they've hired very few DBA / SRE experts, and instead appear to have assigned Ruby developers to the tasks.

Perhaps they simply have to bet the farm on scaling much larger to sustain the entire firm, which is troubling for enterprise customers, and for teams like mine running a private instance of the open source product. Should probably review the changelog podcast interview[2] with the CEO and see if any quotes have new meaning after today.

[1]: https://about.gitlab.com/team/ [2]: https://changelog.com/podcast/103


What is Microsoft doing in this space? I honestly don't know, so not trying to be a jerk.



> Gitlab's compensation policy is pretty much intentionally designed to not pay people to live in SF.

What do you mean? They pay people in SF much more than in other cities because the high cost of living. I'd consider working for Gitlab if I would live in SF, living in Berlin it's not an option.


Look, I love GitLab. Gitlab was there for me when both my son and I got cancer, and they were more than fair with me when I needed to get healthy and planned to return to work. I have nothing but high praises for Sid and the Ops team.

With that said, I'll agree that the salary realities for GitLab employees are far below the base salary that was expected for a senior level DevOps person. I've got about 10 years experience in the space, and the salary was around $60K less than what I had been making at my previous job. I took the Job at GitLab because I believe in the product, believe in the team, and believe in what Gitlab could become...

With that said, starting from Day 1, we were limited by an Azure infrastructure that didn't want to give us Disk iops, legacy code and build processes that made automation difficult at times, and a culture that proclaimed openness, but, didn't really seem to be that open. Some of the moves that they've made (Openshift, rolling their own infrastructure, etc) have been moves in the right direction, but, they still haven't solved the underlying stability issues -- and these are issues that are a marathon, not a sprint. They've been saying that the performance, stability, and reliability of gitlab.com is a priority -- and it has been since 2014 -- but, adding complexity of the application isn't helping: if I were engineering management, I'd take two or three releases and just focus on .com. Rewrite code. Focus on pages that return in longer than 4 seconds and rewrite them. When you've got all of that, work on getting that down to three seconds. Make gitlab so that you can run it on a small DO droplet for a team of one or two people. Include LE support out of the box. Work on getting rid of the omnibus insanity. Caching should be a first class citizen in the Gitlab ecosystem.

I still believe in Gitlab. I still believe in the Leadership team. Hell, if Sid came to me today and said, "Hey, we really need your technical expertise here, could you help us out," I'd do so in a heartbeat -- because I want to see GitLab succeed (because we need to have quality open source alternatives to Jira, SourceForge Enterprise Edition, and others).

Not trying to be combative, but, "You truly get what you pay for" seems a little vindictive here -- the one thing that I wish they would have done was be open with the salary from the beginning -- but, Sid made it very clear that the offer that he would give me was going to be "considerably less" than what I was making.


> They really should focus on creating jobs that pays $200,000+ a year, regardless of living location, to attract the best talent from around the world. Getting 3-6 top talent, that can help steer the company in the right direction, can make all the difference in the long run.

SIGN ME UP! That would be a freaking great opportunity!!


> SIGN ME UP! That would be a freaking great opportunity!!

I think you asking for the job, might be a signal, that you are not who they are looking for :-)


Yup - top talent is already making more. Gitlab needs to recruit with purpose (this is what we're doing and why), environment (remote first, transparency, etc), and pay (we can match 70% of what you'd get at XYZ Company). Right now, it feels like they're capped at 30-50% of what someone could make at a big org, which is just a drop in salary most people would never take, regardless of the company values/purpose.

One alternate idea would be to hire consultants on a temporary basis. You may not be able to pay $250k a year, but you could pay a one time $40k fee to review the architecture and come up with prioritized strategy for disaster recovery and scalability.


Why would they try to recruit from Microsoft? Most of the software engineers at Microsoft are not focused on developing scalable web services architectures. And the ones that do have built up all of their expertise with Microsoft technologies (.net running on Windows server talking to mssql).

>Microsoft and others that have top talent/domain expertise, will have the advantage.

Again, Microsoft isn't even in this same field (git hosting) or if they are, are effectively irrelevant due to little market/mindshare. Are you an employee there or something?


> Most of the software engineers at Microsoft are not focused on developing scalable web services architectures.

Uh, MS literally runs Azure, which may not be the biggest IAAS offering, but is certainly vastly larger and more complex than Gitlab. There are certainly numerous engineers at MS who would have experience relevant to Gitlab (though perhaps not with their particular tech stack). It may not be most of the engineers there, but in a company with literally tens of thousands of engineers, there are few things that will be true of most of them.

> Microsoft isn't even in this same field (git hosting)

How is what they're hosting at all relevant to the problem at hand? This could have happened regardless of what the end product was - it's a database issue. In fact, the git infrastructure was explicitly not involved in this issue - it was only their DB-backed features that had data loss.

Additionally, Microsoft is in the business of git hosting, if only tangentially. TFS supports git, and has since 2013: https://blogs.msdn.microsoft.com/mvpawardprogram/2013/11/13/... Your objection is both unkind and factually incorrect. The "mindshare" comment is a bit silly - even though they may not be as active on forums like HN, developers working on MS technologies are still one of the largest groups in programming (as a non-MS developer looking for work in the Pacific Northwest, this is something I'm constantly reminded of). I doubt your estimate of Microsoft's real mindshare is anything close to accurate.

> Are you an employee there or something?

This accusation is eminently not in the spirit of HN, and Microsoft was hardly the only company he mentioned. Whatever your personal vendetta against them, it's absurd to think that Microsoft is not one of the top pools of talent in tech - they're a huge company with a vast variety of offerings and divisions.


> Why would they try to recruit from Microsoft?

I'm not sure if you read my post correctly, but I never mentioned poaching from Microsoft. I said compete for programmers that would choose to work for Microsoft. I'm also not sure if you understand what Microsoft does. It's a very diverse company with R&D spending that rivals some small nations.

> Microsoft isn't even in this same field (git hosting

I guess you haven't heard of https://www.visualstudio.com/team-services/ and their on premise TFS solution that supports Git.

Microsoft understands Enterprise and it's quite obvious they want to be a major provider for Git hosting. It will be foolish to believe Microsoft is not focused on owning the Git mindshare in Enterprise.

> Are you an employee there or something

No. Just somebody that understands this problem space.


One of the main drivers of revenue for Microsoft is Office 365, with 23.1 million subscribers[0]. Along with Azure, MS runs some of the largest web services around. Most developers at MS don't necessarily work on these products, but to say that all the devs working on them use a simple .NET stack + SQL Server is discrediting a lot of work that they do.

Disclaimer: I work for Microsoft in the Office division and opinions are my own

[0] https://www.microsoft.com/en-us/Investor/earnings/FY-2016-Q4...


>I work for Microsoft in the Office division

Hey there, honest question incoming. Any chances of you chaps making Word a better documentation tool in the future? Edit history storing formatting and data changes on the same tree is making it impossible to use Word for anything serious. This really comes to light once you start working at an MS tech company on documentation, where it is obvious that you should use MS products for work. Some tech writers I know just end up using separate technology branches for their group efforts, since neither Sharepoint nor Word is a professional tool for this job.


Hotmail, MSN, Skype, msdn.com, microsoft.com, the Windows Update Servers, Azure.

Microsoft has a ton of people with experience in building cloud system, either in-house people or people from aquisitions.

Microsoft has so many employees and domains of activity that you can probably find an engineer for any domain you're looking for.


>us devopish types are both quite costly and quite rare - but I think operating the service as they do today seriously tarnishes the Gitlab brand.

The sad thing is it doesn't have to be this way. Software stacks and sysadmin is out there for the learning, but due to the incentives of moving jobs every two years, nobody wants to invest to make those people, we all know we'l find /someone/ to do it anyways.


I think they are running to catch up on the gitlab system itself, let alone running it as a production service. The bugs in the last few months have been epic. Backups not working, merge requests broken, chrome users seeing bugs, chaotic support. Basically their qa and release processes are not remotely enterprise ready.


If I understand correctly, the public Gitlab is similar to what you can get with a private Gitlab instance. That makes me wonder, instead of trying to scale the one platform up, would it be OK to spin up a second public silo? I mean yeah, it would be a different silo, but for something free I'd say "meh".

I think it's totally fine admitting when you've stopped being able to scale up, and need to start scaling out.


They could, and as a stopgap measure that might work, but..

(1) Some of the collaboration features (e.g. work on Gitlab itself) depend on having everyone on the same instance.

(2) Gitlab.com gives them a nice dogfood-esque environment for what it's like to actually operate Gitlab at scale. If they are having problems scaling it, then potentially so are their customers. Fixing the root cause is usually a good thing and is often an imperative to avoid being drowned in technical debt.

(3) It moves the problem in some respects. Modern devops techniques mostly allow the number of like servers to be largely irrelevant, but still.. the more unique instances of Gitlab, the more overhead there will be managing those instances (and figuring out which projects/people go on which instances).

It's a simple approach which I'm sure would work, but it also means a bunch of new problems are introduced which don't currently exist.


There is a free version and paid version.

The one they offer for free is the paid version.

You can run your own, but you won't have every feature unless you pay.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: