Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a great write-up and I love all the different ways they collected and analyzed data.

That said, it would have been much easier and more accurate to simply put each laptop side by side and run some timed compilations on the exact same scenarios: A full build, incremental build of a recent change set, incremental build impacting a module that must be rebuilt, and a couple more scenarios.

Or write a script that steps through the last 100 git commits, applies them incrementally, and does a timed incremental build to get a representation of incremental build times for actual code. It could be done in a day.

Collecting company-wide stats leaves the door open to significant biases. The first that comes to mind is that newer employees will have M3 laptops while the oldest employees will be on M1 laptops. While not a strict ordering, newer employees (with their new M3 laptops) are more likely to be working on smaller changes while the more tenured employees might be deeper in the code or working in more complicated areas, doing things that require longer build times.

This is just one example of how the sampling isn’t truly as random and representative as it may seem.

So cool analysis and fun to see the way they’ve used various tools to analyze the data, but due to inherent biases in the sample set (older employees have older laptops, notably) I think anyone looking to answer these questions should start with the simpler method of benchmarking recent commits on each laptop before they spend a lot of time architecting company-wide data collection



I totally agree with your suggestion, and we (I am the author of this post) did spot-check the performance for a few common tasks first.

We ended up collecting all this data partly to compare machine-to-machine, but also because we want historical data on developer build times and a continual measure of how the builds are performing so we can catch regressions. We quite frequently tweak the architecture of our codebase to make builds more performant when we see the build times go up.

Glad you enjoyed the post, though!


I think there's something to be said for the fact that the engineering organization grew through this exercise - experimenting with using telemetry data in new ways that, when presented to other devs in the org, likely helped them to all level up and think differently about solving problems.

Sometimes these wandering paths to the solution have multiple knock-on effects in individual contributor growth that are hard to measure but are (subjectively, in my experience) valuable in moving the overall ability of the org forward.


I didn't see any analysis of network building as an alternative to M3s. For my project, ~40 million lines, past a certain minimum threshold, it doesn't matter how fast my machine is, it can't compete with the network build our infra-team makes.

So sure, an M3 might make my build 30% faster than my M1 build, but the network build is 15x faster. Is it possible instead of giving the developers M3s they should have invested in some kind of network build?


Network full builds might be faster, but would incremental builds be? Would developers still be able to use their favourite IDE and OS? Would developers be able to work without waiting in a queue? Would developers be able to work offline?

If you have a massive, monolithic, single-executable-producing codebase that can't be built on a developer machine, then you need network builds. But if you aren't Google, building on laptops gives developers better experience, even if it's slower.


i hate to say it but working offline is not really a thing at work anymore. it is no one thing, but a result of k8s by and large. i think a lot of places got compliant when you could just deploy a docker image, fuck how long that takes and how slow it is on mac


That depends entirely on the tech stack, and how much you care about enabling offline development. You can definitely run something like minikube on your laptop.


That is a very large company if you have a singular 40 million line codebase, maybe around 1000 engineers or greater? Network builds also takes significant investment in adopting stuff like bazel and a dedicated devex team to pull off most of the time. Setting up build metrics to determine a build decision and the other benefits that come from it is a one month project at most for one engineer.

It's like telling an indie hacker to adopt a complicated kubernetes setup for his app.


1,000 is a small company.


A company with ~1000 software engineers is probably in the top 100 in software company sizes in the world, especially if they are not a sweat shop consultancy, bank or defense contractor, which are all usually large companies themselves.


I mean, the vast majority of software engineers in the world are not in software engineering companies. If we are purposefully limiting ourselves to Bay Tech companies, then sure, I guess 1,000 software engineers is big. But the companies in the world that are around the top 100 employers in the world, like you suggested, these places have 150,000 to 250,000 employees. 1,000 programmers for internal CRUD and system integration is quite realistic; the IT staff alone for a company that size is like 5,000 people, that is not even accounting for their actual business (which these days almost always has a major software component).

This is also not including large government employers like the military, intelligence, or all of the large services like the mail.

1,000 software engineers is simply not that much.


They said 1000 engineers. Surely a company consists of roles other than software engineers, right?


Maybe, but I feel that s not the point here


What do you mean by network build?


They probably mean tools like distcc or sccache:

https://github.com/distcc/distcc

https://github.com/mozilla/sccache



Dedicated build machines.


> This is a great write-up and I love all the different ways they collected and analyzed data.

> [..] due to inherent biases in the sample set [..]

But that is an analysis methods issue. This serves as a reminder that one cannot depend on AI-assistants when they are not themselves enough knowledgeable on a topic. At least for the time being.

For once, as you point, they conducted a t-test on data that are not independently sampled, as multiple data points were sampled by different people, and there are very valid reasons to believe that different people would have different tasks that may be more or less compute-demanding, which confound the data. This violates one of the very fundamental assumptions of the t-test, which was not pointed out by the code interpreter. In contrast, they could have modeled their data with what is called "linear mixed effects model" where stuff like person (who the laptop belongs to) as well as possibly other stuff like seniority etc could be put into the model as "random effects".

Nevertheless it is all quite interesting data. What I found most interesting is the RAM-related part: caching data can be very powerful, and higher RAM brings more benefits than people usually realise. Any laptop (or at least macbook) with more RAM than it usually needs has most of the time its extra RAM filled by cache.


I agree, it seems like they were trying to come up with the most expensive way to answer the question possible for some reason. And why was the finding in the end to upgrade M1 users to more expensive M3s when M2s were deemed sufficient?


If employees are purposefully isolated from the company's expenses, they'll waste money left and right.

Also, they don't care since any incremental savings aren't shared with the employees. Misaligned incentives. In that mentally, it's best to take while you can.


Are m2s meaningfully cheaper? M1s are still being sold at their launch price


Because M2s are no longer produced.


I would think you would want to capture what/how was built, as like:

* Repo started at this commit

* With this diff applied

* Build was run with this command

Capture that for a week. Now you have a cross section of real workloads, but you can repeat the builds on each hardware tier (and even new hardware down the road)


The dev telemetry sounds well intentioned… but in 5-10 years will some new manager come in and use it as a productivity metric or work habit tracking technique, officially or unofficially?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: