How to start a new Go project: go mod init mymodule Go's default toolchain is fi...

sethammons · on May 23, 2023

I'm on the vendor bandwagon; always have been. I don't want a github outage to dictate when I can build/deploy. Yes, that happened. That is why we vendor :).

Now you can set up a proxy server; however, I don't want to do that. I'm pretty sure I have a few vendored packages that no longer exist at their original import path. For code reviews, we put off checking in the vendor path til the end if possible.

programd · on May 24, 2023

I have to strongly agree. Third party repos move, code on the internet disappears or silently changes, connectivity goes away at the most awkward time. You always want a point-in-time copy of your code and all dependencies under your control. Sometimes even for legal or security reasons.

Always vendor your dependencies in your private Git repo or a proxy you control. Or heck, even in some long term backup solution if you must. Experience trumps theory.

gwd · on May 24, 2023

> I don't want a github outage to dictate when I can build/deploy. ...I'm pretty sure I have a few vendored packages that no longer exist at their original import path.

Golang now has an automatic transparent caching proxy at pkg.go.dev. If your build has ever worked, it should continue to work even if the original source goes away. Furthermore, your build should only break if both pkg.go.dev goes down, and the upstream source is unavailable (is down or has moved).

djbusby · on May 23, 2023

I do all my vendor in a "cache-proxy" thing (for lots of vendors). That box always runs, I just need upstream the first time I get the package. Doesn't bloat my code, makes sure package is available and makes audits of vendor stuff easy.

jasonwatkinspdx · on May 23, 2023

I like vendoring in most languages as it means I can follow all the code flow easily in my editor when debugging something.

BreakfastB0b · on May 23, 2023

I have zero trouble doing that in vscode without vendoring.

vardump · on May 23, 2023

UPX only means smaller files on the disk, but it comes with a cost: it tends to increase memory requirements, because the binary on the disk cannot be mapped to memory anymore. Unless it's uncompressed somewhere in the filesystem.

Worse, if you run multiple instances of the same binary, none of them can be shared.

A bit simplified, without UPX, 100 processes of 100 MB binaries requires only 100 MB RAM for the code, but with UPX 10 GB.

Edit: In reality, likely only a fraction of that 100 MB needs actually to be mapped into memory, so without UPX true memory consumption is even less than 100 MB.

foobiekr · on May 24, 2023

All true, but I think a compressed iso9660fs can actually support dynamic paging - the pages are decompressed into memory, obviously, but can be demand paged without staging them to media.

cced · on May 23, 2023

Can you expand on this a bit? I use upx at work to ship binaries. Are you saying these binaries have different memory usage upx’d than they do otherwise?

vardump · on May 23, 2023

Normally operating system simply maps binaries, executables and loadable libraries (.dylib, .so, .dll, etc.) into memory. The cost is approximately same whether you do this once or 1000 times. The code is executed from the mapped area as-is.

However, when a binary is compressed, this cannot work, because in the file the binary is represented as a compressed data. The only way you can work around is to allocate some memory, decompress the binary there, map the region as executable and run it from there. This results a non-shareable copy of the data for each running instance.

A random link about this issue in practice: https://github.com/nushell/nushell/issues/4131

themerone · on May 23, 2023

Every instance of a program will use an amount of ram equal to the uncompressed size.

If the application is uncompressed, the uncompressed executable will be loaded into ram 1 time and be reused by every instance of the application.

mikepurvis · on May 23, 2023

Also impacts startup time. Really it's only appropriate for situations like games where you're very confident there will be just one instance of it, and it'll be long-running.

And even then, it's of dubious value when game install footprints are overwhelmingly dominated by assets rather than executable code.

ashishb · on May 23, 2023

Any source that you can cite for this? I'm not saying you are wrong. I'm just curious to see more proof of this.

themerone · on May 23, 2023

https://stackoverflow.com/questions/9219244/why-does-my-appl...

natmaka · on May 24, 2023

On Linux isn't KSM able to compensate for this?

vardump · on May 24, 2023

KSM has some security implications, possibility for side channel attacks.

ashishb · on May 24, 2023

Wow. Thanks.

foobiekr · on May 24, 2023

Containers really bust this model.

Karrot_Kream · on May 23, 2023

I'm curious, was the practice of using upx there before you got there? We generally A/B test changes like this pretty thoroughly by running load tests against our traffic and looking at things like CPU and Memory pressure in our deploys.

ovao · on May 24, 2023

> the checked in code is impossible to review

While there are valid arguments against vendoring dependencies, I’m not convinced this is one of them in the typical case. It’s exceptionally easy to ignore certain directories when reviewing PRs in GitHub (although I still wish this was available as a repo-level preference), and I’d hope at least this would be the same in Gitlab, BitBucket, etc. I don’t review vendored dependencies, and I wouldn’t expect anyone else to, although the utility of that is admittedly domain-dependent.

Go also has the benefit that its dependencies tend not to be in deep chains, so the level of repo bloat when vendoring is usually not too terrible, at least relatively speaking.

crdrost · on May 24, 2023

Yeah, if you have a problem with it split it into two separate commits to review separately.

But WTF is this about not reading your dependencies. Read your dependencies! It is the most amazing superpower for someone to be like “Uh I don't know how Redux handles that and you can just tell them because you have read Redux. And that's also how you'll know, hey, do they have tests, are they doing weird things with monkeypatching classes or binaries at runtime, “oh the request is lazy—it doesn't get sent unless you attach a listener to hear the response,” what would it look like for the debugger to step through their code and is that reasonable for me to do or will I end up 50 layers deep into the call stack before the code actually does the thing.

I get it, this dependency is 100,000 LOC and if you printed it out that's basically 5 textbooks of code, you'd need a year to read all of that and truly understand it... Well don't use that dependency! “But I need it for dependency injection...” I mean if that's all then use a lightweight one or roll a quick solution in a day or explicitly inject your dependencies in 5 pages or or or. My point is just that you have so many options. If that thing is coming at 5 or 50 textbooks or whatever it is, what it actually means is that you are pulling in something with a huge number of bells and whistles and you plan on using 0.1% of them.

gjadi · on May 24, 2023

In this context, what would be useful is something like a linker-pruning at the source level.

That is, when your code is compiled, the linker can prune code that is never called. Then a feedback mechanism could show which part of the code is actually used (like looking in the .map of the linker).

Does something like that exist?

crdrost · on May 24, 2023

Google's Closure compiler was doing this for JavaScript, where it matters because network bandwidth is a limited resource in some places. There it was called “tree shaking” if you want the jargon name for it.

gjadi · on May 25, 2023

Thanks!

lima · on May 25, 2023

But then it's trivial to sneak a backdoor past code review.

didip · on May 23, 2023

There is a benefit to using "go mod vendor". Some corporate environments lock down their CI/CD pipelines. By vendoring everything, the CI/CD does not need to make external HTTP calls.

pjmlp · on May 24, 2023

They shouldn't do external calls anyway, all legally approved packages should be hosted on internal server.

ownagefool · on May 23, 2023

So, I don't bother with vendoring my dependencies ( usually ), but you have it the wrong way round.

Vendoring would make it more likely you're gonna review the changes, be ause you can quickly eyeball whether or not changes look significant, which is something you often won't get out of a go.sum change.

fsociety · on May 23, 2023

Unless you import a dependency which totals several hundred thousand lines of code.

tonyhb · on May 23, 2023

Make your git commit history good? `go mod vendor` in a separate commit to your PR changes. Review the commit with local code changes. Easy.

justinclift · on May 24, 2023

That's not totally without cost though, as it can break workflows that cherry pick commits between branches. eg main/master branch vs stable release branches

ownagefool · on May 24, 2023

I don't think anyone is saying it's without cost, just that there are certain circumstances where you might want to bare the cost.

There's a generic question of how you build confidence in your dependcies not being compromised, and there's steps you can take to mitigate that without reading code, but if everyone was adopting that stance then we'd likely have no mitigations

latchkey · on May 23, 2023

> There's no point in stripping a binary or even using UPX on it unless you're targeting extremely low memory environments

I really dislike absolutes like this.

My target is 30,000+ servers and distributing a binary to all of them is a lot easier when it is 3m than when it is 26m.

nemothekid · on May 23, 2023

If the problem is distribution, what's wrong with gzip? All the upsize of UPX and none of the downsides. If your distribution method is http, then you don't even have to write any code other than setting a Content-Encoding header.

latchkey · on May 23, 2023

gzip doesn't make it small enough.

3mb is after `xz -z -9e`.

But, if you start with something smaller, you generally get something even smaller.

I tried UPX, but ended up with just `-s -w` (and xz), simply because UPX was taking too long to build the binary in CI.

More importantly though, I was responding to OP's absolute.

Thaxll · on May 23, 2023

I don't really believe that, at the speed of nic it makes pretty much 0 difference even on 30k servers. Shaving couple of ms at worse few seconds vs modifing a binary, def not worth it.

latchkey · on May 23, 2023

The servers are not all on gige. Many are on 100mbit and yes, that saturates the network when they are all updating. I learned through trial and error.

The updates are not pushed, they are pulled. Why? Because the machines might be in some sort of rebooting state at any point. So trying to first communicate with the machine and timeouts from that, would just screw everything up.

So, the machines check for an update on a somewhat random schedule and then update if they need to. This means that a lot of them updating at the same time would also saturate the network.

Smaller binaries matter.

hnlmorg · on May 23, 2023

I’m curious why you’ve got servers on 100Mb. Last time I ran a server on 100Mb was more than 20 years ago. I remember the experience well because we needed AppleTalk support which wasn’t trivial on GbE (for reasons unrelated to GbE — but that’s another topic entirely).

What’s your use case for having machines on 100Mb? Are you using GbE hardware but dropping down to 100Mb, and if not, where are you getting the hardware from?

Sounds like you might work in a really interesting domain :)

mikepurvis · on May 23, 2023

Not the GP but edge devices on wifi/m2m are another scenario where you're very sensitive to deployment size.

Which can also be solved with compression at various other stages of the pipeline as mentioned by other commenters, but just to say that that's an easy case where this matters.

latchkey · on May 24, 2023

Because the 12 GPUS in them are a lot more important than the networking speed. =)

They were for mining ETH... we've turned them off though now that PoS has been successful.

justinclift · on May 24, 2023

For large-ish scale distributed updates like that, maybe some kind of P2P type of approach would work well?

IBM used to use a variant of Bittorrent to internally distribute OS images between machines. That was more than a decade ago though, when I was last working with that stuff.

latchkey · on May 24, 2023

Answered below. https://news.ycombinator.com/item?id=36052632

Another issue with that is that the systems I was running can go offline at any time. P2P, which could work, kind of wants a lot more uptime than what we had. It would just add some complexity to deal with individual downtime.

hnlmorg · on May 24, 2023

Interesting stuff. Thanks for the insight

latchkey · on May 24, 2023

What I ended up with was really neat.

machine <-> cloudflare <-> github

CI would run, build a binary that was stored as an asset in github. Since the project is private, I had to build a proxy in front of it to pass the auth token, so I used CF workers. GH also has limitations on number of downloads, so CF also worked as a proxy to reduce the connections to GH.

I then had another private repo with a json file in it where I could specify CIDR ranges and version numbers. It also went through a similar CF worker path.

Machines regularly/randomly hit a CF worker with their current version and ip address. The worker would grab the json file and then if a new version was needed, in the same response, return the binary (or return a 304 not modified). The binary would download, copy itself into position and then quit. The OS would restart it a minute later.

It worked exceptionally well. With CIDR based ranges, I could release a new version and only update a single machine or every machine. It made testing really easy. The initial install process was just a single line bash/curl to request to get the latest version of the app.

I also had another 'ping' endpoint, where I could send commands to the machine that would be executed by my golang app (running as root). The machine would ping, and the pong response would be some json that I could use to do anything on the machine. I had a postgres database running in GCP and used GPC functions. I stored machine metrics and other individual worker data in there that just needed to be updated every ping. So, I could just update column and the machine would eventually ping, grab the command out of the column and then erase it. It was all eventually consistent and idempotent.

At ~30k workers, we had about 60 requests per second 24/7 and cost us at most about $300 a month total. It worked flawlessly. If anything on the backend went down, the machines would just keep doing their thing.

xarope · on May 24, 2023

could be IOT or edge type stuff that's POE'd?

pdmccormick · on May 23, 2023

Sounds like an interesting problem to have. Would something peer-to-peer like BitTorrent work to spread the load? Utilize more of the networks' bisectional bandwidth, as opposed to just saturating a smaller number of server uplinks. I recall reading many years ago that Facebook did this (I think it was them?)

latchkey · on May 24, 2023

The complication of implementing BitTorrent isn't worth it at 4mb binary sizes.

Always go with the simplest solution first.

hsn915 · on May 24, 2023

30k servers? Are you operating a botnet?

latchkey · on May 24, 2023

bheadmaster · on May 24, 2023

> Vendoring dependencies using "go mod vendor" is not a good default workflow - it bloats the repo, the checked in code is impossible to review, and is generally a pain to keep up to date. Don't, unless you really have to.

Vendoring dependencies is a nice way of using private Go repositories as dependencies in CI builds without importing any security keys. Vendor everything from dev machine, and build it in CI. You don't even need an internet connection.

cyberax · on May 24, 2023

If you have a CI/CD setup, then it makes sense to set up a module proxy that will just cache everything forever.

bheadmaster · on May 24, 2023

Sure, it makes sense. But that's another moving part in the machinery that you have to configure and maintain. It also makes sense to just keep things simple and vendor dependencies, sacrificing some extra space for simplicity of configuration. It just depends on what tradeoff you're looking for.

pdmccormick · on May 23, 2023

A Go vendoring pattern that I've found very useful is to use two repositories, the first for the main "project" repository, then a second "vendoring" repository that imports the first as a module, and also vendors everything.

This may require a few extra tricks to plumb through, for example, to make all cmd's be externally importable (i.e. in the project repository, transform "cmd/foo/%.go" from being an unimportable "package main" into an importable "cmd/foo/cmdfoo/%.go", then have a parallel "cmd/foo/main.go" in the vendoring repository that is just "func main() { cmdfoo.Main() }", same as you have in the project repository in fact).

Vendoring aside, this is also a useful pattern if you're "go:embed"ing a collection of build artefacts coming from another source, like a frontend HTML/JS/CSS project.

foobiekr · on May 24, 2023

At this point, why not do the clean thing and have a forked repo per dependency. Setting up your "monorepo" like construct is as easy as a gitignore and a json file listing your dependencies and the specific hash, then have a script pull them and do a checkout.

This lifecycle is vastly cleaner and easier to update/control than vendoring, and also forces you to actually have explicit copies of everything your build needs in the same way that vendoring does, but in a cleaner, separated, traceable, manageable way.

lmm · on May 23, 2023

> - Vendoring dependencies using "go mod vendor" is not a good default workflow - it bloats the repo, the checked in code is impossible to review, and is generally a pain to keep up to date. Don't, unless you really have to.

Go's setup is that if you don't vendor your dependencies then your build might break at any time, no?

metaltyphoon · on May 24, 2023

Not if your default $GOPROXY is google. Google will cache that package indefinitely?

liggitt · on May 24, 2023

That's explicitly not guaranteed. From https://proxy.golang.org:

> Why did a previously available module become unavailable in the mirror?

>

> proxy.golang.org does not save all modules forever. There are a number of reasons for this…

(I am a googler, but don't work on the go team – my opinions that projects should vendor and actually review their dependencies are my own)

gwd · on May 24, 2023

> proxy.golang.org does not save all modules forever. There are a number of reasons for this, but one reason is if proxy.golang.org is not able to detect a suitable license.

If you're vendoring something without an appropriate license, you're skating on thin ice legally.

liggitt · on May 24, 2023

That's just one possible reason. The disclaimer does not specify all the possible reasons the proxy would drop a saved version. Treating it more like a cache seems appropriate.

hsn915 · on May 24, 2023

No, packages are stored locally in a "modcache".

Unless you're doing something stupid like "create a clean virtual environment for every build" then yea your build might break if you lose the internet or the packages disappear. Just don't ever do that stupid thing.

wizhi · on May 24, 2023

This is essentially he default when using most CI services, however.

hsn915 · on May 26, 2023

Hence most CI services are stupid

bakoo · on May 23, 2023

> There's no point in stripping a binary or even using UPX on it unless you're targeting extremely low memory environments

Deploing at a large enough scale, perhaps where other optimization options aren't as good or even available, could also be a target.

up2isomorphism · on May 24, 2023

“Not keeping up to date “ is one of the mostly important features nowadays, in fact.

viraptor · on May 24, 2023

> the checked in code is impossible to review

You're not expected to review the committed dependencies any more than you're expected to review the external repositories every time you update go.mod/sum. If you don't care, just ignore those parts - if you do care, you were already doing it.

attentive · on May 23, 2023

also increases startup time

0zemp2c · on May 23, 2023

vendoring is a bit of project smell, but for large teams it removes the confusion of who has what version of a dependency

unfortunately most teams don't schedule a periodic `go mod tidy` so you just end up with ancient deps

most people never read the code of the deps they pull in, so I don't think vendoring provides any security assurances

dilyevsky · on May 23, 2023

> it removes the confusion of who has what version of a dependency

go.mod/sum files already remove that confusion as it’s their intended purpose

icholy · on May 23, 2023

That's not what `go mod tidy` does ...

Paul-Craft · on May 23, 2023

I'd go way farther than "a bit of a project smell." I literally cannot think of a single instance in which vendoring a dependency for any reason other than, say, caching it for CI so you don't have to worry that the maintainer pulls a `left-pad` on you, has gone well.

If the package has bugs, you're far better off either waiting for upstream fixes, working around the bug in your application code, or just switching to a different library. That goes double if the library you're using is missing a feature you need, even if it's scheduled for the next version release.

Unless you're prepared to maintain a full-on fork of the dependency (and, if you do, please make it public), everything about vendoring for these reasons is 100% bad for you for very little incremental benefit. It's like the joke about regular expressions ("You have a problem and think 'I'll use regexes to solve it.' Now you have two problems"), except it's not a joke, and it sucks way more.

TL;DR: Vendoring to cache for CI/build servers, yes. Any other reason, just don't; it's not worth the headaches.

ownagefool · on May 23, 2023

If you work on code that introduces threat to life, you might be prepared to own all the code, even if you don't write it all from scratch.