Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you elaborate why would it be simpler to backup terabytes of files instead of just one?


Not GP but one disadvantage of updating one huge file is it's harder to do efficient incremental backups. Theoretically it can still be done if your backup software supports e.g. content-defined chunking (there was a recent HN thread about Google's rsync-with-fastcdc tool). If you choose to store your assets as separate files instead though, you can trivially have incremental backups using off-the-shelf software like plain old rsync [1].

[1]: https://www.cyberciti.biz/faq/linux-unix-apple-osx-bsd-rsync...


> there was a recent HN thread about Google's rsync-with-fastcdc tool

Was this the tool & thread you mean? https://news.ycombinator.com/item?id=34303497?


Yeah that's the one!


Although for SQLite in particular, you can do streaming incremental backups with Litestream: https://litestream.io/


Wow, that is actually an amazing performance curiousity adding parallelism to the mix. I guess this would depend on the M.2 spec?


If you're using 16 PCI 4.0 lanes you max out at 32GB/s, although commercial drives tends to have much lower throughput than that maximum (~7.5GB/s for a good NVMe drive). Cat6a ethernet tops out at 10 gigabits per second, but plenty of earlier versions have lower caps e.g. 1 gigabit. My guess is you'll most likely be limited by either disk or network hardware before needing CPU parallelism, if all you're doing is copying bytes from one to the other.


The other being a network socket in this case? But that socket might be two servers over? Meh, ideally they've optimized that as well.

So absolutely it is a network problem which means custom fiber?


Oh, sorry — by "copying bytes from one to the other," I meant copying bytes from the disk to the network interface controller on the same physical computer. It's true that beyond that it'll depend on the network topology connecting you to where you want the data to be, and how fast the machines in between and on the other end are!

I don't know enough about custom fiber to know whether that will help stretch past being network-bottlenecked — most NICs max out at 10 gigabits/second, but I've heard of faster ones. Eventually you might be able to make yourself disk-limited... Either way, backing up one file is probably easier than backing up a zillion files scattered around the filesystem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: