Takeout is not really a great solution for backup, since it's not incremental. Y...

philsnow · on June 17, 2024

Have you pointed borgbackup or similar at it? i.e. extract the archive to a specific directory, let borg create an archive of it, and then a month later do the sage thing and see if the incremental size is egregiously large? I would expect the overwhelming bulk of data to be media, and those will consume (nearly) zero incremental space with borgbackup or some other deduplicating backup system.

crazygringo · on June 17, 2024

I don't know what the point would be? You still have to perform the entire Takeout, on a disk that already has a previous Takeout, so you always need double the space, and you always need to spend days (?) downloading terabytes (?) of data.

Once you've downloaded the entire new Takeout, there's no reason to deduplicate -- just delete the old Takeout.

philsnow · on June 17, 2024

Ah right, it would still take a long time to download.

My use case is a have a local NAS that i use for backup but i also want things backed up offsite, so i mirror the backups to b2 (and soon to glacier).

I would download and extract the takeout archive locally, then run borg with the NAS as the borg repo. It tries to dedup and only store incremental data in the Borg repo.

If the takeout data consistently has enough of the same “shape”, the b2/s3 storage would only grow by roughly my incremental takeout archive size, rather than storing 200 more GB every time I export a takeout.

So yah, it would use a lot of space locally and temporarily, but the idea for me is to minimize cloud storage but also being able to extract files from older takeout archives.

63stack · on June 17, 2024

The reason for deduplication and incremental backups is that you can recover accidentally deleted photos.

You don't need to keep the previous backup on the disk, it's enough to have it on the backup destination (at least in the case of borg).

Gigachad · on June 17, 2024

I don't mind that it isn't incremental, because it massively reduces the complexity and risk. I download a zip file and that just contains all my data. I don't need to keep all the past copies to build up the final dataset, that one zip just works. There is also no software managing it. If someone hacks my google account and deletes my data. Nothing is going to automatically sync that deletion to my local copy.

sanchez_0_lam · on June 17, 2024

You're right, that it's not the best. It would be better to have an incremental thing. For me it's just the easiest thing to work with.

But, as stated in the blogpost, I'll check gphots-sync in near future.

The problem with that kind of approach is that I need to setup my own infrastructure of some kind to run that sync app on it. With Google Takeout I can just do that to external vendor.