I have been using this for years and it's great software. One tip, store each different email account in it's own "database" my crontab looks like this:
I sync it all locally to my house then back it up to Dropbox as well. The reason to store them in different datebases is you cannot "filter" them out when restoring so if they all go to the same DB if you restore you are restoring ALL your email across all accounts to one new account.
Chiming in to second this comment for anyone who is skeptical of using gmvault. I too have used it for years with great success. Thanks to the author for creating it!
I believe it is all flat-file. When I say "Database" I am just referencing what they call it:
-d DB_DIR, --db-dir DB_DIR
But from what I see the structure looks like this:
db/
YYYY-MM/
1234554543262346.eml.gz - I assume the meat and potatoes of the email along with attachments, not sure
1234554543262346.meta - JSON file with msg_id, thread_id, flags, labels, subject, etc
if you want to try something that downloads Gmail via imap and indexes it into an sqlite3 db (with FTS5 fulltext of from/to/cc/bcc/subject/body fields) and extracts attachments to filesystem, take a look at a recent project of mine:
It also saves the raw .eml files to disk. No support for labels (yet), but it does properly link up threads in the db using `References` from the parsed headers (setting both MPTT and adjacency-list fields)
FWIW regarding speed, i was able to download, index & extract my entire INBOX + Sent Items (14k emails, 3.5GB total) in < 10min on a fast connection. the limiting factor by far was connection/imap speed.
I'm interested to know what the pros and cons are of this utility vs using the Google takeout functionality? I like the idea of this project but I don't know what it would gain me over Google's native export? Is it the restoration that's missing from Google's service?
My coworker and I both tried this yesterday, and today we both received an email with this error message: "Sorry, we encountered a problem when creating your Google data archive."
Exactly what you get from Takeout varies from service to service; for email you get an MBOX-format mailbox file (that you can then import into a desktop email client of your choice).
It is actually mbox. I should have provided more detailed numbers - in my Takeout file, for example, there are 91,360 chat messages and only 23,407 email messages .
I am actively trying to get off of dropbox. I really want a similar native application experience that syncs to-and-from S3. Not a cron job using the s3 cli, not something that only works on osx/windows/etc... So far owncloud enterprise is the only polished looking solution I've found, but thats a bit overkill...
"The core team of Syncany is on hiatus for an indefinite amount of time. Feel free to do with the code what the license allows and encourages, but please don't expect any maintenance"
Not feasible to use this for critical infrastructure if there is no maintenance happening.
How have you found syncthing perf? When I tried it a while back the perf was horrible. It ate CPU and was pretty slow, even though I was just testing with 2 machines on my local network.
Thanks, Syncthing looks like exactly what I was looking for! I'm guessing it needs some external server for coordination? It seems a bit unnecessary to restart after each config chsnge, is there some technical reason for it? Nice job otherwise, will try it out.
The community also hosts relay servers, so if your two devices can't communicate with eachother directly, it will work anyway.
Relay servers take bandwidth. Anyone can run a relay server, and it will automatically join the relay pool and be available to Syncthing users. This is documented here:
I use amazon cloud drive unlimited $60/yr and arq backup to have client side encrypted backups. I also arq backup to my local NAS.
If you want to use an open source tool you can use borg backup & rsync.net as your external backup site. Borg doesn't have good S3 integration, and using fuse & s3 doesn't work that well either. It works best when the borg daemon is on the reciever box too to help with indexing and such.
There's a big item on one of my whiteboards: "put gmvault into the environment" ... the idea being that you could run 'gmvault', over SSH, on rsync.net:
ssh user@rsync.net gmvault ... blah blah ...
I've been meaning to do this forever ... it would be great if rsync.net customers could not install anything, but just run gmvault as an ssh command.
The only reason it takes time is that we do not have a python interpreter in our environment - we try to keep things as simple and locked down as possible - which means we have to "freeze" gmvault as a binary executable in order to put it into place ...
Please don't use OwnCloud. It eats your files and cost us loads of time and effort, in addition to sowing FUD among my office coworkers, who thought someone was deleting files from the shared/sync drive. Plus it doesn't support delta sync [0], so if (for example) you're syncing large files like (for example) True/VeraCrypt volumes, you're going to be pushing a lot of data around. This is especially awful since you're not doing this on a LAN but to S3, which means your raw cost in dollars for operating this software will be much much larger than with another solution like SyncThing or Seafile which does support delta sync.
ownCloud, or nowadays NextCloud [1], is still the best solution I have, hosted on a private virtual server. It's not a drop-in replacement for sure -- I'm basically prepared to do a full clean reinstall each time I want to upgrade -- but they have desktop and Android clients, and it's working just fine. Of course, I also have a separate backup of all the files.
I did an extensive lookup of dropbox alternatives last year, and ended up self-hosting seafile.
Multiple family members use it and it works great, plus encryption built-in from day one (me not being able to read their stuff is critical to me).
There has been no outage, and upgrades are easy.
You may want to have a look at odrive (https://www.odrive.com). Use it to sync a bunch of different storage accounts via this single app. Works alright for what it does, and they have an S3 option.
I've been using syncthing and I'm reasonably pleased with it. I don't use it heavily and only over my local network, though, so I'm not sure how well it handles archiving changes and file conflicts.
WARNING/fun fact: it doesn't download all emails properly. Last time I tried it, it seemed that when the Gmail server randomly closed a connection (or maybe some other time, but I think it was in these instances), the program would just keep whatever partial results it had and then move on to the next email. Which meant I had a lot of partial emails on my drive (only a small fraction of all the emails, but still), and no way to detect them.
Given how easy and carelessly Google can close your account and ruin your digital life, I guess periodic backups of your cloud accounts will soon be considered a good practice.
In this case, it's not as good as a normal backup. To my knowledge, Google Docs/Sheets/etc. is exported in other file formats (MS Word format, PDF format, etc), not their internal file format. So it's not a real backup, just an export.
It uses the same IMAP protocol that your mail clients do. I have been using it for years and not a single issue with google closing down any of the 5-6 accounts I have archived.
Data is only half the problem. The other is that email address has become the "primary key" for everything. Banking websites, random forums, everything. It's sometimes impossible to change it because it's the "primary key" for identity on that website. And email addresses are not portable like phone numbers.
Email addresses are portable to any webmail, mail server or other email infrastructure provider if you have your own domain and then forward to the service of your choice. This way you can use Gmail if you like or maintain your own full email stack, or anything in between, while still addressing your mail to something you control.
but do you really control your domain? if somebody hacks into your registrar or forges your signature or something and transfers it to themselves, they would get your email, and it might be very difficult for you to get it back. Or maybe I am being too paranoid about this and it's less likely that your domain gets taken from you compared to your mail provider deciding to terminate you.
You can also do something like using fastmail, where you don't control your domain, but being a paying customer you do have somebody to call if there are issues
I find the attack itself technically plausible, but where's the motivation? Why would someone go to the effort just to get some random personal domain? Sounds a little far-fetched, and I've never heard it happening to a personal domain of a non-VIP. Meanwhile, we know that Gmail accounts get terminated regularly, and appealing is hard-to-impossible.
Cryptocurrency mostly (based on public disclosures), but sometimes other digital assets like twitter handles or game currency. Sometimes domain names themselves are the asset.
As is now well known, mobile numbers are often also hijacked to subvert weak 2fa.
This is something I have tried to weigh the risks of when it comes to use a Gmail account vs my own domain. What is more likely to happen: Google permanently locks me out out of my Gmail account or I forget to renew my domain or have it somehow stolen from me?
> What is more likely to happen: Google permanently locks me out out of my Gmail account or I forget to renew my domain or have it somehow stolen from me?
Then use TLDs where that can't happen.
If you forget to renew a .de domain, for example, you get a letter, and have 2 weeks to restore it, or end it - per default, it gets held in TRANSFER mode, for 50€/month.
Also, you can always get the domain back, if you can prove ownership.
This is true if you're starting from nothing. However, Gmail is over a decade old at this point and is the primary email for millions of users. And something like custom domain is beyond the technical skills of average users.
However, Gmail is over a decade old at this point and is the primary email for millions of users.
Switching is annoying, but not impossible. You just forward the old address to the new and progressively replace it on any accounts you might have. I still have access to my Gmail account, it just hasn't received a non-spam email in years.
Oddly enough, it was really easy to set up a custom domain on Gmail: I did it. However, Google stopped offering that service for free, which will stop most ordinary people from using it...
Since Gmail supports imap and pop3, one could simply use any proven email client to backup the emails. I don't think a special tool that or may not work is needed for this.
Which "proven email client" do you recommend? There's an Import/Export extension for Thunderbird, one of the last standing desktop mail apps, but it's not good at handling huge, multi-thousand message exports.
gmvault is purpose-built and quite simple. I've used it before with success. It backs up your whole mailbox in one command. Why should we fiddle with a desktop mail client?
Thunderbird works fine for me to handle several GB of emails (the mailbox archives that it generates are plain text and standard so there's no need to export them). For the command line there are also tools like `getmail`[0], with the added benefit that they work with any email provider.
+1 on offlineimap. Those of us that may be a little strange in the head and prefer using mail clients like mutt will often use it to handle background imap syncing.
I've successfully used getmail to back-up my gmail accounts for over ten years now. It's a classic case of a functional, if a bit crufty, solution blinding one to a potentially better solution, so i appreciate this (gmvault) heads-up. thank you.
To my knowledge you can't run Thunderbird as a background service so you'll have to routinely run it. This can be a job scheduled to run in the background.
The 'restore' feature looks nice. But would it be of any value if Google decides to close your account? That is, can I take my backup emails from Gmail and 'restore' it to myemail@self-hosted-domain.com in order to migrate all those emails over?
I have opted to have 3 gmail accounts with various names; 1 outlook, 1 yahoo account. Every email sent to my gmail is forwarded > outlook > yahoo. I use 2 other gmail accounts for newsletters and etc.
I am sure at least 1 company will remain free to access data! BTW thanks for the thunderbird tip.
My backup of Gmail is to use Mail.app via IMAP + download all attachments.
I then have a backup of my computer ->
1. Time Machine +
2. Arq->S3/Glacier
Given that this keeps mail locally in a constantly readible format (offline, copied in mbox)... is there something missing in my basic solution that this cli utility adds?
Not as full featured (can't restore), but it's just a 77-line Python script. You could audit it yourself to make sure it doesn't upload your creds to another server.
Yep! And much more. But gmvault does also restore emails to an account, interestingly.
I am (slowly) working on a project to pull some statistics from Takeout's mbox file for Mail. Also want to play around with the Location History, Chrome data and Hangouts exports.
Since yesterday I have a few Google Takeout zip files in my backup ( https://takeout.google.com/settings/takeout ). I've used gmvault in the past, but this looks superior from the outside. Haven't delved into the data I'll admit.
I am working on digging around this data right now, actually. Some details on my website (see profile) and I will post on HN about it at some point when I get things further along (also want to look at Location History and Chrome data exports).
I find it incredibly easier to configure and run periodically in a cron job. Plus I've never had even the slightest stability issue with gmvault whereas Thunderbird (which I use anyway to read emails at work) has bad days from time to time.
Additionally, the output format (gzipped plain email + metadata) looks very convenient for indexing / analysis; something I'm dreaming of for a long time.
Off-topic. What are some of the reasons you use desktop clients? I have tried using thunderbird several times but the habit never stuck. Anything I might be missing out on (apart from local IMAP backups)?
(1) Desktop clients used to be faster and more efficient than using a web interface, and probably still are in most cases. However, Gmail is impressively snappy so I don't think it applies in that case.
(2) You can do things with a desktop client that you can't do with a web interface. That includes sort by subject and sort by size.
(3) Desktop clients can have real folders, which Gmail doesn't.
(4) Works off-line. This is still useful, though not as useful as it used to be.
Works offline, has its own OS-level windows (so I don't have to hunt around for the tab, can close the browser but keep e-mail open or vice vers), multiple e-mail accounts under one application while keeping them entirely separated (I don't want to give one provider access to my mailboxes at another provider, and I don't want to have to use multiple different web UIs). Also a matter of habit.
I have used this in the past to backup email accounts of resigned employees so we continue to stay well within the maximum number of active accounts for our free Google Apps.
It generally works fine and it allows you to restore the emails to a different account name. (I sometimes temporarily restore an account to search for old emails). It seems to have some issues with restoring accounts with a lot of large emails (large or multiple attachments) especially those that have reached the 15GB quota.
I have been using Spanning backup for several years now. It backs up our Gmail, documents , calendars, contacts, sites It is one of those set-it-and-forget-it type services. The company is acquired by EMC now. The service costs us around $35/year/email account; for us it is small price for peace of mind. I wrote a review of them few years back - http://reviewofweb.com/gmail/backupify-vs-spanning/
Even before this I've been looking to get a bit independent from Google, atm I'm trying to get Camlistore to replicate my data over various places. (I wish it had some erasure encoding)
A lazy alternative is to forward all incoming email to a yahoo account. I've been doing this for years and it's come in handy on the rare occasions when Gmail is unavailable.
I was trying to push some people off my Google Apps accounts since google won't allow you to "split" some users off the account. Wish I would have seen this then!
Offtopic: I have Gsuite (free) for my company (<50), is it possible to backup all the emails of my users without knowing their passwords (with XOauth tokens?).
Ironically using some of these buggy tools can be an effective and swift way to get your account locked out or at least rate limited by anti-abuse systems. Read the code and make sure they are using official APIs and are coded sanely.
Yes, we need to backup our emails before google delete our accounts. Googles operations start to remains me of obamas administration, where nobody can actually say anything out of template..