Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gmail is opening and caching URLs within emails without user intervention (2019) (support.google.com)
390 points by _wldu on Aug 19, 2021 | hide | past | favorite | 267 comments


I built a small Go web app to do some security testing. When a user registers for an account, I generate a 128-bit secure token and email it to the address they provided (as a URL). Token URLs look like this:

/validate/email/1d00a5c2648c211befd33f5a8a7cbfab

The token is cryptographically strong and disappears after access. It can't be guessed and no one but the email account holder should click it, but I am seeing the URL accessed multiple times from multiple IPs, so I investigated.

Turns out, if the user provides a Gmail or Gsuite email account during registration, Google clicks the link. I was curious if others on HN had encountered this and how they dealt with it.


That's why it should not be HTTP GET endpoint. Get endpoint should only be when request is idempotent. Use HTTP POST for your usecase.


Is it possible to embed a link that uses POST in an email? I can't think of a way unless form tags work, but then the link wouldn't work in a plain text email reader


You’d need to send the user to the verify page and populate a form with their token from the url. Then submit the form, either automatically or by getting the user to manually hit a button.


You should not submit the form automatically. Tools like Microsoft O365 ATP run any links in an emulated browser with Javascript support. These will, in some cases, happily autosubmit the form for you.


We've done this for over 2 years now, and over 200k users never ran into this. Also not within government users.


It could very well be that your specific Javascript does not run automatically or does not run correctly. I see the same with one of our auto-submitting forms. I do not know whether or not that is intentional on Microsoft's part. But other users have had different experiencies, so be aware that Microsoft may 'fix' their issue on some day and all of a sudden all your users will start clicking/unsubscribing/whatevering automatically.

See https://blog.healthchecks.io/2019/12/preventing-office-365-a... for someone who did have this experience.


Turn on O365 “link scanning for malware” and Salesforce onetime links for password resets etc stop working.


How do you know your users have never run into this?

People don't tend to report problems like "my account was activated sooner than I expected"


We use it to log in users. We quickly get complaints if something doesn't work well.


But your users ought to believe it works well to have google's servers respond to the activation link instead of requiring them to click it themselves, so no complaints.

The verification process serves only you, the administrator. To everyone else it's a tedious obstacle.

Nobody will reach out to you to say "I made it through the registration process just fine but it was slightly less burdensome than I expected, is everything OK?"

If you want to know if Google is hitting your activation URLs, check your access logs. Your users will almost certainly not realize it happened. Even if they do notice it, there is no impact on them and no motivation to inform you. You would have to be extremely lucky to hear about it from a user.


Half of unsubscribe links seem to auto-submit. Are they all broken in O365?


This is how mail chimp does it, I believe. JavaScript to submit the form automatically.


Depending on how your app works, non-idempotent links in emails can often be an over-looked csrf vector. Sometimes people also make such links auto log people in which can be problematic.


You can use <form method=POST> in email body (obviously does not work in plaintext mode)


I this sometime triggers a warning to the user (something like “Are you sure you want to submit form data to external site?”), which may not be the best end user experience.


If you put a header “account verification form” above the button, it's a better experience; the user knows what the computer is calling a form.


That sounds very phishy.


You're assuming the email is HTML. No, not all users use HTML email.


What do you think the second sentence of their comment is talking about?


This endpoint is idempotent - clicking that link multiple times has the same effect as doing it once.


Correct. Idempotency isn't precisely the right concept to appeal to here. The right concept is that GET requests are assumed by convention to be "safe," which implies they aren't tied to user interaction. "...user did not request the side-effects, so therefore cannot be held accountable for them" (https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#:~:te....).


Presumably there is a different content for the first response when the token is still valid, otherwise this would be a pointless link.


Not if it "disappears after access"


I was just wondering if a web page that counts visitors is idempotent. Not?


It changes state on the server, i.e. the counter. So no.


I’d say it depends on what the counter is for. If it’s used to bill the user per pageview, then it’s user-visibly not an idempotent action, so no. If it’s used to estimate site speed, then the user doesn’t care, so yes. (In fact, analytics is the example I see most often under “non-idempotent stuff it’s OK to do on a GET”.) If it’s used to display a counter in the site footer, then you might wish the user cared for your bragging, but they probably don’t, so yes with a disapproving glance in your general direction.


Well, this is why your email provider should not open your links for you. Use a different email provider instead.


I think I'm going to continue using the one that is pre-opening links that should be idempotent so that it can check them against its heuristics for spam or phishing. That's been really nice to have.

And I'll instead refrain from using sites that inappropriately provide bare get URLs that are really state-mutating booby traps in disguise.


Lol, the nerve of the other reply: "stop using the most widely used email provider in the world". I liked your reply better


I believe this is how they fetch images without meaningfully accessing tracking pixels.

If everything send to gmail is opened upon arrival and cached, you know nothing about when or if the recipient actually opened the email.


Last time I looked into this, Gmail was not loading and caching images. Is there any evidence that this has changed?

What is being described here is likely being done for some other purpose.


Google claimed to do that.

> Instead of serving images directly from their original external host servers, Gmail will now serve all images through Google’s own secure proxy servers.

Https://gmail.googleblog.com/2013/12/images-now-showing.html


Proxy servers, not caching servers.


When are proxy servers not caching?


I have an email from 2015 where I reported this as a potential vulnerability in the Hartl Rails tutorial, having seen it myself back then. Consider "verified" ashley madison accounts in their breach and this scenario.

This is not just a Gmail thing. Most corporate mail filters visit a link and scan for malware as a feature.


Make the user take action after opening the link. Like click a button.


And make sure the action is a POST instead of a GET. GETs should never modify important state.


This is the correct answer. Just because the norm is to embed verification hashes in URLs to be clicked, doesn't mean it's the right way for it to be done.

Why not send a short random code by email for the user to then copy into the sign-up form they were in the process of filling in?


It takes more effort and more users will decide to move elsewhere. I don't really believe that if someone can't bother to copy code from e-mail, he's worthy to have as a client, but some company are obsessed by metrics and percentage of successfully registered users is one of those metrics.


I understand your way of thinking, but we ended up having a flow for a government site where users had 2-3 steps what normally could be done in 1. Also many were not tech savvy and confused. So we ended up adding JS to automate the click.


You automated a click on a government website? So tell me: how'd that audit go?


This depends on your willingness to turn away business, and may not even be legal depending on where you work. In the United States, I would not want to defend that copy and paste scheme as being compliant with the Americans With Disabilities act having seen usability tests from people trying to accomplish that exact workflow using screen readers. Remember that things like cognitive impairments count and, like vision and motor control/range of motion, most of us will be affected at some point in our lives.

What I do think would be reasonable is having a well-labeled link which takes you to a confirmation form: someone can follow it easily and choose to submit it with far less friction and it leaves standard web semantics intact.


Clicking a link (one action) is easier than copying a code and pasting it (two actions). It's possible the user will copy the wrong thing or paste the code into a wrong field, including the browser address bar.

All of that may affect the sign-up rate.


Kinda. I often read my email on my phone while working on my desktop. (Or visa versa). In these situations, a code is always better. I hate the links personally.


How many times having to click a link (instead of entering a code) stopped you from finishing a sign-up process?


It’s pretty easy to measure. I had a site with a verification step. And we would see like 20% drop off of people who clicked on the link but never confirmed. Not sure why. We didn’t have them copy and paste anything, just click a confirm button.

Switching to no confirm obviously changed this to 0% drop off of people who clicked the link, but the number of people who clicked was the same.

It was curious to me why people wouldn’t go through with the confirmation step, but never learned why. We just learned that for some reason more people click once instead of twice.


How would you have known if that 20% were real people and not bot activity?


I don’t necessarily. But they have active accounts that do stuff and had the drop off activity consistent with “normal users.”

So it doesn’t matter to me if they were bots or not.

For example, 100 users clicked on the first link, 80 completed, and had normal account activity (clicking on stuff, uploading and downloading things, etc).

100 users clicked on the second link and then had normal account activity.

Maybe they were all bots, but they seemed human based on the “normal activity.”


Nobody remembers the exact moment they stopped thinking about something because it was easier not to.

Ragequitting is one way to exit a process, but just not going to the next step from distraction is surely more common.


I'm asking because often when I talk to people about things they hate, they end up admitting it's not that big of a deal. The annoyance is minor enough they don't look for alternatives or abandon whatever they were doing.

The original discussion was about clicking links vs reading and entering the code in sign-up confirmations. The former takes less steps and is easier to complete. Power users with unusual habits might disagree. But if they complete the sign-up anyway, it makes more sense to focus on regular users.


> user to then copy into the sign-up form

Extra steps are hard and boring and people don’t want to do them.

I consider myself a savvy user and I want to click a link. Not click a link, then look up a code from the email, then paste, then click submit.

I’d live with having to manually click “I’m sure I want to unsubscribe” or something.

This is most annoying when the site wants me to type in my email address to unsubscribe. I have lots and lots of different email addresses that funnel into a single one. When the site doesn’t put my address in the “To” field, I dont know who they sent to.

Services should be respectful of users time.


>This is most annoying when the site wants me to type in my email address to unsubscribe.

I've become increasingly suspicious of this practice

if my email address is in the URL, why don't you autofill that email box for me? if it's not in the URL, why aren't you fetching it from your database using my unique hash in the URL? do you even keep any records of email subscription preferences? am I just signing up for more shitty spam by giving you my email, again? am I just being marked as 'active' i.e. fresh meat, somewhere in the spammiverse?

these questions become more poignant and the suspicion more fiery when, low and behold, it turns out you are still subscribed

I don't fill them in any more, I just block the sender

don't get me started on the "it may take up to 28 days for our systems to register your desertion" bollocks -- I'm not working a contractual notice period, or running a lap of dishonour. it's a bitflip to 'false' in the 'is_pesterable' column, else a respectful deletion. it takes microseconds, not weeks!


That too as I suspect the unsubscribe links may just be data gathering.

I typically never type in any information because I assume the site doesn’t know and wants to know.

I’d rather just set up a kill rule on my end than risk getting my email on one more list.


There were good suggestions in other comments in this HN post.

One of them mentioned that you can continue keeping things as a 1 click solution with the token in the URL, but instead of doing the destructive action upon visiting the link -- instead you would get sent to a page with a form where the token is put into a hidden field that gets auto-submit as a POST request with Javascript.

This way from your POV it's a 1 click solution. You only waste a second waiting for the redirect and if the user doesn't have Javascript enabled you can <noscript> the field as being an input field which is pre-filled out based on the value from the URL (this can be done server side).

Now everyone is happy, unless gmail is going to go as far as auto-following redirects with JS enabled.


Ideally they wouldn't be following redirects _if they're POST requests_, right?


True, that was a bad choice of words but gmail could still load the page, execute the JS which in turn submits the form which is a POST request to your back-end. It's technically not following a redirect but it's doing things beyond just visiting the URL linked in the email due to it executing JS.


Ah interesting, so you're saying that GMail is not likely to be avoiding the POST requests if they're in the JS code? I have a passing, non-professional familiarity with Web practices, so this isn't something I have a great intuition for.


The basic flow would be:

    - You GET /reset/abc123
    - Your server responds back with a page that has a form
    - There's a hidden field with the token
    - Javascript kicks in and on page load executes the form as a POST request
    - Your server responds to that POST request and does whatever it needs to do
All of that is kicked off by gmail visiting /reset/abc123, and now it comes down to whether or not gmail's pre-visiting code will run the JS on the page. If not, then the above workflow fixes this issue, if it does then you're in the same position as avoiding all of this and having a GET /reset/abc123 perform the destructive action.


Right, it's the second part of the fourth bullet that I was asking about. Basically, that it sounds like it isn't feasible for Google to avoid POST requests if they're being submitted from JS.


Are we really going to continue to break the paradigm that GET requests should be idempotent to save people an extra click or Ctrl+C and Ctrl+V? Standards matter. In this case Google are doing something that should be allowed, but being criticised for it because it breaks badly implemented services.

Entering emailed or texted codes is becoming more common with 2FA for banking, PayPal etc. anyway so I think most people are going to broadly manage.


Sorry, GET requests aren’t idempotent. At the minimum they create log entries. So you can DDoS servers by filling their logs with “idempotent” GETs.

UX is important, and I think saying “suck it users, I’m going to use GET the way I think is write” is not a positive way of thinking about it.

I think the problem is just the mechanics of POST not being allowed in an email, so if there’s a way to POST from just clicking on a link I think we should use it. But there’s not, so having a GET that triggers something is the least bad thing. I like it better than javascript and forms in email. And better than autosubmitting, hidden forms on load.


Verification hashes in URLs are fine, as long as accessing the URL does not invalidate the hash yet.


This is how Steam does it.


Thanks. That's good advice.


Btw you could just have JS do a POST request, the user doesn't need to do anything except open the page. This is how unsubscribe pages work.


Thanks. I don't use JS, just Go with HTML templates. I populate the form now with {{ .code }} from the URL so the user does not have to copy/paste the code. But they do have to click 'Submit' to post the form. I think this is a reasonable approach that most users are OK with.


You could add a single line of JS to have it auto submitted as well


Wouldn't that be equivalent to just doing a GET?


No... Because you have to do a GET, execute JavaScript, and make a POST request. Bots don't execute JavaScript and no well-behaved bots are going to make POST requests


No, because if you curl the original url the POST for the second url won't be triggered, but if you navigate your browser to the url, the browser will trigger the POST.


That presumes everyone executes random JS or has a browser that supports it.


Create a "click here" button, hide it by adding it a class or a style with JS.


There is this thing called an HTML form. Every browser supports it.


It should go without saying that you provide a fallback.


Depending on context and implementation details, this can often be a security issue (csrf or something similar). Probably not in the unsubscribe case though


Obviously you need to make sure your API is not susceptible to CSRF but that goes without saying... Should I also tell him to password protect his database? :P


If you follow what the parent is suggesting (open a page with a get request that has js which does a post request automatically with no user interaction), its probably impossible to not be susceptible to csrf


I'm really not sure why you think that, but that's just not true at all. For starters, if it is a JSON API then a CORS request will be done for all cross-domain API requests. Even a missing CORS configuration would then block the CORS request from third-party domains.


So the situation is: there is some url you open (with a normal get request. Typically but not neccesarily from an email), then that url does the non-idempotent POST request without any futher user interaction.

A malicious page that knows the url used in the email could open the url from the email in a popup. The js will execute in the popup, and do the POST request. It doesn't matter how much csrf protection you have on the POST step, if anyone can trigger it with no user interaction just by opening some page with a GET request.


> A malicious page that knows the url used in the email could open the url from the email in a popup

Sorry but this whole scenario is just ridiculous. If somebody can access your email it is already game over. It doesn't matter what web technology you are using at the that point, user interaction or not.

If a "malicious page" knows the URL it doesn't matter at all because it means it is capable of arbitrary code execution and that point it could just exfiltrate the URL to somebody or a Chromium instance to perform the user interaction. Actually if the page can open a popup I think it could also execute JavaScript within the context of the page and perform the user interaction right there.


So there's two scenarios:

Scenario a) No authentication-y bits in the url. User goes to the url, site checks if the user is already logged in via a cookie. If so, does the POST request.

Typically in this case the urls are easily guessable, so that's an easy CSRF. In principle they could be made per-user (some sort of HMAC on a user+timestamp). In practise, I think its fairly common for websites not to do that in this sort of situation.

scenario b) The url contains some sort of nonce, or signed assertion that automatically logs the user in. I think this is fairly common in email urls, because web developers want people to be able to take an action just from clicking the link in the email, even if the device they read email on is not the same as the device they normally use to interact with the application. I also think this is the scenario that applies to this discussion, since it was started around talking about the issues caused by google auto following links, and in scenario A, google auto-following links would not be an issue.

Of course, in principle, its possible that the authentication bits in the url, just authenticate that action, and don't generally log the user in. In practise I think its really common to just generally log the user in, since most sites want people to stay on the site once the user does anything, and not just immediately exit after the email action is completed.

These urls are typically not guessable. A small percentage of users do tend to smatter these across the internet (e.g. https://urlscan.io/), but ignoring that, this is a login-CSRF. That is, an attacker can generate their own such url, and force their victim to log into the attacker's account.

The impact of a login-csrf tend to be very application specific. Sometimes its kind of minor, but I have definitely seen cases in major websites where a login-csrf can lead to a full account take-over of the victim's account.

> Actually if the page can open a popup I think it could also execute JavaScript within the context of the page and perform the user interaction right there.

This is only true if the pop-up has the same origin as the site that opened it. Otherwise there is just a very limited API (Basically, postMessage(). Also both sides can change the current url of the other side, which is a bit nuts). Also there is now a new http header, Cross-Origin-Opener-Policy that affects this.


What you are describing isn't even CSRF


Yes it is - and it's worthwhile to read bawolffs well written explanation of how exactly it could be exploited. Downplaying security vulnerabilities of this sort is precisely how database leaks happen.


I don't downplay security issues. I just make sure they are actually understood first and that isn't what is going on here at all. What he is describing is not related to CSRF


Make the link password protected, and they don’t get the password in the same e-mail message.

Kinda hard to pre-scan a URL if you can’t provide the password for it.


Yeah lots of email clients do this. Next to caching also lots of scanners for malicious content.

We solved it by having a screen with a confirmation button , then later we added javascript to show a loader page over the button and click the button automatically.


That is a security risk that Google is causing here. While I agree that URLs shouldn't necessarily be used to store secrets, the usual password reset mail is nothing else and the mechanism has merit.

https://www.w3.org/TR/capability-urls/

It is also a good way to communicate between two parties that don't want to have user account in any service, we constantly request input from B2B customers by providing forms with a capability URLs. An no, we don't want to use an identity provider. Maybe good ones like auth0. Amazon Cognito is pretty decent in my opinion, but Amazon is also big tech. Industrial espionage is something real for that matter.

We have mail providers that respect privacy, just saying... I don't understand the love for Gmail at all, especially when you use a mail client, which I would heavily recommend to everyone.

Ironically a lot of security scanner also do follow links. Understandable, but I just hope they don't plaster the logs too much...


Another idea is to have 3 links, where only one is visible:

  https://example/com/token?forBots
  https://example.com/token
  https://example.com/token?forBots
Hopefully any automated systems will open the first or last link first, so that you can save the request info and filter based on that. In case requests come out of order, you can always add a small delay to the "human" link before responding.

I haven't yet gotten to implementing any of the authentication on my current project, so I might be missing something really basic.

The next best thing is to set a cookie when requesting the magic link, but the downside (or upside?) is that it will be valid only for the browser it was requested with.


No link in an email should perform an action on its own. Every link should lead to a confirmation button, at minimum. Too many services automatically open all the links in emails.


Tons of services send a verification link after registration, and when you click the link you are taken to a page that says "You're verified."

But in those cases there may be an automatic POST after you travel to the link, so it wouldn't be triggered by gmail looking up the url.


This may be for the purpose of ensuring the email address itself is deliverable. You don't want someone to sign up with random garbage, then try sending notifications, newsletters, etc. to it- I believe doing so can affect domain reputation.

For this use-case, it seems like even an automated link click would be a good signal of a deliverable email address.


Not just deliverable, but also that it's correct. There's a lot of people who think that my {firstname}{lastname}@gmail.com email address is their own. If they try to register it somewhere, a verification email stops them from completing the registration.


You can also check for various headers to determine (with quite good accuracy) if a link was clicked by a human or fetched programatically. Here's a list I've accumulated over the years for virtually the same feature as yours:

- `sec-fetch-dest` header is present (HUMAN)

- `accept` header is present (HUMAN)

- `from` header is bingbot(at)microsoft.com (AUTOMATED)

- `user-agent` header includes BingPreview (AUTOMATED)

HTH


Also most humans do use browsers, so if you don't have any following requests for resources like scripts, images or just the favicon, you probably got visited by a bot.


Not all humans use browsers that issue requests for additional resources.

I have my browser configured to retrieve the page only and no additional requests for CSS, images, or javascript.


Yay for building systems that rely on extremely non-idempotent behavior on GET.


We have the issue with 0bin.net burn after reading, so we give a grace period to the link after creation. It works decently, but I'm thinking of just displaying a page with a decrypt button on those, so that you need a post request to actually read the content and trigger the delete.


Consider a different approach to mitigate automated URL fetching interference (this can apply to both email ownership verifications and password resets).

Make the emailed verification/reset link (GET request) idempotent (1 and >1 request has the same effect).

Have the link just present an interface for the user to take the next step. In the next step make a POST request that actually commences your verification/reset process.

In all likelihood you'll want expiry logic (let's say it's 30 minutes) - if you store the token with a created_at timestamp on the server you can have your verification/reset process check that now < (created_at + 30 minutes)

If expired, provide a UI for the user to request a fresh verification/reset email.


Yes and outlook 365 too. We had to add an extra step to our activation process to handle this (a prompt to click a link to proceed). I would not be surprised if they start making their link-follower start clicking around inside opened pages too :-/


Here's a quick PoC:

  <link rel="prefetch" href="/actual_validate/email/1d00a5c2648c211befd33f5a8a7cbfab?prefetch=1">

  <script>
   location.href = "/actual_validate/email/1d00a5c2648c211befd33f5a8a7cbfab?js=1";
  </script>
  <noscript>
   <a href="/actual_validate/email/1d00a5c2648c211befd33f5a8a7cbfab?js=0" rel="nofollow" class="btn btn-primary" role="button">Click to confirm your account</a>
  </noscript>


Why the prefetch though? Reason being bots don't open them? If so, this is a really good idea!


It’s likely faster for the noscript case.


Yeah, I don't know about that. Wouldn't it be that the query strings differentiate the two links?

I assume so, because of an old trick where query strings are used for ad-hoc cache control as in /style.css?1629472765


You’re right, I wasn’t paying close enough attention. That’s what I get for reading HN mostly on my phone!


Query string is to differentiate the links (to understand which case is getting triggered)


OK, but then the prefetch link as it stands is useless, no?


A "GET" request is not supposed to alter state.

On the other hand. It validates the email address more quickly, so you could even refresh/poll when it's verified automatically


If there is a cookie session, 99% of GETs do alter the state.


I'm sure Google uses a specific user agent to make a request, so you can filter that out.

A better solution is to assume that some middleman (email server or client) will always try to access links in the email. Instead send the user a code and have them manually enter it on the linked page.


Or link them to a page with a POST form that actually performs the action. That way you only add a single click to the flow, and no remotely sane software will automatically perform POST requests to arbitrary urls.


> no remotely sane software will automatically perform POST requests to arbitrary urls

I'm not a web developer. Out of curiosity, why is that?


Websites are generally designed such that GET requests are side-effect free, and POST requests do have side effects. So for example searching on google is a GET request. It doesn't do anything other than serve the requested page. Logging in on the other hand has the side effect of setting cookies and probably writing some stuff to the database, so that's a POST request.

These assumptions are so baked into web software that while assuming a GET request won't do anything zany or overly stateful is probably fine, assuming the same for a POST request should probably be considered negligent.


For precisely the reason being discussed here: GET requests can be performed automatically for many reasons. For example, if you've ever pasted a URL into a Slack channel (or similar) and seen the link converted into a thumbnail of the page a few moments later, you've seen a piece of software issue a GET request on your behalf. Now imagine that wasn't a link to a page but a link to an something that modified your account - resetting your password, for example.


POST requests typically perform modifications on the server based on user action, like POST'ing this comment. GET requests should be idempotent.


They did not in my case. Here is the UA string. It looks like a normal client a user might have:

74.51.221.37 - - [19/Aug/2021:22:05:16 +0000] "GET /validate/email/1d00a5c2648c211befd33f5a8a7cbfab HTTP/1.1" 404 0 "" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"

$ dig -x 74.51.221.37 +short

cache.google.com.


This is like the reason I quit using Skype 10 years ago. My colleagues and I noticed the same thing: send a link in a chat, and within seconds to minutes a request (or more) for that URL from a Microsoft server would be logged. Done and bye.


Wouldn't be surprised if pretty much every communication service is doing this.

I sent a link to a large file over Viber and immediately some ip connected and started downloading. Stopped at 350mb of around 3.5gb. I get that they want to show thumbnails or whatnot, but they just don't discriminate between content types.


That is a great way to do a cross-link DDoS of unsolicited link opening services - send 1000 messages with google links on skype and send 1000 gmail messages with microsoft links there, all gigabytes in size...


Does Skype generate preview images? Most chat clients do at this point. They all have to access the link to get the relevant metadata to do that.


Interesting "feature": the user perceives a (real) benefit: preview. But, there is also an unspoken benefit for Microsoft, who can feed their marketing analytics, or AI training data, or ... with what it learns via the chats and links.

I'd be fine if the fine print in the EULA provides a guarantee that the feature scans content solely for generating previews, and that M$ keeps no copy of it, etc....But, I'm sure I'd go blind looking for such text in the EULA.


This is interesting because it's something that would likely only happen in production or a staging server.

If you're building your web app in development, chances are your links will have localhost as their hostname which wouldn't trigger a visit from Google. You may also end up having an in memory fake email server to not even send the email in dev too (lots of web frameworks have solutions for this).

Checking the user-agent might work but I'm not a fan of this method because now it sets you up with having to keep a list of all known agents for every email client / service that might pre-visit URLs.


I’ve also seen anti-virus do this, though I don’t remember which brand.


You need to authenticate the user before the activation.


This. You could rely on a cookie during the get request as well, that you set on the users browser during registration. Or re-auth after click.


The problem with that is people like myself who tend to register on a laptop but then click the email verification link on their mobile phone.

(Because waiting for Gmail to load on a laptop is painful, whereas on my phone is shows up as a push notification within seconds)


What about magic links? :)


Does gmail respect robots.txt?


Not for this use case or crawling. They won't even hit robots. I've caught google, discord, valve, slack doing this on my hobby sites over the years, likely checking to see if the target URL contains obvious malware. In my case the solutions were simple. Add simple auth and/or block IP ranges associated with their AS number. Obviously this isn't the solution for companies, though you could have a unique domain specifically for email URL's and decide what limitations to put in place. Blocking them can flag your site as "malicious" and I am perfectly ok with google saying my domains are malicious.


Isn't there a way to verify that the click is coming from GMail? Maybe via User Agent or its IP.


The user agent I saw looks like a normal client that a person might use:

74.51.221.37 - - [19/Aug/2021:22:05:16 +0000] "GET /validate/email/1d00a5c2648c211befd33f5a8a7cbfab HTTP/1.1" 404 0 "" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"

I suppose you could somehow block cache.google.com but I suspect Microsoft and others do similar things.


We've seen something similar with one of the email campaign services. The Unsubscribe links were "clicked" within a minute of emails being sent. Other services show you a message and process the unsubscribe on a POST request.


I've had a lot of grief from a few users' Exchange doing it (likely as part of some anti phishing plugin of sorts), to the point we changed validation links from one time to sort lived.


Don't know why you are downvoted. Many corporations and institutions employ sandboxes to check mails and the links contained in them. This is a standard security practice by now.

So a link that is only valid once would be affected. Restricting the validity by time is a good way to solve this while still maintaining decent security.


I presume robots.txt is still ignored in this case?


Make the link expire after a certain period of time and not after first click.


Block first attempt to access.


Haha that's one way to frustrate your users.


I had the same issue with Microsoft's email service and Facebook messages. How I dealt with it was to not email private links... I use Element these days or email links to encrypted files in some circumstances. I wish websites would stop using email and phone verifications...


All URLs sent to any major email provider are "clicked" because they are scanning the page to see if it is phishing or otherwise malicious (desktop antivirus and other things will also prescan URLs). It also protects privacy by defeating click tracking on marketing emails.

Google will also pre-load all the images in your email too.

You shouldn't take any write action to your database just based on a URL being visited. Take them to the verification page and ask them to sign in or submit a form with the token pre-filled.


I frequently use depsez[1] for explain analysis and, initially when I'd create a new plan I was sending the delete link alongside in a DM to my coworkers so they could clear the entry when they'd finished with it... until I realized that slack was pre-fetching the link and deleting explains before my coworkers could take a look. This is an interesting case since having the request be a pure `GET` submission is pretty convenient - but yea... there's a good reason to follow the proscribed behaviors for when to use `POST`.

1. https://explain.depesz.com/ great site - I highly recommend it for getting into postgres performance analysis.


True. Phish testing campaigns in companies that send fake phishing emails to employees, are probably full of inaccurate data due to this.

"Why did you click that link? But, I didn't."


Many phishing test as a service companies will report clicks vs. people who actually interact with the page.


Which is more accurate since clicking a link is not usually an issue while filling out a form on it is the real attack.


Most companies still trigger whatever action (disciplinary, additional training) upon just clicking the link, though.


Would you mind naming some? I kind of have a hard time finding those, and some time ago I purchased phishingly.com for a side project, but it doesn't seem I will be working on this anytime soon, so I may as well pass it on.


> Phish testing campaigns in companies that send fake phishing emails to employees, are probably full of inaccurate data due to this.

Anecdata: just running curl on one of those test URLs will trigger a failure and can result in a long discussion with HR and IT.

> > Many phishing test as a service companies will report clicks vs. people who actually interact with the page.

> Would you mind naming some?

KnowBe4 is one such company. Their emails are also easy to spot because they'll have an X-PHISH-TEST email header.


> protects privacy by defeating click tracking on marketing emails.

I don't think that's true. It should be pretty trivial to know whether a click came from a user or google.


That also include links sent on all major chat, in this day and age, if you not self-hosting, or E2E all your links will be minded by companies.


WhatsApp and Signal notably don't do this on their servers for the link preview features they will fetch it locally on your device.


> Google will also pre-load all the images in your email too

PLEASE disable automatic loading in Gmail settings. Don't let the idiots use unethical, stalkerish e-mail read receipts.


Doesn't gmail's preloading defeat the read receipts? It makes it so every tracking pixel sent to gmail gets loaded (and not by your IP), thereby making it meaningless.


This was true briefly in 2013. https://arstechnica.com/information-technology/2013/12/gmail... But they got so much pushback they effectively disabled it. https://arstechnica.com/information-technology/2013/12/dear-... (It's still cached, but not in a privacy-preserving way.)


It's still more privacy-preserving than not preloading them at all, right? Whoever is serving the images doesn't get your IP addresses, cookies, etc.

Not saying Google is virtuous here -- it only serves to enforce their advertising monopoly -- but I don't see how the image caching in itself is a bad thing.


Ok, that's probably true. Still works as a read receipt though.


Not if gmail always follows (image) links in emails, regardless of whether the recipient address belongs to anyone.

Then it’s all noise.


But they don't.


> Doesn't gmail's preloading defeat the read receipts? It makes it so every tracking pixel sent to gmail gets loaded (and not by your IP), thereby making it meaningless.

According to some articles I've read, the marketers can still name the images unique per user.

So when Google's caches query for it, they still know it's you.

I will keep "always load images" off as usual in Gmail.

https://arstechnica.com/information-technology/2013/12/dear-...


If Google fetches the images when the mail is delivered (not when it is opened), all the sender learns is that the mail arrived (but not whether the user actually looked at it).

I'm not sure if that's the case though.


I thought gmail preloads only when the e-mail is opened. I thought it was basically just a proxy.


Doesn't automatic loading render those read receipts meaningless?


Gmail only loads the images when you load the message. It's been that way since late 2013. https://arstechnica.com/information-technology/2013/12/dear-...


For those confused by this comment, they were referring to automatic loading of images when you open mail.


You think every email provider crawls links in your email and the inspects the destinations to protect you from spam?

That is patently not true, otherwise you would be dealing with utter chaos as you interacted with the internet. If, as the OP claims, Gmail actually _is_ doing this, then that is worrying but it's not the general case.

Google pre-loads and caches images, which many people consider problematic, but they're not pre-fetching URLs.

Refer to these two incredibly recent posts to understand why:

1. https://news.ycombinator.com/item?id=28192269 - How to prevent email spoofing, using an unholy combination of silly standards

2. https://news.ycombinator.com/item?id=28194477 - Email Authenticity 101: DKIM, Dmarc, and SPF


Well, yes I do believe that. How else do you explain the behavior seen in the article? Outlook.com emails have been doing it for years. https://stackoverflow.com/questions/32851044/how-do-i-stop-o...

Microsoft also scans links sent in encrypted Skype messages. https://arstechnica.com/information-technology/2013/05/think...


Office 365 calls this "safe links" and it's a feature in Outlook and teams. They have a page describing it and everything.


HTTP GET requests should not be interpreted by the server as a request to change something. That's what POST, PUT, DELETE and PATCH are for.


I agree, but how do you initiate a POST request via an email message? Embedding a form sometimes raises its own security alert.


The unsubscribe link should lead to a separate HTML form describing the unsubscription and a "confirm" button.


> I agree, but how do you initiate a POST request via an email message? Embedding a form sometimes raises its own security alert.

So, your question is how to evade security alerts for actions with potentially significant side effects?


> So, your question is how to evade security alerts for actions with potentially significant side effects?

Well there's already a big ol' button in an email that says, "click me to register". The end user doesn't really care about the implementation. If one pops up a security alert (the POST form) and one doesn't (the simple link), how do you think everyone implements that big ol' button, 100% of the time, for 100% of everything?

I wish email didn't work this way, but as far as I understand, this is the lay of the land. If there's a better way, I'll be happy to implement it in the system(s) that I have control over.

I'm really asking for engagement within the community with help solving this sticky problem (if it wasn't clear). If link caching is this prevalent, what to do about it, for things like registering via email?


You can do it with js but that assumes that google is not running any js which is probably not a safe assumption.


We have had the same problem with Microsoft where our marketing department is effectively DDoSing our service by sending out links to 50k+ users.

We would get spikes of thousands of requests per second from Microsoft IP addresses, which after some googling were linked to their threat detection.


Threat detection is becoming essential because of ransomware phishing and alikes, protection from click tracking is good too.

Just send email in tranches.

If the marketing department is sending too many e-mails for the web server to handle, then the volume is probably out of proportion to the company and what they are doing is probably just spam.


I agree, I'm not blaming Microsoft here, and it is our problem to solve (the service should handle the traffic, and/or the emails should be staggered over time)


I always wondered when single-click unsubscribe was going to be a problem because of exactly this. I mean, how do you expect to give a URL to Google and have them just never crawl it?


There's also RFC 8058 [0] that proposes to refine the `List-Unsubscribe` header for one-click unsubscriptions. It uses the `List-Unsubscribe-Post` header to indicate that an HTTP POST request can be used to unsubscribe with a single click.

It specifically mentions in section 3.2 that mail receivers are not to crawl this URL without user consent:

> The mail receiver MUST NOT perform a POST on the HTTPS URI without user consent. When and how the user consent is obtained is not part of this specification.

I haven't seen any statistics on how widespread adoption of this RFC is among the major mail providers, though.

[0]: https://datatracker.ietf.org/doc/html/rfc8058


Oh, that's the rake we stepped on.

Bloody obvious in retrospect, but it took us an embarrassingly long time to realize that we were leaking mailing list subscribers because of these one-click unsub links.


> it took us an embarrassingly long time to realize that we were leaking mailing list subscribers because of these one-click unsub links

It didn't take some companies long. They were just a bit more, uhhh, shady about the knowledge.


You make just visiting the URL not everything that needs to be done. So for example, the URL you visit then also runs a small bit of javascript behind the scenes that does the actual unsub action - or the javascript just does a redirect.

Or even simpler, you make it so the user has to click a button to POST the request. You've had to do this for years, now.

I would assume though, that Gmail is smart enough to go, "oh hey, looks like a verification link, maybe I shouldn't touch it"


But to be fair, 1-click unsubscribe is a very user friendly thing to do. As a user, if I have to jump through a bunch of hoops to unsubscribe, I'm just going to mark your message as spam and move on with my life.


I absolutely agree. That's why on the app I wrote, the default is one-click unsubscribe, but I still have to do it using a little bit of Javascript. If JS is disabled, you just have to click a button that POSTs the request. I'm not sure what else to do!

There's a similar but different problem with the reader-supplied "unsubscribe" button. This usually uses information found in the header of the email message - "List-Unsubscribe", but guess what also gets prefetched sometimes? Enter RFC8058 and, "List-Unsubscribe-Post"and another email kludge to throw on the pile,

https://datatracker.ietf.org/doc/html/rfc8058


Exactly. And it's not just one-click unsubscribe. Using a secret link sent to an account's email address as a way to implicitly log in instead of having to remember a password is increasingly common and also an interesting idea in terms of user experience and security.

If it's OK for your mail service to open one secret link, where does it stop? Is it also OK for them to spider the content they can reach from that link? Now they are potentially gaining access to all kinds of possibly sensitive information that they would not have been able to reach except for spying on your email. And if that's not OK, why was it OK for them to open the secret link in the first place?


>>Using a secret link sent to an account's email address as a way to implicitly log in instead of having to remember a password is increasingly common and also an interesting idea in terms of user experience

The user experience with this is terrible if email is not set up on the device you want to log in from


The user experience of lots of things is terrible if the relevant facilities aren't set up on the device you want to use at the time. It's a curse of our modern, highly-connected and always-online world. You get the same problem with logging into sites that require ID and password from a device that doesn't have your password manager on it.

But the fact is, many systems do work like that and many users do prefer it. I'm taking a pragmatic stance here because assuming the messy, unpredictable real world always follows some theoretical standards at a scale of billions of people and millions of organisations is very predictably going to give bad results in a lot of cases.


Gross never thought of that. Lots of valuable data so no doubt someone will try it (if they aren't already)


> to open one secret link

"Secret link" is an oxymoronical concept. Resource identifiers are exactly that: identifiers. They're not private names, and any design that relies on keeping them secret is inherently flawed. If it's accessible on the openly resolvable web, then the content needs to be treated as if it's public. If your use calls for authentication or authorization, then actually use an authentication or authorization system.


They are in fact private names, because they're unknown to the public. This is in fact an authentication system.

Yes, the public could guess a 128-bit random value and log in - but that's no different from the ability of the public to guess your password, or your session cookie, or your SSL session state, or whatever. Every authentication mechanism is based on "There is a high-entropy value, and nobody but the authorized user has it." It makes no difference from a theoretical standpoint - i.e., in terms of whether it's "actually" an authentication system" - whether the high-entropy value is sent to the server as part of the URL or via a header or via POST data.

(It clearly makes a difference from a practical standpoint, because in order to have a secret link, the link must actually be kept secret. But that's no different from, like, the need to not expose your cookies to third-party requests or whatever.)


I understand they are known to the public? Every MTA from any random site between the sender and receiver gets the mail, including secrets. They can all decide to scan the site, write them in a log,...

Then, when you click the link, if you don't have https, anything between receiver and site also gets a copy of the link. And there are proxys, add injecting ISPs, etc.


Are you claiming that the contents of emails are public?

Which "random sites" see emails between sender and receiver?

Yes, proxies and ad-injecting ISPs can see the contents of plaintext HTTP. But that's hardly a reason to say that logging into a website with a password or presenting a cookie doesn't count as an authentication system!


I've always treated the contents of emails as public. Things are getting a little better these days, but email is still often forwarded in plaintext through multiple servers owned by disparate parties. There is no reason to believe anything you send in an email will remain private.


> But that's hardly a reason to say that logging into a website with a password or presenting a cookie doesn't count as an authentication system!

That's fine. No one is saying that. They're saying that URLs aren't an authentication (or authorization) system.


I agree no one is saying that. I think they are being unsound in refusing to say that but also saying that URLs don't count as an authentication system.

Is the argument "Information in an email should be treated as public"? Then how do you validate users on signup in the first place? How do you ensure that someone owns an email address that they claim to own?

Is the argument "Information sent over HTTPS should be treated as public"? Then why does the argument not apply to passwords or cookies?

If the argument is something else, what is it?


"Is the argument [...]? Is the argument [...]? If the argument is something else, what is it?"

This not difficult at all. Playing dumb isn't clever, it's just obnoxious.

The argument, stated amply before, is that URLs are not private.

Email being a private medium or not is orthogonal.


They're saying that URLs aren't an authentication (or authorization) system.

I'm curious to know how those advocating a position similar to this think something like a password reset facility on a website should work. We all know security-sensitive systems should rely on alternative methods of authentication anyway, but for those of us living in the real world where billions of people access millions of systems via websites using their email address as ID/fallback, what else would you do that does not rely on trusting emails to be acceptably secret for at least a few minutes?


What relation does your question have to the statement you're quoting?


As a wise person once said, playing dumb isn't clever, it's just obnoxious.


You seem to have difficulty following the logical throughline here. There is no playing dumb in the previous comment.

Let's put it in a statement instead of the form of a question: it does not follow to respond to the quoted part ("URLs aren't an authentication (or authorization) system") with remarks about "those of us living in the real world where billions of people access millions of systems via websites using their email address as ID/fallback[...]".

You (both of you) are confusing the subject here: URLs vs reaching back to drag emails and their privacy into focus. They're different fucking things! Stop responding to comments about one with responses that deal in the other!

Billions of people access websites using their email addresses? Granted! Now say something about URLs if that's what your quibble is and shut up about the emails that the URLs were sent in and whether or not those emails are private. The comments about email are misdirection at worst, and a sign of unclear thinking (and a hazard to confuse others) at best.


You're doing some subtle jiu jitsu and extracting a lot of benefit from responding to the previous message as if it said "if the names are not known to the public, then[...]". It does not.

Resource identifiers, on the web[1], are not private names—not even by virtue of the fact they were communicated over a private channel—and they need to be treated as public, full stop. URLs are not private names, simply because of what they are.

> It makes no difference from a theoretical standpoint [...] whether the high-entropy value is sent to the server as part of the URL or via a header or via POST data.

It makes no difference from an information theoretic standpoint. There is no reason, however, to narrowly consider the information content and its entropy and declare that you are done. From an information architecture standpoint, there is a difference.

> But that's no different from, like, the need to not expose your cookies

It is different, for the reasons above.

(Every entropy-based cryptographic protocol also begins with observations how hard it is to do something in practice, and is then founded on exploiting those side effects. To describe a system and then wave away concerns that it is merely unfit "from a practical standpoint" makes it a failure of a design. It is fundamentally at odds with not just the evaluation criteria that protocols fit for use are measured against, but from which they are born.)


I'm not following your argument. What is this thing that URLs are which makes them public?

For instance, I could argue "Fingerprints are not passwords, and they need to be treated as public, because of what they are" - because I can finish that sentence with "and what they are is a pattern that's left on every single random thing you touch, and is also immutable and impossible to rotate."

What's the analogous thing for URLs?


> I'm not following your argument.

It's almost certainly true that you do, you're just being dishonest. (The alternative is worse.)

> What is this thing that URLs are which makes them public?

You mean other than being identifiers (universal identifiers, at that)? It's like you've never used or encountered someone else articulating an argument that incorporates (or would be appropriate to incorporate) the phrase "by definition" before.

If your security protocols are compromised by the card catalog or the Rolodex being invented—compromised not by knowing the contents of a given resource, but by knowing the correct way to refer to or otherwise describe the identity of that resource—then you don't really have very good security in your protocols (particularly in a world where those things have already been invented).

> What's the analogous thing for URLs?

What? What a bizarre request.

The next time someone asks you to make your case in terms of bad analogies just because they can't get away from using them themselves, you can go ahead and say, "No, thanks. I'll pass."


I understand full well that you are claiming that URLs are public by definition. I am disputing that you are interpreting the definition correctly.

The invention of the card catalog and the Rolodex does not compromise anything, because the card catalog and the Rolodex simply catalogue information that is public, but in a poorly-accessible format. No card catalog can find the name of an unpublished, self-printed book that is sitting in my house. No Rolodex can determine the extension of the direct line at my work. Since I am claiming the URL in question is not public in the first place, I am claiming that it would not end up catalogued.

Can you explain, clearly, how this URL would end up catalogued? "By definition" is not an argument.


You're mixing up "public" with "published" (intentionally, perhaps).

> I am disputing that you are interpreting the definition correctly.

And I question whether you've actually made an attempt to grok the subject as a matter of definition, rather than substituting your synthesis (based on an experiential mental model of the subject derived from firsthand inference) in place of what URLs actually are actually supposed to be. (Meaning the playing dumb comment would be apropos here as well.)

> Can you explain, clearly, how this URL would end up catalogued?

The mechanics of how don't have to be explained, because that's how definitions work (whether you accept it or not). Explaining how is not a pre-requisite to what.

But if you're really dying for some missing insight, how about pausing to demonstrate some awareness of the catalyst of this tedious exchange: that a company that was founded on the basis related to cataloguing documents and their public identifiers is (shocker) doing that, right before doing things to/with them—and this has led to people who built up a model of the world similar to yours getting upset because the mistaken assumptions that went into building that model conflict with they're now being told is happening.


This is what MailChimp does for its one-click unsubscribes.

You visit a URL, and some JS POSTs to `https://[youraccount].us1.list-manage.com/unsubscribe/post` with a body containing your subscription and list IDs.

I'm not sure what prevents crawlers executing JavaScript on that page and triggering the unsubscribe action anyway, though, unless it's just that email crawlers don't execute JS.


it could do some checks on the environment - useragent, screen size etc?


Well, the problem is at some point (maybe the can-spam act) there was a nebulous requirement for "one click unsubscribe." So rock, meet hard place. On mobile or I'd look up the law. I suppose a link and then a button press counts as two clicks.


The CAN-SPAM act actually doesn't require one-click unsubscribe, but there needs to be a way to unsubscribe in any email you send out, and your physical address has to be listed in the email message.


No need to downvote me - if there's a better practice, I'm absolutely happy to adopt it. Or I can give you the github details and you can send a PR.


Almost everywhere I see just prefills your email into a text box (or for the more asshole ones, doesn't) and provides a button to click. I rarely if ever see one click unsub.


Or “click this link to verify your e-mail address”…


Well the email address exists. If they wanted you to prove you can read the mail account they should send you a token or something, or at least require authentication if you click the link.


The verification link shouldn't be used to verify the email exists, it should be used to verify that the owner of the email address is the same person that signed up for whatever service.

I have a common name gmail address, and I get verification emails all the time that I never open. If websites keep emailing me after that, then I rightfully mark them as spam.


The verification is usually to prove the address is yours, as well as whether it exists.


Run some javascript on the page that sends a POST? Not ideal, but not sure how else to fix it, you can't send a POST with a single link as far as I can tell.

Google could of course send the POST as well but then at least they're violating the HTTP standard.


Any security scanner product should also run the JS just like the real browser does. Otherwise cloaking the phishing page will be completely trivial.


I thought the "one click" was counted from the web site, not from the email. So the click to get to the unsubscribe page doesn't count, but one click after that should do it.


> I always wondered when single-click unsubscribe was going to be a problem

To put succinctly what others are saying: it was never not a problem; it never should have been happening.


one-click unsub is implemented by ignoring scrapers. Google is especially good about using a unique UA string


The access I saw to the registration URL was from cache.google.com and it looked like a client browser

74.51.221.37 - - [19/Aug/2021:22:05:16 +0000] "GET /validate/email/1d00a5c2648c211befd33f5a8a7cbfab HTTP/1.1" 404 0 "" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"

$ dig -x 74.51.221.37 +short

cache.google.com.


I see other GETs like this one:

216.99.127.196 - - [20/Aug/2021:16:25:42 +0000] "GET /validate/email/2591b346e5b8b435bdde54d797fe23a9 HTTP/1.1" 200 811 "" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"


Also from my anecdotal experience it's possible google is a bit smart about unsubscribe links. My tiny blog has "unsubscribe" as part of its URL, it's truly one-click unsub and doesn't do anything clever like user agent checking or JS actions. Nevertheless, I see 8 gmails that still have email notifications on, and 3 that have unsubbed at some point.


Wasn't there an issue like this years ago in something called Google Web Accelerator (I think it was a toolbar for IE).

Theory: it preloads links in the background.

Practice: some old bb showed a (delete) link after each post and a (ban) link, among others, next to each user if you were logged in as administrator. All of these sent GET requests because some developer hadn't read that part of the standards, and there was no "are you sure?" prompt either.

What I think is happening here is that gmail is scanning the content of each link in an e-mail for some subset of {malware, fraud, phishing, child abuse, other bad stuff}. This is a feature if you're a non-techy user who clicks on phishing links, I suppose?


Yes, we had this same scenario play out 16 years ago, and it seems that many modern web developers have forgotten its lessons:

https://blog.moertel.com/posts/2005-05-06-google-web-acceler...


Why isn't rel="nofollow" a solution to this? According to the docs [1]

  Use the nofollow value when other values don't apply, and you'd rather Google not associate your site with, or crawl the linked page from, your site. 
It seems like crawling a nofollow anchor tag in an email breaks this rule. Am I reading it wrong, is there an exception for emails, or is Google being inconsistent?

[1] https://developers.google.com/search/docs/advanced/guideline...


That's for the search engine. This is for malicious link checking. Wouldn't do much good if every spammer could say "please don't look at my malicious web site".


> It seems like crawling a nofollow anchor tag in an email breaks this rule.

There is no actual rule stated in the quoted material, and it describes it aa a mechanism for specifying a preference for how Google handles the link when Google encounters the tag on the creator’s site, which a user email on Gmail...isn't, even approximately.


This is a good feature in my opinion. Why should I let the sender know when I click on tracking links or view the email? If you really want to, just filter out clicks from AS15169.


I disagree. I think the image and resource preloading done by Google is perfectly fine, but clicking actual links will mess up tons of systems (like, for example, links that are only supposed to be valid once, like in password reset emails).

There are good use cases for links with single-use tokens in them. Companies sad and desperate enough to suck as much data from you as humanly possible (i.e. every single news letter with tracking links) ruin these use cases for everyone, assuming they are indeed the reason Google is implementing this feature.

Given that many links (like, again, from password reset emails) give the person who clicks the link instant access to your account, I'd say this behaviour goes further than just filtering out trackers. Google has no business opening my Slack account or entering the change password page for the services I use. The stalking prevention they apply to external images and such is fine in my opinion, but links to external web pages should be left alone. You never know when Google accidentally clicks a link that says "confirm order" or "unsubscribe" because its magical AI misinterpreted the contents of an email.


From the very beginning of the web when HTTP was defined, there has been the rule that a GET request should never take an action on its own. Actions should be based on some other method like a POST request.

Google is following the standard. People who take actions based on GET requests are not. Sure, mistakes happen out of ignorance, but they should be fixed.


From the very beginning of the web when HTTP was defined, there has been the rule that a GET request should never take an action on its own. Actions should be based on some other method like a POST request.

Maybe, but then several decades passed and now billions of people don't use online systems the same ways any more. It used to be that I could send a legitimate mail to a friend or family member and not worry that whatever mail system they use would refuse to deliver it to them because my system didn't jump through several not-quite-standard hoops that didn't exist when the email protocols were defined. Google seem fine with discarding the historical standards on which most of the Internet is built in that situation.

In any case, if a communications service is going to snoop on your private communications and take actions that would be impossible without spying on you, I think the burden is 100% on them not to screw anything up for anyone, ever.


"Things change as time goes on" is not an excuse to ignore valid standards.

All big email providers know this and know that GETs should not affect their unsubscribe/confirmation links. It's part of their job.


"Things change as time goes on" is not an excuse to ignore valid standards.

You're talking about an idealised, theoretical world. I'm talking about the real one, the same one where big mail providers routinely ignore valid standards themselves in their efforts to fight real world problems like spam and identity theft.

Again, if someone is going to help themselves to private information then the burden should be 100% on them not to screw anything up for anyone, whether or not that anyone was following any particular set of rules.


Yes, people often don't follow standards, but they are sometimes useful in the real world for deciding who should fix their software, and for making breakage less likely due to whatever changes happen in the future.

Google isn't the only vendor you need to worry about. Other mail agents, browsers, or proxies will sometimes prefetch URLs. On the web you often don't know what software your users are running or what it will do, and there's no way of knowing what they'll do in a year. Users have the right to run whatever mail software they want. But if you follow standards then you have a better chance.

If you don't, it's a risk. Sometimes vendors do go the extra mile to tolerate other people's buggy code, but not always.

You can decide to blame everyone else if you want, but if you use GET requests for user actions, something will likely break eventually.


Yes, I understand the reason for the standards, and other things being equal I am all for following them. I just think it's strange that an organisation spying on someone's communications and then doing anything it otherwise couldn't and breaking anything as a result should be getting such a free ride. There is no law that says any web application you write has to follow those standards, and the only reason not doing so is a problem in this case is because the likes of Google didn't mind their own business.


There are other reasons but you're ignoring them.


What, specifically?


Some other software, other than Google, might follow the link automatically. Because the standard allows that.

(Also, consider mistaken clicks, which happen all the time on touch screens.)


Some other software, other than Google, might follow the link automatically. Because the standard allows that.

That is literally the opposite of what good native email clients have been doing for a long time. They won't even open linked images and the like by default, to prevent tracking.


At least according to the original link, GMail follows the link when a user views an email. That sounds like tracking gold to me.

As to masking your IP when you actually click on the link. How could that possibly work? Your IP is still definitely making its way to the tracking server upon clicking the link. There would be no mechanism for GMail to prevent that unless it rewrote the links to point at its caching server, which would pretty much break GMail.

Sure, pre-fetching would create a bit of noise because your IP would be mixed in with the Google IPs, but as you've quite rightly noted, filtering out clicks from AS15169 would eliminate that noise.


So this way Google automatically confirms the validity of the email to spammers by visiting all their links? Doesn't sound great, and people still know when you click on links or view the email. They just have to guess a bit better.


They already do this. If you send to an invalid address gmail will respond saying the email can not be delivered.


> This is a good feature in my opinion.

Some links include automatic login functionality. I definitely don't want Google logging in to my accounts.


> I definitely don't want Google logging in to my accounts.

Then don't use websites who provide links that provide insecure features.


A GET of a webpage shouldn’t have side effects - if it does, your app is broken.

This might be new for Google, but some virus scanners have been prefetching links for decades.


You can (or used to be able to) embed an HTML form inside an email, and clicking the submit button will perform a POST if that is the method specified. This would probably solve the issue. You avoid an extra click on the resulting page, it's a POST which matches the standard, and Google (presumably) won't hit it.


Forms are not supported in all major clients.


How will you handle people who use text-only email?


Trying to explain this to some marketing people looking at "click through rates" was extremely frustrating. This screwed with a lot of a/b testing analytics everywhere and I suspect most people were none the wiser.


> This screwed with a lot of a/b testing analytics everywhere

My heart bleeds for the oh-so-poor marketing people whose data, consisting of unwitting human test subjects, has been poisoned.


This sounds like they have built a unsafe system and are running into something which is checking for malware. This has been a problem for decades, which is why things like unsubscribe systems usually give you a form which requires you to submit it since a passive robot won’t POST it.

I have little sympathy for the first poster: those kind of phishing tests are good if your goal is to train your users to think of the security group as an adversary but not much else. If clicking on one link compromises your security, you need to put the IT house in order first (hint: where’s the WebAuthn which completely?) and especially deal with the vendors who are training everyone to think that clicking on obfuscated links is routine.


One of my favorite side effects of this feature is that, Firebase Authentication, which is entirely run by another segment of Google, constantly throws errors about this to the user, which they don't understand.

I get a support request at least once a week from a user who "Doesn't understand why they can't verify their email".

Turns out, their Gmail account already verified it by clicking the link before they opened the email, and they didn't think to try signing in, because they couldn't even verify the email.

¯\_(ツ)_/¯


wait... isn't that messed up though..? Anyone can register a account with my email.


I noticed this same behavior when sending a validation code link via SMS. Had to insert a "Verify Code" button landing page to prevent whatever link-sniffing was happening.


In Settings | General, there is an option, "Ask before displaying external images - This option also disables dynamic email." AFAIK this doesn't pertain to links, just images.


I don't see anything wrong with this. GET request should be idempotent, and it protects me from bad actors abusing it to detect when I open an email.


What about the idempotency of HTTP GET is surprising here?


The email should have a link to a page with a CAPTCHA challenge and a form that submits by POST.

The POST with valid captcha solution testifies a human clicked purposefully.


> email should have a link to a page with a CAPTCHA challenge

Hard no. CAPTCHA is a blight on the web to anyone who values privacy or has accessibility issues.


No one seems to be mentioning privacy. This would allow senders to verify email opens, something no email client should do.


I thought this was intentional? I vaguely recall Google did this as a security feature to avoid invisible pixels?


why share this now? It's like a discussion from 2019 that traces back to posts google themselves made in like 2009 about preloading images, sending links thru their own servers etc, anti-phishing etc. why share this now/find a new discussion/link about it.


(2019).

anything new on this?

IIRC was a known thing and surely was already discussed on here somewhere then.


If this helps reduce spam by caching unsubscribe urls, I’m all in.


There's no option to disable on mobile Gmail


[2019]


what does this mean?


If only they could mind their own business...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: