Dealing with the big black alphabet mailbox (Gmail issues)

Gushi
8 min readJun 8, 2021

Note: this article documents a history of interoperability problems between private mail server operators and Gmail. It may be updated from time to time, and updates will be dated in-document. (This note added 2021-06-09)

I’ve been a sysadmin for 25 years now. I’ve been operating networks for most of that time. While, in most of my early career, I focused on the systems side rather than routing protocols, it was always a close part of my job.

I’ve also pretty much always run a small hosting service. It was a place to give my dad a webpage to put up for his books, it was a place to put some of my interests, and it was a “train set” where I could try things before moving them in to work production. (It continues to be this, even today, as I explore RPKI, HTTP security headers, contribute to the Trusted Domain Project, and deploy DMARC — as I experiment with new scripts to do DNSSEC signing and play with Catalog Zones, and run betas I’d be leery of running on the production stack at work). It’s my toy train set, and I wear my engineer hat when I’m running it.

Mainly however, it has been a place to give my friends neutral space where they’re not at the whim of having their content locked behind one person’s paywall, or having ads slapped on their content. (From an engineering point of view, this gives me real-world data to play with).

Running a hosting service, even a small one, also implies running the ancillary services to go with it: knowing the ins and outs of Apache, knowing how to manage mail delivery well, being at least somewhat clued in DNS. As well as knowing some hardware, OS upgrades, and more recently, some VMs. It means maintaining backups of both your config, and of user data.

Most importantly, it’s an administrative exercise in knowing how to run a network. How to tell: is this thing running, is this thing reachable, are people having issues, and if so, are you handling them right? If you do have an issue, do you know how to reach out to the other network, or to your upstream ISP, and solve the issue. It means dealing with abuse complaints when one of your users does something boneheaded (intentionally or accidentally), and it sometimes means educating your users on how to keep their code updated. For a giant hosting farm, or a single box, sitting in a colo, all these things apply.

When you really get to the point of being a network operator, it means managing your own routing, and speaking protocols like BGP out to the world, where every router on the big connected internet can hear your router shouting out “I have these IP blocks”, or, echoes of the same. And in being part of that, you’re considered an Autonomous System. Your constellation is visible in the sky. You’re given an AS Number, and you are now part of a global community. Your AS number and address blocks are every bit as much expected to be reachable as Microsoft’s or Comcasts. You reach out to other systems and peer with them directly, or connect to route servers. You participate in events like NANOG. You’re expected to be reachable to deal with abuse issues, and network problems.

I am there. Many of my friends are. After many years of “Eh, I just do systems”, I took the leap. I run routers and switches and vlans. I’m on the map, even though I’m just doing it with a tiny DBA and getting my kit mainly via Ebay. This does not make me rich. Sometimes it buys me dinner. Sometimes it costs me time. It’s largely a labor of love.

This article is about the Email side of things.

Gmail did not exist when I started this. Other companies like Hotmail certainly did, but the 800 pound black-box monstrosity that is Google did not have its foothold in place. Gmail decided that it was going to not precisely follow standards for how mail works, and boy howdy did they not follow them.

If I host your domain, I offer a web client that I use called Squirrelmail. It is free and open source and works in pretty much any browser. It does not require shit-tons of javascript. It is friendly with screen readers and renders both basic HTML email as well as plain-text messages. This is what email pretty much needs to be.

I also allow my users to grab their email with imap and pop3. These two protocols will allow pretty much any mail client all the way back to Eudora and even PINE to pull your email off my server, read it, and keep it for yourself.

Still, my goal is to let my users do whatever it is they want, and for some of them, the answer was “I just want my domain’s mail to go to my Gmail.”

Sure. We can bulk forward everything for your domain to you, that’s fine. But, if your domain starts getting spam, then I start sending all that spam to Gmail, and eventually I look like a spam source to Gmail.

Sometimes this results in Gmail just routing any mail any of my users send to anyone at Gmail, straight to the spam box.

Sometimes it results in Gmail just outright refusing to accept delivery. I don’t have a screenshot for this, but trust me, it happens, and the users complain heavily, because not only can they not *send* to anyone else at gmail, but it also blocks the inbound flow of *their* mail.

And despite Gmail offering a dashboard where you can see if you have a problem, and manage your reputation, I’m enough in the noise floor that this happens:

My messages pass every test (SPF, DKIM, DMARC) that they’re running.

Here’s a link to their guidelines for forwarders. I’m following all of them.

And despite being a huge network operator, Gmail is completely opaque. You have no path to complain if google groups are sending you UAE jobs spam that you were added to without your consent. You have no way of contacting their NOC. You have no 800 number you can call to report a problem.

In a fit of disgust, I did the only thing one can do: told my users: Forwarding to gmail is problematic. Please only pull mail IN to Gmail with pop3, do not forward. Like every horror movie wanting to leave things open for a sequel, this is where we fade to the credits and write:

“The End” [and after a second] “…?”

Spoiler: it’s not the end.

So, recently, my users once again complained “Hey, we’re not getting any email from your server? Is something down?”

Turns out, the service that gmail uses to grab pop3 mail is a little bit broken.

For starters, if it hits a problem, it hides that problem in the settings dialog. You don’t see the error on your inbox. You have to go in to the little “gear” and go to a separate tab, and then you spot this:

Want more info? Sure, let’s see what kind of information this huge, global thousand-dollar-a-share-conglomerate provides:

Note, they cut off the error mid-sentence. Also “server returned error?” My server didn’t return any error at all. Your system returned that error. They don’t tell you what IP is being used, which protocol (ipv4 or ipv6), and on a busy server, that level of timestamp, with no seconds, makes it really hard to narrow down in logs.

Now, like most people who believe in forwarding internet protocols, my server has an ipv6 and an ipv4 address. And in most of the modern world, if you have both, ipv6 is always checked first. Looking at a TCPdump shows that they reach out to me with a SYN packet, and then when I reply to their connection attempt, they never receive it. (Flags [S] means SYN, [S.] means SYN/ACK)

Hilarously, my ISP, Hurricane Electric, the guys who run tunnelbroker.net for the last two decades, has said “try disabling ipv6”. Yeah, as a troubleshooting measure, maybe. But that’s *really* not the way the world should function. Tweaking *that* requires a bunch of SSL nonsense, but it’s my next step, since I don’t want to remove the AAAA DNS record for my main IP.

Update 2020–06–09: I have had to go ahead and re-issue my SSL cert to include ipv4-only hostnames and ipv6-only hostnames as SubjectAltNames on my main certificate. Annoyingly, while my existing one-year geotrust cert was only like $29, any cert that included SubjectAltNames was over $100, plus some included some level of altnames, some did not, and the fees were non-obvious. I’ve switched to using Let’s Encrypt for this (which is free) but requires a restart of all services using that cert (currently like 8 or 9 of them) every time the cert rolls, which is way more frequently than a commercial cert would.

The SSL landscape really sucks. This may be its own article real soon. Perhaps a helpful graphic from one SSL vendor will tell the tale:

This is where we are at now. This problem remains unsolved. This article will be updated if I can solve it. I have reached out to two contacts I have who are inside Google (both of whom are fairly senior), but this is not what a normal network operator should have to do.

--

--

Gushi

Gushi/Dan Mahoney is a sysadmin/network operator in Northern Washington, working for a global non-profit, as well as individually.