[ Also sent to the postfix-users@postfix.org list. ]
I sent a "please fix your TLSA records" notice to "postmaster" and "info" at a domain whose primary MX host certificate fails to match its TLSA records:
postfix/pickup[62805]: 7672C1DD39: ... postfix/cleanup[63835]: 7672C1DD39: message-id=<...> postfix/qmgr[844]: 7672C1DD39: from=<...>, size=8815, nrcpt=2 (queue active) postfix/smtp[63837]: 7672C1DD39: host mail.example.nl[192.0.2.1] said: 450 4.2.0 ... Greylisted, ... postfix/smtp[63837]: 7672C1DD39: to=postmaster@example.nl, relay=mail.example.nl[192.0.2.1]:25, delay=3.7, delays=0.03/0.01/1.5/2.2, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 119F07F861) postfix/smtp[63837]: 7672C1DD39: to=info@example.nl, relay=vps.example.nl[192.0.2.2]:25, delay=5.7, delays=0.03/0.01/5/0.61, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 2FBDC2064A)
The "postmaster" account copy was accepted by the primary MX, but the "info" address was greylisted: *only* by the primary MX.
However, the secondary MX accepts email *without* greylisting! This has the effect of delivering all mail via the secondary, with the primary never seeing any retries that cause greylisting to pass.
What happened next was interesting and ironic. I got a "delay" notification from the secondary:
Your message could not be delivered for more than 3 hour(s). It will be retried until it is 5 day(s) old.
For further assistance, please send mail to postmaster.
If you do so, please include this problem report. You can delete your own text from the attached returned message.
The mail system
info@example.nl: Server certificate not trusted
The secondary is configured to verify DANE TLSA when relaying mail to the primary, so the problem report to "info", and indeed pretty much all mail to the system in question is queueing on the secondary MX host, waiting for the primary MX TLSA records to be fixed!
Only "postmaster" email gets through to its destination (if the sending system does not validate TLSA records, which is naturally the case for my outbound "please fix your TLSA records" notices).
The main lesson here is not implement Greylisting on only a subset of your MX hosts. Don't do that!
1. When greylisting, make sure that all MX hosts with equal or worse (higher) MX preference also greylist.
2. It is harmless (though less effective) to greylist only on (all) backup MX hosts, and skip greylisting on the primary. It is not a good idea to greylist on the primary, and not on the backups.
3. Monitor your DANE TLSA records, in such a way that notices of problems get through even when the TLSA records are stale.
4. Handle certificate rotation correctly.
https://dane.sys4.de/common_mistakes#3
http://tools.ietf.org/html/rfc7671#section-8.1 http://tools.ietf.org/html/rfc7671#section-8.4 https://community.letsencrypt.org/t/please-avoid-3-0-1-and-3-0-2-dane-tlsa-r... https://www.internetsociety.org/deploy360/blog/2016/03/lets-encrypt-certific...
With "3 1 1" + "2 1 1" TLSA records, the rollover process can be substantially simplified:
https://www.ietf.org/mail-archive/web/uta/current/msg01498.html
Happy greylisting and TLSA record publishing to you all...