Hi,
On smtp.xs4all.nl we enabled DANE outgoing verification, but currently only with a "soft fail": if DANE fails, we fallback to non-DANE delivery... for now. Except for a few hardcoded domains (currently only our own, and havedane.net). If anyone feels confident about their own DANE setup, feel free to send me your domain (or domains), and I'll add it to the list of hardfails.
... but in a few weeks we'll disable the softfail anyway, if we don't see any problems (other than the ones in the danefail list, which I'm not yet using, but the softfail is hitting some of the domains on that list already).
We don't do smtp-tlsrpt reporting (yet?), but I can make some stats on demand for your domain, if you'd like that.
As a word of caution to other would-be DANE implementers: we also had problems with a domain that was not on the dane-fail list. This domain had DNSSEC and TLSA records for the MX host, but did not offer STARTTLS. That would be a huge red flag, and fortunately we had the softfail fallback so mail kept on being delivered. After investigating, it turned out that this domain had *my* IPs in an exception list of "do not offer TLS", because a few years ago we hit some sort of timeout bug that caused hanging connections. The hanging connection bug has since been solved, but the IP exception was still there...
So the moral of the story is: next to the domains in the dane-fail list, there might be local exceptions that might apply, so keep an eye on your logfiles.
On Sep 3, 2018, at 6:57 AM, Jan-Pieter Cornet johnpc@xs4all.net wrote:
On smtp.xs4all.nl we enabled DANE outgoing verification, but currently only with a "soft fail": if DANE fails, we fallback to non-DANE delivery... for now. Except for a few hardcoded domains (currently only our own, and havedane.net).
Great news, welcome to the club, and thanks! When you do enable a default hardfail, you might consider exempting a particular sender address or subject tag, or perhaps a custom header:
https://tools.ietf.org/html/draft-ietf-uta-smtp-require-tls-03#section-3
so that you can still send email to the contacts of any domains that are failing, prior to disabling DANE for the domain for all senders.
You might find that some domains have intermittent outages as a result of poorly executed key/cert rollovers where the TLSA records are updated *after* they first become invalid. For any of those, soft fail may make sense until your logs show no failures for a year or more.
As a word of caution to other would-be DANE implementers: we also had problems with a domain that was not on the dane-fail list. This domain had DNSSEC and TLSA records for the MX host, but did not offer STARTTLS. That would be a huge red flag, and fortunately we had the softfail fallback so mail kept on being delivered. After investigating, it turned out that this domain had *my* IPs in an exception list of "do not offer TLS", because a few years ago we hit some sort of timeout bug that caused hanging connections. The hanging connection bug has since been solved, but the IP exception was still there...
So the moral of the story is: next to the domains in the dane-fail list, there might be local exceptions that might apply, so keep an eye on your logfiles.
Indeed the danefail list is not expected to be "complete". If any of you run into domains for which you need to make an exception, please open an issue or pull request on github if unable to resolve with the remote domain.
For those publishing TLSA records for inbound DANE, please make *sure* that you're offering STARTTLS *unconditionally*, to all SMTP clients with no restrictions by client IP address or reputation. Configurations that restrict STARTTLS to a set of "good" IPs are not compatible with DANE. If STARTTLS was disabled with some client IPs for interoperability reasons, resolve those first.
Viktor Dukhovni ietf-dane@dukhovni.org writes:
For those publishing TLSA records for inbound DANE, please make *sure* that you're offering STARTTLS *unconditionally*, to all SMTP clients with no restrictions by client IP address or reputation. Configurations that restrict STARTTLS to a set of "good" IPs are not compatible with DANE.
This is indeed an important point to consider. Never thought of the possibility that the same client would first fail TLS and then start using DANE at some later point in time.
If STARTTLS was disabled with some client IPs for interoperability reasons, resolve those first.
In a perfect world, yes. But in practice: How do you do that?
I don't think it is realistic to offer STARTTLS without some local exception list. There are just too many buggy clients and ignorant sysadmins.
Bjørn
On Sep 3, 2018, at 8:25 AM, Bjørn Mork bjorn@mork.no wrote:
If STARTTLS was disabled with some client IPs for interoperability reasons, resolve those first.
In a perfect world, yes. But in practice: How do you do that?
I don't think it is realistic to offer STARTTLS without some local exception list. There are just too many buggy clients and ignorant sysadmins.
AFAIK, things aren't as dire as you describe. I'd expect MTA-to-MTA opportunistic TLS to be working reliably with the vast majority of senders, with the need for exceptions very rare. Yes, some Yahoo systems (still!) insist on shooting themselves in the foot and dropping to cleartext when WebPKI certificate verification fails, but they do then succeed in the clear:
http://postfix.1071664.n5.nabble.com/Another-yahoo-problem-td89756.html#a897...
with no exceptions needed in their case.
If a sender has buggy client-side STARTTLS, they'll typically send in the clear when STARTTLS fails, in which case you still don't need to make exceptions, let them retry.
If you do find the need to make an exception, I'd recommend applying any STARTTLS exceptions on just a subset of your primary MX hosts. This has two benefits:
1. They can still get through if/when they fix their software turn on DANE.
2. Their logs and your logs continue to record intermittent failures while they are still broken, with the mail getting through via the MX hosts with the exception list. That way, you can prune your lists when failures cease, and they may take action sooner if they see ongoing issues.
Viktor Dukhovni ietf-dane@dukhovni.org writes:
If you do find the need to make an exception, I'd recommend applying any STARTTLS exceptions on just a subset of your primary MX hosts. This has two benefits:
They can still get through if/when they fix their software turn on DANE.
Their logs and your logs continue to record intermittent failures while they are still broken, with the mail getting through via the MX hosts with the exception list. That way, you can prune your lists when failures cease, and they may take action sooner if they see ongoing issues.
Smart! That is certainly a good solution. Thanks
Bjørn
On 3-9-18 14:25, Bjørn Mork wrote:
For those publishing TLSA records for inbound DANE, please make *sure* that you're offering STARTTLS *unconditionally*, to all SMTP clients with no restrictions by client IP address or reputation. Configurations that restrict STARTTLS to a set of "good" IPs are not compatible with DANE.
This is indeed an important point to consider. Never thought of the possibility that the same client would first fail TLS and then start using DANE at some later point in time.
In our case, it wasn't that we failed TLS, but a previous software version had a limitation that annoyed some receivers: the outgoing connection would be kept open for as long as the "max command timeout" value, which was set at 5 minutes. So after every mail delivery, the connection would stay open for another 5 minutes (the idea being: existing connections are cheap, making a TLS connection isnt, so if another mail arrives this connection can be reused).
But not everyone agrees that connections are cheap, especially if your software forks for every incoming SMTP connection.
So, most operators would, when looking at their systems at some point in time, just see a few of our hosts keeping open a connection to their system, usually doing nothing. This got combined with flaky NAT-alike loadbalancers that would expire idle TCP connections, and you'd get stuck with half-open TCP/SSL connections that took a while to get rid of (sometimes hours). I know at least in one case that this happened.
For some reason, disabling TLS made these symptoms less painful, so that approach was taken (this was a few years ago, DANE wasn't on anyone's radar).
Our current software no longer has that limitation, and we now close the connection 30 seconds after delivery (unless new mail arrives that can reuse the connection).
If STARTTLS was disabled with some client IPs for interoperability reasons, resolve those first.
In a perfect world, yes. But in practice: How do you do that?
I don't think it is realistic to offer STARTTLS without some local exception list. There are just too many buggy clients and ignorant sysadmins.
I never had a need for STARTTLS exceptions. Just make sure your software has appropriate timeouts and connection limits, and closes idle connections. And make sure that any loadbalancer you have in front of your systems, keeps track of TCP connections at least as long as the maximum timeout on your mail platform (and preferably a lot longer).
If any software misbehaves (eg doesn't respect timeouts), complain to the vendor.
When you really do get abusive clients, just block them completely instead of only blocking TLS. That draws their attention a lot sooner and puts much less of a load on your system. If this "client" is too big to be blocked completely, then either they likely have a well functioning abuse department and can resolve the issue, or you are doing something wrong in the first place, see timeouts and connection limits above.
On Sep 4, 2018, at 3:33 AM, Jan-Pieter Cornet johnpc@xs4all.net wrote:
Our current software no longer has that limitation, and we now close the connection 30 seconds after delivery (unless new mail arrives that can reuse the connection).
One off-topic comment on this subject. Postfix also has support for connection caching, but uses the feature much more carefully (on demand):
1. Connection caching to a destination is only enabled when the queue has more messages to that destination than the destination concurrency limit. In other words, only when there are messages waiting for a delivery slot queued behind the current message, and so the cached connection is likely to get used.
2. Cached connections are closed after 2s of idle time, only sustained traffic keeps cached connections open.
3. Cached connections are closed after ~300s of use. This amortizes connection setup latency when some MX hosts are slow. Sadly some receiving sites limit the number of messages per connection (rather than connection duration). That's unfortunate, re-use limits by message count don't overcome slow MX connection "attractors".
See http://www.postfix.org/CONNECTION_CACHE_README.html#safety
These ensure that connection caching is never seen as "aggressive" by receiving systems.
On 3-9-18 13:58, Viktor Dukhovni wrote:
On smtp.xs4all.nl we enabled DANE outgoing verification[...]
Great news, welcome to the club, and thanks! When you do enable a default hardfail, you might consider exempting a particular sender address or subject tag, or perhaps a custom header:
https://tools.ietf.org/html/draft-ietf-uta-smtp-require-tls-03#section-3
so that you can still send email to the contacts of any domains that are failing, prior to disabling DANE for the domain for all senders.
Oh, that's a good idea. Implemented the "RequireTLS: NO" header now :)
You might find that some domains have intermittent outages as a result of poorly executed key/cert rollovers where the TLSA records are updated *after* they first become invalid. For any of those, soft fail may make sense until your logs show no failures for a year or more.
Hm, that requires quite a bit of state-keeping. Can such domains be added to the dane-fail list, or should those domains be put on another list? (dane-transient-failures?)
Indeed the danefail list is not expected to be "complete". If any of you run into domains for which you need to make an exception, please open an issue or pull request on github if unable to resolve with the remote domain.
I'll go over all the DANE logs in a few days, and see if any domains not on the dane-fail list show any errors. If I find any, I'll contact the domains and if necessary create a github pull request on the danefail list.
For those publishing TLSA records for inbound DANE, please make *sure* that you're offering STARTTLS *unconditionally*, to all SMTP clients with no restrictions by client IP address or reputation. Configurations that restrict STARTTLS to a set of "good" IPs are not compatible with DANE. If STARTTLS was disabled with some client IPs for interoperability reasons, resolve those first.
Thanks, I'll use your message to persuade any other domains that don't send me STARTTLS, if I find any :). If they do not respond, can I add those domains to the dane-fail list too? I understand that selectively offering STARTTLS is a lot harder to test for other people...
Am 03.09.2018 um 12:57 schrieb Jan-Pieter Cornet:
On smtp.xs4all.nl we enabled DANE outgoing verification, but currently only with a "soft fail": if DANE fails, we fallback to non-DANE delivery... for now. Except for a few hardcoded domains (currently only our own, and havedane.net). If anyone feels confident about their own DANE setup, feel free to send me your domain (or domains), and I'll add it to the list of hardfails.
Hello,
cool stuff! I assume you use postfix. Could you be more verbose on how you implement what you name as "soft fail"? I'm currently not aware how to configure the "log dane failures but deliver anyway"
Our domain (datev.de) could be a candidate for your "hardcoded domains". But I expect there is virtually no traffic between your and my users :-/
Andreas
On Sep 3, 2018, at 9:07 AM, Andreas Schulze andreas.schulze@datev.de wrote:
I assume you use postfix. Could you be more verbose on how you implement what you name as "soft fail"? I'm currently not aware how to configure the "log dane failures but deliver anyway"
Postfix does not presently have a 'soft-fail' mode for DANE. So either xs4all is not using Postfix, or the code has been modified to support soft-fail.
Hi Andreas,
On 3-9-18 15:07, Andreas Schulze wrote:
Am 03.09.2018 um 12:57 schrieb Jan-Pieter Cornet:
On smtp.xs4all.nl we enabled DANE outgoing verification, but currently only with a "soft fail": if DANE fails, we fallback to non-DANE delivery... for now. Except for a few hardcoded domains (currently only our own, and havedane.net). If anyone feels confident about their own DANE setup, feel free to send me your domain (or domains), and I'll add it to the list of hardfails.
cool stuff! I assume you use postfix. Could you be more verbose on how you implement what you name as "soft fail"?
No, we use Cloudmark Gateway (version 5.5.2 at the moment).
I'm currently not aware how to configure the "log dane failures but deliver anyway"
This MTA is fairly programmable. The way we implemented 'soft fail' is by inspecting the error in the "temp fail" phase. If that indicates a DANE problem (either bad TLSA records, bad certificate, or no TLS at all), then we re-queue the message for delivery within a few seconds, but marked as "no DANE".
So what that effectively does is connect to the remote MX, notice that there's a problem and close the connection again. Then a few seconds later you get another connection, but this time we do not check DANE, and delivery proceeds with only opportunistic TLS.
I'll use it to see if there are any more domains that need to be put on the dane-fail list.
Our domain (datev.de) could be a candidate for your "hardcoded domains". But I expect there is virtually no traffic between your and my users :-/
Actually, there is some traffic, a few mails a day it seems :). I've added your domain to the list. (Check out connections from 194.109.24.0/26. smtp.xs4all.nl is a cluster, and the frontend IP address is never going to make outgoing connections, but the cluster members are, which are in that network).
Andreas
Hi Jan-Pieter,
On smtp.xs4all.nl we enabled DANE outgoing verification, but currently only with a "soft fail": if DANE fails, we fallback to non-DANE delivery... for now. Except for a few hardcoded domains (currently only our own, and havedane.net). If anyone feels confident about their own DANE setup, feel free to send me your domain (or domains), and I'll add it to the list of hardfails.
Very nice! Please add internet.nl and forumstandaardisatie.nl to your list of hardfails.
participants (5)
-
Andreas Schulze
-
Bjørn Mork
-
Jan-Pieter Cornet
-
Knubben, B.S.J. (Bart) - Forum Standaardisatie
-
Viktor Dukhovni