Re: DANE SMTP behavior with inconsistent initial CNAME response
On Mon, Dec 10, 2018 at 12:41:00PM +0100, Jan-Pieter Cornet wrote:
Um, putting an alias (record with CNAME) in an MX record is still frowned upon by the RFCs (specifically, 2181 and 5321).
Yes, but they are used in practice, and I don't know of any MTAs that refuse to follow the CNAMEs, so one has to live with their de facto use, despite the fact that they're undefined in RFC 5321 and legacy 2821, 821.
So do you really want to promote a standard that goes against that, and that can result in very brittle setups?
There are two sides to this issue:
* What the receiving domain should do, to configure interoperable security.
* What the sending MTA should do, to deliver securely when possible.
On the *receiving* side:
1. Avoid CNAMEs if at all possible, use already canonical hostnames in MX records.
2. If you do use a CNAME, and want to enable DANE, publish TLSA records on *both* sides of the CNAME chain:
; TLSA record alias under original MX hostname to TLSA RRset ; under fully-expanded MX hostname: ; _25._tcp.mx.example.com. IN CNAME _25._tcp.mx.step<N>.example.com.
; Actual TLSA record data for the real MX host. ; _25._tcp.mx.step<N>.example.com. IN TLSA 3 1 1 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
However, on the sending side, we check both places, with the fully-expanded target preferred. Because:
1. Customers who publish MX CNAMEs to the MX host of their provider, generally don't know or neglect to also publish CNAMEs for the associated TLSA records.
2. Some customers attempt to publish TLSA data (not just an alias to the real TLSA RRset at the provider) on their end of the CNAME chain, but they don't manage key rollover for the target host, and inevitably have out-of-sync TLSA records. If the provider publishes "real" TLSA records, checking there first works better.
Yes, when CNAMEs are using on the receiving side, the setup can be fragile when configured sloppily, some receiver configurations are less secure than others. This is *opportunistic* DANE TLS.
I would propose to make this check a lot simpeler: always take the domain found in the MX record as the base domain for corresponding TLSA lookups. And only do the TLSA lookups if the domain name of that MX record is DNSSEC protected. That way, if a CNAME is introduced, it doesn't change the TLSA-protected status of the original domain, or break things in case DNSSEC is suddenly added or dropped from a remote zone.
There's no "break things". If TLSA records are published on the remote end, they're surely more authoritative for the actual certificate chain of the underlying host. If they're not published, then the fallback behaviour is exactly as you describe.
So in your terminology:
a. mx.step<N>.example is secure (all the AD bits are 1):
- use the original domain as the base for TLSA lookups
Often, when CNAMEs are used, not present, so domain remains insecure.
The upshot of all this is that nothing should change for those that aren't doing DNSSEC or TLSA records. You're already introducing a possible CNAME query that isn't used in the current mail delivery mechanism, and there's no telling what that would cause in the real world. It has happened in the past that DNS servers return SERVFAIL on any CNAME query due to bugs.
The CNAME query is needed, only and exactly when the A records are found on the far side of a CNAME chain. This is the case even for your proposal, because that's what it takes to determine whether the MX host lies in a secure zone, and skip TLSA records if not.
If one attempts TLSA lookups in unsigned zones one can't deliver any mail to any of the hundreds of thousands of signed domains hosted by Microsoft at outlook.com:
nist.gov. IN MX 0 nist-gov.mail.protection.outlook.com. ; NoError AD=1 nist-gov.mail.protection.outlook.com. IN A 23.103.198.42 ; NoError AD=0 _25._tcp.nist-gov.mail.protection.outlook.com. IN TLSA ? ; ServFail AD=0
... you're effectively approving the "MX points to CNAME" case. Which happens, but is not considered best practice, so I wouldn't recommend to make this case the first preference.
The specification is carefully considered, and deals with the world as it is, not how I might like it to be.
On 10-12-18 14:27, Viktor Dukhovni wrote:
Um, putting an alias (record with CNAME) in an MX record is still frowned upon by the RFCs (specifically, 2181 and 5321).
Yes, but they are used in practice, and I don't know of any MTAs that refuse to follow the CNAMEs, so one has to live with their de facto use, despite the fact that they're undefined in RFC 5321 and legacy 2821, 821.
I really meant 2181 (Clarification to DNS specification), which is still current, but...
The specification is carefully considered, and deals with the world as it is, not how I might like it to be.
Fair enough, I realize MX-pointing-to-CNAME exists and we have to live with it.
It would be a good idea for DANE SMTP verifiers to verify both ends of the CNAME (chain) for any TLSA records, and whether they match the real record. Just to avoid any pitfalls as I described.
Yes, when CNAMEs are using on the receiving side, the setup can be fragile when configured sloppily, some receiver configurations are less secure than others. This is *opportunistic* DANE TLS.
It still breaks mail delivery if you get it wrong though. Opportunistic DANE also means blocking the connection if there are TLSA records but they don't match.
[on only checking original MX host TLSA records] Often, when CNAMEs are used, not present, so domain remains insecure.
That's indeed the biggest disadvantage of my proposal.
The CNAME query is needed, only and exactly when the A records are found on the far side of a CNAME chain. This is the case even for your proposal, because that's what it takes to determine whether the MX host lies in a secure zone, and skip TLSA records if not.
I'm not suggesting to skip the CNAME query. I'm suggesting to not treat errors from that query as blocking for delivery, if there is no DNSSEC signed domain involved. Since the CNAME query is a new addition to the mail delivery landscape, you risk running into unexpected results from nameservers.
... for which you give an excellent example yourself :)
If one attempts TLSA lookups in unsigned zones one can't deliver any mail to any of the hundreds of thousands of signed domains hosted by Microsoft at outlook.com:
nist.gov. IN MX 0 nist-gov.mail.protection.outlook.com. ; NoError AD=1 nist-gov.mail.protection.outlook.com. IN A 23.103.198.42 ; NoError AD=0 _25._tcp.nist-gov.mail.protection.outlook.com. IN TLSA ? ; ServFail AD=0
As *.mail.protection.outlook.com is unsigned, there is no reason to lookup TLSA records, so the ServFail for the TLSA lookup shouldn't harm anyone. I didn't suggest any changes to that part of the protocol. (In fact, I don't think any DANE implementation even does the TLSA lookup on a non-signed result from the A query of the mx host?)\
Oh, PS: nist-gov.mail.protection.outlook.com. IN CNAME ? ; ServFail AD=0 (The nameserver actually returns NOTIMP, which is converted to ServFail by the local resolver). Now in this case the CNAME query isn't needed since there is no CNAME record involved, but still, it shows that nameservers can fail the CNAME query.
On Dec 10, 2018, at 9:41 AM, Jan-Pieter Cornet johnpc@xs4all.net wrote:
I'm not suggesting to skip the CNAME query. I'm suggesting to not treat errors from that query as blocking for delivery, if there is no DNSSEC signed domain involved. Since the CNAME query is a new addition to the mail delivery landscape, you risk running into unexpected results from nameservers.
You should stop. The CNAME record can't be a problem, it is only made, *after* discovering that the A/AAAA records for the MX host, involve a CNAME, and the response AD bit is 0:
Q: mx.example.com. IN A ? R: mx.example.com. IN CNAME mx.example.net. ; AD=0 mx.example.net. IN A 192.0.2.1 ; AD=0
Only then does one make a CNAME query:
Q: mx.example.com IN CNAME ? R: mx.example.com IN CNAME mx.example.net. ; AD=???
to discover the whether the original zone is signed,
... for which you give an excellent example yourself :)
My example shows why and how we *avoid* making the problematic TLSA query, by making sure we know the security status of the MX host's zone. The text in the specification is not some hastily constructed ad hoc choice. Please consider the possibility that there are good reasons for the choices made.
I am done with this topic for now.
participants (2)
-
Jan-Pieter Cornet
-
Viktor Dukhovni