Um, putting an alias (record with CNAME) in an MX record is still frowned upon by the RFCs (specifically, 2181 and 5321). So do you really want to promote a standard that goes against that, and that can result in very brittle setups?
I would propose to make this check a lot simpeler: always take the domain found in the MX record as the base domain for corresponding TLSA lookups. And only do the TLSA lookups if the domain name of that MX record is DNSSEC protected. That way, if a CNAME is introduced, it doesn't change the TLSA-protected status of the original domain, or break things in case DNSSEC is suddenly added or dropped from a remote zone.
So in your terminology:
a. mx.step<N>.example is secure (all the AD bits are 1): * use the original domain as the base for TLSA lookups
b. some lookups are insecure: check security status of original domain via "mx.original.example CNAME ?" query. * if secure, again use "mx.original.example" as base domain for TLSA lookups * if not secure, no DANE. * on failure (servfail, timeout): - if any previous query for this delivery was DNSSEC signed (AD=1), tempfail delivery to this MX record (take next record if any) - if no DNSSEC found, no DANE, try to deliver as usual (remote DNS server might fail on CNAME queries).
The upshot of all this is that nothing should change for those that aren't doing DNSSEC or TLSA records. You're already introducing a possible CNAME query that isn't used in the current mail delivery mechanism, and there's no telling what that would cause in the real world. It has happened in the past that DNS servers return SERVFAIL on any CNAME query due to bugs.
In your original proposal:
On 6-12-18 18:55, Viktor Dukhovni wrote:
a. "mx.step<N>.example" is secure (all the AD bits are 1), and associated secure TLSA RRset found.
* use "mx.step<N>.example" as TLSA base domain with corresponding TLSA records
... you're effectively approving the "MX points to CNAME" case. Which happens, but is not considered best practice, so I wouldn't recommend to make this case the first preference. If any, I'd put it in the standard saying: if all other TLSA record lookups failed, implementation MAY check mx.step<N>.example" as base domain for TLSA lookup.
b. expanded CNAME is secure (all AD bits are 1), and no secure TLSA associated with the expanded name.
* check for secure TLSA at original name
As explained above, I'd recommend to make this the default. The way it's specified here, it may cause a brittle setup that can break easily.
Imagine an organisation that had their DANE TLSA setup right, at one point. Now someone changes the mx.original.example into a CNAME to mx.stepN.example. Practically all mail delivery these days allows that CNAME, and mail delivery will follow. Assume the TLSA records are copied to the mx.stepN.example.
Now the keys are rolled, and the _25._tcp.mx.stepN.example records are updated (assuming automatic rollover). The original TLSA records are forgotten. Nobody checks them anyway.
Then, due to some fluke, DNSSEC fails for stepN.example (maybe the domain is moved?), and suddenly the TLSA records at the original domain get checked again by DANE. Those still contain the (hashes of the) old keys, and mail delivery breaks. Nothing changed at "original.example" to where the mail is addressed, but DNSSEC status at "stepN.example" changed. That's a very hard to debug situation.
c. expanded CNAME is insecure (one of the AD bits is 0). Check security status of initial name via an "mx.original.example. IN CNAME ?" query. You only need the validation status (AD bit) not the result.
* If "insecure", done no DANE. * If "secure", look for TLSA RRset with "mx.original.example" as TLSA base domain. * If lookup failure (timeout, servfail whether DNSSEC-related or not), try a different MX host or defer.
... except, as metioned above, that CNAME query is new in the email delivery landscape, so I'd opt on the side of caution (ignore any error) if you haven't seen any DNSSEC signed domain at this point.