DANE SMTP behavior with inconsistent initial CNAME response

older
Re: azathoth.unzane.com: DANE TLSA...

Gael Roualland

6 Dec 2018 6 Dec '18

4:42 p.m.

Hello DANE users,

One of our customers brought to our attention failure to deliver email to some domains with a very broken DNS setup when enabling Opportunistic DANE.

The impacted domains have different name servers at the zone and TLD delegation level, and these name servers serve different zone content for the domain, eg for say a "domain.tld":

- some of these serve a valid zone content such as (A): domain.tld. IN MX 1 mail.domain.tld. mail.domain.tld IN A 192.168.1.1

- others server a "parking" like zone (B): *.domain.tld. IN CNAME domain.tld. domain.tld. IN A 192.168.1.2

None of these zones are signed.

When trying to check for DANE support for the domain, the following scenario can happen if data from both zones is intermixed:

1) MX lookup for the domain: returns "1 mail.domain.tld." (from zone A) 2) A/AAAA lookup for "mail.domain.tld": returns "A 192.168.1.1" through "CNAME domain.tld." (from zone B) 3) Since the A reply is insecure and is from a CNAME alias, as per §2.2.2 of the RFC we issue an explicit CNAME request on "mail.domain.tld." to check if this is secure: this returns a NODATA answer (from zone A)

In that case, our implementation considers that having a NODATA answer to a CNAME that it received previously in a former response is suspicious, and temporary fails this delivery attempt as if we hit a DNS error. (Subsequent retries may have a different reply order subject to DNS caches refresh, but since only the address in zone A is a valid next-hop, odds for a successful delivery are low)

AFAIK, this particular case is not covered by the RFC, and we're trying to assess if degrading to Opportunistic TLS in that case instead of temporary failing could actually introduce some security flaw in the process.

It seems to me that for an attacker to be able to purposely return such inconsistent answers, he would need to be able to spoof answers if zones are not signed (like here) or have either control of the zone DNS key or of the MTA's resolver, both which would allow other ways to skip DANE.

I'd like to get your opinions on this. Does that sound safe ? What do other implementations do in that particular case ?

Thanks,

Gaël.

Show replies by date

Viktor Dukhovni

6 Dec 6 Dec

5:14 p.m.

On Thu, Dec 06, 2018 at 04:42:44PM +0100, Gael Roualland wrote:

...

some of these serve a valid zone content such as (A):

domain.tld. IN MX 1 mail.domain.tld. mail.domain.tld IN A 192.168.1.1

others server a "parking" like zone (B):

*.domain.tld. IN CNAME domain.tld. domain.tld. IN A 192.168.1.2

...

None of these zones are signed.

A careful DANE implementation will generally not employ DANE for unsigned domains.

...

MX lookup for the domain: returns "1 mail.domain.tld." (from zone A)

This is unsigned, and so you could stop DANE processing at that point. However, some DANE implementations (e.g. recent Postfix versions) still employ DANE even for domains with "insecure" MX records, when the selected MX host is in a signed zone.

...

A/AAAA lookup for "mail.domain.tld": returns "A 192.168.1.1" through

"CNAME domain.tld." (from zone B)

Yes, that can happen, what a mess. The destination domain's DNS is broken, and DANE or not, it is lucky to get any mail at all.

...

Since the A reply is insecure and is from a CNAME alias, as per

§2.2.2 of the RFC we issue an explicit CNAME request on "mail.domain.tld." to check if this is secure: this returns a NODATA answer (from zone A)

So long as the NODATA answer is "insecure", the fact that it denies the CNAME's existence is tolerable, the sole purpose of the CNAME lookup is probe for DNSSEC in the original zone, and an "insecure" NODATA shows the zone to be unsigned.

...

In that case, our implementation considers that having a NODATA answer to a CNAME that it received previously in a former response is suspicious, and temporary fails this delivery attempt as if we hit a DNS error.

I would suggest that it is OK to be less "suspicious". While zone content can vary between nameservers, the DNSSEC status of a domain should be robust.

...

AFAIK, this particular case is not covered by the RFC, and we're trying to assess if degrading to Opportunistic TLS in that case instead of temporary failing could actually introduce some security flaw in the process.

You should be fine.

...

It seems to me that for an attacker to be able to purposely return such inconsistent answers, he would need to be able to spoof answers if zones are not signed (like here) or have either control of the zone DNS key or of the MTA's resolver, both which would allow other ways to skip DANE.

I'd like to get your opinions on this. Does that sound safe ? What do other implementations do in that particular case ?

Your instinct is reasonable. It is OK to continue with this domain.

By the way, it is news to me that Proofpoint supports DANE. Congratulations and welcome to the club! Do you support both DANE-EE(3) and DANE-TA(2)? The most recent opensource Sendmail snapshot I can find still has fairly minimal DANE support. Is there somewhere a public description of DANE support in Proofpoint?

-- Viktor.

Gael Roualland

6:37 p.m.

On 6/12/18 17:16, Viktor Dukhovni wrote:

...

...

MX lookup for the domain: returns "1 mail.domain.tld." (from zone A)

This is unsigned, and so you could stop DANE processing at that point. However, some DANE implementations (e.g. recent Postfix versions) still employ DANE even for domains with "insecure" MX records, when the selected MX host is in a signed zone.

This is what we offer as an option too (default on) and which is enabled in that case. However we could still have a similar behavior if the original domain is signed, but the MX hostname is in a non-signed zone or an alias to a non-signed name.

...

...

A/AAAA lookup for "mail.domain.tld": returns "A 192.168.1.1" through

"CNAME domain.tld." (from zone B)

Yes, that can happen, what a mess. The destination domain's DNS is broken, and DANE or not, it is lucky to get any mail at all.

Indeed...

...

...

Since the A reply is insecure and is from a CNAME alias, as per

§2.2.2 of the RFC we issue an explicit CNAME request on "mail.domain.tld." to check if this is secure: this returns a NODATA answer (from zone A)

So long as the NODATA answer is "insecure", the fact that it denies the CNAME's existence is tolerable, the sole purpose of the CNAME lookup is probe for DNSSEC in the original zone, and an "insecure" NODATA shows the zone to be unsigned.

Agreed, we could look only at the validation result here. But assuming we get the same inconsistency in a secure way, then would that also be enough to apply DANE using the initial name as the TLSA domain ?

...

...
In that case, our implementation considers that having a NODATA answer to a CNAME that it received previously in a former response is suspicious, and temporary fails this delivery attempt as if we hit a DNS error.

I would suggest that it is OK to be less "suspicious". While zone content can vary between nameservers, the DNSSEC status of a domain should be robust.

OK.

...

Your instinct is reasonable. It is OK to continue with this domain.

By the way, it is news to me that Proofpoint supports DANE. Congratulations and welcome to the club! Do you support both DANE-EE(3) and DANE-TA(2)? The most recent opensource Sendmail snapshot I can find still has fairly minimal DANE support. Is there somewhere a public description of DANE support in Proofpoint?

Thanks.

I'm referring to the Cloudmark Security Platform product. Cloudmark was acquired by Proofpoint a year ago. I can't speak for DANE support status in other Proofpoint products.

We do support DANE-EE and DANE-TA, and try to adhere to the specification as much as possible. We have a blog post about it at [1] but that doesn't go into technical implementation details.

Gaël.

[1] https://blog.cloudmark.com/2017/03/27/dane-and-email-security/

Viktor Dukhovni

6:55 p.m.

...

On Dec 6, 2018, at 12:37 PM, Gael Roualland groualland@proofpoint.com wrote:

Agreed, we could look only at the validation result here. But assuming we get the same inconsistency in a secure way, then would that also be enough to apply DANE using the initial name as the TLSA domain ?

If the originally observed CNAME's fully expanded target is insecure, or there are no TLSA records with that name as the base domain, then you can indeed proceed to probe for TLSA records at the original name. If the full CNAME expansion result is secure, then necessarily the origin is also secure, so you don't even need to make the "initial CNAME" query.

mx.original.example. IN CNAME mx.step1.example. ; AD=(0|1) ... mx.step<i>.example. IN CNAME mx.step<i+1>.example. ; AD=(0|1) mx.step<N>.example. IN A 192.2.0.1 ; AD=(0|1)

[ Typically the resolver delivers the whole chain with a single AD bit value, so it is either "1" for all, or 0 for the whole response, but then you don't know where in the chain it stopped being secure. ]

a. "mx.step<N>.example" is secure (all the AD bits are 1), and associated secure TLSA RRset found.

* use "mx.step<N>.example" as TLSA base domain with corresponding TLSA records

b. expanded CNAME is secure (all AD bits are 1), and no secure TLSA associated with the expanded name.

* check for secure TLSA at original name

c. expanded CNAME is insecure (one of the AD bits is 0). Check security status of initial name via an "mx.original.example. IN CNAME ?" query. You only need the validation status (AD bit) not the result.

* If "insecure", done no DANE. * If "secure", look for TLSA RRset with "mx.original.example" as TLSA base domain. * If lookup failure (timeout, servfail whether DNSSEC-related or not), try a different MX host or defer.

-- Viktor.

Gael Roualland

7 Dec 7 Dec

11:39 a.m.

On 6/12/18 18:57, Viktor Dukhovni wrote:

...

a. "mx.step<N>.example" is secure (all the AD bits are 1), and associated secure TLSA RRset found.
   * use "mx.step<N>.example" as TLSA base domain with corresponding
     TLSA records
b. expanded CNAME is secure (all AD bits are 1), and no secure TLSA associated with the expanded name.
  * check for secure TLSA at original name

Yes, this is what we currently do.

...

c. expanded CNAME is insecure (one of the AD bits is 0). Check security status of initial name via an "mx.original.example. IN CNAME ?" query. You only need the validation status (AD bit) not the result.
  * If "insecure", done no DANE.
  * If "secure", look for TLSA RRset with "mx.original.example"
    as TLSA base domain.
  * If lookup failure (timeout, servfail whether DNSSEC-related or not),
    try a different MX host or defer.

That's what I was referring to. So this confirms that validation is enough whether "secure" or "insecure". We'll thus relax our implementation not to enforce a valid CNAME reply but just check validation status on lookup success.

Thanks,

Gaël.

Jan-Pieter Cornet

10 Dec 10 Dec

12:42 p.m.

Um, putting an alias (record with CNAME) in an MX record is still frowned upon by the RFCs (specifically, 2181 and 5321). So do you really want to promote a standard that goes against that, and that can result in very brittle setups?

I would propose to make this check a lot simpeler: always take the domain found in the MX record as the base domain for corresponding TLSA lookups. And only do the TLSA lookups if the domain name of that MX record is DNSSEC protected. That way, if a CNAME is introduced, it doesn't change the TLSA-protected status of the original domain, or break things in case DNSSEC is suddenly added or dropped from a remote zone.

So in your terminology:

a. mx.step<N>.example is secure (all the AD bits are 1): * use the original domain as the base for TLSA lookups

b. some lookups are insecure: check security status of original domain via "mx.original.example CNAME ?" query. * if secure, again use "mx.original.example" as base domain for TLSA lookups * if not secure, no DANE. * on failure (servfail, timeout): - if any previous query for this delivery was DNSSEC signed (AD=1), tempfail delivery to this MX record (take next record if any) - if no DNSSEC found, no DANE, try to deliver as usual (remote DNS server might fail on CNAME queries).

The upshot of all this is that nothing should change for those that aren't doing DNSSEC or TLSA records. You're already introducing a possible CNAME query that isn't used in the current mail delivery mechanism, and there's no telling what that would cause in the real world. It has happened in the past that DNS servers return SERVFAIL on any CNAME query due to bugs.

In your original proposal:

On 6-12-18 18:55, Viktor Dukhovni wrote:

...

a. "mx.step<N>.example" is secure (all the AD bits are 1), and associated secure TLSA RRset found.
   * use "mx.step<N>.example" as TLSA base domain with corresponding
     TLSA records

... you're effectively approving the "MX points to CNAME" case. Which happens, but is not considered best practice, so I wouldn't recommend to make this case the first preference. If any, I'd put it in the standard saying: if all other TLSA record lookups failed, implementation MAY check mx.step<N>.example" as base domain for TLSA lookup.

...

b. expanded CNAME is secure (all AD bits are 1), and no secure TLSA associated with the expanded name.
  * check for secure TLSA at original name

As explained above, I'd recommend to make this the default. The way it's specified here, it may cause a brittle setup that can break easily.

Imagine an organisation that had their DANE TLSA setup right, at one point. Now someone changes the mx.original.example into a CNAME to mx.stepN.example. Practically all mail delivery these days allows that CNAME, and mail delivery will follow. Assume the TLSA records are copied to the mx.stepN.example.

Now the keys are rolled, and the _25._tcp.mx.stepN.example records are updated (assuming automatic rollover). The original TLSA records are forgotten. Nobody checks them anyway.

Then, due to some fluke, DNSSEC fails for stepN.example (maybe the domain is moved?), and suddenly the TLSA records at the original domain get checked again by DANE. Those still contain the (hashes of the) old keys, and mail delivery breaks. Nothing changed at "original.example" to where the mail is addressed, but DNSSEC status at "stepN.example" changed. That's a very hard to debug situation.

...

c. expanded CNAME is insecure (one of the AD bits is 0). Check security status of initial name via an "mx.original.example. IN CNAME ?" query. You only need the validation status (AD bit) not the result.
 * If "insecure", done no DANE.
 * If "secure", look for TLSA RRset with "mx.original.example"
   as TLSA base domain.
 * If lookup failure (timeout, servfail whether DNSSEC-related or not),
   try a different MX host or defer.

... except, as metioned above, that CNAME query is new in the email delivery landscape, so I'd opt on the side of caution (ignore any error) if you haven't seen any DNSSEC signed domain at this point.

-- Jan-Pieter Cornet johnpc@xs4all.nl "Any sufficiently advanced incompetence is indistinguishable from malice." - Grey's Law

2406

Age (days ago)

2410

Last active (days ago)

List overview

Download

5 comments

3 participants

tags (0)

participants (3)

Gael Roualland
Jan-Pieter Cornet
Viktor Dukhovni