Inconsistent renews from F/O peers

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent renews from F/O peers

Mark Sandrock
Hello,

      it sometimes happens that shortly
after obtaining an initial lease of MCLT,
(3600 seconds), some Windows clients
send a broadcast renew request that
is responded to differently by the two
failover peers.

This pre-mature renew request comes
say 3 seconds after the initial lease grant,
and the granting (primary) peer acks
the renew request with a 3597 value,
whereas the secondary peer extends
the lease to the full 24 hours by acking
the renew request with an 86,400 value.

While Windows recognizes that it now
has a 24-hour lease, the client's
switch port does not recognize
the lease extension, and the connection
is dropped an hour after the laptop
was first docked.

Although this seems incorrect behavior
on the switch'es part, the pathological
behavior of Windows renewing a lease
only 3 seconds into it, also seems wrong.

Have others seen this problematic
behavior, I wonder?

Thank you.
Mark
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent renews from F/O peers

Simon Hobson
Mark Sandrock <[hidden email]> wrote:

>      it sometimes happens that shortly
> after obtaining an initial lease of MCLT,
> (3600 seconds), some Windows clients
> send a broadcast renew request that
> is responded to differently by the two
> failover peers.

> Although this seems incorrect behavior
> on the switch'es part, the pathological
> behavior of Windows renewing a lease
> only 3 seconds into it, also seems wrong.

When this happens, do you have any indication how long it took to get the reply back to the client ?
I'm wondering if a combination of factors (delay by the server, delay by the switch (I assume that the DHCP packets are being handled by the management CPU)) are leading to a delay at the client which is long enough for it to retry - hence the second renew after 3 seconds.

You may need to leave a packet capture running until you capture an event. IIRC there is a field (who's name escapes me at the moment) in the packet which indicates how long the client has been trying - if the first packet has 0 and the second has 3, then this would seem to support the hypothesis.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent renews from F/O peers

Shawn Routhier

> On May 11, 2015, at 9:02 AM, Simon Hobson <[hidden email]> wrote:
>
> Mark Sandrock <[hidden email]> wrote:
>
>>     it sometimes happens that shortly
>> after obtaining an initial lease of MCLT,
>> (3600 seconds), some Windows clients
>> send a broadcast renew request that
>> is responded to differently by the two
>> failover peers.
>
>> Although this seems incorrect behavior
>> on the switch'es part, the pathological
>> behavior of Windows renewing a lease
>> only 3 seconds into it, also seems wrong.
>
> When this happens, do you have any indication how long it took to get the reply back to the client ?
> I'm wondering if a combination of factors (delay by the server, delay by the switch (I assume that the DHCP packets are being handled by the management CPU)) are leading to a delay at the client which is long enough for it to retry - hence the second renew after 3 seconds.
>
> You may need to leave a packet capture running until you capture an event. IIRC there is a field (who's name escapes me at the moment) in the packet which indicates how long the client has been trying - if the first packet has 0 and the second has 3, then this would seem to support the hypothesis.
>

It is the “secs” field.

**

Another item to check is if dhcp-cache-threshold is enabled on both servers.
This is a feature to limit the number of times the servers touch the lease file.
If a client requests a renew early instead of handing out a full lease the server
simply hands out the previous lease again, correcting the lease time it sends
as necessary.  So when renewing a lease of 3600 seconds 3 seconds after
it was handed out the server would hand out a time period of 3597 seconds.



_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent renews from F/O peers

Mark Sandrock
In reply to this post by Simon Hobson

> On May 11, 2015, at 11:02 AM, Simon Hobson <[hidden email]> wrote:
>
> Mark Sandrock <[hidden email]> wrote:
>
>>     it sometimes happens that shortly
>> after obtaining an initial lease of MCLT,
>> (3600 seconds), some Windows clients
>> send a broadcast renew request that
>> is responded to differently by the two
>> failover peers.
>
>> Although this seems incorrect behavior
>> on the switch'es part, the pathological
>> behavior of Windows renewing a lease
>> only 3 seconds into it, also seems wrong.
>
> When this happens, do you have any indication how long it took to get the reply back to the client ?
> I'm wondering if a combination of factors (delay by the server, delay by the switch (I assume that the DHCP packets are being handled by the management CPU)) are leading to a delay at the client which is long enough for it to retry - hence the second renew after 3 seconds.
>
> You may need to leave a packet capture running until you capture an event. IIRC there is a field (who's name escapes me at the moment) in the packet which indicates how long the client has been trying - if the first packet has 0 and the second has 3, then this would seem to support the hypothesis.

They got a packet capture today, which
is being sent to Cisco and Infoblox,
so I'll take a look at it.

But reportedly Cisco acknowledged
that the premature renew 3 seconds
after the initial lease is coming from
their own NAM client.

Thank you, Simon.

Mark
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent renews from F/O peers

Mark Sandrock
In reply to this post by Shawn Routhier
Thanks for that info, Shawn.

Mark

Sent from my iPhone

> On May 11, 2015, at 11:20 AM, Shawn Routhier <[hidden email]> wrote:
>
>
>> On May 11, 2015, at 9:02 AM, Simon Hobson <[hidden email]> wrote:
>>
>> Mark Sandrock <[hidden email]> wrote:
>>
>>>    it sometimes happens that shortly
>>> after obtaining an initial lease of MCLT,
>>> (3600 seconds), some Windows clients
>>> send a broadcast renew request that
>>> is responded to differently by the two
>>> failover peers.
>>
>>> Although this seems incorrect behavior
>>> on the switch'es part, the pathological
>>> behavior of Windows renewing a lease
>>> only 3 seconds into it, also seems wrong.
>>
>> When this happens, do you have any indication how long it took to get the reply back to the client ?
>> I'm wondering if a combination of factors (delay by the server, delay by the switch (I assume that the DHCP packets are being handled by the management CPU)) are leading to a delay at the client which is long enough for it to retry - hence the second renew after 3 seconds.
>>
>> You may need to leave a packet capture running until you capture an event. IIRC there is a field (who's name escapes me at the moment) in the packet which indicates how long the client has been trying - if the first packet has 0 and the second has 3, then this would seem to support the hypothesis.
>
> It is the “secs” field.
>
> **
>
> Another item to check is if dhcp-cache-threshold is enabled on both servers.
> This is a feature to limit the number of times the servers touch the lease file.
> If a client requests a renew early instead of handing out a full lease the server
> simply hands out the previous lease again, correcting the lease time it sends
> as necessary.  So when renewing a lease of 3600 seconds 3 seconds after
> it was handed out the server would hand out a time period of 3597 seconds.
>
>
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent renews from F/O peers

Mark Sandrock
In reply to this post by Shawn Routhier

> On May 11, 2015, at 11:20 AM, Shawn Routhier <[hidden email]> wrote:
...
>
> Another item to check is if dhcp-cache-threshold is enabled on both servers.
> This is a feature to limit the number of times the servers touch the lease file.
> If a client requests a renew early instead of handing out a full lease the server
> simply hands out the previous lease again, correcting the lease time it sends
> as necessary.  ...

So, dhcpd.conf(5) gives a default of
25% for 'dhcp-cache-threshold',
which contradicts what I am seeing.

The issuing F/O (primary) peer indeed
ACKs a 3597 in response to the renew,
but at the same time, (as the 3-second
renew is broadcast), the secondary
F/O peer ACKs an 86,400 value,
i.e., extends the MCLT lease time.

This seems like incorrect behavior
on the part of the F/O peer.

Thank you.
Mark
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent renews from F/O peers

Bob Harold

On Tue, May 12, 2015 at 9:25 AM, Mark Sandrock <[hidden email]> wrote:

> On May 11, 2015, at 11:20 AM, Shawn Routhier <[hidden email]> wrote:
...
>
> Another item to check is if dhcp-cache-threshold is enabled on both servers.
> This is a feature to limit the number of times the servers touch the lease file.
> If a client requests a renew early instead of handing out a full lease the server
> simply hands out the previous lease again, correcting the lease time it sends
> as necessary.  ...

So, dhcpd.conf(5) gives a default of
25% for 'dhcp-cache-threshold',
which contradicts what I am seeing.

The issuing F/O (primary) peer indeed
ACKs a 3597 in response to the renew,
but at the same time, (as the 3-second
renew is broadcast), the secondary
F/O peer ACKs an 86,400 value,
i.e., extends the MCLT lease time.

This seems like incorrect behavior
on the part of the F/O peer.

Thank you.
Mark

As I understand it...
As soon as both servers have a copy of the lease information, then they should return the full lease time.  The MCLT time is only returned while the lease information has not been synchronized.  So I think the peer is giving the correct response.  The primary might not have gotten confirmation back from the peer, so it might also be giving the 'correct' response from what it knows.  But you would need to trace the traffic or watch debug logs to know if the primary had gotten a response in those three seconds, in which case it gave the wrong response.

-- 
Bob Harold


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent renews from F/O peers

Mark Sandrock

On May 12, 2015, at 10:09 AM, Bob Harold <[hidden email]> wrote:


On Tue, May 12, 2015 at 9:25 AM, Mark Sandrock <[hidden email]> wrote:

> On May 11, 2015, at 11:20 AM, Shawn Routhier <[hidden email]> wrote:
...
>
> Another item to check is if dhcp-cache-threshold is enabled on both servers.
> This is a feature to limit the number of times the servers touch the lease file.
> If a client requests a renew early instead of handing out a full lease the server
> simply hands out the previous lease again, correcting the lease time it sends
> as necessary.  ...

So, dhcpd.conf(5) gives a default of
25% for 'dhcp-cache-threshold',
which contradicts what I am seeing.

The issuing F/O (primary) peer indeed
ACKs a 3597 in response to the renew,
but at the same time, (as the 3-second
renew is broadcast), the secondary
F/O peer ACKs an 86,400 value,
i.e., extends the MCLT lease time.

This seems like incorrect behavior
on the part of the F/O peer.

Thank you.
Mark

As I understand it...
As soon as both servers have a copy of the lease information, then they should return the full lease time.  The MCLT time is only returned while the lease information has not been synchronized.  So I think the peer is giving the correct response.  The primary might not have gotten confirmation back from the peer, so it might also be giving the 'correct' response from what it knows.  But you would need to trace the traffic or watch debug logs to know if the primary had gotten a response in those three seconds, in which case it gave the wrong response.

It seems possible that the issuing DHCP
server does not yet know if its F/O peer
knows about the lease... But your
suggestion requires that the stated
default of 25% for 'dhcp-cache-threshold'
does not apply if that parameter is not
present in the 'dhcpd.conf' file -- which
it is not on the F/O peers concerned.

I take it otherwise, that the default applies
in all cases, unless it is over-ridden.

But I don't know that for a fact.

Thank you.
Mark


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent renews from F/O peers

Mark Sandrock
In reply to this post by Shawn Routhier
Our ISC DHCP version is 4.2.4-P2.

It may be that 'dhcp-cache-threshold'
is not yet supported in that version,
which would help account for the
different behavior of the FO peers.

Thank you.
Mark

Sent from my iPhone

> On May 11, 2015, at 11:20 AM, Shawn Routhier <[hidden email]> wrote:
>
>
>> On May 11, 2015, at 9:02 AM, Simon Hobson <[hidden email]> wrote:
>>
>> Mark Sandrock <[hidden email]> wrote:
>>
>>>    it sometimes happens that shortly
>>> after obtaining an initial lease of MCLT,
>>> (3600 seconds), some Windows clients
>>> send a broadcast renew request that
>>> is responded to differently by the two
>>> failover peers.
>>
>>> Although this seems incorrect behavior
>>> on the switch'es part, the pathological
>>> behavior of Windows renewing a lease
>>> only 3 seconds into it, also seems wrong.
>>
>> When this happens, do you have any indication how long it took to get the reply back to the client ?
>> I'm wondering if a combination of factors (delay by the server, delay by the switch (I assume that the DHCP packets are being handled by the management CPU)) are leading to a delay at the client which is long enough for it to retry - hence the second renew after 3 seconds.
>>
>> You may need to leave a packet capture running until you capture an event. IIRC there is a field (who's name escapes me at the moment) in the packet which indicates how long the client has been trying - if the first packet has 0 and the second has 3, then this would seem to support the hypothesis.
>
> It is the “secs” field.
>
> **
>
> Another item to check is if dhcp-cache-threshold is enabled on both servers.
> This is a feature to limit the number of times the servers touch the lease file.
> If a client requests a renew early instead of handing out a full lease the server
> simply hands out the previous lease again, correcting the lease time it sends
> as necessary.  So when renewing a lease of 3600 seconds 3 seconds after
> it was handed out the server would hand out a time period of 3597 seconds.
>
>
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users