ISC DHCP Users

Failback causes lost lease

Classic

List

Threaded

7 messages Options

Sean McMurray

Failback causes lost lease

In testing my dhcp failover, I pulled the ethernet cable on the primary
server.

The secondary server acknowleged renewal requests as expected.

Then I plugged the cable back in. After both the primary and secondary
had moved from communications-interrupted to normal, the secondary logs
multiple dhcp requests from a client whose lease is owned by the primary
server. The primary server does not log any of these but the last
request, reporting that "lease in transition state expired".

Then the secondary server logs a DHCPDISCOVER from that client and
records it load balancing to the primary server.

The primary server also sees the DHCPDISCOVER and offers a new lease
that is not the same number as the previous lease. This despite the old
number not having been reassigned.

The end result is that failback causes my clients to change their ip
address.

Why does this happen and how can I prevent it?

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Gregory Sloop

Re: Failback causes lost lease

Re: Failback causes lost lease

SM> In testing my dhcp failover, I pulled the ethernet cable on the primary
SM> server.

SM> The secondary server acknowleged renewal requests as expected.

SM> Then I plugged the cable back in. After both the primary and secondary
SM> had moved from communications-interrupted to normal, the secondary logs
SM> multiple dhcp requests from a client whose lease is owned by the primary
SM> server. The primary server does not log any of these but the last
SM> request, reporting that "lease in transition state expired".

SM> Then the secondary server logs a DHCPDISCOVER from that client and
SM> records it load balancing to the primary server.

SM> The primary server also sees the DHCPDISCOVER and offers a new lease
SM> that is not the same number as the previous lease. This despite the old
SM> number not having been reassigned.

SM> The end result is that failback causes my clients to change their ip
SM> address.

SM> Why does this happen and how can I prevent it?

SM> _______________________________________________
SM> dhcp-users mailing list
[hidden email]
SM> https://lists.isc.org/mailman/listinfo/dhcp-users

1) Logs would be good.
2) I think something with your config is broken. If I were to [wildly] guess, it's a physical/network layer issue.
3) I have a very small setup with 100+ clients, and it certainly doesn't work this way for me.

There are some issues when a single server is up and in "communications interrupted" mode and you've got a tight IP pool and the leases were fairly evenly balanced against both servers. [I've posted, in the past, about an event that was kinda ugly for this client while running a 4.1 version [IIRC]. *However* those problems should be vastly less of a problem with 4.2+ - and you're not having an issue with communications interrupted anyway.

IIRC, you had a problem where the two servers wouldn't recover from CI to Normal like they should too. How did you solve that problem? Is it possible this is related? [I'm too lazy to go check old threads, but I _think_ it was you...my apologies if I'm wrong.]

-Greg

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Sean McMurray

Re: Failback causes lost lease

Gregory,

Thanks for your reply.

On 06/25/2015 12:47 PM, Gregory Sloop wrote:

Re: Failback causes lost lease SM> In testing my dhcp failover, I pulled the ethernet cable on the primary
SM> server.

SM> The secondary server acknowleged renewal requests as expected.

SM> Then I plugged the cable back in. After both the primary and secondary
SM> had moved from communications-interrupted to normal, the secondary logs
SM> multiple dhcp requests from a client whose lease is owned by the primary
SM> server. The primary server does not log any of these but the last
SM> request, reporting that "lease in transition state expired".

SM> Then the secondary server logs a DHCPDISCOVER from that client and
SM> records it load balancing to the primary server.

SM> The primary server also sees the DHCPDISCOVER and offers a new lease
SM> that is not the same number as the previous lease. This despite the old
SM> number not having been reassigned.

SM> The end result is that failback causes my clients to change their ip
SM> address.

SM> Why does this happen and how can I prevent it?

SM> _______________________________________________
SM> dhcp-users mailing list
[hidden email]
SM> https://lists.isc.org/mailman/listinfo/dhcp-users

1) Logs would be good.
2) I think something with your config is broken. If I were to [wildly] guess, it's a physical/network layer issue.
3) I have a very small setup with 100+ clients, and it certainly doesn't work this way for me.

There are some issues when a single server is up and in "communications interrupted" mode and you've got a tight IP pool and the leases were fairly evenly balanced against both servers. [I've posted, in the past, about an event that was kinda ugly for this client while running a 4.1 version [IIRC]. *However* those problems should be vastly less of a problem with 4.2+ - and you're not having an issue with communications interrupted anyway.

I am having an issue with communications interrupted. When I pull the ethernet cable, both the primary and secondary servers move from normal to communications-interrupted.

As far as "tight IP pool" goes, it's the only ip in use in a /16 pool.

IIRC, you had a problem where the two servers wouldn't recover from CI to Normal like they should too. How did you solve that problem? Is it possible this is related? [I'm too lazy to go check old threads, but I _think_ it was you...my apologies if I'm wrong.]

That was a stupid networking mistake where the failover traffic wasn't making it between peers. That problem was solved when I quit being so stupid. In this case, the peers are communicating failover data correctly when not in "communications-interrupted" stage.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Gregory Sloop

Re: Failback causes lost lease

Gregory,

Thanks for your reply.

On 06/25/2015 12:47 PM, Gregory Sloop wrote:
Re: Failback causes lost lease SM> In testing my dhcp failover, I pulled the ethernet cable on the primary
SM> server.

SM> The secondary server acknowleged renewal requests as expected.

SM> Then I plugged the cable back in. After both the primary and secondary
SM> had moved from communications-interrupted to normal, the secondary logs
SM> multiple dhcp requests from a client whose lease is owned by the primary
SM> server. The primary server does not log any of these but the last
SM> request, reporting that "lease in transition state expired".

SM> Then the secondary server logs a DHCPDISCOVER from that client and
SM> records it load balancing to the primary server.

SM> The primary server also sees the DHCPDISCOVER and offers a new lease
SM> that is not the same number as the previous lease. This despite the old
SM> number not having been reassigned.

SM> The end result is that failback causes my clients to change their ip
SM> address.

SM> Why does this happen and how can I prevent it?

SM> _______________________________________________
SM> dhcp-users mailing list
[hidden email]
SM> https://lists.isc.org/mailman/listinfo/dhcp-users

1) Logs would be good.
2) I think something with your config is broken. If I were to [wildly] guess, it's a physical/network layer issue.
3) I have a very small setup with 100+ clients, and it certainly doesn't work this way for me.

There are some issues when a single server is up and in "communications interrupted" mode and you've got a tight IP pool and the leases were fairly evenly balanced against both servers. [I've posted, in the past, about an event that was kinda ugly for this client while running a 4.1 version [IIRC]. *However* those problems should be vastly less of a problem with 4.2+ - and you're not having an issue with communications interrupted anyway.
I am having an issue with communications interrupted. When I pull the ethernet cable, both the primary and secondary servers move from normal to communications-interrupted.

But in your initial post on this thread you said:

> "After both the primary and secondary
> had moved from communications-interrupted to normal"

It can't be both ways. Either they are CI, or in a Normal state. It can't be both.
Like I said, logs would probably be helpful. [Unless someone else has a lightening bolt moment and can tell you exactly what's wrong without them - but I doubt that.

As far as "tight IP pool" goes, it's the only ip in use in a /16 pool.

Yes, I expected as much. Further the symptoms you're having don't match, at all, what I'm describing. [No free leases is the result of my situation.]

IIRC, you had a problem where the two servers wouldn't recover from CI to Normal like they should too. How did you solve that problem? Is it possible this is related? [I'm too lazy to go check old threads, but I _think_ it was you...my apologies if I'm wrong.]
That was a stupid networking mistake where the failover traffic wasn't making it between peers. That problem was solved when I quit being so stupid. In this case, the peers are communicating failover data correctly when not in "communications-interrupted" stage.

So, I'd ask for logs that demonstrate that:
1) What real state both fail-over peers are actually in. [CI/Normal/recovery something else]
2) Logs and/or packet caps that show the [primary] peer who initially leased the IP is actually getting the renew requests.
3) Is this a really simple config setup? If not - and it appears to be a test environment - strip the config down to bare minimum. Then build up. It's quite easy to make a mistake in a config that borks everything up, whilst trying to do everything in one go. [Though I can't envision a mistake in a config that would produce these results/symptoms...but I'm far from a total guru on dhcpd.]

My gut feeling is unchanged, in that there's some physical/data/network/transport layer issue that's preventing all the relevant traffic getting from clients to both peers, and perhaps even between the peers themselves.

-Greg
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Simon Hobson

Re: Failback causes lost lease

In reply to this post by Sean McMurray

Sean McMurray <[hidden email]> wrote:

> I am having an issue with communications interrupted. When I pull the ethernet cable, both the primary and secondary servers move from normal to communications-interrupted.

Isn't that what should happen ? A and B can't talk, so both A and B should consider communications interrupted.
I'd be seriously worried if they went to any other combination of states.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Sean McMurray

Re: Failback causes lost lease

In reply to this post by Gregory Sloop

On 06/25/2015 01:39 PM, Gregory Sloop wrote:

Re: Failback causes lost lease

Gregory,

Thanks for your reply.

On 06/25/2015 12:47 PM, Gregory Sloop wrote:
Re: Failback causes lost lease SM> In testing my dhcp failover, I pulled the ethernet cable on the primary
SM> server.

SM> The secondary server acknowleged renewal requests as expected.

SM> Then I plugged the cable back in. After both the primary and secondary
SM> had moved from communications-interrupted to normal, the secondary logs
SM> multiple dhcp requests from a client whose lease is owned by the primary
SM> server. The primary server does not log any of these but the last
SM> request, reporting that "lease in transition state expired".

SM> Then the secondary server logs a DHCPDISCOVER from that client and
SM> records it load balancing to the primary server.

SM> The primary server also sees the DHCPDISCOVER and offers a new lease
SM> that is not the same number as the previous lease. This despite the old
SM> number not having been reassigned.

SM> The end result is that failback causes my clients to change their ip
SM> address.

SM> Why does this happen and how can I prevent it?

SM> _______________________________________________
SM> dhcp-users mailing list
[hidden email]
SM> https://lists.isc.org/mailman/listinfo/dhcp-users

1) Logs would be good.

Primary Server:
=========

Jun 25 11:33:00 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:33:00 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:36:23 dhcpd: timeout waiting for failover peer partner
Jun 25 11:36:23 dhcpd: peer partner: disconnected
Jun 25 11:36:23 dhcpd: failover peer partner: I move from normal to communications-interrupted
Jun 25 11:42:38 dhcpd: failover peer partner: peer moves from normal to communications-interrupted
Jun 25 11:42:38 dhcpd: failover peer partner: I move from communications-interrupted to normal
Jun 25 11:42:38 dhcpd: balancing pool 7f4c9a14cb80 10.112.0.0/13 total 1024 free 511 backup 512 lts 0 max-own (+/-)102
Jun 25 11:42:38 dhcpd: balanced pool 7f4c9a14cb80 10.112.0.0/13 total 1024 free 511 backup 512 lts 0 max-misbal 153
Jun 25 11:42:38 dhcpd: balancing pool 7f4c9a143260 10.104.0.0/13 total 1024 free 512 backup 512 lts 0 max-own (+/-)102
Jun 25 11:42:38 dhcpd: balanced pool 7f4c9a143260 10.104.0.0/13 total 1024 free 512 backup 512 lts 0 max-misbal 154
Jun 25 11:42:38 dhcpd: balancing pool 7f4c9a13a930 10.96.0.0/13 total 1024 free 512 backup 512 lts 0 max-own (+/-)102
Jun 25 11:42:38 dhcpd: balanced pool 7f4c9a13a930 10.96.0.0/13 total 1024 free 512 backup 512 lts 0 max-misbal 154
Jun 25 11:42:38 dhcpd: failover peer partner: peer moves from communications-interrupted to normal
Jun 25 11:43:38 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112: lease in transition state expired
Jun 25 11:43:39 dhcpd: DHCPDISCOVER from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:43:40 dhcpd: DHCPOFFER on 10.112.2.201 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:43:40 dhcpd: uid lease 10.112.1.1 for client 00:19:d1:4d:03:f8 is duplicate on 10.112.0.0/13
Jun 25 11:43:40 dhcpd: DHCPREQUEST for 10.112.2.201 (10.112.0.104) from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:43:40 dhcpd: DHCPACK on 10.112.2.201 to 00:19:d1:4d:03:f8 via em2.112

Secondary Server:
===========
Jun 25 11:36:21 dhcpd: timeout waiting for failover peer partner
Jun 25 11:36:21 dhcpd: peer partner: disconnected
Jun 25 11:36:21 dhcpd: failover peer partner: I move from normal to communications-interrupted
Jun 25 11:37:34 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:37:34 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:39:10 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:39:10 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:40:11 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:40:11 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:40:44 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:40:44 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:41:16 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:41:16 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:41:47 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:41:47 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:42:12 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:42:12 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:42:37 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:42:37 dhcpd: DHCPACK on 10.112.1.1 to 00:19:d1:4d:03:f8 via em2.112
Jun 25 11:42:38 dhcpd: failover peer partner: peer moves from normal to communications-interrupted
Jun 25 11:42:38 dhcpd: failover peer partner: I move from communications-interrupted to normal
Jun 25 11:42:38 dhcpd: balancing pool 7f325dbfdb30 10.112.0.0/13 total 1024 free 511 backup 512 lts 0 max-own (+/-)102
Jun 25 11:42:38 dhcpd: balanced pool 7f325dbfdb30 10.112.0.0/13 total 1024 free 511 backup 512 lts 0 max-misbal 153
Jun 25 11:42:38 dhcpd: balancing pool 7f325dbf4210 10.104.0.0/13 total 1024 free 512 backup 512 lts 0 max-own (+/-)102
Jun 25 11:42:38 dhcpd: balanced pool 7f325dbf4210 10.104.0.0/13 total 1024 free 512 backup 512 lts 0 max-misbal 154
Jun 25 11:42:38 dhcpd: balancing pool 7f325dbeb8e0 10.96.0.0/13 total 1024 free 512 backup 512 lts 0 max-own (+/-)102
Jun 25 11:42:38 dhcpd: balanced pool 7f325dbeb8e0 10.96.0.0/13 total 1024 free 512 backup 512 lts 0 max-misbal 154
Jun 25 11:42:38 dhcpd: failover peer partner: peer moves from communications-interrupted to normal
Jun 25 11:43:06 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112: lease owned by peer
Jun 25 11:43:13 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112: lease owned by peer
Jun 25 11:43:24 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112: lease owned by peer
Jun 25 11:43:38 dhcpd: DHCPREQUEST for 10.112.1.1 from 00:19:d1:4d:03:f8 via em2.112: lease owned by peer
Jun 25 11:43:39 dhcpd: DHCPDISCOVER from 00:19:d1:4d:03:f8 via em2.112: load balance to peer partner
Jun 25 11:43:40 dhcpd: DHCPREQUEST for 10.112.2.201 (10.112.0.104) from 00:19:d1:4d:03:f8 via em2.112: lease owned by peer

2) I think something with your config is broken. If I were to [wildly] guess, it's a physical/network layer issue.
3) I have a very small setup with 100+ clients, and it certainly doesn't work this way for me.

There are some issues when a single server is up and in "communications interrupted" mode and you've got a tight IP pool and the leases were fairly evenly balanced against both servers. [I've posted, in the past, about an event that was kinda ugly for this client while running a 4.1 version [IIRC]. *However* those problems should be vastly less of a problem with 4.2+ - and you're not having an issue with communications interrupted anyway.

I am having an issue with communications interrupted. When I pull the ethernet cable, both the primary and secondary servers move from normal to communications-interrupted.

But in your initial post on this thread you said:

> "After both the primary and secondary
> had moved from communications-interrupted to normal"

It can't be both ways. Either they are CI, or in a Normal state. It can't be both.
Like I said, logs would probably be helpful. [Unless someone else has a lightening bolt moment and can tell you exactly what's wrong without them - but I doubt that.

I hope the logs above help.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Gregory Sloop

Re: Failback causes lost lease

On 06/25/2015 01:39 PM, Gregory Sloop wrote:
Re: Failback causes lost lease

2) I think something with your config is broken. If I were to [wildly] guess, it's a physical/network layer issue.
3) I have a very small setup with 100+ clients, and it certainly doesn't work this way for me.

There are some issues when a single server is up and in "communications interrupted" mode and you've got a tight IP pool and the leases were fairly evenly balanced against both servers. [I've posted, in the past, about an event that was kinda ugly for this client while running a 4.1 version [IIRC]. *However* those problems should be vastly less of a problem with 4.2+ - and you're not having an issue with communications interrupted anyway.

I am having an issue with communications interrupted. When I pull the ethernet cable, both the primary and secondary servers move from normal to communications-interrupted.

---
You didn't answer the question: "It can't be both ways. Either they are CI, or in a Normal state."

But the logs you posted seem to indicate that both servers believe they're in "normal" mode when the DISCOVER requests come in. Yet they only show the primary getting a single DISCOVER [the last one] while the secondary gets (4) over more than 30s.

Does a packet capture show the primary server actually receiving all the DHCPREQUEST packets that are going to the secondary?
It would be odd [typically a sign of lost packets] that the secondary would see (4) DHCPREQUEST's but the primary would only see one. It would be very helpful to know if those packets are actually arriving at the interface, but being ignored by dhcpd, or if they're not ever getting there at all. If it's the latter, it's probably not a dhcpd problem.

Searches on "lease in transition state expired" appear to find incidences of that output, as a result of bugs, but I could only find those kinds of problems recounted in *really* old, non-supported versions. [i.e. 3.0.x]

Are you running a currently supported version, and what version is it?

-Greg

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users