Procedure for failover partner replacement.

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Procedure for failover partner replacement.

Bob McDonald
I've got a failing dhcp failover partner. (the partner is a HA cluster and both nodes are being RMAed. Long story)

My question is this. Is the following procedure ok for the replacement? (I've already confirmed the new version of DHCP is exactly the same as the old one)

1) before shutting down the failing partner cluster, stop DHCP and save the dhcpd.leases file and the DHCPD.conf file.
2) shut down the failing partner cluster completely.
3) bring up the replacement partner cluster while leaving DHCPD turmed off.
4) restore the DHCPD.leases and DHCPD.conf files.
5) restart DHPCD on the replacement partner cluster.

My contention is that this will result in the failover pair going into partner-interrupted state for about 5 or 10 minutes while the HA cluster is replaced and then should restart communications as if nothing happened when the replacement partner comes live. Thoughts?

Regards,

Bob

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Procedure for failover partner replacement.

perl-list
If you are using the same version of DHCP, then it should work fine.

I have done exactly that before and it has worked for me.  The DHCP servers returned to both partners normal mode after a very brief (not responding startup) state.  The DHCP servers did not appear to in any way realize they were on a different cluster.  



From: "Bob McDonald" <[hidden email]>
To: [hidden email]
Sent: Thursday, May 11, 2017 9:35:42 AM
Subject: Procedure for failover partner replacement.
I've got a failing dhcp failover partner. (the partner is a HA cluster and both nodes are being RMAed. Long story)

My question is this. Is the following procedure ok for the replacement? (I've already confirmed the new version of DHCP is exactly the same as the old one)

1) before shutting down the failing partner cluster, stop DHCP and save the dhcpd.leases file and the DHCPD.conf file.
2) shut down the failing partner cluster completely.
3) bring up the replacement partner cluster while leaving DHCPD turmed off.
4) restore the DHCPD.leases and DHCPD.conf files.
5) restart DHPCD on the replacement partner cluster.

My contention is that this will result in the failover pair going into partner-interrupted state for about 5 or 10 minutes while the HA cluster is replaced and then should restart communications as if nothing happened when the replacement partner comes live. Thoughts?

Regards,

Bob

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Procedure for failover partner replacement.

Chris Buxton
In reply to this post by Bob McDonald
On May 11, 2017, at 6:35 AM, Bob McDonald <[hidden email]> wrote:

>
> I've got a failing dhcp failover partner. (the partner is a HA cluster and both nodes are being RMAed. Long story)
>
> My question is this. Is the following procedure ok for the replacement? (I've already confirmed the new version of DHCP is exactly the same as the old one)
>
> 1) before shutting down the failing partner cluster, stop DHCP and save the dhcpd.leases file and the DHCPD.conf file.
> 2) shut down the failing partner cluster completely.
> 3) bring up the replacement partner cluster while leaving DHCPD turmed off.
> 4) restore the DHCPD.leases and DHCPD.conf files.
> 5) restart DHPCD on the replacement partner cluster.
>
> My contention is that this will result in the failover pair going into partner-interrupted state for about 5 or 10 minutes while the HA cluster is replaced and then should restart communications as if nothing happened when the replacement partner comes live. Thoughts?

Here is what I would do:

1. On both failover peers (both clusters), set 'max-unacked-updates 1000;'.
2. Save the old dhcpd.conf and any included files from the failing peer cluster. Do not save the leases file.
3. Shut down the failing cluster completely.
4. Put the remaining failover peer into partner-down state.
5. Bring up the replacement cluster with dhcpd not running.
6. Restore the dhcpd.conf (including the 'max-unacked-updates' statement.
7. Start dhcpd on the replacement cluster.

At step 3, the remaining peer will move to communications-interrupted. But step 4 will change this, so that you don't have to worry about pool exhaustion during steps 5 and 6. At step 7, the new peer will move to recover state, sync with the master, and then move to normal state. At that point, the other peer will automatically move from partner-down to normal state.

Regards,
Chris
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Procedure for failover partner replacement.

perl-list
That is an interesting idea, Chris, but in my experience both peers will enter recover mode at step 7 and won't answer dhcp requests until the recover-wait (MCLT) period expires or you manually intervene...  as always YMMV



From: "Chris Buxton" <[hidden email]>
To: "Users of ISC DHCP" <[hidden email]>
Sent: Thursday, May 11, 2017 10:38:48 AM
Subject: Re: Procedure for failover partner replacement.
On May 11, 2017, at 6:35 AM, Bob McDonald <[hidden email]> wrote:

>
> I've got a failing dhcp failover partner. (the partner is a HA cluster and both nodes are being RMAed. Long story)
>
> My question is this. Is the following procedure ok for the replacement? (I've already confirmed the new version of DHCP is exactly the same as the old one)
>
> 1) before shutting down the failing partner cluster, stop DHCP and save the dhcpd.leases file and the DHCPD.conf file.
> 2) shut down the failing partner cluster completely.
> 3) bring up the replacement partner cluster while leaving DHCPD turmed off.
> 4) restore the DHCPD.leases and DHCPD.conf files.
> 5) restart DHPCD on the replacement partner cluster.
>
> My contention is that this will result in the failover pair going into partner-interrupted state for about 5 or 10 minutes while the HA cluster is replaced and then should restart communications as if nothing happened when the replacement partner comes live. Thoughts?

Here is what I would do:

1. On both failover peers (both clusters), set 'max-unacked-updates 1000;'.
2. Save the old dhcpd.conf and any included files from the failing peer cluster. Do not save the leases file.
3. Shut down the failing cluster completely.
4. Put the remaining failover peer into partner-down state.
5. Bring up the replacement cluster with dhcpd not running.
6. Restore the dhcpd.conf (including the 'max-unacked-updates' statement.
7. Start dhcpd on the replacement cluster.

At step 3, the remaining peer will move to communications-interrupted. But step 4 will change this, so that you don't have to worry about pool exhaustion during steps 5 and 6. At step 7, the new peer will move to recover state, sync with the master, and then move to normal state. At that point, the other peer will automatically move from partner-down to normal state.

Regards,
Chris
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Procedure for failover partner replacement.

Chris Buxton
As long as you've put the remaining server into partner-down state, it will remain there until its peer has finished recovery.

Regards,
Chris

On May 11, 2017, at 8:17 AM, perl-list <[hidden email]> wrote:

That is an interesting idea, Chris, but in my experience both peers will enter recover mode at step 7 and won't answer dhcp requests until the recover-wait (MCLT) period expires or you manually intervene...  as always YMMV



From: "Chris Buxton" <[hidden email]>
To: "Users of ISC DHCP" <[hidden email]>
Sent: Thursday, May 11, 2017 10:38:48 AM
Subject: Re: Procedure for failover partner replacement.
On May 11, 2017, at 6:35 AM, Bob McDonald <[hidden email]> wrote:

>
> I've got a failing dhcp failover partner. (the partner is a HA cluster and both nodes are being RMAed. Long story)
>
> My question is this. Is the following procedure ok for the replacement? (I've already confirmed the new version of DHCP is exactly the same as the old one)
>
> 1) before shutting down the failing partner cluster, stop DHCP and save the dhcpd.leases file and the DHCPD.conf file.
> 2) shut down the failing partner cluster completely.
> 3) bring up the replacement partner cluster while leaving DHCPD turmed off.
> 4) restore the DHCPD.leases and DHCPD.conf files.
> 5) restart DHPCD on the replacement partner cluster.
>
> My contention is that this will result in the failover pair going into partner-interrupted state for about 5 or 10 minutes while the HA cluster is replaced and then should restart communications as if nothing happened when the replacement partner comes live. Thoughts?

Here is what I would do:

1. On both failover peers (both clusters), set 'max-unacked-updates 1000;'.
2. Save the old dhcpd.conf and any included files from the failing peer cluster. Do not save the leases file.
3. Shut down the failing cluster completely.
4. Put the remaining failover peer into partner-down state.
5. Bring up the replacement cluster with dhcpd not running.
6. Restore the dhcpd.conf (including the 'max-unacked-updates' statement.
7. Start dhcpd on the replacement cluster.

At step 3, the remaining peer will move to communications-interrupted. But step 4 will change this, so that you don't have to worry about pool exhaustion during steps 5 and 6. At step 7, the new peer will move to recover state, sync with the master, and then move to normal state. At that point, the other peer will automatically move from partner-down to normal state.

Regards,
Chris
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Procedure for failover partner replacement.

Christopher Barry
In reply to this post by Bob McDonald
On Thu, 11 May 2017 08:35:42 -0500
Bob McDonald <[hidden email]> wrote:

>I've got a failing dhcp failover partner. (the partner is a HA cluster
>and both nodes are being RMAed. Long story)
>
>My question is this. Is the following procedure ok for the replacement?
>(I've already confirmed the new version of DHCP is exactly the same as
>the old one)
>
>1) before shutting down the failing partner cluster, stop DHCP and
>save the dhcpd.leases file and the DHCPD.conf file.
>2) shut down the failing partner cluster completely.
>3) bring up the replacement partner cluster while leaving DHCPD turmed
>off. 4) restore the DHCPD.leases and DHCPD.conf files.
>5) restart DHPCD on the replacement partner cluster.
>
>My contention is that this will result in the failover pair going into
>partner-interrupted state for about 5 or 10 minutes while the HA
>cluster is replaced and then should restart communications as if
>nothing happened when the replacement partner comes live. Thoughts?
>
>Regards,
>
>Bob

What is you DHCP environment like? Short leases and very busy, or
enterprise lease times, separate wireless dhcp? You're probably not an
ISP I'm guessing. You have a cluster for failover, not performance.

I would double the lease time a day or so before, to help deal with
any contingencies that are extremely unlikely to occur. Then stand up
the new cluster or just a spare box while the cluster is getting
replaced, rsync the leases and conf over to the temp or new cluster
primary on a chron for a while without dhcp running there, then, when
the network is quiet, ideally a Friday night, shut down the old
cluster's dhcp and start it on the new one. There would only be seconds
of interruption max if any. dhcp is broadcast and the client really
doesn't care where it's address comes from. Don't make it more
stressful than it needs to be.


--
Regards,
Christopher
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Procedure for failover partner replacement.

Bob McDonald
In reply to this post by Bob McDonald
Thanks for the advice.

I followed the procedure I listed. All went very well.

No need to muck around with lease times of changing status to partner-down.

This was seen by DHCP as a restart.

Regards,

Bob

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Procedure for failover partner replacement.

Christopher Barry
On Sat, 13 May 2017 09:29:33 -0500
Bob McDonald <[hidden email]> wrote:

>Thanks for the advice.
>
>I followed the procedure I listed. All went very well.
>
>No need to muck around with lease times of changing status to
>partner-down.
>
>This was seen by DHCP as a restart.
>
>Regards,
>
>Bob

Shoot first, ask questions later... I re-read your post *after* I
posted... Doh! and realized my answer was unlikely to be applicable.
Pardon the noise.

Interested though; What kind of environment are you managing? Must be
really serious to have a cluster of clusters just for dhcp. Your email
setup must be amazing! :)

--
Regards,
Christopher
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users