General questions about failover, config changes and restarting

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

General questions about failover, config changes and restarting

James Dore
Hi all,

I’ve had a pair of DHCP servers running in a load balance/failover cluster for about 9 months, but haven’t really got my head round what happens when I make a change to the configuration.

I have a bunch of config files called from the main config file thus:

##########################
#                        #
# Failover configuration #
#                        #
##########################
failover peer "newc-dhcp" {
    primary;
    address 129.67.111.199; # address of this server
    port 519;
    peer address 129.67.111.243; # address of the secondary dhcpd
    peer port 519;
   max-response-delay 60;
   max-unacked-updates 10;
   mclt 600;
   split 128;
   load balance max seconds 3;
}

key primaryhost {
    algorithm hmac-md5;
    secret <ssshhh!>
};

omapi-key primaryhost;
omapi-port 7911;


###########################
#                         #
# Load the golbal options #
#                         #
###########################

include "/etc/dhcpd.d/master.conf"; # (Rarely!) Edit this file to set global options

########################
#                      #  
# Subnet config files  #
#                      #
########################

include "/etc/dhcpd.d/vlan1.conf"; # 129.67.108.0/22 Main subnet and static assignments
include "/etc/dhcpd.d/vlan3.conf"; # 10.30.0.0/22 Devices subnet config and static assignments
include "/etc/dhcpd.d/vlan4.conf"; # 10.4.0.0/16 NAT Vlan4 Subnet config and static assignments
include "/etc/dhcpd.d/annexe.conf"; # 163.1.173.0/24 Annexe subnet config and static assignments

Both peers have pretty similar config files, the only difference being the secret and the address/peer address settings. Everything else is the same. (Should it be?)

The things I’m curious about are what happens when I make a change to one of the Subnet config files, for instance to add a new static assignment. My usual method has been to edit the file one peer, and then scp it over to the other peer. After that, it seems like I need to do a number of restarts of each peer before they both return to Normal status. They seem to get stuck in Partner-down, Recover, or Recover Wait status for a while.

If I can get them both in Recover Wait, then they will synchronise, but it seems to be difficult to get them there.

Is there anything I can do to smooth the process?

I can’t find much info about troubleshooting failover or load balancing, all my googling has turned up is instructions on initial setup. Does anyone have some useful pointers or links?

Cheers,
James


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

RE: General questions about failover, config changes and restarting

Patrick Trapp
I can't answer the whys or hows so much, but I can tell you what we do here. It was set up by someone that preceded me and I have worked to make it easier to manage, but the underlying structure, I cannot claim.

We have two DHCP servers in a failover configuration that are essentially the same, as you describe. We also have a third DHCP server that is also the same, but it has no reference to the failover configuration of the other two. DHCP is never actually started on the third server. We make our changes to the third server and test the configuration for syntactical errors, then we run a script that uses version control to upload the resulting configuration - just the part that the production servers will have in common, not the server-specific bits - to the version control server, remote in to each of production servers in turn, have them download the new configuration and restart. I have not seen any issues with restarts in our scenario, but I don't know what the pertinent bits might be that make the difference.

Patrick
________________________________________
From: [hidden email] [[hidden email]] on behalf of James Dore [[hidden email]]
Sent: Wednesday, March 02, 2016 6:36 AM
To: [hidden email]
Subject: General questions about failover, config changes and restarting

Hi all,

I’ve had a pair of DHCP servers running in a load balance/failover cluster for about 9 months, but haven’t really got my head round what happens when I make a change to the configuration.

I have a bunch of config files called from the main config file thus:

##########################
#                        #
# Failover configuration #
#                        #
##########################
failover peer "newc-dhcp" {
    primary;
    address 129.67.111.199; # address of this server
    port 519;
    peer address 129.67.111.243; # address of the secondary dhcpd
    peer port 519;
   max-response-delay 60;
   max-unacked-updates 10;
   mclt 600;
   split 128;
   load balance max seconds 3;
}

key primaryhost {
    algorithm hmac-md5;
    secret <ssshhh!>
};

omapi-key primaryhost;
omapi-port 7911;


###########################
#                         #
# Load the golbal options #
#                         #
###########################

include "/etc/dhcpd.d/master.conf"; # (Rarely!) Edit this file to set global options

########################
#                      #
# Subnet config files  #
#                      #
########################

include "/etc/dhcpd.d/vlan1.conf"; # 129.67.108.0/22 Main subnet and static assignments
include "/etc/dhcpd.d/vlan3.conf"; # 10.30.0.0/22 Devices subnet config and static assignments
include "/etc/dhcpd.d/vlan4.conf"; # 10.4.0.0/16 NAT Vlan4 Subnet config and static assignments
include "/etc/dhcpd.d/annexe.conf"; # 163.1.173.0/24 Annexe subnet config and static assignments

Both peers have pretty similar config files, the only difference being the secret and the address/peer address settings. Everything else is the same. (Should it be?)

The things I’m curious about are what happens when I make a change to one of the Subnet config files, for instance to add a new static assignment. My usual method has been to edit the file one peer, and then scp it over to the other peer. After that, it seems like I need to do a number of restarts of each peer before they both return to Normal status. They seem to get stuck in Partner-down, Recover, or Recover Wait status for a while.

If I can get them both in Recover Wait, then they will synchronise, but it seems to be difficult to get them there.

Is there anything I can do to smooth the process?

I can’t find much info about troubleshooting failover or load balancing, all my googling has turned up is instructions on initial setup. Does anyone have some useful pointers or links?

Cheers,
James


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

glenn.satchell
In reply to this post by James Dore
Hi James

The configurations for the subnets and everything except the failover (and
possibly the keys) should be exactly the same, so editting one and scp the
file to the other server is exactly the right thing to do.

It doesn't matter too much which server is restarted first, but you should
not restart the second until the first has finished synchronising lease
information. This may take a little while if there are many thousands of
leases - I see you have a /22 and /16, so maybe up to 17000 or so leases.
Could take a few minuted depending on network speed and latency between
the servers.

Once the first server has finished synchronising, then it's ok to restart
the other server, and this should synchronise much quicker.

regards,
-glenn

On Wed, March 2, 2016 11:36 pm, James Dore wrote:

> Hi all,
>
> I’ve had a pair of DHCP servers running in a load balance/failover
> cluster for about 9 months, but haven’t really got my head round what
> happens when I make a change to the configuration.
>
> I have a bunch of config files called from the main config file thus:
>
> ##########################
> #                        #
> # Failover configuration #
> #                        #
> ##########################
> failover peer "newc-dhcp" {
>     primary;
>     address 129.67.111.199; # address of this server
>     port 519;
>     peer address 129.67.111.243; # address of the secondary dhcpd
>     peer port 519;
>    max-response-delay 60;
>    max-unacked-updates 10;
>    mclt 600;
>    split 128;
>    load balance max seconds 3;
> }
>
> key primaryhost {
>     algorithm hmac-md5;
>     secret <ssshhh!>
> };
>
> omapi-key primaryhost;
> omapi-port 7911;
>
>
> ###########################
> #                         #
> # Load the golbal options #
> #                         #
> ###########################
>
> include "/etc/dhcpd.d/master.conf"; # (Rarely!) Edit this file to set
> global options
>
> ########################
> #                      #
> # Subnet config files  #
> #                      #
> ########################
>
> include "/etc/dhcpd.d/vlan1.conf"; # 129.67.108.0/22 Main subnet and
> static assignments
> include "/etc/dhcpd.d/vlan3.conf"; # 10.30.0.0/22 Devices subnet config
> and static assignments
> include "/etc/dhcpd.d/vlan4.conf"; # 10.4.0.0/16 NAT Vlan4 Subnet config
> and static assignments
> include "/etc/dhcpd.d/annexe.conf"; # 163.1.173.0/24 Annexe subnet config
> and static assignments
>
> Both peers have pretty similar config files, the only difference being the
> secret and the address/peer address settings. Everything else is the same.
> (Should it be?)
>
> The things I’m curious about are what happens when I make a change to
> one of the Subnet config files, for instance to add a new static
> assignment. My usual method has been to edit the file one peer, and then
> scp it over to the other peer. After that, it seems like I need to do a
> number of restarts of each peer before they both return to Normal status.
> They seem to get stuck in Partner-down, Recover, or Recover Wait status
> for a while.
>
> If I can get them both in Recover Wait, then they will synchronise, but it
> seems to be difficult to get them there.
>
> Is there anything I can do to smooth the process?
>
> I can’t find much info about troubleshooting failover or load balancing,
> all my googling has turned up is instructions on initial setup. Does
> anyone have some useful pointers or links?
>
> Cheers,
> James
>
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
>
>


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

sthaug
> The configurations for the subnets and everything except the failover (and
> possibly the keys) should be exactly the same, so editting one and scp the
> file to the other server is exactly the right thing to do.

Absolutely. For us it has worked well to separate the config into

dhcpd-server-specific.conf - Server related stuff (e.g. failover)
dhcpd-common.conf - Common config (e.g. pools)

where the server-specific stuff typically changes very rarely, while
subnets etc. are defined in the common configuration (same for both
servers in a failover pair) - which can then safely be scp'ed over as
necessary when it changes. And one of the files then includes the other.

Steinar Haug, Nethelp consulting, [hidden email]
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

James Dore
In reply to this post by glenn.satchell
Hi Glenn,

Thanks for that - what am I looking for in the dhcpd.log that tells me synchronisation has finished on the first server?

I ask because we’ve had occasions in the past where I’ve restarted the first server, but left the second for a couple of hours, and we stop getting addresses issued to new clients. This is the kind of log message we get during this situation -

dhcpd.log-20151123:2015-11-19T11:45:33.497093+00:00 garibaldi dhcpd: DHCPDISCOVER from 58:7f:57:17:00:1f (Keiths-iPhone-2) via 163.1.173.254: not responding (recover wait)

and they don’t clear until both the peers have moved back to ‘normal’.

I could see if there’s more log detail I can turn on, I suppose.

Cheers,
James


> On 2 Mar 2016, at 15:34, Glenn Satchell <[hidden email]> wrote:
>
> Hi James
>
> The configurations for the subnets and everything except the failover (and
> possibly the keys) should be exactly the same, so editting one and scp the
> file to the other server is exactly the right thing to do.
>
> It doesn't matter too much which server is restarted first, but you should
> not restart the second until the first has finished synchronising lease
> information. This may take a little while if there are many thousands of
> leases - I see you have a /22 and /16, so maybe up to 17000 or so leases.
> Could take a few minuted depending on network speed and latency between
> the servers.
>
> Once the first server has finished synchronising, then it's ok to restart
> the other server, and this should synchronise much quicker.
>
> regards,
> -glenn
>
> On Wed, March 2, 2016 11:36 pm, James Dore wrote:
>> Hi all,
>>
>> Iâ?Tve had a pair of DHCP servers running in a load balance/failover
>> cluster for about 9 months, but havenâ?Tt really got my head round what
>> happens when I make a change to the configuration.
>>
>> I have a bunch of config files called from the main config file thus:
>>
>> ##########################
>> #                        #
>> # Failover configuration #
>> #                        #
>> ##########################
>> failover peer "newc-dhcp" {
>>    primary;
>>    address 129.67.111.199; # address of this server
>>    port 519;
>>    peer address 129.67.111.243; # address of the secondary dhcpd
>>    peer port 519;
>>   max-response-delay 60;
>>   max-unacked-updates 10;
>>   mclt 600;
>>   split 128;
>>   load balance max seconds 3;
>> }
>>
>> key primaryhost {
>>    algorithm hmac-md5;
>>    secret <ssshhh!>
>> };
>>
>> omapi-key primaryhost;
>> omapi-port 7911;
>>
>>
>> ###########################
>> #                         #
>> # Load the golbal options #
>> #                         #
>> ###########################
>>
>> include "/etc/dhcpd.d/master.conf"; # (Rarely!) Edit this file to set
>> global options
>>
>> ########################
>> #                      #
>> # Subnet config files  #
>> #                      #
>> ########################
>>
>> include "/etc/dhcpd.d/vlan1.conf"; # 129.67.108.0/22 Main subnet and
>> static assignments
>> include "/etc/dhcpd.d/vlan3.conf"; # 10.30.0.0/22 Devices subnet config
>> and static assignments
>> include "/etc/dhcpd.d/vlan4.conf"; # 10.4.0.0/16 NAT Vlan4 Subnet config
>> and static assignments
>> include "/etc/dhcpd.d/annexe.conf"; # 163.1.173.0/24 Annexe subnet config
>> and static assignments
>>
>> Both peers have pretty similar config files, the only difference being the
>> secret and the address/peer address settings. Everything else is the same.
>> (Should it be?)
>>
>> The things Iâ?Tm curious about are what happens when I make a change to
>> one of the Subnet config files, for instance to add a new static
>> assignment. My usual method has been to edit the file one peer, and then
>> scp it over to the other peer. After that, it seems like I need to do a
>> number of restarts of each peer before they both return to Normal status.
>> They seem to get stuck in Partner-down, Recover, or Recover Wait status
>> for a while.
>>
>> If I can get them both in Recover Wait, then they will synchronise, but it
>> seems to be difficult to get them there.
>>
>> Is there anything I can do to smooth the process?
>>
>> I canâ?Tt find much info about troubleshooting failover or load balancing,
>> all my googling has turned up is instructions on initial setup. Does
>> anyone have some useful pointers or links?
>>
>> Cheers,
>> James
>>
>>
>> _______________________________________________
>> dhcp-users mailing list
>> [hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>
>>
>
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

Steven Carr
On 2 March 2016 at 16:52, James Dore <[hidden email]> wrote:> Hi Glenn,
> I ask because we’ve had occasions in the past where I’ve restarted the first server, but left the second for a couple of hours, and we stop getting addresses issued to new clients. This is the kind of log message we get during this situation -

Sync is finished when both peers return to NORMAL mode. You need to
restart both servers (just kill dhcpd and restart it) one after
another or you're likely to run into issues with the pools not
matching, and then you'll run into issues with not leasing IPs.

Steve
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

James Dore

> On 2 Mar 2016, at 17:21, S Ca <[hidden email]> wrote:
>
> On 2 March 2016 at 16:52, James Dore <[hidden email]> wrote:> Hi Glenn,
>> I ask because we’ve had occasions in the past where I’ve restarted the first server, but left the second for a couple of hours, and we stop getting addresses issued to new clients. This is the kind of log message we get during this situation -
>
> Sync is finished when both peers return to NORMAL mode. You need to
> restart both servers (just kill dhcpd and restart it) one after
> another or you're likely to run into issues with the pools not
> matching, and then you'll run into issues with not leasing IPs.
>
> Steve

Yes, that has been my normal practice, but which ever order I restart the servers in, they never seem to sort themselves out first time round and require a handful of restarts apiece.

My usual method is

service dhcpd restart

given my SLES12 servers use systemd for service management now. although rcdhcpd restart achieves the same thing.

Any ideas why my servers need so many restarts? I try to leave them to settle for five or six minutes before trying again, but they just seem to stick with things like

dhcpd.log-20151123:2015-11-23T10:17:58.203346+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from normal to shutdown
dhcpd.log-20151123:2015-11-23T10:18:03.848105+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from shutdown to startup
dhcpd.log-20151123:2015-11-23T10:18:03.854208+00:00 garibaldi dhcpd: failover peer newc-dhcp: peer moves from normal to communications-interrupted
dhcpd.log-20151123:2015-11-23T10:18:03.927644+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from startup to shutdown
dhcpd.log-20151123:2015-11-23T10:18:03.940208+00:00 garibaldi dhcpd: failover peer newc-dhcp: peer moves from communications-interrupted to partner-down

in the logs.

Cheers,
James

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

Steven Carr
On 3 March 2016 at 15:02, James Dore <[hidden email]> wrote:
> Any ideas why my servers need so many restarts? I try to leave them to settle for five or six minutes before trying again, but they just seem to stick with things like
>
> dhcpd.log-20151123:2015-11-23T10:17:58.203346+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from normal to shutdown
> dhcpd.log-20151123:2015-11-23T10:18:03.848105+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from shutdown to startup
> dhcpd.log-20151123:2015-11-23T10:18:03.854208+00:00 garibaldi dhcpd: failover peer newc-dhcp: peer moves from normal to communications-interrupted
> dhcpd.log-20151123:2015-11-23T10:18:03.927644+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from startup to shutdown
> dhcpd.log-20151123:2015-11-23T10:18:03.940208+00:00 garibaldi dhcpd: failover peer newc-dhcp: peer moves from communications-interrupted to partner-down

So what version of DHCPD are you running, a (stupid) feature was
introduced in a particular version (now regressed) to automatically
enter partner-down on a shut down, that is the root of your problems,
you should not be entering partner-down at all as the partner is not
down, and then it has to go through it's recovery stage to get back
into normal mode. Also check the init script to make sure it's not
doing anything silly with omshell to initiate a shutdown as that will
also trigger partner-down.

Others on the list might be able to comment which version this change
was regressed in.

Steve
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

sthaug
In reply to this post by James Dore
> > Sync is finished when both peers return to NORMAL mode. You need to
> > restart both servers (just kill dhcpd and restart it) one after
> > another or you're likely to run into issues with the pools not
> > matching, and then you'll run into issues with not leasing IPs.
...
> Any ideas why my servers need so many restarts? I try to leave them to settle for five or six minutes before trying again, but they just seem to stick with things like

I am somewhat mystified about why you would need many minutes for a
restart. Here's what we see on our failover pair, with around 100k
leases and a couple of hundred pools:

(master restarts, log on slave):
Mar  3 08:50:00 slam dhcpd: peer dhcp1-dhcp2: disconnected
Mar  3 08:50:00 slam dhcpd: failover peer dhcp1-dhcp2: I move from normal to communications-interrupted
...
(a few seconds pass, and then)
Mar  3 08:50:11 slam dhcpd: failover peer dhcp1-dhcp2: peer moves from normal to normal
Mar  3 08:50:11 slam dhcpd: failover peer dhcp1-dhcp2: I move from communications-interrupted to normal

So a restart for us takes around 11 seconds.

It should be noted that
- The servers have plenty of memory, and hardware RAID with battery
backup for the disks.
- We use the "delayed ACK" facility.

Steinar Haug, Nethelp consulting, [hidden email]
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

James Dore

> On 3 Mar 2016, at 21:15, [hidden email] wrote:
>
>>> Sync is finished when both peers return to NORMAL mode. You need to
>>> restart both servers (just kill dhcpd and restart it) one after
>>> another or you're likely to run into issues with the pools not
>>> matching, and then you'll run into issues with not leasing IPs.
> ...
>> Any ideas why my servers need so many restarts? I try to leave them to settle for five or six minutes before trying again, but they just seem to stick with things like
>
> I am somewhat mystified about why you would need many minutes for a
> restart. Here's what we see on our failover pair, with around 100k
> leases and a couple of hundred pools:
>
> (master restarts, log on slave):
> Mar  3 08:50:00 slam dhcpd: peer dhcp1-dhcp2: disconnected
> Mar  3 08:50:00 slam dhcpd: failover peer dhcp1-dhcp2: I move from normal to communications-interrupted
> ...
> (a few seconds pass, and then)
> Mar  3 08:50:11 slam dhcpd: failover peer dhcp1-dhcp2: peer moves from normal to normal
> Mar  3 08:50:11 slam dhcpd: failover peer dhcp1-dhcp2: I move from communications-interrupted to normal
>
> So a restart for us takes around 11 seconds.
>
> It should be noted that
> - The servers have plenty of memory, and hardware RAID with battery
> backup for the disks.
> - We use the "delayed ACK" facility.
>
> Steinar Haug, Nethelp consulting, [hidden email]


Sorry, clearly I didn’t explain myself clearly enough: it’s not that the restart *itself* that takes minutes (that takes a couple of seconds) - it’s the period of time *after* the restart in which the peers are sat not synchronising or sitting in partner-down/recover/recover wait state and *before* I do another restart that kicks them back into sync.

Hope that makes sense!

Cheers,
James

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

James Dore
In reply to this post by Steven Carr

> On 3 Mar 2016, at 16:45, S Carr <[hidden email]> wrote:
>
> On 3 March 2016 at 15:02, James Dore <[hidden email]> wrote:
>> Any ideas why my servers need so many restarts? I try to leave them to settle for five or six minutes before trying again, but they just seem to stick with things like
>>
>> dhcpd.log-20151123:2015-11-23T10:17:58.203346+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from normal to shutdown
>> dhcpd.log-20151123:2015-11-23T10:18:03.848105+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from shutdown to startup
>> dhcpd.log-20151123:2015-11-23T10:18:03.854208+00:00 garibaldi dhcpd: failover peer newc-dhcp: peer moves from normal to communications-interrupted
>> dhcpd.log-20151123:2015-11-23T10:18:03.927644+00:00 garibaldi dhcpd: failover peer newc-dhcp: I move from startup to shutdown
>> dhcpd.log-20151123:2015-11-23T10:18:03.940208+00:00 garibaldi dhcpd: failover peer newc-dhcp: peer moves from communications-interrupted to partner-down
>
> So what version of DHCPD are you running, a (stupid) feature was
> introduced in a particular version (now regressed) to automatically
> enter partner-down on a shut down, that is the root of your problems,
> you should not be entering partner-down at all as the partner is not
> down, and then it has to go through it's recovery stage to get back
> into normal mode. Also check the init script to make sure it's not
> doing anything silly with omshell to initiate a shutdown as that will
> also trigger partner-down.
>
> Others on the list might be able to comment which version this change
> was regressed in.
>
> Steve

Hi Steve,

It’s version 4.2.6 as shipped with SLES12.

Cheers,
James


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: General questions about failover, config changes and restarting

Steven Carr
On 4 March 2016 at 11:00, James Dore <[hidden email]> wrote:
> It’s version 4.2.6 as shipped with SLES12.

So a quick scan of the archives I got this from another thread that
had a similar issue...

<snip>
From: Shawn Routhier <[hidden email]>
Subject: Re: Primary server stuck in "recovering" on restarts
Date: Wed, 30 Jul 2014 09:57:16 -0700

This is likely caused by our addition of a "gentle shutdown" feature
in 4.2.6 and 4.3.0.
In this we added a signal handler to collect some signals and shut the
serve down
cleanly.  Unfortunately one side effect of this change was to put the
peer into partner-down.

We have backed out this change for 4.2.7 and 4.3.1 (both currently in
beta, if people are
testing them we'd like to hear about your results).

In the meantime if this is the problem you should be able to avoid it
by using a hard kill
such as "kill -9" to stop the process.
</snip>

So looks like you need to upgrade or tweak the init script to 'kill -9'

Steve
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users