Frustrated DHCP failover not working.. :(

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Frustrated DHCP failover not working.. :(

Rob Morin

Hello all… I recently upgraded our 2 dhcp servers, running Ubuntu 14.04 on quad core servers with 8 gigs of ram.

 

Before I get into everything we DID have a working failover pair before the upgrades were done, but just on crappy and failing hard for 4 years.

 

What was done is the following….

 

I made both our  dhcp-1(primary) and our dhcp-2(secondary) into stand alone mode(no fail over) , I know this might have not been the correct way to do this, but at the time it seemed practical.

We then configured our clients controllers to go half to dhcp-1 and half to dhcp-2

This worked fine.

We then gradually moved, over the course of a couple days,  all the client controllers to go only to dhcp-2 server, so at that point all controllers were going to dhcp-2 only.

This was working fine.

 

I then swapped out dhcp-1 server for a more updated one, with the above mentioned specs.

Last night I attempted to bring them back into failover mode/setup, this did not go well.

 

What I did was the following;

With dhcp-1 dhcpd daemon stopped, but configured to do failover, I then stopped dhcp-2 server.

Now during this time period,  leases were obviously not give out J

I then proceed to re-configure dhcp-2 server to be a failover once again using the same method that was previously used successfully, I added the secondary server conf include statement  back into dhcpd.conf file, I made sure all was like it was before we did anything.

 

I started up dhcp-2, with its dhcpd,leases file the same as it was before I started all this, and then  in the syslog I saw 1000’s of the below line..

 

DHCPDISCOVER from 8c:2d:aa:21:10:91 via 10.37.22.1: peer holds all free leases

 

Now the peer, dhcp-1 was not even up, so I am not sure how it was saying that.

 

I then preceded to tell dhcp-2 that dhcp-1 was done via omshell command, then dhcp-2 started giving leases out again.

 

I then went back on to dhcp-1, started it and it went into recover mode..

Feb 10 05:45:45 dhcp-1 dhcpd: Internet Systems Consortium DHCP Server 4.3.3-P1
Feb 10 05:45:45 dhcp-1 dhcpd: Copyright 2004-2016 Internet Systems Consortium.
Feb 10 05:45:45 dhcp-1 dhcpd: All rights reserved.
Feb 10 05:45:45 dhcp-1 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
Feb 10 05:45:45 dhcp-1 dhcpd: Wrote 0 leases to leases file.
Feb 10 05:45:45 dhcp-1 dhcpd: Host HW hash:   No table.
Feb 10 05:45:45 dhcp-1 dhcpd: Host UID hash:  No table.
Feb 10 05:45:45 dhcp-1 dhcpd: Lease IP hash:  Contents/Size (%): 1664000/1800017 (92%). Min/max: 0/1
Feb 10 05:45:45 dhcp-1 dhcpd: Lease UID hash: Contents/Size (%): 0/1800017 (0%). Min/max: 0/0
Feb 10 05:45:45 dhcp-1 dhcpd: Lease HW hash:  Contents/Size (%): 0/1800017 (0%). Min/max: 0/0
Feb 10 05:45:45 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: I move from recover to startup
Feb 10 05:45:45 dhcp-1 dhcpd: Server starting service.
Feb 10 05:45:45 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: peer moves from unknown-state to partner-down
Feb 10 05:45:45 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: I move from startup to recover
Feb 10 05:45:45 dhcp-1 dhcpd: Sent update request all message to tdl-dhcp-failover
Feb 10 05:46:54 dhcp-1 dhcpd: bind update on 10.54.147.229 from tdl-dhcp-failover rejected: 10.54.147.229: invalid state transition: active to expired
Feb 10 05:46:56 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: peer update completed.
Feb 10 05:46:56 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: I move from recover to recover-wait


And stayed that way....

So i put dhcp-2 back into stand alone mode as to keep the clients happy....


So what would be the proper procedure to get these two back into failover mode, while dhcp-2 servers leases still?


P.S. I just did a test on some dev servers and did not realize that recover-wait will stay like that till mclt time is over? is this correct, as i moved mclt time to 30 seconds on dev server and then eventually after  almost 45 secs i saw that the 2 dev servers saw each other.

Please see below for my conf files


Thanks...


--------------
Primary
-------------


    dhcpd.conf


authoritative;
log-facility local7;
option domain-name "tmp";
option domain-name-servers 172.30.64.210, 172.30.64.220;
default-lease-time 1200;
max-lease-time 3600;


# Include EITHER the primary configuration
include "/usr/local/etc/dhcp/dhcpd_primary.conf";
# OR the secondary configuration
#include "/etc/dhcp/dhcpd_secondary.conf";

# No service for the local networks
subnet 172.30.0.0 netmask 255.255.255.0 { }
subnet 172.30.128.0 netmask 255.255.255.0 { }
subnet 172.30.129.0 netmask 255.255.255.0 { }

# Non-standard IP ranges (i.e. big stores)
include "/usr/local/etc/dhcp/dhcpd_special_pools.conf";
pid-file-name "/run/dhcp-server/dhcpd.pid";
ddns-update-style none;
omapi-port 7911;
omapi-key omapi_key;
key omapi_key {
     algorithm hmac-md5;
     secret xxxxxxxxxxxxxxxx==;
}


    dhcpd_primary.conf


## PRIMARY
failover peer "dhcp-failover" {
  primary; # declare this to be the primary server
  address 172.30.128.9;
  port 647;
  peer address 172.30.128.11;
  peer port 647;
  max-response-delay 30;
  max-unacked-updates 10;
  load balance max seconds 3;
  mclt 1800;
  split 128;
}

    dhcpd_pools.conf


subnet 10.32.0.0 netmask 255.255.255.0 {
  option routers 10.32.0.1;
  pool {
        failover peer "tdl-dhcp-failover";
        range 10.32.0.5 10.32.0.254;
  }
}

subnet 10.32.1.0 netmask 255.255.255.0 {
  option routers 10.32.1.1;
  pool {
        failover peer "tdl-dhcp-failover";
        range 10.32.1.5 10.32.1.254;
  }
}

............................
and another 6000 subnets like above in this whole dhcpd_pools.conf file



--------------

Secondary
--------------


 dhcpd.conf


authoritative;
log-facility local7;
option domain-name "tmp";
option domain-name-servers 172.30.64.210, 172.30.64.220;
default-lease-time 1200;
max-lease-time 3600;


# Include EITHER the primary configuration
#include "/usr/local/etc/dhcp/dhcpd_primary.conf";
# OR the secondary configuration
include "/etc/dhcp/dhcpd_secondary.conf";

# No service for the local networks
subnet 172.30.0.0 netmask 255.255.255.0 { }
subnet 172.30.128.0 netmask 255.255.255.0 { }
subnet 172.30.129.0 netmask 255.255.255.0 { }

# Non-standard IP ranges (i.e. big stores)
include "/usr/local/etc/dhcp/dhcpd_special_pools.conf";
pid-file-name "/run/dhcp-server/dhcpd.pid";
ddns-update-style none;
omapi-port 7911;
omapi-key omapi_key;
key omapi_key {
     algorithm hmac-md5;
     secret xxxxxxxxxxxxxxxx==;
}


    dhcpd_secondary.conf


## SECONDARY
failover peer "dhcp-failover" {
 secondary;
 address 172.30.128.11;
 port 647;
 peer address 172.30.128.9;
 peer port 647;
 max-response-delay 30;
 max-unacked-updates 10;
 load balance max seconds 3;
}

 

dhcpd_pools.conf file is same as for dhcp-1 server


Rob Morin

Montreal, Canada


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Simon Hobson
Rob Morin <[hidden email]> wrote:

> What was done is the following….
>  
> I made both our  dhcp-1(primary) and our dhcp-2(secondary) into stand alone mode(no fail over) , I know this might have not been the correct way to do this, but at the time it seemed practical.
> We then configured our clients controllers to go half to dhcp-1 and half to dhcp-2
> This worked fine.
> We then gradually moved, over the course of a couple days,  all the client controllers to go only to dhcp-2 server, so at that point all controllers were going to dhcp-2 only.
> This was working fine.

What I would have suggested was :
Just stop server 1 and put server 2 into partner down mode (or remove it's failover config. Clients with a lease from server 1 would try and renew directly for a while, then finally broadcast a request for the address in use. At this point, server2 would answer and the client would "switch servers" without changing address.

> I then swapped out dhcp-1 server for a more updated one, with the above mentioned specs.

So server1 is now a newer version than server 2 ? I'm not really familiar with failover, but I suspect that there may well be some compatibility issues between different versions - especially if one of them is 4 years old.

I would be inclined to suggest migrating clients to the new server, then upgrade server2. The quick and easy way to do this is to :
Do not even start server 1 (or just nuke it's leases file), stop server 2, copy the leases file from server2 to server 1, start server 1 and make sure that server2 can't be accidentally started.


If you don't want this big-bang change, then it can be done by (assuming you don't have the luxury of a huge address space) :
Start up server 1 with a small pool that does not overlap with the pool in use by server 2. You need to reduce the lease length offered by server 2.
Incrementally, decrease the size of the pool offered by server 2, and increase the size of the pool offered by server 1 - always allowing all leases in the freed up space on server2 to expire before adding the range to server 1. The more spare addresses you have, the faster you can do this. After a while, all clients are using server 1.

You can avoid clients address churn by taking advantage of an undocumented behaviour in the code (warning: undocumented and not guaranteed to never change without warning). The ISC server allocated "never used" addresses from the top down - ie higher addresses first. This is just an artifact of the hashing process.
So if you put the new range used by server1 higher (numerically) than server 2, any new clients looking for a lease will get addresses from this range. But if you remove addresses from the top of the range offered by server 2 and immediately add them to the bottom of the range offered by server1, clients will keep the same address as they switch servers. Because new clients will get addresses at the top of the range, there's a fairly good chance you'll avoid any conflicts.


Finally, you can upgrade server2. When you've done that, add the failover config and let it sync the client data from server 1.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Rob Morin
Hey Simon, sorry if most post was confusing a bit...

dhcp-2 server is new and has all the new hardware and is running version
4.3.3-p1 as well as dhcp-1

Currently dhcp-1 daemon is not running, dhcp-2 is running and giving out
leases just fine, but in "stand alone" mode, meaning failover is not
configured.

What i need to do is re-config dhcp-2 to be the secondary in a failover
mode, which would be just commentating the line in the dhcpd.conf file
that tells it that ir is the secondary, and then add dhcp-1 to the mix.

But when i tried this last night, after restarting dhcp-2, i saw many
lines in the log saying...
Feb 10 05:46:44 dhcp-2 dhcpd: DHCPDISCOVER from b8:44:d9:b8:1b:f4 via
10.51.168.1: peer holds all free leases
Feb 10 05:46:44 dhcp-2 dhcpd: DHCPDISCOVER from cc:78:5f:6d:8a:73 via
10.33.169.1: peer holds all free leases
Feb 10 05:46:44 dhcp-2 dhcpd: DHCPDISCOVER from c0:ce:cd:14:3f:d8 via
10.35.167.1: peer holds all free leases
Feb 10 05:46:44 dhcp-2 dhcpd: DHCPDISCOVER from 90:8d:6c:96:a4:57 via
10.48.5.1: peer holds all free leases
Feb 10 05:46:44 dhcp-2 dhcpd: DHCPDISCOVER from 10:1c:0c:40:5f:3a via
10.41.158.1: peer holds all free leases
Feb 10 05:46:44 dhcp-2 dhcpd: DHCPDISCOVER from 34:12:98:76:5c:66 via
10.42.148.1: peer holds all free leases
Feb 10 05:46:44 dhcp-2 dhcpd: DHCPDISCOVER from bc:4c:c4:ad:22:57 via
10.32.73.1: peer holds all free leases
Feb 10 05:46:44 dhcp-2 dhcpd: DHCPDISCOVER from 00:08:ca:70:76:81 via
10.51.22.1: peer holds all free leases

While this was going on dhcp-1 was not started yet...
So i started to panic a bit.... :)
I went over to dhcp-1 and started it and its log showed this...


Feb 10 05:46:56 dhcp-1 dhcpd: failover peer dhcp-failover: peer update
completed.
Feb 10 05:46:56 dhcp-1 dhcpd: failover peer dhcp-failover: I move from
recover to recover-wait

So i panicked once again, shut down dhcp-1 and went over to dhcp-2 and
told it that partner was down via omshell
and then it started to give out leases again.

To be safe i then stopped dhcp-2 and made it stand alone again restarted
and leases were being given out ok..

But now i was thinking , maybe i should have waited for mclt time to
expire??

Argg!! Maybe all was well and i jumped the gun to early???


Rob Morin
Gestionnaire des systèmes | Senior Systems Administrator
Tel: 514 385-4448 #174
DATAVALET.COM
5275, chemin Queen-Mary, Montréal (Québec) H3W 1Y3 Canada
 
CE COURRIEL AINSI QUE CES DOCUMENTS JOINTS peuvent contenir des renseignements confidentiels et privilégiés. Si vous n’êtes pas le destinataire désigné, veuillez nous en informer immédiatement et effacer toute copie. Merci.
THIS EMAIL AND THE DOCUMENTS ATTACHED may contain privileged or confidential information. If the reader of this message is not the intended recipient, please notify the sender immediately and delete the original message. Thank you.

On 2016-02-10 10:52 AM, Simon Hobson wrote:

> Rob Morin <[hidden email]> wrote:
>
>> What was done is the following….
>>  
>> I made both our  dhcp-1(primary) and our dhcp-2(secondary) into stand alone mode(no fail over) , I know this might have not been the correct way to do this, but at the time it seemed practical.
>> We then configured our clients controllers to go half to dhcp-1 and half to dhcp-2
>> This worked fine.
>> We then gradually moved, over the course of a couple days,  all the client controllers to go only to dhcp-2 server, so at that point all controllers were going to dhcp-2 only.
>> This was working fine.
> What I would have suggested was :
> Just stop server 1 and put server 2 into partner down mode (or remove it's failover config. Clients with a lease from server 1 would try and renew directly for a while, then finally broadcast a request for the address in use. At this point, server2 would answer and the client would "switch servers" without changing address.
>
>> I then swapped out dhcp-1 server for a more updated one, with the above mentioned specs.
> So server1 is now a newer version than server 2 ? I'm not really familiar with failover, but I suspect that there may well be some compatibility issues between different versions - especially if one of them is 4 years old.
>
> I would be inclined to suggest migrating clients to the new server, then upgrade server2. The quick and easy way to do this is to :
> Do not even start server 1 (or just nuke it's leases file), stop server 2, copy the leases file from server2 to server 1, start server 1 and make sure that server2 can't be accidentally started.
>
>
> If you don't want this big-bang change, then it can be done by (assuming you don't have the luxury of a huge address space) :
> Start up server 1 with a small pool that does not overlap with the pool in use by server 2. You need to reduce the lease length offered by server 2.
> Incrementally, decrease the size of the pool offered by server 2, and increase the size of the pool offered by server 1 - always allowing all leases in the freed up space on server2 to expire before adding the range to server 1. The more spare addresses you have, the faster you can do this. After a while, all clients are using server 1.
>
> You can avoid clients address churn by taking advantage of an undocumented behaviour in the code (warning: undocumented and not guaranteed to never change without warning). The ISC server allocated "never used" addresses from the top down - ie higher addresses first. This is just an artifact of the hashing process.
> So if you put the new range used by server1 higher (numerically) than server 2, any new clients looking for a lease will get addresses from this range. But if you remove addresses from the top of the range offered by server 2 and immediately add them to the bottom of the range offered by server1, clients will keep the same address as they switch servers. Because new clients will get addresses at the top of the range, there's a fairly good chance you'll avoid any conflicts.
>
>
> Finally, you can upgrade server2. When you've done that, add the failover config and let it sync the client data from server 1.
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Niall O'Reilly
On 10 Feb 2016, at 16:36, Rob Morin wrote:

> Currently dhcp-1 daemon is not running, dhcp-2 is running and giving
> out leases just fine, but in "stand alone" mode, meaning failover is
> not configured.

   At least that's good!

> What i need to do is re-config dhcp-2 to be the secondary in a
> failover mode,

   How important is it that dhcp-2 be the secondary, and not the
primary?
   [Pas besoin de me répondre: c'est un point de réflexion]

> which would be just commentating the line in the dhcpd.conf file that
> tells it that ir is the secondary, and then add dhcp-1 to the mix.

   It might be simpler to introduce dhcp-1 as secondary.

   If it's not acceptable to keep the servers in this "reversed"
configuration,
   I'ld suggest the sequence outlines below.  It's tedious, but avoids
the solo/secondary
   transition which (a) scares me and (b) seems to be giving you
trouble.

   I've done this kind of juggling, but long enough ago that I don't
remember enough
   detail to give you a tested worked example.  Besides, I don't work
there any more
   and so no longer have access to the boxes.

   Initial condition: dhcp-2 in "solo" (less typing than "stand alone"!)
mode
   Ensure dhcp-1 is running the same version of the server code as
dhcp-2
   Prepare dhcp-2 to be primary and dhcp-1 to be secondary
   Enable failover and wait until stable operating conditions are
established
   Note: this failover phase is simply a preparation for migrating to
"solo" mode on dhcp-1
   Disable dhcp-2 and set dhcp-1 in "partner-down" mode
   Prepare dhcp-1 for "solo" operation and then restart it

   You should at this stage have the same situation as now, but with
dhcp-1 in "solo" mode.

   Finally, prepare the target failover configuration (with dhcp-1 as
primary) on each node
   and activate when ready.

   I'm not entirely sure that it will, but I hope this helps.

   Best regards,
   Niall O'Reilly
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Rob Morin
Hey  Niall, thanks fro your prompt reply... :)

Your suggestion is interesting, however currently our controllers only
point to dhcp-2 server and not to both, so making dhcp-1 primary before
we modify the controllers to go to both machines, would not work.

Pretty much what i was trying to do is to keep dhcp-2 server secondary,
and sync it up with dhcp-1, even though dhcp-1 is not giving out any
leases, because no traffic goes to it currently, as all traffic goes to
dhcp-2 only.

Downtime must be kept to a min as we normally have 10'000s of request at
any given time to these dhcp servers. Thats why i wanted to keep djcp-2
secondary and dhcp-1 primary as no requests go to dhcp-1 now, once
dhcp-1 looks like its communicating with dhcp-2 ok, then we would direct
some controllers to send requests to both servers rather than just the
one, dhcp-2

I am still thinking that maybe what i did last night actually worked,
and maybe i just did not wait long enough fro dhcp-1 to come out of
recovery=wait because the mclt time was 30 mins.

I was also thinking to simply shut down dhcpd on both servers, delete
the dhcpd.leases files on both servers, then start up dhcp-2, then
dhcp-1 and hopefully they would be in sync faster? Or even maybe
decrease the mclt time from 30 mins to 5 mins? not sure what that would
do? Our leases times are 20 minutes. and at 3AM est, not many people
should care if they loose their IP for 20 mins, i hope...

:)

Comments/suggestions?

Rob Morin
Montreal, Canada

On 2016-02-10 12:27 PM, Niall O'Reilly wrote:

> On 10 Feb 2016, at 16:36, Rob Morin wrote:
>
>> Currently dhcp-1 daemon is not running, dhcp-2 is running and giving
>> out leases just fine, but in "stand alone" mode, meaning failover is
>> not configured.
>
>   At least that's good!
>
>> What i need to do is re-config dhcp-2 to be the secondary in a
>> failover mode,
>
>   How important is it that dhcp-2 be the secondary, and not the primary?
>   [Pas besoin de me répondre: c'est un point de réflexion]
>
>> which would be just commentating the line in the dhcpd.conf file that
>> tells it that ir is the secondary, and then add dhcp-1 to the mix.
>
>   It might be simpler to introduce dhcp-1 as secondary.
>
>   If it's not acceptable to keep the servers in this "reversed"
> configuration,
>   I'ld suggest the sequence outlines below.  It's tedious, but avoids
> the solo/secondary
>   transition which (a) scares me and (b) seems to be giving you trouble.
>
>   I've done this kind of juggling, but long enough ago that I don't
> remember enough
>   detail to give you a tested worked example.  Besides, I don't work
> there any more
>   and so no longer have access to the boxes.
>
>   Initial condition: dhcp-2 in "solo" (less typing than "stand
> alone"!) mode
>   Ensure dhcp-1 is running the same version of the server code as dhcp-2
>   Prepare dhcp-2 to be primary and dhcp-1 to be secondary
>   Enable failover and wait until stable operating conditions are
> established
>   Note: this failover phase is simply a preparation for migrating to
> "solo" mode on dhcp-1
>   Disable dhcp-2 and set dhcp-1 in "partner-down" mode
>   Prepare dhcp-1 for "solo" operation and then restart it
>
>   You should at this stage have the same situation as now, but with
> dhcp-1 in "solo" mode.
>
>   Finally, prepare the target failover configuration (with dhcp-1 as
> primary) on each node
>   and activate when ready.
>
>   I'm not entirely sure that it will, but I hope this helps.
>
>   Best regards,
>   Niall O'Reilly
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Attila Szalay
In the master node, you have this configuration option: split 128;

This means, that half of the ip addresses are linked to the master and the secondary only own the other half.

Of course time-to-time they rebalance the ip pools, but (if I remember correctly) it only happens after a timeout. So if the clients only reach one of the hosts, it is possible, that those host (temporally) run out of hosts.

This is because in normal operation (and with split 128, which means 50-50%) they run more likely as a load-balancing scenario than master-slave. (Also they split the mac address word to half too and the slave answer to one half of the macs and the master to the other))

On Wed, 10 Feb 2016 at 18:48 Rob Morin <[hidden email]> wrote:
Hey  Niall, thanks fro your prompt reply... :)

Your suggestion is interesting, however currently our controllers only
point to dhcp-2 server and not to both, so making dhcp-1 primary before
we modify the controllers to go to both machines, would not work.

Pretty much what i was trying to do is to keep dhcp-2 server secondary,
and sync it up with dhcp-1, even though dhcp-1 is not giving out any
leases, because no traffic goes to it currently, as all traffic goes to
dhcp-2 only.

Downtime must be kept to a min as we normally have 10'000s of request at
any given time to these dhcp servers. Thats why i wanted to keep djcp-2
secondary and dhcp-1 primary as no requests go to dhcp-1 now, once
dhcp-1 looks like its communicating with dhcp-2 ok, then we would direct
some controllers to send requests to both servers rather than just the
one, dhcp-2

I am still thinking that maybe what i did last night actually worked,
and maybe i just did not wait long enough fro dhcp-1 to come out of
recovery=wait because the mclt time was 30 mins.

I was also thinking to simply shut down dhcpd on both servers, delete
the dhcpd.leases files on both servers, then start up dhcp-2, then
dhcp-1 and hopefully they would be in sync faster? Or even maybe
decrease the mclt time from 30 mins to 5 mins? not sure what that would
do? Our leases times are 20 minutes. and at 3AM est, not many people
should care if they loose their IP for 20 mins, i hope...

:)

Comments/suggestions?

Rob Morin
Montreal, Canada

On 2016-02-10 12:27 PM, Niall O'Reilly wrote:
> On 10 Feb 2016, at 16:36, Rob Morin wrote:
>
>> Currently dhcp-1 daemon is not running, dhcp-2 is running and giving
>> out leases just fine, but in "stand alone" mode, meaning failover is
>> not configured.
>
>   At least that's good!
>
>> What i need to do is re-config dhcp-2 to be the secondary in a
>> failover mode,
>
>   How important is it that dhcp-2 be the secondary, and not the primary?
>   [Pas besoin de me répondre: c'est un point de réflexion]
>
>> which would be just commentating the line in the dhcpd.conf file that
>> tells it that ir is the secondary, and then add dhcp-1 to the mix.
>
>   It might be simpler to introduce dhcp-1 as secondary.
>
>   If it's not acceptable to keep the servers in this "reversed"
> configuration,
>   I'ld suggest the sequence outlines below.  It's tedious, but avoids
> the solo/secondary
>   transition which (a) scares me and (b) seems to be giving you trouble.
>
>   I've done this kind of juggling, but long enough ago that I don't
> remember enough
>   detail to give you a tested worked example.  Besides, I don't work
> there any more
>   and so no longer have access to the boxes.
>
>   Initial condition: dhcp-2 in "solo" (less typing than "stand
> alone"!) mode
>   Ensure dhcp-1 is running the same version of the server code as dhcp-2
>   Prepare dhcp-2 to be primary and dhcp-1 to be secondary
>   Enable failover and wait until stable operating conditions are
> established
>   Note: this failover phase is simply a preparation for migrating to
> "solo" mode on dhcp-1
>   Disable dhcp-2 and set dhcp-1 in "partner-down" mode
>   Prepare dhcp-1 for "solo" operation and then restart it
>
>   You should at this stage have the same situation as now, but with
> dhcp-1 in "solo" mode.
>
>   Finally, prepare the target failover configuration (with dhcp-1 as
> primary) on each node
>   and activate when ready.
>
>   I'm not entirely sure that it will, but I hope this helps.
>
>   Best regards,
>   Niall O'Reilly
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Shawn Routhier

> On Feb 10, 2016, at 11:37 AM, Attila Szalay <[hidden email]> wrote:
>
> In the master node, you have this configuration option: split 128;
>
> This means, that half of the ip addresses are linked to the master and the secondary only own the other half.

This isn’t quite correct.  The split option refers to how the two servers
handle incoming requests not how they allocate addresses between
them.  128 is splitting the hash buckets used to divide the clients
50 / 50.

>
> Of course time-to-time they rebalance the ip pools, but (if I remember correctly) it only happens after a timeout. So if the clients only reach one of the hosts, it is possible, that those host (temporally) run out of hosts.

The load balancing algorithm is used until the seconds field in the
request packet exceeds the value specified by “load balance max seconds”
at which point either of the two servers will respond.  As the seconds field
is the amount of time since the client started to request it will normally be
0 to start with and increment if there are time outs.

>
> This is because in normal operation (and with split 128, which means 50-50%) they run more likely as a load-balancing scenario than master-slave. (Also they split the mac address word to half too and the slave answer to one half of the macs and the master to the other))

On the original question.  Setting your MCLT to 30 minutes while your default lease time is 20 minutes
seems a bit strange to me.

Shawn



_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Gregory Sloop
Re: Frustrated DHCP failover not working.. :(

SR> On the original question.  Setting your MCLT to 30 minutes while
SR> your default lease time is 20 minutes
SR> seems a bit strange to me.

I certainly don't want to derail the search for a solution to the original poster, but MCLT and it's proper setting isn't much discussed, and when I brought it up some time back, Cathy Almond pointed at several FAQ's that were not very useful. There was/is very little discussion about what a "reasonable" forumla might be to determine what an optimal setting would be. [Cathy pointed to FAQ's that _strongly_ suggested leaving it the default 30m. Which may be where/why the OP left it at 30m. I believe this is quite wrong, but I'm certainly no guru on dhcpd - so I hesitate to make authoritative pronouncements about the subject. :) ]

In short, either here or in a new thread, I think a fuller discussion about MCLT and fail-over operation. [Especially when a partner is down - either in "communications-interrupted" mode, or in "partner-down mode."] I'd even perhaps be willing to spend some time writing up a more informative FAQ on the issue - provided I'm able to get a good understanding of how it works. If I've missed some docs somewhere, I'd be more than glad to be pointed at them.

---
More to the point of this thread.
1) I don't recall my peer servers waiting the MCLT time to go from "waiting" to "normal" - ever. [I could certainly be wrong about this - but I don't think waiting the MCLT time would have made the servers go from waiting to normal. I'd guess there's some kind of communication problem between the two. i.e. Intermittant, too little bandwidth, etc. ]

2) In a fail-over situation, I'm not sure it's clear to Rob that both machines will be splitting the pool and both will be responding to lease requests.

i.e. Fail-over isn't really a "fail-over" server. Which machine is primary and which is secondary is, IMO, immaterial. [Other than one will have the "master" config file, while the second will have the "peer."] Personally, I think it ought to be called load-balancing with fail-over features. [Because in normal operation, it's load balancing between the two servers, and when one is down, it has features to continue operating on a single server with minimal disruption (or if you're somewhat lucky, none).

My apologies if I'm wrong and #2 is clear to Rob.

-Greg


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Shawn Routhier

On Feb 10, 2016, at 12:35 PM, Gregory Sloop <[hidden email]> wrote:

Re: Frustrated DHCP failover not working.. :(


SR> On the original question.  Setting your MCLT to 30 minutes while
SR> your default lease time is 20 minutes
SR> seems a bit strange to me.

I certainly don't want to derail the search for a solution to the original poster, but MCLT and it's proper setting isn't much discussed, and when I brought it up some time back, Cathy Almond pointed at several FAQ's that were not very useful. There was/is very little discussion about what a "reasonable" forumla might be to determine what an optimal setting would be. [Cathy pointed to FAQ's that _strongly_ suggested leaving it the default 30m. Which may be where/why the OP left it at 30m. I believe this is quite wrong, but I'm certainly no guru on dhcpd - so I hesitate to make authoritative pronouncements about the subject. :) ]

In short, either here or in a new thread, I think a fuller discussion about MCLT and fail-over operation. [Especially when a partner is down - either in "communications-interrupted" mode, or in "partner-down mode."] I'd even perhaps be willing to spend some time writing up a more informative FAQ on the issue - provided I'm able to get a good understanding of how it works. If I've missed some docs somewhere, I'd be more than glad to be pointed at them.

It’s difficult to try and cover all the different possibilities as people use the servers in different ways.
I don’t think we have a great description on it.


---
More to the point of this thread.
1) I don't recall my peer servers waiting the MCLT time to go from "waiting" to "normal" - ever. [I could certainly be wrong about this - but I don't think waiting the MCLT time would have made the servers go from waiting to normal. I'd guess there's some kind of communication problem between the two. i.e. Intermittant, too little bandwidth, etc. ]

There are some wait states that last for MCLT.


2) In a fail-over situation, I'm not sure it's clear to Rob that both machines will be splitting the pool and both will be responding to lease requests.

i.e. Fail-over isn't really a "fail-over" server. Which machine is primary and which is secondary is, IMO, immaterial. [Other than one will have the "master" config file, while the second will have the "peer.”]

There is at least one major difference between primary and secondary.  In the transition
from active to free or backup.  In normal operation only the primary can transition the lease from active
to free (available on the primary) to backup (available on the secondary).  The rules change for partner-down
and we added an optimization for use in communications-interrupted.

Personally, I think it ought to be called load-balancing with fail-over features. [Because in normal operation, it's load balancing between the two servers, and when one is down, it has features to continue operating on a single server with minimal disruption (or if you're somewhat lucky, none).

One can approximate failover by setting the split level to 0 or 256 in which case one or the other server
will serve everything and the other will serve nothing until the load balance max seconds value is
exceeded.


My apologies if I'm wrong and #2 is clear to Rob.

-Greg

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Frustrated DHCP failover not working.. :(

Niall O'Reilly
In reply to this post by Rob Morin
On 10 Feb 2016, at 17:48, Rob Morin wrote:

> Your suggestion is interesting, however currently our controllers only
> point to dhcp-2 server and not to both, so making dhcp-1 primary
> before we modify the controllers to go to both machines, would not
> work.

   I'm sorry; I missed that detail. 8-(

   I think you should start by pointing the controllers at both, as
making
   requests of a server which isn't set up to respond shouldn't cause
any
   harm.

   /Niall
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users