DHCP Failover - initial Configuration

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

DHCP Failover - initial Configuration

Philippe Maechler

Hello dhcp-users


Yesterday I wrote the same mail to the list, but it never appeared in the arrchives or in my inbox. Therefor i resend the mail from another address. sorry if the messaged is now sent multiple times


 

One of our dhcp 4.3 servers for DOCSIS died last week and we’re now in the process to set up two new servers with version 4.4 😊.

I’d like to introduce failover for some important pools, but still have open questions

 

The configuration is split into a few pieces.

 

The static configuration part is rsynce’d once a day to the secondary node.

Both servers have a .local config, which contains next-server, time-servers, server-name, the subnet declaration for the management net and our omapi configuration

The dynamic part comes from another system and is uploaded by ftp two both servers.

 

A script runs every 5 minutes and if we have new files from ftp, we build the configuration and restart the server. The secondary server does the same, but the daemon is not enabled, so the server won’t serve any requests.

 

If we have a planned maintenance, we stop both server, rsync the lease.db and start the secondary node

Further the lease.db from the primary node is fetched every 10 minutes to another system. So that when the primary server dies, we have a ~10minute old lease db (depends on the last reload) that we can put on the second node

 

 

If we wanna go for failover, what are the right steps to start?

  1. Configure failover on the primary node (in the local confg)
  2. Choose which we wanna do failover and configure them
  3. Restart the primary node and put it into partner down state
  4. Configure the second server (failover and pools)
  5. Start the second server
  6. Put the primary server into partner-up? Mode

 

Does this sound right?

 

 

 

/30 networks

We have about ~240 pools, ~50 pools only contain one single ip address. Does failover makes sense here?

We can’t use hosts definitions because we only know the option-82. Some customers have more than one device connected but we can only serve the single ip address to one of them. If we use failover, can it happen, that server-1 hands out the ip address to device-1 and server two hand out the ip to device-2?

 

Heavily used pools

The bigger part is our /24 pools. These are all in a shared network config. I guess the failover part works pretty fine for the individual pools here. The shared network is sometimes at 95% usage. Can this lead to problems?

 

 

Server restarts

Currently we restart the service every 5minutes if something changed. When we go for failover, we should reload server one and if it synced to his partner, we can reload the server two. How does server two know, that the server one is up to date and everything is synced?

 

The ISC Knowledge Base contains an article about failover setup (https://kb.isc.org/article/AA-00502/0/A-Basic-Guide-to-Configuring-DHCP-Failover.html) and part 7 is configuring OMAPI access? Who uses that? Is this for the communication between primary and secondary or only for putting one server into partner down mode?

We actually do use omapi to check and expire active leases.

 

 

I’m sure that as soon as I hit send, more questions come to my mind 😊 e.g good mctl and split values for lease times of 1h

 

 

TIA for all your inputs and recommendations

 

Philippe


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

Simon Hobson
Philippe Maechler <[hidden email]> wrote:

> If we wanna go for failover, what are the right steps to start?
> • Configure failover on the primary node (in the local confg)
> • Choose which we wanna do failover and configure them
> • Restart the primary node and put it into partner down state
> • Configure the second server (failover and pools)
> • Start the second server
> • Put the primary server into partner-up? Mode
>  
> Does this sound right?

Almost, the last step is automagic - when the second server comes up, it will communicate with the first, sync the leases, then after (AIUI) MCLT they will both go into normal operation.

> /30 networks
> We have about ~240 pools, ~50 pools only contain one single ip address. Does failover makes sense here?
> We can’t use hosts definitions because we only know the option-82. Some customers have more than one device connected but we can only serve the single ip address to one of them. If we use failover, can it happen, that server-1 hands out the ip address to device-1 and server two hand out the ip to device-2?

Failover won't work with such a pool - there's no free leases to balance between the servers. You could configure the same pool on both servers without failover - but then, as you suggest, the same address could be leased to two devices.

> Heavily used pools
> The bigger part is our /24 pools. These are all in a shared network config. I guess the failover part works pretty fine for the individual pools here. The shared network is sometimes at 95% usage. Can this lead to problems?

As long as there are free leases in a pool then it will work.

> Server restarts
> Currently we restart the service every 5minutes if something changed. When we go for failover, we should reload server one and if it synced to his partner, we can reload the server two. How does server two know, that the server one is up to date and everything is synced?

After a restart it will take time for the servers to resync. You'll need to adapt your management system to hold off on restarts. Hopefully someone more familiar with failover will be along soon with more details, but from things said on here, there are some cases where the servers can take a while before they get back to fully normal operation.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

Philippe Maechler
Hello Simon, hello list

On Wed, 8 Aug 2018 at 19:24, Simon Hobson <[hidden email]> wrote:

> Server restarts
> Currently we restart the service every 5minutes if something changed. When we go for failover, we should reload server one and if it synced to his partner, we can reload the server two. How does server two know, that the server one is up to date and everything is synced?

After a restart it will take time for the servers to resync. You'll need to adapt your management system to hold off on restarts. Hopefully someone more familiar with failover will be along soon with more details, but from things said on here, there are some cases where the servers can take a while before they get back to fully normal operation.
 
Yes, I'm already testing a way for checking the server state before a reload. The current idea is, that our reload script first checks via omapi the failover-state from the other server. If the server is in ready and in sync, we do the reload. otherwise we wait another few minutes. Since we already rely on omapi for other things, this shouldn't be much magic :)


Something else you mentioned, mlct. On of our access system is doing something like dhcp-snooping/dhcp-aging. When a client successfuly logs on with a DORA sequence, the clients mac address is allowed to communicate for a given time. unfortunately this time is hardcoded in the access system and not learnt from the DORA sequence. If we have a lease time of 7200s but an mlct of 3600, clients would first get a lease time of 1h and on a Request/Ackownlede a lease time of 2h. Would that work if we set mlct==lease-time? What are the benedits and drawbacks from such a configuration?



_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

perl-list
That could be really bad.  MCLT only affects the behavior of the "secondary" failover peer when the "primary" isn't present and how long a "recover-wait" period lasts, as far as I know.  If you set your lease expiry time to 7200 and the length of your snooping/aging setup the same, that could work in most cases.  However, clients are in control of what lease time they want to use.  They can and do, at times, request different lease times than are offered by the DHCP server which could throw your snooping/aging system off.  All that said, I'd think you should set MCLT to the same as your lease length in this situation for maximum compatibility.

----- Original Message -----
> From: "Philippe Maechler" <[hidden email]>
> To: "Users of ISC DHCP" <[hidden email]>
> Sent: Thursday, August 9, 2018 2:15:25 AM
> Subject: Re: DHCP Failover - initial Configuration

> Hello Simon, hello list

> On Wed, 8 Aug 2018 at 19:24, Simon Hobson < [ mailto:[hidden email] |
> [hidden email] ] > wrote:

>> > Server restarts
>>> Currently we restart the service every 5minutes if something changed. When we go
>>> for failover, we should reload server one and if it synced to his partner, we
>>> can reload the server two. How does server two know, that the server one is up
>> > to date and everything is synced?

>> After a restart it will take time for the servers to resync. You'll need to
>> adapt your management system to hold off on restarts. Hopefully someone more
>> familiar with failover will be along soon with more details, but from things
>> said on here, there are some cases where the servers can take a while before
>> they get back to fully normal operation.

> Yes, I'm already testing a way for checking the server state before a reload.
> The current idea is, that our reload script first checks via omapi the
> failover-state from the other server. If the server is in ready and in sync, we
> do the reload. otherwise we wait another few minutes. Since we already rely on
> omapi for other things, this shouldn't be much magic :)

> Something else you mentioned, mlct. On of our access system is doing something
> like dhcp-snooping/dhcp-aging. When a client successfuly logs on with a DORA
> sequence, the clients mac address is allowed to communicate for a given time.
> unfortunately this time is hardcoded in the access system and not learnt from
> the DORA sequence. If we have a lease time of 7200s but an mlct of 3600,
> clients would first get a lease time of 1h and on a Request/Ackownlede a lease
> time of 2h. Would that work if we set mlct==lease-time? What are the benedits
> and drawbacks from such a configuration?

> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

Philippe Maechler
Hi perl-list

Yes I already ran into the "issue" that the client can request a lease for a certain time. Thats why we have

min-lease-time 3600;
default-lease-time 3600;
max-lease-time 3600;

set in our dhcpd.conf

It's good to know that we can set mclt and the lease-time to the same value ;)


On Thu, 9 Aug 2018 at 15:14, perl-list <[hidden email]> wrote:
That could be really bad.  MCLT only affects the behavior of the "secondary" failover peer when the "primary" isn't present and how long a "recover-wait" period lasts, as far as I know.  If you set your lease expiry time to 7200 and the length of your snooping/aging setup the same, that could work in most cases.  However, clients are in control of what lease time they want to use.  They can and do, at times, request different lease times than are offered by the DHCP server which could throw your snooping/aging system off.  All that said, I'd think you should set MCLT to the same as your lease length in this situation for maximum compatibility.

----- Original Message -----
> From: "Philippe Maechler" <[hidden email]>
> To: "Users of ISC DHCP" <[hidden email]>
> Sent: Thursday, August 9, 2018 2:15:25 AM
> Subject: Re: DHCP Failover - initial Configuration

> Hello Simon, hello list

> On Wed, 8 Aug 2018 at 19:24, Simon Hobson < [ mailto:[hidden email] |
> [hidden email] ] > wrote:

>> > Server restarts
>>> Currently we restart the service every 5minutes if something changed. When we go
>>> for failover, we should reload server one and if it synced to his partner, we
>>> can reload the server two. How does server two know, that the server one is up
>> > to date and everything is synced?

>> After a restart it will take time for the servers to resync. You'll need to
>> adapt your management system to hold off on restarts. Hopefully someone more
>> familiar with failover will be along soon with more details, but from things
>> said on here, there are some cases where the servers can take a while before
>> they get back to fully normal operation.

> Yes, I'm already testing a way for checking the server state before a reload.
> The current idea is, that our reload script first checks via omapi the
> failover-state from the other server. If the server is in ready and in sync, we
> do the reload. otherwise we wait another few minutes. Since we already rely on
> omapi for other things, this shouldn't be much magic :)

> Something else you mentioned, mlct. On of our access system is doing something
> like dhcp-snooping/dhcp-aging. When a client successfuly logs on with a DORA
> sequence, the clients mac address is allowed to communicate for a given time.
> unfortunately this time is hardcoded in the access system and not learnt from
> the DORA sequence. If we have a lease time of 7200s but an mlct of 3600,
> clients would first get a lease time of 1h and on a Request/Ackownlede a lease
> time of 2h. Would that work if we set mlct==lease-time? What are the benedits
> and drawbacks from such a configuration?

> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

Simon Hobson
In reply to this post by Philippe Maechler
Philippe Maechler <[hidden email]> wrote:

> On of our access system is doing something like dhcp-snooping/dhcp-aging. When a client successfuly logs on with a DORA sequence, the clients mac address is allowed to communicate for a given time. unfortunately this time is hardcoded in the access system and not learnt from the DORA sequence.

That is bad and means the access system is BROKEN. No ifs or buts, it is **BROKEN**.
But I imagine you've already suggested this to the vendor and they've "declined to fix it".

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

perl-list
In reply to this post by Philippe Maechler
Even that doesn't guarantee anything.  Clients can and do still suggest and receive a different lease length.  Its really not often, but I have seen it.  For example, some versions of windows will initially request an hour lease followed by a full lease length (ie: what you have configured in your directives) at renewal time.  This would be when you have longer than an hour lease lengths configured.

----- Original Message -----
> From: "Philippe Maechler" <[hidden email]>
> To: "Users of ISC DHCP" <[hidden email]>
> Sent: Thursday, August 9, 2018 10:36:17 AM
> Subject: Re: DHCP Failover - initial Configuration

> Hi perl-list
> Yes I already ran into the "issue" that the client can request a lease for a
> certain time. Thats why we have

> min-lease-time 3600;
> default-lease-time 3600;
> max-lease-time 3600;

> set in our dhcpd.conf

> It's good to know that we can set mclt and the lease-time to the same value ;)

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

Simon Hobson
perl-list <[hidden email]> wrote:

>> Yes I already ran into the "issue" that the client can request a lease for a
>> certain time. Thats why we have
>
>> min-lease-time 3600;
>> default-lease-time 3600;
>> max-lease-time 3600;
>
>> set in our dhcpd.conf

> Even that doesn't guarantee anything.  Clients can and do still suggest and receive a different lease length.  Its really not often, but I have seen it.  For example, some versions of windows will initially request an hour lease followed by a full lease length (ie: what you have configured in your directives) at renewal time.  This would be when you have longer than an hour lease lengths configured.

The server should never offer a lease shorter than min-lease-time, or longer than max-lease-time - except when failover is/has happened and MCLT is in effect. A client could choose to not use the full lease length, but it cannot (unless broken) use the address for longer than the lease time offered.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

Philippe Maechler
In reply to this post by Simon Hobson


On Thu, 9 Aug 2018 at 17:03, Simon Hobson <[hidden email]> wrote:
Philippe Maechler <[hidden email]> wrote:

> On of our access system is doing something like dhcp-snooping/dhcp-aging. When a client successfuly logs on with a DORA sequence, the clients mac address is allowed to communicate for a given time. unfortunately this time is hardcoded in the access system and not learnt from the DORA sequence.

That is bad and means the access system is BROKEN. No ifs or buts, it is **BROKEN**.
But I imagine you've already suggested this to the vendor and they've "declined to fix it".
 
Exactly. I was told that the lease-time is internaly handled as an integer in hours and therefor would require significant changes in the system. I could never really understand that because when the lease-time is in hours, how does the system detects if the lease ends yet or in 10 minutes...
But yes, a fix was declined or better "not possible in the near future"... that was two years ago :(


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

Philippe Maechler
In reply to this post by Simon Hobson

On Thu, 9 Aug 2018 at 18:32, Simon Hobson <[hidden email]> wrote:
perl-list <[hidden email]> wrote:

> Even that doesn't guarantee anything.  Clients can and do still suggest and receive a different lease length.  Its really not often, but I have seen it.  For example, some versions of windows will initially request an hour lease followed by a full lease length (ie: what you have configured in your directives) at renewal time.  This would be when you have longer than an hour lease lengths configured.
Yes, client cant and do ask for shorter or longer lease times, but the server declines that and hands out only leases with a lengt of 1h. The only exception is for leases wo fall into the dhcp-cache-threshold option. They get a "shorter" lease time offered.
 
The server should never offer a lease shorter than min-lease-time, or longer than max-lease-time - except when failover is/has happened and MCLT is in effect. A client could choose to not use the full lease length, but it cannot (unless broken) use the address for longer than the lease time offered.
 
We once had a broken client who refused the lease offered. The client did the DORA sequence, declined the lease and startet again with a DORA sequence and so on. Except from that we never faced an issue with default-, min- and max-lease time set to 3600s


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: DHCP Failover - initial Configuration

Simon Hobson
In reply to this post by Philippe Maechler
Philippe Maechler <[hidden email]> wrote:

> Exactly. I was told that the lease-time is internaly handled as an integer in hours and therefor would require significant changes in the system. I could never really understand that because when the lease-time is in hours, how does the system detects if the lease ends yet or in 10 minutes...
> But yes, a fix was declined or better "not possible in the near future"... that was two years ago :(

That comes from the WTF! department


> We once had a broken client who refused the lease offered. The client did the DORA sequence, declined the lease and startet again with a DORA sequence and so on.

Yes, there are some "interesting" clients. Many years ago with a different hat on, I had a printer RIP that refused to accept a lease. Long after I'd manually configured it I found out that it would only accept a lease for at least 2 years ! Some choice language was uttered at that as well.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users