Failure of dhcp server failover

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Failure of dhcp server failover

Simon Hobson
Eugene Grosbein <[hidden email]> wrote:

> UniFi does not maintain Controller API compatibility between major releases
> of the Controller and I cannot just upgrade to 4.x series as I have lots of custom code
> utilizing the API.

That's good enough reason for me - and yes, it's one of those "niggles" with Ubiquiti stuff.

> Can you provide any references to standards
> or DHCP server documentation for restrictions on GI-Addr?

Try RFC2131 https://www.ietf.org/rfc/rfc2131.txt - section 4.3.1 for example :
> the address is selected based on ... or on the address of the relay agent that forwarded the message ('giaddr' when not 0)



>> For DHCP to work properly, you **MUST** have the GI-Addr
>> within the subnet served by the interface on the relay agent -
>> using an un-numbered interface is pretty well guaranteed not to work properly.
>
> "Subnet of the interface" is common notion but an IP network can work without
> such notion at all. We use large plain IP pool (like /19) and multiple vlans
> routed by set of routers and Router/DHCP relay creates "static" /32 routes
> pointing to interface of client on the fly. In such case, interfaces do not have "subnet" notion
> but the pool does have its netmask and client has it too. Routers do arp-proxying, of course.

Ah, so what you are saying is that your VLANs are simply part of a bigger network, much like plugging multiple switches into each other to make one big network. It might have helped if you had described your network at the start, so people wouldn't be working on the assumption that it was a "normal" network.

You seem to have gone out of your way to make a complicated network - is there any fundamental reason behind this ?
But regardless of that, I see a way around it ...

> It works just fine when not in failover mode. I can't think a reason
> this could work for single ISC DHCP server and not work for a cluster other than bug/race.

Wel it is well known that very small address ranges "do not work well" in failover situations. It's hard to balance free leases between two servers when there is only one lease !

Since each device-address mapping is mapping a single entity to a single address, I don't see what failover brings to the party other than problems. You could simply define the same (non-failover) single address pool on both servers and it'll work fine. On initial setup, the client will get two identical offers - one from each server - but after that it will simply renew with the server it accepted an offer from. If that server goes down, the other server will be able to give it the same address without having to involve failover.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Failure of dhcp server failover

Niall O'Reilly
In reply to this post by Eugene Grosbein

It's not difficult to find RFC 2131 or the book by Droms and Lemon.

In particular, section 4.3.1 of the RFC seems to make it clear that the choice of a client IP address in a case like the one you describe is beyond the scope of the DHCP protocol.

Best regards,
Niall O'Reilly



On 2 May 2016 20:22:06 BST, Eugene Grosbein <[hidden email]> wrote:
03.05.2016 1:00, Niall O'Reilly пишет:

I would say that a relay which sets the same gwaddr on different VLANs is definitely NOT working
oer-VLAN, and that using "ip unnumbered" is incompatible with correct operation of DHCP.

Perhaps. Would you kindly provide any references to standards
or DHCP server documentation for restrictions on gwddr?


--
Sent from Kaiten Mail. Please excuse my brevity.
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Failure of dhcp server failover

Eugene Grosbein
03.05.2016 3:15, Niall O'Reilly пишет:
> It's not difficult to find RFC 2131 or the book by Droms and Lemon.
>
> In particular, section 4.3.1 of the RFC seems to make it clear that the choice of a client IP address in a case like the one you describe is beyond the scope of the DHCP protocol.

"beyond the scope of the DHCP protocol" != "incompatible with correct operation of DHCP".

In fact, this section 4.3.1 explicitly notes that "... it may be
the case that the DHCP client should be assigned an address from
a different subnet than the address recorded in 'giaddr'".

This is exactly my case.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

RE: Failure of dhcp server failover

Patrick Trapp
This conversation appears to have gone in another direction. Is there any value left in comparing my config to what Eugene originally shared? I've been a bit tied up, I'm afraid.

________________________________________
From: [hidden email] [[hidden email]] on behalf of Eugene Grosbein [[hidden email]]
Sent: Monday, May 02, 2016 3:28 PM
To: Niall O'Reilly
Cc: Users of ISC DHCP
Subject: Re: Failure of dhcp server failover

03.05.2016 3:15, Niall O'Reilly пишет:
> It's not difficult to find RFC 2131 or the book by Droms and Lemon.
>
> In particular, section 4.3.1 of the RFC seems to make it clear that the choice of a client IP address in a case like the one you describe is beyond the scope of the DHCP protocol.

"beyond the scope of the DHCP protocol" != "incompatible with correct operation of DHCP".

In fact, this section 4.3.1 explicitly notes that "... it may be
the case that the DHCP client should be assigned an address from
a different subnet than the address recorded in 'giaddr'".

This is exactly my case.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Failure of dhcp server failover

Eugene Grosbein
In reply to this post by Simon Hobson
03.05.2016 2:44, Simon Hobson пишет:

> Wel it is well known that very small address ranges "do not work well" in failover situations.
>  It's hard to balance free leases between two servers when there is only one lease !

In face, it is not :-) The server in failover mode knows if it is "primary" or not
and could control last of single lease of a pool only if it is primary.

> Since each device-address mapping is mapping a single entity to a single address,
>  I don't see what failover brings to the party other than problems.
>  You could simply define the same (non-failover) single address pool on both servers and it'll work fine.
>  On initial setup, the client will get two identical offers - one from each server -
>  but after that it will simply renew with the server it accepted an offer from.
> If that server goes down, the other server will be able to give it the same address without having to involve failover.

Just like all brilliant!

I was under (wrong) impression that failover mode is global thing and completely missed
the fact that each pool has its own "failover" settings. I'm switching all my single address pools
to non-failover configuration and turining second server on again. Will see if it helps.


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Failure of dhcp server failover

Eugene Grosbein
In reply to this post by Patrick Trapp
On 03.05.2016 03:34, Patrick Trapp wrote:
> This conversation appears to have gone in another direction. Is there any value left in comparing my config to what Eugene originally shared? I've been a bit tied up, I'm afraid.

I've switched my single address pools to non-failover mode and the problem has gone. Case closed.

Thanks all for sharing ideas.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Failure of dhcp server failover

Simon Hobson
Eugene Grosbein <[hidden email]> wrote:

> I've switched my single address pools to non-failover mode and the problem has gone. Case closed.

Just to finish this up for the benefit of anyone searching the archives later ...

What I think was happening in this case is that with only one address in a pool, failover just "doesn't work properly". Only one server can hold that lease, and if it decides to load balance* the query to the other server then neither will reply to the client - one server doesn't reply because it's going to leave the other one to do it, but the other one doesn't reply because it hasn't got a free lease.

This would appear to be just a corner case where the people who designed and built the failover system didn't envisage it being used for very very small (in this case, one address) pools. In any pool with 2 or more free addresses this wouldn't be a problem as the free leases would be balanced between the peers and both would have at least one free lease.

And the answer in this case was to realise that there's no need for failover when a single address pool is tied to a specific client - and as is done with host statements, simply put the same (non-failover) config on both servers.


* Perhaps someone with greater knowledge of how the load balancing works could check. I suspect the decision is done on a simple hash, and it's pot luck whether the server with the free lease will get to answer the query.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Failure of dhcp server failover

John Wobus
On May 4, 2016, at 7:40 AM, Simon Hobson <[hidden email]> wrote:
> Eugene Grosbein <[hidden email]> wrote:
>
>> I've switched my single address pools to non-failover mode and the problem has gone. Case closed.
>
> Just to finish this up for the benefit of anyone searching the archives later ...
>
> What I think was happening in this case is that with only one address in a pool, failover just "doesn't work properly". Only one server can hold that lease, and if it decides to load balance* the query to the other server then neither will reply to the client - one server doesn't reply because it's going to leave the other one to do it, but the other one doesn't reply because it hasn't got a free lease.

I believe DHCP failover was designed to give you redundancy assuming you can configure sufficient extra addresses.
It’s designed to continue working even if the two servers lose contact with each other and no longer
know whether the other is up, even if clients are still talking to them them both, and despite this,
the servers still never ever give out the same address to two different clients.  To do this, they used
the strategy of assigning a server to “own” any particular unused address at any time.

You can’t have all this without IP address slack.  They could have chosen other factors as primary,
but they chose: “never give the same IP” plus “work even if the servers lose touch”.  The cost is in having
more addresses in the pools than the number you need to keep working during partial outages.

Twice the number of clients the pool will ever see should be sufficient, but usually overkill.  A rough size guess
would be “double the pool's maximum number of clients” minus “minimum number of clients the pool ever sees”.

John Wobus
Cornell IT
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
12