ISC DHCP Users

Peer rebalancing problems

Classic

List

Threaded

8 messages Options

Norman Elton

Peer rebalancing problems

I've seen references to this in previous posts, but no clear
resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
in a failover pair. I discovered this morning that one server was
stuck in "communications-interrupted" state. Turns out there were two
dhcpd processes running simultaneously. Not sure how that happened,
but shockingly, it wasn't happy.

I've restarted both servers, we're back in normal failover state. But
one of my subnets is still not balancing out:

landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
free 58 backup 320 lts -131 max-own (+/-)38
landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
free 58 backup 320 lts -131 max-misbal 57
landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
free 353 backup 0 lts -176 max-own (+/-)35 (requesting peer
rebalance!)
landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
free 353 backup 0 lts -176 max-misbal 53

It seems a little strange that both servers have a negative LTS value.
And that they're so different. Is this explainable somehow?

Thanks,

Norman
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Norman Elton

Re: Peer rebalancing problems

Sorry, I just discovered this nugget:

landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
Peer may be out of leases or database inconsistent.

I will start googling and post if I discover anything.

Thanks,

Norman

On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:

>
> I've seen references to this in previous posts, but no clear
> resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
> in a failover pair. I discovered this morning that one server was
> stuck in "communications-interrupted" state. Turns out there were two
> dhcpd processes running simultaneously. Not sure how that happened,
> but shockingly, it wasn't happy.
>
> I've restarted both servers, we're back in normal failover state. But
> one of my subnets is still not balancing out:
>
> landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
> free 58 backup 320 lts -131 max-own (+/-)38
> landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
> free 58 backup 320 lts -131 max-misbal 57
> landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
> free 353 backup 0 lts -176 max-own (+/-)35 (requesting peer
> rebalance!)
> landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
> free 353 backup 0 lts -176 max-misbal 53
>
> It seems a little strange that both servers have a negative LTS value.
> And that they're so different. Is this explainable somehow?
>
> Thanks,
>
> Norman

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Bob Harold

Re: Peer rebalancing problems

I remove the subnet from failover, so it only has one DHCP server, then after the servers settle, add failover back in. It is a pain, but I have not found a better solution.

--
Bob Harold

On Tue, Sep 3, 2019 at 3:44 PM Norman Elton <[hidden email]> wrote:

Sorry, I just discovered this nugget:

landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
Peer may be out of leases or database inconsistent.

I will start googling and post if I discover anything.

Thanks,

Norman

On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:
>
> I've seen references to this in previous posts, but no clear
> resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
> in a failover pair. I discovered this morning that one server was
> stuck in "communications-interrupted" state. Turns out there were two
> dhcpd processes running simultaneously. Not sure how that happened,
> but shockingly, it wasn't happy.
>
> I've restarted both servers, we're back in normal failover state. But
> one of my subnets is still not balancing out:
>
> landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
> free 58 backup 320 lts -131 max-own (+/-)38
> landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
> free 58 backup 320 lts -131 max-misbal 57
> landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
> free 353 backup 0 lts -176 max-own (+/-)35 (requesting peer
> rebalance!)
> landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
> free 353 backup 0 lts -176 max-misbal 53
>
> It seems a little strange that both servers have a negative LTS value.
> And that they're so different. Is this explainable somehow?
>
> Thanks,
>
> Norman
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Norman Elton

Re: Peer rebalancing problems

Just to confirm ... remove the failover declaration from one server,
and the entire subnet from the other server?

Norman

On Tue, Sep 3, 2019 at 4:02 PM Bob Harold <[hidden email]> wrote:

>
> I remove the subnet from failover, so it only has one DHCP server, then after the servers settle, add failover back in. It is a pain, but I have not found a better solution.
>
> --
> Bob Harold
>
>
>
> On Tue, Sep 3, 2019 at 3:44 PM Norman Elton <[hidden email]> wrote:
>>
>> Sorry, I just discovered this nugget:
>>
>> landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
>> Peer may be out of leases or database inconsistent.
>>
>> I will start googling and post if I discover anything.
>>
>> Thanks,
>>
>> Norman
>>
>> On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:
>> >
>> > I've seen references to this in previous posts, but no clear
>> > resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
>> > in a failover pair. I discovered this morning that one server was
>> > stuck in "communications-interrupted" state. Turns out there were two
>> > dhcpd processes running simultaneously. Not sure how that happened,
>> > but shockingly, it wasn't happy.
>> >
>> > I've restarted both servers, we're back in normal failover state. But
>> > one of my subnets is still not balancing out:
>> >
>> > landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
>> > free 58 backup 320 lts -131 max-own (+/-)38
>> > landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
>> > free 58 backup 320 lts -131 max-misbal 57
>> > landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
>> > free 353 backup 0 lts -176 max-own (+/-)35 (requesting peer
>> > rebalance!)
>> > landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
>> > free 353 backup 0 lts -176 max-misbal 53
>> >
>> > It seems a little strange that both servers have a negative LTS value.
>> > And that they're so different. Is this explainable somehow?
>> >
>> > Thanks,
>> >
>> > Norman
>> _______________________________________________
>> dhcp-users mailing list
>> [hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Bob Harold

Re: Peer rebalancing problems

Yes. (I use a managed solution from BlueCat Networks, but I assume that is what it does under the covers.)

--
Bob Harold

On Tue, Sep 3, 2019 at 4:08 PM Norman Elton <[hidden email]> wrote:

Just to confirm ... remove the failover declaration from one server,
and the entire subnet from the other server?

Norman

On Tue, Sep 3, 2019 at 4:02 PM Bob Harold <[hidden email]> wrote:
>
> I remove the subnet from failover, so it only has one DHCP server, then after the servers settle, add failover back in. It is a pain, but I have not found a better solution.
>
> --
> Bob Harold
>
>
>
> On Tue, Sep 3, 2019 at 3:44 PM Norman Elton <[hidden email]> wrote:
>>
>> Sorry, I just discovered this nugget:
>>
>> landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
>> Peer may be out of leases or database inconsistent.
>>
>> I will start googling and post if I discover anything.
>>
>> Thanks,
>>
>> Norman
>>
>> On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:
>> >
>> > I've seen references to this in previous posts, but no clear
>> > resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
>> > in a failover pair. I discovered this morning that one server was
>> > stuck in "communications-interrupted" state. Turns out there were two
>> > dhcpd processes running simultaneously. Not sure how that happened,
>> > but shockingly, it wasn't happy.
>> >
>> > I've restarted both servers, we're back in normal failover state. But
>> > one of my subnets is still not balancing out:
>> >
>> > landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
>> > free 58 backup 320 lts -131 max-own (+/-)38
>> > landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
>> > free 58 backup 320 lts -131 max-misbal 57
>> > landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
>> > free 353 backup 0 lts -176 max-own (+/-)35 (requesting peer
>> > rebalance!)
>> > landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
>> > free 353 backup 0 lts -176 max-misbal 53
>> >
>> > It seems a little strange that both servers have a negative LTS value.
>> > And that they're so different. Is this explainable somehow?
>> >
>> > Thanks,
>> >
>> > Norman
>> _______________________________________________
>> dhcp-users mailing list
>> [hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Chris Buxton

Re: Peer rebalancing problems

In reply to this post by Norman Elton

You may also need to clean up the dhcpd.leases file, to remove failover-related states and such.

Regards,
Chris Buxton

> On Sep 3, 2019, at 1:07 PM, Norman Elton <[hidden email]> wrote:
>
> Just to confirm ... remove the failover declaration from one server,
> and the entire subnet from the other server?
>
> Norman
>
> On Tue, Sep 3, 2019 at 4:02 PM Bob Harold <[hidden email]> wrote:
>>
>> I remove the subnet from failover, so it only has one DHCP server, then after the servers settle, add failover back in. It is a pain, but I have not found a better solution.
>>
>> --
>> Bob Harold
>>
>>
>>
>> On Tue, Sep 3, 2019 at 3:44 PM Norman Elton <[hidden email]> wrote:
>>>
>>> Sorry, I just discovered this nugget:
>>>
>>> landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
>>> Peer may be out of leases or database inconsistent.
>>>
>>> I will start googling and post if I discover anything.
>>>
>>> Thanks,
>>>
>>> Norman
>>>
>>> On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:
>>>>
>>>> I've seen references to this in previous posts, but no clear
>>>> resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
>>>> in a failover pair. I discovered this morning that one server was
>>>> stuck in "communications-interrupted" state. Turns out there were two
>>>> dhcpd processes running simultaneously. Not sure how that happened,
>>>> but shockingly, it wasn't happy.
>>>>
>>>> I've restarted both servers, we're back in normal failover state. But
>>>> one of my subnets is still not balancing out:
>>>>
>>>> landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
>>>> free 58 backup 320 lts -131 max-own (+/-)38
>>>> landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF total 2970
>>>> free 58 backup 320 lts -131 max-misbal 57
>>>> landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
>>>> free 353 backup 0 lts -176 max-own (+/-)35 (requesting peer
>>>> rebalance!)
>>>> landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF total 2970
>>>> free 353 backup 0 lts -176 max-misbal 53
>>>>
>>>> It seems a little strange that both servers have a negative LTS value.
>>>> And that they're so different. Is this explainable somehow?
>>>>
>>>> Thanks,
>>>>
>>>> Norman
>>> _______________________________________________
>>> dhcp-users mailing list
>>> [hidden email]
>>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>
>> _______________________________________________
>> dhcp-users mailing list
>> [hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
>

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Simon Hobson

Re: Peer rebalancing problems

Chris Buxton <[hidden email]> wrote:

>> Just to confirm ... remove the failover declaration from one server,
>> and the entire subnet from the other server?

> You may also need to clean up the dhcpd.leases file, to remove failover-related states and such.

Won't that be automatic ?
Removing the subnet entirely from one server will cause it to delete all the lease information it holds for that subnet. Won't removing the failover declaration from the other server trigger it to remove the failover state from it's leases ?

It's a question, not a statement dressed as a question. I haven't used failover - but I have seen leases cleaned out on removing a subnet (or reducing a pool/range) declaration. So removing failover stuff from leases would be a logical thing to happen if the failover declaration is removed.

Simon

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

Chris Buxton

Re: Peer rebalancing problems

> On Sep 4, 2019, at 10:41 AM, Simon Hobson <[hidden email]> wrote:
>
> Chris Buxton <[hidden email]> wrote:
>
>>> Just to confirm ... remove the failover declaration from one server,
>>> and the entire subnet from the other server?
>
>> You may also need to clean up the dhcpd.leases file, to remove failover-related states and such.
>
> Won't that be automatic ?
> Removing the subnet entirely from one server will cause it to delete all the lease information it holds for that subnet. Won't removing the failover declaration from the other server trigger it to remove the failover state from it's leases ?
>
> It's a question, not a statement dressed as a question. I haven't used failover - but I have seen leases cleaned out on removing a subnet (or reducing a pool/range) declaration. So removing failover stuff from leases would be a logical thing to happen if the failover declaration is removed.

Yes, it gets cleaned out on the server that no longer serves the subnet. However, we (BlueCat) have seen that not get cleaned up in the past on the remaining server. Specifically for the procedure under discussion, that leftover cruft causes the rebuild of the failover relationship to fail.

That may have been fixed in a later version, though. I'm not sure if it's still an issue.

Regards,
Chris Buxton
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users