Peer rebalancing problems

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Peer rebalancing problems

Norman Elton
I've seen references to this in previous posts, but no clear
resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
in a failover pair. I discovered this morning that one server was
stuck in "communications-interrupted" state. Turns out there were two
dhcpd processes running simultaneously. Not sure how that happened,
but shockingly, it wasn't happy.

I've restarted both servers, we're back in normal failover state. But
one of my subnets is still not balancing out:

landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
free 58  backup 320  lts -131  max-own (+/-)38
landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
free 58  backup 320  lts -131  max-misbal 57
landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
free 353  backup 0  lts -176  max-own (+/-)35  (requesting peer
rebalance!)
landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
free 353  backup 0  lts -176  max-misbal 53

It seems a little strange that both servers have a negative LTS value.
And that they're so different. Is this explainable somehow?

Thanks,

Norman
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Peer rebalancing problems

Norman Elton
Sorry, I just discovered this nugget:

landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
Peer may be out of leases or database inconsistent.

I will start googling and post if I discover anything.

Thanks,

Norman

On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:

>
> I've seen references to this in previous posts, but no clear
> resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
> in a failover pair. I discovered this morning that one server was
> stuck in "communications-interrupted" state. Turns out there were two
> dhcpd processes running simultaneously. Not sure how that happened,
> but shockingly, it wasn't happy.
>
> I've restarted both servers, we're back in normal failover state. But
> one of my subnets is still not balancing out:
>
> landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
> free 58  backup 320  lts -131  max-own (+/-)38
> landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
> free 58  backup 320  lts -131  max-misbal 57
> landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
> free 353  backup 0  lts -176  max-own (+/-)35  (requesting peer
> rebalance!)
> landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
> free 353  backup 0  lts -176  max-misbal 53
>
> It seems a little strange that both servers have a negative LTS value.
> And that they're so different. Is this explainable somehow?
>
> Thanks,
>
> Norman
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Peer rebalancing problems

Bob Harold
I remove the subnet from failover, so it only has one DHCP server, then after the servers settle, add failover back in.  It is a pain, but I have not found a better solution.

--
Bob Harold



On Tue, Sep 3, 2019 at 3:44 PM Norman Elton <[hidden email]> wrote:
Sorry, I just discovered this nugget:

landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
Peer may be out of leases or database inconsistent.

I will start googling and post if I discover anything.

Thanks,

Norman

On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:
>
> I've seen references to this in previous posts, but no clear
> resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
> in a failover pair. I discovered this morning that one server was
> stuck in "communications-interrupted" state. Turns out there were two
> dhcpd processes running simultaneously. Not sure how that happened,
> but shockingly, it wasn't happy.
>
> I've restarted both servers, we're back in normal failover state. But
> one of my subnets is still not balancing out:
>
> landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
> free 58  backup 320  lts -131  max-own (+/-)38
> landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
> free 58  backup 320  lts -131  max-misbal 57
> landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
> free 353  backup 0  lts -176  max-own (+/-)35  (requesting peer
> rebalance!)
> landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
> free 353  backup 0  lts -176  max-misbal 53
>
> It seems a little strange that both servers have a negative LTS value.
> And that they're so different. Is this explainable somehow?
>
> Thanks,
>
> Norman
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Peer rebalancing problems

Norman Elton
Just to confirm ... remove the failover declaration from one server,
and the entire subnet from the other server?

Norman

On Tue, Sep 3, 2019 at 4:02 PM Bob Harold <[hidden email]> wrote:

>
> I remove the subnet from failover, so it only has one DHCP server, then after the servers settle, add failover back in.  It is a pain, but I have not found a better solution.
>
> --
> Bob Harold
>
>
>
> On Tue, Sep 3, 2019 at 3:44 PM Norman Elton <[hidden email]> wrote:
>>
>> Sorry, I just discovered this nugget:
>>
>> landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
>> Peer may be out of leases or database inconsistent.
>>
>> I will start googling and post if I discover anything.
>>
>> Thanks,
>>
>> Norman
>>
>> On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:
>> >
>> > I've seen references to this in previous posts, but no clear
>> > resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
>> > in a failover pair. I discovered this morning that one server was
>> > stuck in "communications-interrupted" state. Turns out there were two
>> > dhcpd processes running simultaneously. Not sure how that happened,
>> > but shockingly, it wasn't happy.
>> >
>> > I've restarted both servers, we're back in normal failover state. But
>> > one of my subnets is still not balancing out:
>> >
>> > landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
>> > free 58  backup 320  lts -131  max-own (+/-)38
>> > landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
>> > free 58  backup 320  lts -131  max-misbal 57
>> > landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
>> > free 353  backup 0  lts -176  max-own (+/-)35  (requesting peer
>> > rebalance!)
>> > landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
>> > free 353  backup 0  lts -176  max-misbal 53
>> >
>> > It seems a little strange that both servers have a negative LTS value.
>> > And that they're so different. Is this explainable somehow?
>> >
>> > Thanks,
>> >
>> > Norman
>> _______________________________________________
>> dhcp-users mailing list
>> [hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Peer rebalancing problems

Bob Harold
Yes.  (I use a managed solution from BlueCat Networks, but I assume that is what it does under the covers.)

--
Bob Harold



On Tue, Sep 3, 2019 at 4:08 PM Norman Elton <[hidden email]> wrote:
Just to confirm ... remove the failover declaration from one server,
and the entire subnet from the other server?

Norman

On Tue, Sep 3, 2019 at 4:02 PM Bob Harold <[hidden email]> wrote:
>
> I remove the subnet from failover, so it only has one DHCP server, then after the servers settle, add failover back in.  It is a pain, but I have not found a better solution.
>
> --
> Bob Harold
>
>
>
> On Tue, Sep 3, 2019 at 3:44 PM Norman Elton <[hidden email]> wrote:
>>
>> Sorry, I just discovered this nugget:
>>
>> landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
>> Peer may be out of leases or database inconsistent.
>>
>> I will start googling and post if I discover anything.
>>
>> Thanks,
>>
>> Norman
>>
>> On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:
>> >
>> > I've seen references to this in previous posts, but no clear
>> > resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
>> > in a failover pair. I discovered this morning that one server was
>> > stuck in "communications-interrupted" state. Turns out there were two
>> > dhcpd processes running simultaneously. Not sure how that happened,
>> > but shockingly, it wasn't happy.
>> >
>> > I've restarted both servers, we're back in normal failover state. But
>> > one of my subnets is still not balancing out:
>> >
>> > landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
>> > free 58  backup 320  lts -131  max-own (+/-)38
>> > landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
>> > free 58  backup 320  lts -131  max-misbal 57
>> > landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
>> > free 353  backup 0  lts -176  max-own (+/-)35  (requesting peer
>> > rebalance!)
>> > landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
>> > free 353  backup 0  lts -176  max-misbal 53
>> >
>> > It seems a little strange that both servers have a negative LTS value.
>> > And that they're so different. Is this explainable somehow?
>> >
>> > Thanks,
>> >
>> > Norman
>> _______________________________________________
>> dhcp-users mailing list
>> [hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Peer rebalancing problems

Chris Buxton
In reply to this post by Norman Elton
You may also need to clean up the dhcpd.leases file, to remove failover-related states and such.

Regards,
Chris Buxton

> On Sep 3, 2019, at 1:07 PM, Norman Elton <[hidden email]> wrote:
>
> Just to confirm ... remove the failover declaration from one server,
> and the entire subnet from the other server?
>
> Norman
>
> On Tue, Sep 3, 2019 at 4:02 PM Bob Harold <[hidden email]> wrote:
>>
>> I remove the subnet from failover, so it only has one DHCP server, then after the servers settle, add failover back in.  It is a pain, but I have not found a better solution.
>>
>> --
>> Bob Harold
>>
>>
>>
>> On Tue, Sep 3, 2019 at 3:44 PM Norman Elton <[hidden email]> wrote:
>>>
>>> Sorry, I just discovered this nugget:
>>>
>>> landlord01: peer wm-dhcp-01-02: Got POOLREQ, answering negatively!
>>> Peer may be out of leases or database inconsistent.
>>>
>>> I will start googling and post if I discover anything.
>>>
>>> Thanks,
>>>
>>> Norman
>>>
>>> On Tue, Sep 3, 2019 at 3:41 PM Norman Elton <[hidden email]> wrote:
>>>>
>>>> I've seen references to this in previous posts, but no clear
>>>> resolution. I've got two RHEL6 boxes (dhcp-4.1.1-63.P1.el6_10) setup
>>>> in a failover pair. I discovered this morning that one server was
>>>> stuck in "communications-interrupted" state. Turns out there were two
>>>> dhcpd processes running simultaneously. Not sure how that happened,
>>>> but shockingly, it wasn't happy.
>>>>
>>>> I've restarted both servers, we're back in normal failover state. But
>>>> one of my subnets is still not balancing out:
>>>>
>>>> landlord01: balancing pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
>>>> free 58  backup 320  lts -131  max-own (+/-)38
>>>> landlord01: balanced pool 55814b7e0ad0 WIRELESS-FACSTAFF  total 2970
>>>> free 58  backup 320  lts -131  max-misbal 57
>>>> landlord02: balancing pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
>>>> free 353  backup 0  lts -176  max-own (+/-)35  (requesting peer
>>>> rebalance!)
>>>> landlord02: balanced pool 55d8e05a4aa0 WIRELESS-FACSTAFF  total 2970
>>>> free 353  backup 0  lts -176  max-misbal 53
>>>>
>>>> It seems a little strange that both servers have a negative LTS value.
>>>> And that they're so different. Is this explainable somehow?
>>>>
>>>> Thanks,
>>>>
>>>> Norman
>>> _______________________________________________
>>> dhcp-users mailing list
>>> [hidden email]
>>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>
>> _______________________________________________
>> dhcp-users mailing list
>> [hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
>

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Peer rebalancing problems

Simon Hobson
Chris Buxton <[hidden email]> wrote:

>> Just to confirm ... remove the failover declaration from one server,
>> and the entire subnet from the other server?

> You may also need to clean up the dhcpd.leases file, to remove failover-related states and such.

Won't that be automatic ?
Removing the subnet entirely from one server will cause it to delete all the lease information it holds for that subnet. Won't removing the failover declaration from the other server trigger it to remove the failover state from it's leases ?

It's a question, not a statement dressed as a question. I haven't used failover - but I have seen leases cleaned out on removing a subnet (or reducing a pool/range) declaration. So removing failover stuff from leases would be a logical thing to happen if the failover declaration is removed.

Simon

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Peer rebalancing problems

Chris Buxton


> On Sep 4, 2019, at 10:41 AM, Simon Hobson <[hidden email]> wrote:
>
> Chris Buxton <[hidden email]> wrote:
>
>>> Just to confirm ... remove the failover declaration from one server,
>>> and the entire subnet from the other server?
>
>> You may also need to clean up the dhcpd.leases file, to remove failover-related states and such.
>
> Won't that be automatic ?
> Removing the subnet entirely from one server will cause it to delete all the lease information it holds for that subnet. Won't removing the failover declaration from the other server trigger it to remove the failover state from it's leases ?
>
> It's a question, not a statement dressed as a question. I haven't used failover - but I have seen leases cleaned out on removing a subnet (or reducing a pool/range) declaration. So removing failover stuff from leases would be a logical thing to happen if the failover declaration is removed.

Yes, it gets cleaned out on the server that no longer serves the subnet. However, we (BlueCat) have seen that not get cleaned up in the past on the remaining server. Specifically for the procedure under discussion, that leftover cruft causes the rebuild of the failover relationship to fail.

That may have been fixed in a later version, though. I'm not sure if it's still an issue.

Regards,
Chris Buxton
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users