eth0: not responding (recovering)

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

eth0: not responding (recovering)

Teja
Hi I have an issue in the lease flow with isc dhcp service. In the logs it is printing   eth0: not responding (recovering) 
My local is set up with active-active mode(splt value as 50-50%) and because of some reason one of the appliance  went down for some duration. I observed this and i bring it up, and duration of down is nearly 15hr.
After i bring it up. I am seeing the logs saying not responding (recovering). Its been more than two hours still I am getting the same logs  
Does any one have any idea about this scenario and how to get the environment stable

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: eth0: not responding (recovering)

Bill Shirley-2

Assuming you're referring to DHCP failover, is there any traffic flow on the
port and peer port in the failover stanza?

What is your value for mclt?

Which server, primary or secondary, is giving the recovering message?

Bill

On 10/5/2019 9:33 AM, Surya Teja wrote:
Hi I have an issue in the lease flow with isc dhcp service. In the logs it is printing   eth0: not responding (recovering) 
My local is set up with active-active mode(splt value as 50-50%) and because of some reason one of the appliance  went down for some duration. I observed this and i bring it up, and duration of down is nearly 15hr.
After i bring it up. I am seeing the logs saying not responding (recovering). Its been more than two hours still I am getting the same logs  
Does any one have any idea about this scenario and how to get the environment stable

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: eth0: not responding (recovering)

Teja
Hi Bill Thanks for your reply, 
Yes I see traffic on the peer ports which i mentioned in the fail over section of my configuration file. 
My mclt value is 1800(30 min).
I am seeing these issues on the failover server and some times I see the logs saying peer hold all free leases, but that scope is not completely full with active entries in the dhcpd.lease file of that specified server

And one more strange thing I observerd in the lease file. In the file I have statements like my status and peer status. In that peer status is saying unknown 
When will this happen? In general scenario it should be normal that is what i got from internet, but the state is not getting updated in the lease file.

On Sat, 5 Oct 2019, 21:16 Bill Shirley, <[hidden email]> wrote:

Assuming you're referring to DHCP failover, is there any traffic flow on the
port and peer port in the failover stanza?

What is your value for mclt?

Which server, primary or secondary, is giving the recovering message?

Bill

On 10/5/2019 9:33 AM, Surya Teja wrote:
Hi I have an issue in the lease flow with isc dhcp service. In the logs it is printing   eth0: not responding (recovering) 
My local is set up with active-active mode(splt value as 50-50%) and because of some reason one of the appliance  went down for some duration. I observed this and i bring it up, and duration of down is nearly 15hr.
After i bring it up. I am seeing the logs saying not responding (recovering). Its been more than two hours still I am getting the same logs  
Does any one have any idea about this scenario and how to get the environment stable

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: eth0: not responding (recovering)

Teja
I am facing weird situation with fail over setup on my lab environment. I am facing issue when the failover dhcp appliance is added to my existing server.
For the first time when i add failover to primary appliance,  On primary appliance lease file i see the partner state as unknown and in the failover the messages are printing not responding recovery, so i shutdown the failover appliance and removed the failover config section from primary and restarted primary then it was working fine.  
As a trial of second attempt i increased mclt value to 3600 and added the failover section back to primary config and bring up the failover server now.
The environment became most problematic none of the servers are granting leases to devices On the primary lease file it says 
my state partner-down
peer state recovery 

Why do we get these recovery,partner down, unknown status when i add the failover to my environment?
Or do we have any best practice steps how to add failover to existing server without causing any outages?

Any help would be appreciated 
Thanks 



On Sun, 6 Oct 2019, 21:04 Surya Teja, <[hidden email]> wrote:
Hi Bill Thanks for your reply, 
Yes I see traffic on the peer ports which i mentioned in the fail over section of my configuration file. 
My mclt value is 1800(30 min).
I am seeing these issues on the failover server and some times I see the logs saying peer hold all free leases, but that scope is not completely full with active entries in the dhcpd.lease file of that specified server

And one more strange thing I observerd in the lease file. In the file I have statements like my status and peer status. In that peer status is saying unknown 
When will this happen? In general scenario it should be normal that is what i got from internet, but the state is not getting updated in the lease file.

On Sat, 5 Oct 2019, 21:16 Bill Shirley, <[hidden email]> wrote:

Assuming you're referring to DHCP failover, is there any traffic flow on the
port and peer port in the failover stanza?

What is your value for mclt?

Which server, primary or secondary, is giving the recovering message?

Bill

On 10/5/2019 9:33 AM, Surya Teja wrote:
Hi I have an issue in the lease flow with isc dhcp service. In the logs it is printing   eth0: not responding (recovering) 
My local is set up with active-active mode(splt value as 50-50%) and because of some reason one of the appliance  went down for some duration. I observed this and i bring it up, and duration of down is nearly 15hr.
After i bring it up. I am seeing the logs saying not responding (recovering). Its been more than two hours still I am getting the same logs  
Does any one have any idea about this scenario and how to get the environment stable

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: eth0: not responding (recovering)

Bill Shirley-2

I non-failover DHCP server doesn't have any "failover peer" stanza in the /var/log/dhcpd/dhcpd.leases
file.  A working failover lease file has at the top:
failover peer "dhcp-failover" state {
  my state normal at 4 2019/10/10 05:05:04;
  partner state normal at 5 2011/09/02 23:51:25;
}

Try shutting down both the primary and secondary servers, remove the "failover peer" stanza from
both of their lease files, and then bring up the primary with the failover configuration.  Ensure it is
handing out leases, then bring up the secondary with the failover configuration.  Then check that
all is working correctly.

Bill

On 10/9/2019 1:32 PM, Surya Teja wrote:
I am facing weird situation with fail over setup on my lab environment. I am facing issue when the failover dhcp appliance is added to my existing server.
For the first time when i add failover to primary appliance,  On primary appliance lease file i see the partner state as unknown and in the failover the messages are printing not responding recovery, so i shutdown the failover appliance and removed the failover config section from primary and restarted primary then it was working fine.  
As a trial of second attempt i increased mclt value to 3600 and added the failover section back to primary config and bring up the failover server now.
The environment became most problematic none of the servers are granting leases to devices On the primary lease file it says 
my state partner-down
peer state recovery 

Why do we get these recovery,partner down, unknown status when i add the failover to my environment?
Or do we have any best practice steps how to add failover to existing server without causing any outages?

Any help would be appreciated 
Thanks 



On Sun, 6 Oct 2019, 21:04 Surya Teja, <[hidden email]> wrote:
Hi Bill Thanks for your reply, 
Yes I see traffic on the peer ports which i mentioned in the fail over section of my configuration file. 
My mclt value is 1800(30 min).
I am seeing these issues on the failover server and some times I see the logs saying peer hold all free leases, but that scope is not completely full with active entries in the dhcpd.lease file of that specified server

And one more strange thing I observerd in the lease file. In the file I have statements like my status and peer status. In that peer status is saying unknown 
When will this happen? In general scenario it should be normal that is what i got from internet, but the state is not getting updated in the lease file.

On Sat, 5 Oct 2019, 21:16 Bill Shirley, <[hidden email]> wrote:

Assuming you're referring to DHCP failover, is there any traffic flow on the
port and peer port in the failover stanza?

What is your value for mclt?

Which server, primary or secondary, is giving the recovering message?

Bill

On 10/5/2019 9:33 AM, Surya Teja wrote:
Hi I have an issue in the lease flow with isc dhcp service. In the logs it is printing   eth0: not responding (recovering) 
My local is set up with active-active mode(splt value as 50-50%) and because of some reason one of the appliance  went down for some duration. I observed this and i bring it up, and duration of down is nearly 15hr.
After i bring it up. I am seeing the logs saying not responding (recovering). Its been more than two hours still I am getting the same logs  
Does any one have any idea about this scenario and how to get the environment stable

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: eth0: not responding (recovering)

Teja
Hi Bill 
I non-failover DHCP server doesn't have any "failover peer" stanza in the /var/log/dhcpd/dhcpd.leases file ----> 
Sorry for typo in my previous email 
The environment became most problematic none of the servers are granting leases to devices On the primary lease file it says 
my state partner-down
peer state recovery 
----> Its not peer it is  partner state 

 A working failover lease file has at the top ----->
I can see at multiple places in the dhcpd.lease file specifying about these states appended with time stamp saying like 
failover peer "dhcp-peer-workspace1" state {
  my state recover at 3 2019/10/09 14:23:41;
  partner state unknown-state at 3 2019/10/09 14:23:41;
}

failover peer "dhcp-peer- workspace1 " state {
  my state recover at 3 2019/10/09 14:23:41;
  partner state unknown-state at 3 2019/10/09 14:23:41;
}
server-duid "\000\001\000\001%0\251\355\000PV\207D\342";
failover peer "dhcp-peer- workspace1 " state {
  my state recover at 3 2019/10/09 14:23:41;
  partner state unknown-state at 3 2019/10/09 14:23:41;
}
 Try shutting down both the primary and secondary servers, remove the "failover peer" stanza---->
Yes I tried this,I shut down the failover and on primary,  I removed the failover config part totally from the config file
Stopped the dhcpd and deleted the lease file and again touch the lease file then restarted the DHCP it worked as expected,

The moment I bring the failover appliances up and add the failover section  to the primary config file and restart the dhcpd  on the primary  the issue starts.
First the failover logs says it is in recovery mode ok, So i thought as it has to sync the primary it is in recovery mode. After some span of 
time on the primary appliance lease file I see 
my state partner-down
partner state recovery
Thus comes the issues, and these are running in for ever condition the status are not getting updated in any of the appliance lease file
 Is it ok if I edit the lease file manually and make it normal ?


On Thu, Oct 10, 2019 at 9:35 PM Bill Shirley <[hidden email]> wrote:

I non-failover DHCP server doesn't have any "failover peer" stanza in the /var/log/dhcpd/dhcpd.leases
file.  A working failover lease file has at the top:
failover peer "dhcp-failover" state {
  my state normal at 4 2019/10/10 05:05:04;
  partner state normal at 5 2011/09/02 23:51:25;
}

Try shutting down both the primary and secondary servers, remove the "failover peer" stanza from
both of their lease files, and then bring up the primary with the failover configuration.  Ensure it is
handing out leases, then bring up the secondary with the failover configuration.  Then check that
all is working correctly.

Bill

On 10/9/2019 1:32 PM, Surya Teja wrote:
I am facing weird situation with fail over setup on my lab environment. I am facing issue when the failover dhcp appliance is added to my existing server.
For the first time when i add failover to primary appliance,  On primary appliance lease file i see the partner state as unknown and in the failover the messages are printing not responding recovery, so i shutdown the failover appliance and removed the failover config section from primary and restarted primary then it was working fine.  
As a trial of second attempt i increased mclt value to 3600 and added the failover section back to primary config and bring up the failover server now.
The environment became most problematic none of the servers are granting leases to devices On the primary lease file it says 
my state partner-down
peer state recovery 

Why do we get these recovery,partner down, unknown status when i add the failover to my environment?
Or do we have any best practice steps how to add failover to existing server without causing any outages?

Any help would be appreciated 
Thanks 



On Sun, 6 Oct 2019, 21:04 Surya Teja, <[hidden email]> wrote:
Hi Bill Thanks for your reply, 
Yes I see traffic on the peer ports which i mentioned in the fail over section of my configuration file. 
My mclt value is 1800(30 min).
I am seeing these issues on the failover server and some times I see the logs saying peer hold all free leases, but that scope is not completely full with active entries in the dhcpd.lease file of that specified server

And one more strange thing I observerd in the lease file. In the file I have statements like my status and peer status. In that peer status is saying unknown 
When will this happen? In general scenario it should be normal that is what i got from internet, but the state is not getting updated in the lease file.

On Sat, 5 Oct 2019, 21:16 Bill Shirley, <[hidden email]> wrote:

Assuming you're referring to DHCP failover, is there any traffic flow on the
port and peer port in the failover stanza?

What is your value for mclt?

Which server, primary or secondary, is giving the recovering message?

Bill

On 10/5/2019 9:33 AM, Surya Teja wrote:
Hi I have an issue in the lease flow with isc dhcp service. In the logs it is printing   eth0: not responding (recovering) 
My local is set up with active-active mode(splt value as 50-50%) and because of some reason one of the appliance  went down for some duration. I observed this and i bring it up, and duration of down is nearly 15hr.
After i bring it up. I am seeing the logs saying not responding (recovering). Its been more than two hours still I am getting the same logs  
Does any one have any idea about this scenario and how to get the environment stable

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: eth0: not responding (recovering)

Bill Shirley-2

I wouldn't change the state manually.

For me, after changing the primary configuration to failover, I initally start it with "mclt 60;" to speed
recovery:
mclt                3600;    # not for secondary
#mclt                60;    # use this when deploying a replacement server

If the primary is working, then start the secondary.  When they're both "normal", change
the configuration for the primary to the desired mclt time and restart the primary, then the
secondary.

Bill

On 10/10/2019 1:23 PM, Surya Teja wrote:
Hi Bill 
I non-failover DHCP server doesn't have any "failover peer" stanza in the /var/log/dhcpd/dhcpd.leases file ----> 
Sorry for typo in my previous email 
The environment became most problematic none of the servers are granting leases to devices On the primary lease file it says 
my state partner-down
peer state recovery 
----> Its not peer it is  partner state 

 A working failover lease file has at the top ----->
I can see at multiple places in the dhcpd.lease file specifying about these states appended with time stamp saying like 
failover peer "dhcp-peer-workspace1" state {
  my state recover at 3 2019/10/09 14:23:41;
  partner state unknown-state at 3 2019/10/09 14:23:41;
}

failover peer "dhcp-peer- workspace1 " state {
  my state recover at 3 2019/10/09 14:23:41;
  partner state unknown-state at 3 2019/10/09 14:23:41;
}
server-duid "\000\001\000\001%0\251\355\000PV\207D\342";
failover peer "dhcp-peer- workspace1 " state {
  my state recover at 3 2019/10/09 14:23:41;
  partner state unknown-state at 3 2019/10/09 14:23:41;
}
 Try shutting down both the primary and secondary servers, remove the "failover peer" stanza---->
Yes I tried this,I shut down the failover and on primary,  I removed the failover config part totally from the config file
Stopped the dhcpd and deleted the lease file and again touch the lease file then restarted the DHCP it worked as expected,

The moment I bring the failover appliances up and add the failover section  to the primary config file and restart the dhcpd  on the primary  the issue starts.
First the failover logs says it is in recovery mode ok, So i thought as it has to sync the primary it is in recovery mode. After some span of 
time on the primary appliance lease file I see 
my state partner-down
partner state recovery
Thus comes the issues, and these are running in for ever condition the status are not getting updated in any of the appliance lease file
 Is it ok if I edit the lease file manually and make it normal ?


On Thu, Oct 10, 2019 at 9:35 PM Bill Shirley <[hidden email]> wrote:

I non-failover DHCP server doesn't have any "failover peer" stanza in the /var/log/dhcpd/dhcpd.leases
file.  A working failover lease file has at the top:
failover peer "dhcp-failover" state {
  my state normal at 4 2019/10/10 05:05:04;
  partner state normal at 5 2011/09/02 23:51:25;
}

Try shutting down both the primary and secondary servers, remove the "failover peer" stanza from
both of their lease files, and then bring up the primary with the failover configuration.  Ensure it is
handing out leases, then bring up the secondary with the failover configuration.  Then check that
all is working correctly.

Bill

On 10/9/2019 1:32 PM, Surya Teja wrote:
I am facing weird situation with fail over setup on my lab environment. I am facing issue when the failover dhcp appliance is added to my existing server.
For the first time when i add failover to primary appliance,  On primary appliance lease file i see the partner state as unknown and in the failover the messages are printing not responding recovery, so i shutdown the failover appliance and removed the failover config section from primary and restarted primary then it was working fine.  
As a trial of second attempt i increased mclt value to 3600 and added the failover section back to primary config and bring up the failover server now.
The environment became most problematic none of the servers are granting leases to devices On the primary lease file it says 
my state partner-down
peer state recovery 

Why do we get these recovery,partner down, unknown status when i add the failover to my environment?
Or do we have any best practice steps how to add failover to existing server without causing any outages?

Any help would be appreciated 
Thanks 



On Sun, 6 Oct 2019, 21:04 Surya Teja, <[hidden email]> wrote:
Hi Bill Thanks for your reply, 
Yes I see traffic on the peer ports which i mentioned in the fail over section of my configuration file. 
My mclt value is 1800(30 min).
I am seeing these issues on the failover server and some times I see the logs saying peer hold all free leases, but that scope is not completely full with active entries in the dhcpd.lease file of that specified server

And one more strange thing I observerd in the lease file. In the file I have statements like my status and peer status. In that peer status is saying unknown 
When will this happen? In general scenario it should be normal that is what i got from internet, but the state is not getting updated in the lease file.

On Sat, 5 Oct 2019, 21:16 Bill Shirley, <[hidden email]> wrote:

Assuming you're referring to DHCP failover, is there any traffic flow on the
port and peer port in the failover stanza?

What is your value for mclt?

Which server, primary or secondary, is giving the recovering message?

Bill

On 10/5/2019 9:33 AM, Surya Teja wrote:
Hi I have an issue in the lease flow with isc dhcp service. In the logs it is printing   eth0: not responding (recovering) 
My local is set up with active-active mode(splt value as 50-50%) and because of some reason one of the appliance  went down for some duration. I observed this and i bring it up, and duration of down is nearly 15hr.
After i bring it up. I am seeing the logs saying not responding (recovering). Its been more than two hours still I am getting the same logs  
Does any one have any idea about this scenario and how to get the environment stable

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: eth0: not responding (recovering)

Simon Hobson
In reply to this post by Teja
Surya Teja <[hidden email]> wrote:

> Is it ok if I edit the lease file manually and make it normal ?

I would suggest you try leaving the partner down and explicitly set the master to "partner down" state. AIUI, the master should then enter normal operations as if there were no failover configured. Wait until it is running normally, then start the peer with a clean (empty) leases file.

What should then happen is the peer will transfer the lease info from the master. After this they should both go to normal state - not sure if there's any built in delay forr this.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users