Esoteric question

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Esoteric question

Gregory Sloop
Esoteric question So, this is kind of a wild goose-chase for some direction - but thought there might be some useful answers here.

[But I know it's way out there and I'm not going to get direct help on solving the issue on the platform I'm having issues with - just bear with me and see if you have any helpful ideas.]

Let me set the background.

I'm using specific device hardware - in this case, a Mikrotik RB450G [currently in place] and moving to a Ubiquiti EdgeRouter lite.
They're multi-ethernet interface routers - based on Linux.
The RB450G works fine and simply needs replacement. [The two devices are configured as identically as I can. They're very different, so we're talking "functionally" identical, not literally with the same conf files.]

I'm having issues with DHCPd on the new device. [And queries at Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not terribly surprised.]

Lets assume Eth0/LAN is 10.0.0.1/24
DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
14440 second leases.
Clients are connected directly to a switch that's directly connected to ETH0. [No DHCP relay etc.]

Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
Lets say 1.2.3.5/30
The gateway [not that it matters is 1.2.3.6]

We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24] network to the static public IP on the WAN.

---
So, here's what happens/happened.

I went in to swap out the 'Tik box for the new hardware.
Plug it in, and none of the clients on the LAN get DHCP addresses. All the DHCP clients time out.
After several passes at testing here's what I find.

I can't find any configuration problems on the replacement hardware.
The *old* 'Tik hardware/software works perfectly.

If we have the WAN connected to a simple live ethernet port on the *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally fine.
Only when we plug in the Comcast gateway/modem into the WAN port on the new hardware does DHCP fail/timeout. [Remember just plugging it into a regular ethernet switch works fine. It won't pass traffic, because the static IP assignment isn't right - but the LAN side DHCP server works perfectly.]

If we take a client on the LAN and plug in a static IP [rather than DHCP], traffic flows out to the internet perfectly fine.

Packet caps from the new router show that the router/DHCP server IS seeing all the DHCP protocol handshake. [When it's having the "problem."]
The client does a DISCOVER
Server responds with OFFER
The client responds with REQUEST
Then there's a LONG pause. [like 90s+ worth.]
The Server responds with ACK. [It actually appears to send several ACKS. I probably cut my captures too short, so I only have about 2m of capture in my largest one. But that's what I see in what I have.]
However, the client [Windows in this case] has timed out, and never gets the ACK.
And while I'm not 100% certain, the times I've looked, the device believes it's handed out a lease. [I believe it's in the leases file.] But because of the long delay, the client never actually got the lease.

Again,
-simply unplugging the Comcast modem from the router, and DHCP immediately starts working again.
-Plugging Eth1 into a live ethernet port [so that interface is seen as up] also works fine.
-It's only when connected to the Comcast gateway/modem that it fails.

On the LAN side of the network, we've tinkered replacing the switches - dumb, identically configured managed switches, different manged switch, or no switch at all - simply plugged directly into a single client. No changes on the LAN side make the slightest difference either.

Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak into the LAN - but I've also explicitly defined rules that prevent anything from the WAN getting to the LOCAL or LAN interfaces - other than established/related traffic.

So, I'm not asking for you to solve the issue on this particular hardware. What I'm asking for is some plausible explanation that might have these symptoms. I'm completely at wits end. I've spent a lot of hours trying a whole host of troubleshooting things - but I can't think of any possible way this could be happening. But clearly it is.

IMO, either we have some very weird hardware physical layer problem that only impacts DHCP [and not traffic routing] or there's something I'm missing. I'd normally imagine that I'm missing something - but can't figure out what, if anything.

I've tried to closely define the setup, but I'm sure I've forgotten something - perhaps lots of somethings - just ask and I'll try to clarify any missing pieces.

Given how awesome people on this list are, I'm hopeful someone will have something that might jiggle loose something useful!

TIA
-Greg

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

Patrick Trapp
This is way over my head, but with your thorough description, a question comes to mind - did you by chance take a network capture of the WAN side, just to verify that the new device isn't mistakenly sending the requests out that port when it is available?

Patrick

From: dhcp-users <[hidden email]> on behalf of Gregory Sloop <[hidden email]>
Sent: Monday, September 16, 2019 6:20 PM
To: [hidden email] <[hidden email]>
Subject: Esoteric question
 

CAUTION: This email originated from outside of the company. Do not click links or open attachments unless you recognize the sender and know the content is safe.

So, this is kind of a wild goose-chase for some direction - but thought there might be some useful answers here.

[But I know it's way out there and I'm not going to get direct help on solving the issue on the platform I'm having issues with - just bear with me and see if you have any helpful ideas.]

Let me set the background.

I'm using specific device hardware - in this case, a Mikrotik RB450G [currently in place] and moving to a Ubiquiti EdgeRouter lite.
They're multi-ethernet interface routers - based on Linux.
The RB450G works fine and simply needs replacement. [The two devices are configured as identically as I can. They're very different, so we're talking "functionally" identical, not literally with the same conf files.]

I'm having issues with DHCPd on the new device. [And queries at Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not terribly surprised.]

Lets assume Eth0/LAN is 10.0.0.1/24
DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
14440 second leases.
Clients are connected directly to a switch that's directly connected to ETH0. [No DHCP relay etc.]

Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
Lets say 1.2.3.5/30
The gateway [not that it matters is 1.2.3.6]

We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24] network to the static public IP on the WAN.

---
So, here's what happens/happened.

I went in to swap out the 'Tik box for the new hardware.
Plug it in, and none of the clients on the LAN get DHCP addresses. All the DHCP clients time out.
After several passes at testing here's what I find.

I can't find any configuration problems on the replacement hardware.
The *old* 'Tik hardware/software works perfectly.

If we have the WAN connected to a simple live ethernet port on the *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally fine.
Only when we plug in the Comcast gateway/modem into the WAN port on the new hardware does DHCP fail/timeout. [Remember just plugging it into a regular ethernet switch works fine. It won't pass traffic, because the static IP assignment isn't right - but the LAN side DHCP server works perfectly.]

If we take a client on the LAN and plug in a static IP [rather than DHCP], traffic flows out to the internet perfectly fine.

Packet caps from the new router show that the router/DHCP server IS seeing all the DHCP protocol handshake. [When it's having the "problem."]
The client does a DISCOVER
Server responds with OFFER
The client responds with REQUEST
Then there's a LONG pause. [like 90s+ worth.]
The Server responds with ACK. [It actually appears to send several ACKS. I probably cut my captures too short, so I only have about 2m of capture in my largest one. But that's what I see in what I have.]
However, the client [Windows in this case] has timed out, and never gets the ACK.
And while I'm not 100% certain, the times I've looked, the device believes it's handed out a lease. [I believe it's in the leases file.] But because of the long delay, the client never actually got the lease.

Again,
-simply unplugging the Comcast modem from the router, and DHCP immediately starts working again.
-Plugging Eth1 into a live ethernet port [so that interface is seen as up] also works fine.
-It's only when connected to the Comcast gateway/modem that it fails.

On the LAN side of the network, we've tinkered replacing the switches - dumb, identically configured managed switches, different manged switch, or no switch at all - simply plugged directly into a single client. No changes on the LAN side make the slightest difference either.

Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak into the LAN - but I've also explicitly defined rules that prevent anything from the WAN getting to the LOCAL or LAN interfaces - other than established/related traffic.

So, I'm not asking for you to solve the issue on this particular hardware. What I'm asking for is some plausible explanation that might have these symptoms. I'm completely at wits end. I've spent a lot of hours trying a whole host of troubleshooting things - but I can't think of any possible way this could be happening. But clearly it is.

IMO, either we have some very weird hardware physical layer problem that only impacts DHCP [and not traffic routing] or there's something I'm missing. I'd normally imagine that I'm missing something - but can't figure out what, if anything.

I've tried to closely define the setup, but I'm sure I've forgotten something - perhaps lots of somethings - just ask and I'll try to clarify any missing pieces.

Given how awesome people on this list are, I'm hopeful someone will have something that might jiggle loose something useful!

TIA
-Greg

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

glenn.satchell
In reply to this post by Gregory Sloop
Hi Greg,

A very interesting problem... I've heard good reports about both those
vendor's hardware, so sounds like a reasonable choice.

What do you get if you snoop eth1 while connected to the different WAN
devices? I wonder if dhcpd is trying to talk to something else upstream
(no idea why it would do that).

Does the Ubiquiti have some form of cloud management or call home setup?

Best of luck.

regards,
-glenn

On 2019-09-17 09:20, Gregory Sloop wrote:

> So, this is kind of a wild goose-chase for some direction - but
> thought there might be some useful answers here.
>
> [But I know it's way out there and I'm not going to get direct help on
> solving the issue on the platform I'm having issues with - just bear
> with me and see if you have any helpful ideas.]
>
> Let me set the background.
>
> I'm using specific device hardware - in this case, a Mikrotik RB450G
> [currently in place] and moving to a Ubiquiti EdgeRouter lite.
> They're multi-ethernet interface routers - based on Linux.
> The RB450G works fine and simply needs replacement. [The two devices
> are configured as identically as I can. They're very different, so
> we're talking "functionally" identical, not literally with the same
> conf files.]
>
> I'm having issues with DHCPd on the new device. [And queries at
> Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not
> terribly surprised.]
>
> Lets assume Eth0/LAN is 10.0.0.1/24
> DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
> 14440 second leases.
> Clients are connected directly to a switch that's directly connected
> to ETH0. [No DHCP relay etc.]
>
> Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
> Lets say 1.2.3.5/30
> The gateway [not that it matters is 1.2.3.6]
>
> We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24]
> network to the static public IP on the WAN.
>
> ---
> So, here's what happens/happened.
>
> I went in to swap out the 'Tik box for the new hardware.
> Plug it in, and none of the clients on the LAN get DHCP addresses. All
> the DHCP clients time out.
> After several passes at testing here's what I find.
>
> I can't find any configuration problems on the replacement hardware.
> The *old* 'Tik hardware/software works perfectly.
>
> If we have the WAN connected to a simple live ethernet port on the
> *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally
> fine.
> Only when we plug in the Comcast gateway/modem into the WAN port on
> the new hardware does DHCP fail/timeout. [Remember just plugging it
> into a regular ethernet switch works fine. It won't pass traffic,
> because the static IP assignment isn't right - but the LAN side DHCP
> server works perfectly.]
>
> If we take a client on the LAN and plug in a static IP [rather than
> DHCP], traffic flows out to the internet perfectly fine.
>
> Packet caps from the new router show that the router/DHCP server IS
> seeing all the DHCP protocol handshake. [When it's having the
> "problem."]
> The client does a DISCOVER
> Server responds with OFFER
> The client responds with REQUEST
> Then there's a LONG pause. [like 90s+ worth.]
> The Server responds with ACK. [It actually appears to send several
> ACKS. I probably cut my captures too short, so I only have about 2m of
> capture in my largest one. But that's what I see in what I have.]
> However, the client [Windows in this case] has timed out, and never
> gets the ACK.
> And while I'm not 100% certain, the times I've looked, the device
> believes it's handed out a lease. [I believe it's in the leases file.]
> But because of the long delay, the client never actually got the
> lease.
>
> Again,
> -simply unplugging the Comcast modem from the router, and DHCP
> immediately starts working again.
> -Plugging Eth1 into a live ethernet port [so that interface is seen as
> up] also works fine.
> -It's only when connected to the Comcast gateway/modem that it fails.
>
> On the LAN side of the network, we've tinkered replacing the switches
> - dumb, identically configured managed switches, different manged
> switch, or no switch at all - simply plugged directly into a single
> client. No changes on the LAN side make the slightest difference
> either.
>
> Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak
> into the LAN - but I've also explicitly defined rules that prevent
> anything from the WAN getting to the LOCAL or LAN interfaces - other
> than established/related traffic.
>
> So, I'm not asking for you to solve the issue on this particular
> hardware. What I'm asking for is some plausible explanation that might
> have these symptoms. I'm completely at wits end. I've spent a lot of
> hours trying a whole host of troubleshooting things - but I can't
> think of any possible way this could be happening. But clearly it is.
>
> IMO, either we have some very weird hardware physical layer problem
> that only impacts DHCP [and not traffic routing] or there's something
> I'm missing. I'd normally imagine that I'm missing something - but
> can't figure out what, if anything.
>
> I've tried to closely define the setup, but I'm sure I've forgotten
> something - perhaps lots of somethings - just ask and I'll try to
> clarify any missing pieces.
>
> Given how awesome people on this list are, I'm hopeful someone will
> have something that might jiggle loose something useful!
>
> TIA
> -Greg
> _______________________________________________
> dhcp-users mailing list
> [hidden email]
> https://lists.isc.org/mailman/listinfo/dhcp-users
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

Gregory Sloop
Re: Esoteric question Top posting

I don't have captures on Eth1 - though that's probably a good idea. Hard though, because it's a site that is in production like 7x12+ - so a PITA to go onsite (for the fourth time now) to grab some more data...

The potential of an interface with an overlapping subnet on Eth1 was raised and that's a good idea, I think.
But I certainly can't see anything in my config that would do that. I've stripped the config down the the very basics; just, essentially, defining the two Eth interfaces, the NAT/MASQ, DNS & NTP - in an effort to make sure there wasn't something somewhere in the config that was inadvertently causing the issue.

A Question, if anyone knows the answer.
If it's doing a full handshake on Eth0 currently, doesn't that indicate that it believes that Eth0 is the proper interface for that subnet declaration - and so, why would it also be doing it on another interface too? [I get why it would be good to verify by doing some packet-caps - but asking for my own knowledge/education.]

As for cloud-mgmt/call-home - no there's none of that.

Thanks for the thoughts so far.

-Greg

gsuca> Hi Greg,

gsuca> A very interesting problem... I've heard good reports about both those
gsuca> vendor's hardware, so sounds like a reasonable choice.

gsuca> What do you get if you snoop eth1 while connected to the different WAN
gsuca> devices? I wonder if dhcpd is trying to talk to something else upstream
gsuca> (no idea why it would do that).

gsuca> Does the Ubiquiti have some form of cloud management or call home setup?

gsuca> Best of luck.

gsuca> regards,
gsuca> -glenn

gsuca> On 2019-09-17 09:20, Gregory Sloop wrote:
>> So, this is kind of a wild goose-chase for some direction - but
>> thought there might be some useful answers here.

>> [But I know it's way out there and I'm not going to get direct help on
>> solving the issue on the platform I'm having issues with - just bear
>> with me and see if you have any helpful ideas.]

>> Let me set the background.

>> I'm using specific device hardware - in this case, a Mikrotik RB450G
>> [currently in place] and moving to a Ubiquiti EdgeRouter lite.
>> They're multi-ethernet interface routers - based on Linux.
>> The RB450G works fine and simply needs replacement. [The two devices
>> are configured as identically as I can. They're very different, so
>> we're talking "functionally" identical, not literally with the same
>> conf files.]

>> I'm having issues with DHCPd on the new device. [And queries at
>> Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not
>> terribly surprised.]

>> Lets assume Eth0/LAN is 10.0.0.1/24
>> DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
>> 14440 second leases.
>> Clients are connected directly to a switch that's directly connected
>> to ETH0. [No DHCP relay etc.]

>> Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
>> Lets say 1.2.3.5/30
>> The gateway [not that it matters is 1.2.3.6]

>> We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24]
>> network to the static public IP on the WAN.

>> ---
>> So, here's what happens/happened.

>> I went in to swap out the 'Tik box for the new hardware.
>> Plug it in, and none of the clients on the LAN get DHCP addresses. All
>> the DHCP clients time out.
>> After several passes at testing here's what I find.

>> I can't find any configuration problems on the replacement hardware.
>> The *old* 'Tik hardware/software works perfectly.

>> If we have the WAN connected to a simple live ethernet port on the
>> *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally
>> fine.
>> Only when we plug in the Comcast gateway/modem into the WAN port on
>> the new hardware does DHCP fail/timeout. [Remember just plugging it
>> into a regular ethernet switch works fine. It won't pass traffic,
>> because the static IP assignment isn't right - but the LAN side DHCP
>> server works perfectly.]

>> If we take a client on the LAN and plug in a static IP [rather than
>> DHCP], traffic flows out to the internet perfectly fine.

>> Packet caps from the new router show that the router/DHCP server IS
>> seeing all the DHCP protocol handshake. [When it's having the
>> "problem."]
>> The client does a DISCOVER
>> Server responds with OFFER
>> The client responds with REQUEST
>> Then there's a LONG pause. [like 90s+ worth.]
>> The Server responds with ACK. [It actually appears to send several
>> ACKS. I probably cut my captures too short, so I only have about 2m of
>> capture in my largest one. But that's what I see in what I have.]
>> However, the client [Windows in this case] has timed out, and never
>> gets the ACK.
>> And while I'm not 100% certain, the times I've looked, the device
>> believes it's handed out a lease. [I believe it's in the leases file.]
>> But because of the long delay, the client never actually got the
>> lease.

>> Again,
>> -simply unplugging the Comcast modem from the router, and DHCP
>> immediately starts working again.
>> -Plugging Eth1 into a live ethernet port [so that interface is seen as
>> up] also works fine.
>> -It's only when connected to the Comcast gateway/modem that it fails.

>> On the LAN side of the network, we've tinkered replacing the switches
>> - dumb, identically configured managed switches, different manged
>> switch, or no switch at all - simply plugged directly into a single
>> client. No changes on the LAN side make the slightest difference
>> either.

>> Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak
>> into the LAN - but I've also explicitly defined rules that prevent
>> anything from the WAN getting to the LOCAL or LAN interfaces - other
>> than established/related traffic.

>> So, I'm not asking for you to solve the issue on this particular
>> hardware. What I'm asking for is some plausible explanation that might
>> have these symptoms. I'm completely at wits end. I've spent a lot of
>> hours trying a whole host of troubleshooting things - but I can't
>> think of any possible way this could be happening. But clearly it is.

>> IMO, either we have some very weird hardware physical layer problem
>> that only impacts DHCP [and not traffic routing] or there's something
>> I'm missing. I'd normally imagine that I'm missing something - but
>> can't figure out what, if anything.

>> I've tried to closely define the setup, but I'm sure I've forgotten
>> something - perhaps lots of somethings - just ask and I'll try to
>> clarify any missing pieces.

>> Given how awesome people on this list are, I'm hopeful someone will
>> have something that might jiggle loose something useful!

>> TIA
>> -Greg
>> _______________________________________________
>> dhcp-users mailing list
[hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users

--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail:
[hidden email]
http://www.sloop.net
---
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

Brennan,Andrew
Maybe explicitly configure the Eth0 interface in the DHCP server configuration (or startup CLI) so that Eth1 is never looked at by the DHCP daemon?  I think I have an EdgeRouter somewhere, but haven’t tried much with it yet.

It does sound like the difference in behavior is somehow linked to both interfaces being active - and the DHCP configuration shouldn’t even acknowledge the Eth1 interface is active.

andrew.

On Sep 17, 2019, at 11:56 AM, Gregory Sloop <[hidden email]> wrote:

Re: Esoteric question

External.

Top posting

I don't have captures on Eth1 - though that's probably a good idea. Hard though, because it's a site that is in production like 7x12+ - so a PITA to go onsite (for the fourth time now) to grab some more data...

The potential of an interface with an overlapping subnet on Eth1 was raised and that's a good idea, I think.
But I certainly can't see anything in my config that would do that. I've stripped the config down the the very basics; just, essentially, defining the two Eth interfaces, the NAT/MASQ, DNS & NTP - in an effort to make sure there wasn't something somewhere in the config that was inadvertently causing the issue.

A Question, if anyone knows the answer.
If it's doing a full handshake on Eth0 currently, doesn't that indicate that it believes that Eth0 is the proper interface for that subnet declaration - and so, why would it also be doing it on another interface too? [I get why it would be good to verify by doing some packet-caps - but asking for my own knowledge/education.]

As for cloud-mgmt/call-home - no there's none of that.

Thanks for the thoughts so far.

-Greg

gsuca> Hi Greg,

gsuca> A very interesting problem... I've heard good reports about both those
gsuca> vendor's hardware, so sounds like a reasonable choice.

gsuca> What do you get if you snoop eth1 while connected to the different WAN
gsuca> devices? I wonder if dhcpd is trying to talk to something else upstream
gsuca> (no idea why it would do that).

gsuca> Does the Ubiquiti have some form of cloud management or call home setup?

gsuca> Best of luck.

gsuca> regards,
gsuca> -glenn

gsuca> On 2019-09-17 09:20, Gregory Sloop wrote:
>> So, this is kind of a wild goose-chase for some direction - but
>> thought there might be some useful answers here.

>> [But I know it's way out there and I'm not going to get direct help on
>> solving the issue on the platform I'm having issues with - just bear
>> with me and see if you have any helpful ideas.]

>> Let me set the background.

>> I'm using specific device hardware - in this case, a Mikrotik RB450G
>> [currently in place] and moving to a Ubiquiti EdgeRouter lite.
>> They're multi-ethernet interface routers - based on Linux.
>> The RB450G works fine and simply needs replacement. [The two devices
>> are configured as identically as I can. They're very different, so
>> we're talking "functionally" identical, not literally with the same
>> conf files.]

>> I'm having issues with DHCPd on the new device. [And queries at
>> Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not
>> terribly surprised.]

>> Lets assume Eth0/LAN is 10.0.0.1/24
>> DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
>> 14440 second leases.
>> Clients are connected directly to a switch that's directly connected
>> to ETH0. [No DHCP relay etc.]

>> Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
>> Lets say 1.2.3.5/30
>> The gateway [not that it matters is 1.2.3.6]

>> We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24]
>> network to the static public IP on the WAN.

>> ---
>> So, here's what happens/happened.

>> I went in to swap out the 'Tik box for the new hardware.
>> Plug it in, and none of the clients on the LAN get DHCP addresses. All
>> the DHCP clients time out.
>> After several passes at testing here's what I find.

>> I can't find any configuration problems on the replacement hardware.
>> The *old* 'Tik hardware/software works perfectly.

>> If we have the WAN connected to a simple live ethernet port on the
>> *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally
>> fine.
>> Only when we plug in the Comcast gateway/modem into the WAN port on
>> the new hardware does DHCP fail/timeout. [Remember just plugging it
>> into a regular ethernet switch works fine. It won't pass traffic,
>> because the static IP assignment isn't right - but the LAN side DHCP
>> server works perfectly.]

>> If we take a client on the LAN and plug in a static IP [rather than
>> DHCP], traffic flows out to the internet perfectly fine.

>> Packet caps from the new router show that the router/DHCP server IS
>> seeing all the DHCP protocol handshake. [When it's having the
>> "problem."]
>> The client does a DISCOVER
>> Server responds with OFFER
>> The client responds with REQUEST
>> Then there's a LONG pause. [like 90s+ worth.]
>> The Server responds with ACK. [It actually appears to send several
>> ACKS. I probably cut my captures too short, so I only have about 2m of
>> capture in my largest one. But that's what I see in what I have.]
>> However, the client [Windows in this case] has timed out, and never
>> gets the ACK.
>> And while I'm not 100% certain, the times I've looked, the device
>> believes it's handed out a lease. [I believe it's in the leases file.]
>> But because of the long delay, the client never actually got the
>> lease.

>> Again,
>> -simply unplugging the Comcast modem from the router, and DHCP
>> immediately starts working again.
>> -Plugging Eth1 into a live ethernet port [so that interface is seen as
>> up] also works fine.
>> -It's only when connected to the Comcast gateway/modem that it fails.

>> On the LAN side of the network, we've tinkered replacing the switches
>> - dumb, identically configured managed switches, different manged
>> switch, or no switch at all - simply plugged directly into a single
>> client. No changes on the LAN side make the slightest difference
>> either.

>> Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak
>> into the LAN - but I've also explicitly defined rules that prevent
>> anything from the WAN getting to the LOCAL or LAN interfaces - other
>> than established/related traffic.

>> So, I'm not asking for you to solve the issue on this particular
>> hardware. What I'm asking for is some plausible explanation that might
>> have these symptoms. I'm completely at wits end. I've spent a lot of
>> hours trying a whole host of troubleshooting things - but I can't
>> think of any possible way this could be happening. But clearly it is.

>> IMO, either we have some very weird hardware physical layer problem
>> that only impacts DHCP [and not traffic routing] or there's something
>> I'm missing. I'd normally imagine that I'm missing something - but
>> can't figure out what, if anything.

>> I've tried to closely define the setup, but I'm sure I've forgotten
>> something - perhaps lots of somethings - just ask and I'll try to
>> clarify any missing pieces.

>> Given how awesome people on this list are, I'm hopeful someone will
>> have something that might jiggle loose something useful!

>> TIA
>> -Greg
>> _______________________________________________
>> dhcp-users mailing list
[hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users

--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail:
[hidden email]
http://www.sloop.net
---
_______________________________________________
dhcp-users mailing list
[hidden email]
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isc.org%2Fmailman%2Flistinfo%2Fdhcp-users&amp;data=02%7C01%7Candrew.brennan%40drexel.edu%7C3faaf27c6cc04500f9ce08d73b87ac49%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C0%7C637043326257150822&amp;sdata=WdFqlEa8uKMN%2B4MSHrQsdpa00m2ivE7kRSZQTmC8ucc%3D&amp;reserved=0


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

Gregory Sloop
It's not possible to instruct the dhcp server [on edgerouter] which interfaces to listen on - it appears to based on the subnet declaration which interfaces it will respond on.

That said - I *can* have eth1 "active" and have no problem - say just plugging it into a switch or something. The problem ONLY occurs [at least in any situation I've been able to test] when it's connected to the CC modem. It doesn't occur when Eth1 is live but not connected to that equipment. [But still configured the same.]


On Sep 17, 2019 11:23 AM, "Brennan,Andrew" <[hidden email]> wrote:
Maybe explicitly configure the Eth0 interface in the DHCP server configuration (or startup CLI) so that Eth1 is never looked at by the DHCP daemon?  I think I have an EdgeRouter somewhere, but haven’t tried much with it yet.

It does sound like the difference in behavior is somehow linked to both interfaces being active - and the DHCP configuration shouldn’t even acknowledge the Eth1 interface is active.

andrew.

On Sep 17, 2019, at 11:56 AM, Gregory Sloop <[hidden email]> wrote:

External.

Top posting

I don't have captures on Eth1 - though that's probably a good idea. Hard though, because it's a site that is in production like 7x12+ - so a PITA to go onsite (for the fourth time now) to grab some more data...

The potential of an interface with an overlapping subnet on Eth1 was raised and that's a good idea, I think.
But I certainly can't see anything in my config that would do that. I've stripped the config down the the very basics; just, essentially, defining the two Eth interfaces, the NAT/MASQ, DNS & NTP - in an effort to make sure there wasn't something somewhere in the config that was inadvertently causing the issue.

A Question, if anyone knows the answer.
If it's doing a full handshake on Eth0 currently, doesn't that indicate that it believes that Eth0 is the proper interface for that subnet declaration - and so, why would it also be doing it on another interface too? [I get why it would be good to verify by doing some packet-caps - but asking for my own knowledge/education.]

As for cloud-mgmt/call-home - no there's none of that.

Thanks for the thoughts so far.

-Greg

gsuca> Hi Greg,

gsuca> A very interesting problem... I've heard good reports about both those
gsuca> vendor's hardware, so sounds like a reasonable choice.

gsuca> What do you get if you snoop eth1 while connected to the different WAN
gsuca> devices? I wonder if dhcpd is trying to talk to something else upstream
gsuca> (no idea why it would do that).

gsuca> Does the Ubiquiti have some form of cloud management or call home setup?

gsuca> Best of luck.

gsuca> regards,
gsuca> -glenn

gsuca> On 2019-09-17 09:20, Gregory Sloop wrote:
>> So, this is kind of a wild goose-chase for some direction - but
>> thought there might be some useful answers here.

>> [But I know it's way out there and I'm not going to get direct help on
>> solving the issue on the platform I'm having issues with - just bear
>> with me and see if you have any helpful ideas.]

>> Let me set the background.

>> I'm using specific device hardware - in this case, a Mikrotik RB450G
>> [currently in place] and moving to a Ubiquiti EdgeRouter lite.
>> They're multi-ethernet interface routers - based on Linux.
>> The RB450G works fine and simply needs replacement. [The two devices
>> are configured as identically as I can. They're very different, so
>> we're talking "functionally" identical, not literally with the same
>> conf files.]

>> I'm having issues with DHCPd on the new device. [And queries at
>> Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not
>> terribly surprised.]

>> Lets assume Eth0/LAN is 10.0.0.1/24
>> DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
>> 14440 second leases.
>> Clients are connected directly to a switch that's directly connected
>> to ETH0. [No DHCP relay etc.]

>> Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
>> Lets say 1.2.3.5/30
>> The gateway [not that it matters is 1.2.3.6]

>> We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24]
>> network to the static public IP on the WAN.

>> ---
>> So, here's what happens/happened.

>> I went in to swap out the 'Tik box for the new hardware.
>> Plug it in, and none of the clients on the LAN get DHCP addresses. All
>> the DHCP clients time out.
>> After several passes at testing here's what I find.

>> I can't find any configuration problems on the replacement hardware.
>> The *old* 'Tik hardware/software works perfectly.

>> If we have the WAN connected to a simple live ethernet port on the
>> *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally
>> fine.
>> Only when we plug in the Comcast gateway/modem into the WAN port on
>> the new hardware does DHCP fail/timeout. [Remember just plugging it
>> into a regular ethernet switch works fine. It won't pass traffic,
>> because the static IP assignment isn't right - but the LAN side DHCP
>> server works perfectly.]

>> If we take a client on the LAN and plug in a static IP [rather than
>> DHCP], traffic flows out to the internet perfectly fine.

>> Packet caps from the new router show that the router/DHCP server IS
>> seeing all the DHCP protocol handshake. [When it's having the
>> "problem."]
>> The client does a DISCOVER
>> Server responds with OFFER
>> The client responds with REQUEST
>> Then there's a LONG pause. [like 90s+ worth.]
>> The Server responds with ACK. [It actually appears to send several
>> ACKS. I probably cut my captures too short, so I only have about 2m of
>> capture in my largest one. But that's what I see in what I have.]
>> However, the client [Windows in this case] has timed out, and never
>> gets the ACK.
>> And while I'm not 100% certain, the times I've looked, the device
>> believes it's handed out a lease. [I believe it's in the leases file.]
>> But because of the long delay, the client never actually got the
>> lease.

>> Again,
>> -simply unplugging the Comcast modem from the router, and DHCP
>> immediately starts working again.
>> -Plugging Eth1 into a live ethernet port [so that interface is seen as
>> up] also works fine.
>> -It's only when connected to the Comcast gateway/modem that it fails.

>> On the LAN side of the network, we've tinkered replacing the switches
>> - dumb, identically configured managed switches, different manged
>> switch, or no switch at all - simply plugged directly into a single
>> client. No changes on the LAN side make the slightest difference
>> either.

>> Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak
>> into the LAN - but I've also explicitly defined rules that prevent
>> anything from the WAN getting to the LOCAL or LAN interfaces - other
>> than established/related traffic.

>> So, I'm not asking for you to solve the issue on this particular
>> hardware. What I'm asking for is some plausible explanation that might
>> have these symptoms. I'm completely at wits end. I've spent a lot of
>> hours trying a whole host of troubleshooting things - but I can't
>> think of any possible way this could be happening. But clearly it is.

>> IMO, either we have some very weird hardware physical layer problem
>> that only impacts DHCP [and not traffic routing] or there's something
>> I'm missing. I'd normally imagine that I'm missing something - but
>> can't figure out what, if anything.

>> I've tried to closely define the setup, but I'm sure I've forgotten
>> something - perhaps lots of somethings - just ask and I'll try to
>> clarify any missing pieces.

>> Given how awesome people on this list are, I'm hopeful someone will
>> have something that might jiggle loose something useful!

>> TIA
>> -Greg
>> _______________________________________________
>> dhcp-users mailing list
[hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users

--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail:
[hidden email]
http://www.sloop.net
---
_______________________________________________
dhcp-users mailing list
[hidden email]
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isc.org%2Fmailman%2Flistinfo%2Fdhcp-users&amp;data=02%7C01%7Candrew.brennan%40drexel.edu%7C3faaf27c6cc04500f9ce08d73b87ac49%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C0%7C637043326257150822&amp;sdata=WdFqlEa8uKMN%2B4MSHrQsdpa00m2ivE7kRSZQTmC8ucc%3D&amp;reserved=0



_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

Attila Szalay
Hi,

I have a similar setup with different MicroTic routers but haven't experienced things like this.

I think, what should be helpful to solve this issue is a config dump, a (the more the merrier) packet capture (all interfaces, all port, all protocol) and you can also raise the log level related to the dhcp process too.

On Tue, Sep 17, 2019 at 9:02 PM Greg Sloop <[hidden email]> <[hidden email]> wrote:
It's not possible to instruct the dhcp server [on edgerouter] which interfaces to listen on - it appears to based on the subnet declaration which interfaces it will respond on.

That said - I *can* have eth1 "active" and have no problem - say just plugging it into a switch or something. The problem ONLY occurs [at least in any situation I've been able to test] when it's connected to the CC modem. It doesn't occur when Eth1 is live but not connected to that equipment. [But still configured the same.]


On Sep 17, 2019 11:23 AM, "Brennan,Andrew" <[hidden email]> wrote:
Maybe explicitly configure the Eth0 interface in the DHCP server configuration (or startup CLI) so that Eth1 is never looked at by the DHCP daemon?  I think I have an EdgeRouter somewhere, but haven’t tried much with it yet.

It does sound like the difference in behavior is somehow linked to both interfaces being active - and the DHCP configuration shouldn’t even acknowledge the Eth1 interface is active.

andrew.

On Sep 17, 2019, at 11:56 AM, Gregory Sloop <[hidden email]> wrote:

External.

Top posting

I don't have captures on Eth1 - though that's probably a good idea. Hard though, because it's a site that is in production like 7x12+ - so a PITA to go onsite (for the fourth time now) to grab some more data...

The potential of an interface with an overlapping subnet on Eth1 was raised and that's a good idea, I think.
But I certainly can't see anything in my config that would do that. I've stripped the config down the the very basics; just, essentially, defining the two Eth interfaces, the NAT/MASQ, DNS & NTP - in an effort to make sure there wasn't something somewhere in the config that was inadvertently causing the issue.

A Question, if anyone knows the answer.
If it's doing a full handshake on Eth0 currently, doesn't that indicate that it believes that Eth0 is the proper interface for that subnet declaration - and so, why would it also be doing it on another interface too? [I get why it would be good to verify by doing some packet-caps - but asking for my own knowledge/education.]

As for cloud-mgmt/call-home - no there's none of that.

Thanks for the thoughts so far.

-Greg

gsuca> Hi Greg,

gsuca> A very interesting problem... I've heard good reports about both those
gsuca> vendor's hardware, so sounds like a reasonable choice.

gsuca> What do you get if you snoop eth1 while connected to the different WAN
gsuca> devices? I wonder if dhcpd is trying to talk to something else upstream
gsuca> (no idea why it would do that).

gsuca> Does the Ubiquiti have some form of cloud management or call home setup?

gsuca> Best of luck.

gsuca> regards,
gsuca> -glenn

gsuca> On 2019-09-17 09:20, Gregory Sloop wrote:
>> So, this is kind of a wild goose-chase for some direction - but
>> thought there might be some useful answers here.

>> [But I know it's way out there and I'm not going to get direct help on
>> solving the issue on the platform I'm having issues with - just bear
>> with me and see if you have any helpful ideas.]

>> Let me set the background.

>> I'm using specific device hardware - in this case, a Mikrotik RB450G
>> [currently in place] and moving to a Ubiquiti EdgeRouter lite.
>> They're multi-ethernet interface routers - based on Linux.
>> The RB450G works fine and simply needs replacement. [The two devices
>> are configured as identically as I can. They're very different, so
>> we're talking "functionally" identical, not literally with the same
>> conf files.]

>> I'm having issues with DHCPd on the new device. [And queries at
>> Ubiquiti are going nowhere fast. It IS an unusual problem, so I'm not
>> terribly surprised.]

>> Lets assume Eth0/LAN is 10.0.0.1/24
>> DHCPD is setup to hand out addresses for 10.0.0.20-100, say.
>> 14440 second leases.
>> Clients are connected directly to a switch that's directly connected
>> to ETH0. [No DHCP relay etc.]

>> Eth1/WAN is a static /30 - connected directly to a Comcast Modem/BSG.
>> Lets say 1.2.3.5/30
>> The gateway [not that it matters is 1.2.3.6]

>> We're masquerading traffic [NAT] from the local RFC1918 [10.0.0.0/24]
>> network to the static public IP on the WAN.

>> ---
>> So, here's what happens/happened.

>> I went in to swap out the 'Tik box for the new hardware.
>> Plug it in, and none of the clients on the LAN get DHCP addresses. All
>> the DHCP clients time out.
>> After several passes at testing here's what I find.

>> I can't find any configuration problems on the replacement hardware.
>> The *old* 'Tik hardware/software works perfectly.

>> If we have the WAN connected to a simple live ethernet port on the
>> *new hardware,* [EdgeRouter] DHCP works fine for the LAN side. Totally
>> fine.
>> Only when we plug in the Comcast gateway/modem into the WAN port on
>> the new hardware does DHCP fail/timeout. [Remember just plugging it
>> into a regular ethernet switch works fine. It won't pass traffic,
>> because the static IP assignment isn't right - but the LAN side DHCP
>> server works perfectly.]

>> If we take a client on the LAN and plug in a static IP [rather than
>> DHCP], traffic flows out to the internet perfectly fine.

>> Packet caps from the new router show that the router/DHCP server IS
>> seeing all the DHCP protocol handshake. [When it's having the
>> "problem."]
>> The client does a DISCOVER
>> Server responds with OFFER
>> The client responds with REQUEST
>> Then there's a LONG pause. [like 90s+ worth.]
>> The Server responds with ACK. [It actually appears to send several
>> ACKS. I probably cut my captures too short, so I only have about 2m of
>> capture in my largest one. But that's what I see in what I have.]
>> However, the client [Windows in this case] has timed out, and never
>> gets the ACK.
>> And while I'm not 100% certain, the times I've looked, the device
>> believes it's handed out a lease. [I believe it's in the leases file.]
>> But because of the long delay, the client never actually got the
>> lease.

>> Again,
>> -simply unplugging the Comcast modem from the router, and DHCP
>> immediately starts working again.
>> -Plugging Eth1 into a live ethernet port [so that interface is seen as
>> up] also works fine.
>> -It's only when connected to the Comcast gateway/modem that it fails.

>> On the LAN side of the network, we've tinkered replacing the switches
>> - dumb, identically configured managed switches, different manged
>> switch, or no switch at all - simply plugged directly into a single
>> client. No changes on the LAN side make the slightest difference
>> either.

>> Since we're doing NAT/MASQ from LAN->WAN no WAN traffic should leak
>> into the LAN - but I've also explicitly defined rules that prevent
>> anything from the WAN getting to the LOCAL or LAN interfaces - other
>> than established/related traffic.

>> So, I'm not asking for you to solve the issue on this particular
>> hardware. What I'm asking for is some plausible explanation that might
>> have these symptoms. I'm completely at wits end. I've spent a lot of
>> hours trying a whole host of troubleshooting things - but I can't
>> think of any possible way this could be happening. But clearly it is.

>> IMO, either we have some very weird hardware physical layer problem
>> that only impacts DHCP [and not traffic routing] or there's something
>> I'm missing. I'd normally imagine that I'm missing something - but
>> can't figure out what, if anything.

>> I've tried to closely define the setup, but I'm sure I've forgotten
>> something - perhaps lots of somethings - just ask and I'll try to
>> clarify any missing pieces.

>> Given how awesome people on this list are, I'm hopeful someone will
>> have something that might jiggle loose something useful!

>> TIA
>> -Greg
>> _______________________________________________
>> dhcp-users mailing list
[hidden email]
>> https://lists.isc.org/mailman/listinfo/dhcp-users

--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail:
[hidden email]
http://www.sloop.net
---
_______________________________________________
dhcp-users mailing list
[hidden email]
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.isc.org%2Fmailman%2Flistinfo%2Fdhcp-users&amp;data=02%7C01%7Candrew.brennan%40drexel.edu%7C3faaf27c6cc04500f9ce08d73b87ac49%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C0%7C637043326257150822&amp;sdata=WdFqlEa8uKMN%2B4MSHrQsdpa00m2ivE7kRSZQTmC8ucc%3D&amp;reserved=0


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

Simon Hobson-2
In reply to this post by Gregory Sloop
Gregory Sloop <[hidden email]> wrote:

>Packet caps from the new router show that the router/DHCP server IS
>seeing all the DHCP protocol handshake. [When it's having the
>"problem."]
>The client does a DISCOVER
>Server responds with OFFER
>The client responds with REQUEST
>Then there's a LONG pause. [like 90s+ worth.]
>The Server responds with ACK. [It actually appears to send several
>ACKS.

Ah, about 90s you say ?
Have a look on the external interface and/or in the logs and see if it's trying to do any DNS lookups or updates. Over the years I've seen lots of threads related to 90s delays - a common one being SSH logins - which have come down to the device attempting a DNS lookup and waiting for it to time out.

Anyway, what I theorise could be happening is :
With WAN connected, "something" (dhcpd) is trying to do "something" with an outside service and timing out.
When the WAN is link-up but not connected to the modem, such attempts fail very quickly as the device has no ARP entry for it's default route and so the network stack quickly reports "no route to host".

BTW, when I first read your post I was thinking WTF ! It was only after reading the other replies that this idea came to mind.

Simon
--
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

Gregory Sloop
Re: Esoteric question Not to "diss" any of the prior suggestions - but THIS, well this is something I can get behind!

It *might* not be correct, and it's a bust of an idea - but it, IMNSHO, ties all the pieces together in a really elegant way.
It's just a concept that makes so much sense and makes all the weird symptoms all seem so much more plausible.

Wow. Really, massive thanks Simon.

I'll try to update the list when/if I figure out what's wrong. [Unless I've done something so incredibly stupid I'm too embarrassed to post about it... :( ]

The modem did get replaced today, so it's possible the symptoms simply vanish because of some change in the modem config, etc. But we'll see.

Thanks so much all!

-Greg

SH> Gregory Sloop <
[hidden email]> wrote:

>>Packet caps from the new router show that the router/DHCP server IS
>>seeing all the DHCP protocol handshake. [When it's having the
>>"problem."]
>>The client does a DISCOVER
>>Server responds with OFFER
>>The client responds with REQUEST
>>Then there's a LONG pause. [like 90s+ worth.]
>>The Server responds with ACK. [It actually appears to send several
>>ACKS.

SH> Ah, about 90s you say ?
SH> Have a look on the external interface and/or in the logs and see
SH> if it's trying to do any DNS lookups or updates. Over the years
SH> I've seen lots of threads related to 90s delays - a common one
SH> being SSH logins - which have come down to the device attempting a
SH> DNS lookup and waiting for it to time out.

SH> Anyway, what I theorise could be happening is :
SH> With WAN connected, "something" (dhcpd) is trying to do
SH> "something" with an outside service and timing out.
SH> When the WAN is link-up but not connected to the modem, such
SH> attempts fail very quickly as the device has no ARP entry for it's
SH> default route and so the network stack quickly reports "no route to host".

SH> BTW, when I first read your post I was thinking WTF ! It was only
SH> after reading the other replies that this idea came to mind.

SH> Simon

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Esoteric question

Gregory Sloop
Re: Esoteric question So, the provider came in and replaced the cable-modem and that made the problem vanish.
I'm going to see if I can get ahold of the problem modem, and *if* I get some time, I'll see if I can tease out what might have been the root cause.

I wouldn't be holding my breath waiting for it, but we'll see.
It certainly was a *very* odd situation - and I'd love to know the cause.

-Greg


Not to "diss" any of the prior suggestions - but THIS, well this is something I can get behind!

It *might* not be correct, and it's a bust of an idea - but it, IMNSHO, ties all the pieces together in a really elegant way.
It's just a concept that makes so much sense and makes all the weird symptoms all seem so much more plausible.

Wow. Really, massive thanks Simon.

I'll try to update the list when/if I figure out what's wrong. [Unless I've done something so incredibly stupid I'm too embarrassed to post about it... :( ]

The modem did get replaced today, so it's possible the symptoms simply vanish because of some change in the modem config, etc. But we'll see.

Thanks so much all!

-Greg

SH> Gregory Sloop <
[hidden email]> wrote:

>>Packet caps from the new router show that the router/DHCP server IS
>>seeing all the DHCP protocol handshake. [When it's having the
>>"problem."]
>>The client does a DISCOVER
>>Server responds with OFFER
>>The client responds with REQUEST
>>Then there's a LONG pause. [like 90s+ worth.]
>>The Server responds with ACK. [It actually appears to send several
>>ACKS.

SH> Ah, about 90s you say ?
SH> Have a look on the external interface and/or in the logs and see
SH> if it's trying to do any DNS lookups or updates. Over the years
SH> I've seen lots of threads related to 90s delays - a common one
SH> being SSH logins - which have come down to the device attempting a
SH> DNS lookup and waiting for it to time out.

SH> Anyway, what I theorise could be happening is :
SH> With WAN connected, "something" (dhcpd) is trying to do
SH> "something" with an outside service and timing out.
SH> When the WAN is link-up but not connected to the modem, such
SH> attempts fail very quickly as the device has no ARP entry for it's
SH> default route and so the network stack quickly reports "no route to host".

SH> BTW, when I first read your post I was thinking WTF ! It was only
SH> after reading the other replies that this idea came to mind.

SH> Simon


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users