Restarting DHCP safely whilst avoiding partner-down state

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Restarting DHCP safely whilst avoiding partner-down state

sthaug
> The release notes indicate that a "gentle shutdown" feature was added
> in the past and then subsequently removed because the semantics chosen
> caused operational issues - but what these were isn't known because
> the associated bug report isn't publicly available.

See the discussion thread at

https://lists.isc.org/pipermail/dhcp-users/2014-June/017958.html
https://lists.isc.org/pipermail/dhcp-users/2014-July/017970.html

Steinar Haug, Nethelp consulting, [hidden email]
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Restarting DHCP safely whilst avoiding partner-down state

Anderson, Charles R
In reply to this post by Terry Burton
On Fri, May 13, 2016 at 07:49:12PM +0100, Terry Burton wrote:

> On 13 May 2016 at 19:26, Simon Hobson <[hidden email]> wrote:
> > Chuck Anderson <[hidden email]> wrote:
> >
> >> Is there a way to signal dhcpd to write out the lease file so it can
> >> be checked?
> >
> > Surely a simple change would be to not act on a normal kill signal in the middle of a lease file write ? Capture that the signal arrived, and act on it as soon as the complete lease has been written.
> > That one change alone would completely remove the "wrote half a lease to the file" issue.
>
> I couldn't agree more...

Since the code to close a lease file and start appending to a new one
already exists, it doesn't seem a stretch to just call that code in
the SIGTERM handler...
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Restarting DHCP safely whilst avoiding partner-down state

Marc Haber
In reply to this post by Terry Burton
On Fri, May 13, 2016 at 02:00:03PM +0100, Terry Burton wrote:
> I'm attempting to write a systemd .service file for my own uses of ISC
> DHCP. However, if it can be made sufficiently generic then I would
> intend to push this upstream or at least into distributions.
>
> It needs to be suitable for managing failover pairs and I'm struggling
> with the age-old problem of restarting a dhcpd instance. From reading
> around there does not currently appear to be a method for restarting
> dhcpd that is both *safe* and *useful* in such a setup.

Please also cover the IPv6 case. On my systemd systems, I do have an
isc-dhcp-server-v4 and an isc-dhcp-server-v6 service and an
isc-dhcp-server service that is WantedBy both -v4 and -v6, so that
both instances start up concurrently when one does systemctl start
isc-dhcp-server.

Greetings
Marc

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Fwd: Re: Restarting DHCP safely whilst avoiding partner-down state

Terry Burton-2
In reply to this post by sthaug

Dropped the list from cc:

---------- Forwarded message ----------
From: "Terry Burton" <[hidden email]>
Date: 13 May 2016 21:35
Subject: Re: Restarting DHCP safely whilst avoiding partner-down state
To: <[hidden email]>
Cc: "Shawn Routhier" <[hidden email]>

> On 13 May 2016 at 21:23,  <[hidden email]> wrote:

> >> The release notes indicate that a "gentle shutdown" feature was added
> >> in the past and then subsequently removed because the semantics chosen
> >> caused operational issues - but what these were isn't known because
> >> the associated bug report isn't publicly available.
> >
> > See the discussion thread at
> >
> > https://lists.isc.org/pipermail/dhcp-users/2014-June/017958.html
> > https://lists.isc.org/pipermail/dhcp-users/2014-July/017970.html
>
> Very useful, thanks.
>
> Shawn: Are you (or someone else) still working on this? If not, I'm
> happy to run with it if we can agree upon a design.
>
> Given the options given in [1] I think I prefer the different signals
> for different uses approach. I would also suggest extending the OMAPI
> interface to provide quick & safe shutdown.
>
>
> [1] https://lists.isc.org/pipermail/dhcp-users/2014-July/017974.html
>
>
> All the best,
>
> Terry


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Restarting DHCP safely whilst avoiding partner-down state

Terry Burton
In reply to this post by Terry Burton
On 13 May 2016 at 20:30, Terry Burton <[hidden email]> wrote:

> On 13 May 2016 at 20:06, Terry Burton <[hidden email]> wrote:
>> On 13 May 2016 at 19:25, dave c <[hidden email]> wrote:
>>> Are folks forgetting that the default action of the kill command is to send
>>> the TERM signal? That signal should tell the daemon to do an orderly
>>> shutdown, close the leases file cleanly, send whatever signals to the
>>> partner that are required and then exit when everything is ready.
>>>
>>> All the concern I am seeing below would be true if folks were issuing a kill
>>> -9 to stop the service. At which point the leases file would get potentially
>>> corrupted.
>> <...snip...>
>>> So it sounds like a lot of angst over nothing... a TERM signal is defined as
>>> closing all processes and threads cleanly, writing out the last bits of data
>>> and stopping things in an orderly fashion. So seems that issuing kill {dhcpd
>>> pid} would be perfectly acceptable to close things down even in a partner
>>> scenario.
>>
>> Where do you get the definition of a SIGTERM causing a graceful
>> shutdown (other than by convention) and if this were the case for ISC
>> DHCP then why the warning about truncated leases given in AA-01043?
>>
>> The effect of receiving a handleable signal is to immediately jump
>> into the trap handler if one is configured for that signal, otherwise
>> to die.
>>
>> Unless a handler takes care to ensure that everything is consistent
>> and then exit then SIGTERM, SIGINT, etc. are potentially dangerous.
>>
>> The release notes indicate that a "gentle shutdown" feature was added
>> in the past and then subsequently removed because the semantics chosen
>> caused operational issues - but what these were isn't known because
>> the associated bug report isn't publicly available.
>>
>> I need to find time to understand the current codebase, but what I'd
>> like to know the intended semantics and what issues are encountered
>> with implementing these in the way that Simon Hobson suggests.
>
> So currently there are no trap handlers for SIGTERM or SIGINT and
> therefore no cleanup whatsoever at exit.
>
> There is a compiled-out option ENABLE_GENTLE_SHUTDOWN which installs
> handlers for these signals but when this was activated it implemented
> the harmful semantics of putting the server through a
> recovery+partner-down transition which isn't useful for a quick
> configuration reload:
>
> /* Enable the gentle shutdown signal handling.  Currently this
>    means that on SIGINT or SIGTERM a client will release its
>    address and a server in a failover pair will go through
>    partner down.  Both of which can be undesireable in some
>    situations.  We plan to revisit this feature and may
>    make non-backwards compatible changes including the
>    removal of this define.  Use at your own risk.  */
> /* #define ENABLE_GENTLE_SHUTDOWN */
>
> #if defined(ENABLE_GENTLE_SHUTDOWN)
>         /* no signal handlers until we deal with the side effects */
>         /* install signal handlers */
>         signal(SIGINT, dhcp_signal_handler);   /* control-c */
>         signal(SIGTERM, dhcp_signal_handler);  /* kill */
> #endif
>
> Having a more basic signal handler that defers the exit in order to
> continue to write out an outstanding lease seems better. Perhaps once
> could even differentiate these exit semantics based on SIGINT vs
> SIGTERM.
>
> If someone who can speak for ISC is able to indicate whether this
> would be a sensible approach then I am happy to work up a patch.

As a rough proof of concept this simple amendment appears to provide
rapid shutdown without partner down interference for a basic reload.

The part of dhcpd.c that drives the current -> shutdown -> recover
state transition can be guarded by a variable that determines whether
or not to place the server into long-term shutdown (possibly based on
the received signal or a runtime configurable.)

...
#define RECOVER_ON_STARTUP 1!=1   /* To be a variable derived at runtime */
...
#if defined (FAILOVER_PROTOCOL)
        if (RECOVER_ON_STARTUP) {
            /* Set all failover peers into the shutdown state. */
            if (shutdown_state == shutdown_dhcp) {
                for (state = failover_states; state; state = state -> next) {
                    if (state -> me.state == normal) {
                       dhcp_failover_set_state (state, shut_down);
                       failover_connection_count++;
                    }
                    if (state -> me.state == shut_down &&
                        state -> partner.state != partner_down)
                            failover_connection_count++;
                }
            }
            if (shutdown_state == shutdown_done) {
                for (state = failover_states; state; state = state -> next) {
                    if (state -> me.state == shut_down) {
                        if (state -> link_to_peer)
                            dhcp_failover_link_dereference (&state ->
link_to_peer,
                                                            MDL);
                        dhcp_failover_set_state (state, recover);
                    }
                }
            }
        }
#endif
        if (shutdown_state == shutdown_done) {
#if defined (DEBUG_MEMORY_LEAKAGE) && \
                defined (DEBUG_MEMORY_LEAKAGE_ON_EXIT)
                free_everything ();
                omapi_print_dmalloc_usage_by_caller ();
#endif
                if (no_pid_file == ISC_FALSE)
                        (void) unlink(path_dhcpd_pid);
                exit (0);
        }
...
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: Restarting DHCP safely whilst avoiding partner-down state

Terry Burton
In reply to this post by Anderson, Charles R
On 14 May 2016 at 04:12, Chuck Anderson <[hidden email]> wrote:

> On Fri, May 13, 2016 at 07:49:12PM +0100, Terry Burton wrote:
>> On 13 May 2016 at 19:26, Simon Hobson <[hidden email]> wrote:
>> > Chuck Anderson <[hidden email]> wrote:
>> >
>> >> Is there a way to signal dhcpd to write out the lease file so it can
>> >> be checked?
>> >
>> > Surely a simple change would be to not act on a normal kill signal in the middle of a lease file write ? Capture that the signal arrived, and act on it as soon as the complete lease has been written.
>> > That one change alone would completely remove the "wrote half a lease to the file" issue.
>>
>> I couldn't agree more...
>
> Since the code to close a lease file and start appending to a new one
> already exists, it doesn't seem a stretch to just call that code in
> the SIGTERM handler...

Might be even more simple than that but haven't traced the process.

Using "gentle shutdown" it appears that the signal handler simply sets
a shutdown flag then returns control. Therefore in the case that the
leases db is midway through (or mid-printf) the routine (or libc
function) will resume to complete the write. The dispatcher later acts
on the shutdown flag bringing things to a controlled stop.
Short-circuiting the failover state transitions may be all that's
required to get the necessary operational semantics for a rapid
restart.
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
12