dont-use-fsync real world impact

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

dont-use-fsync real world impact

Jure Sah
Hi,

The documentation clearly states that using the dont-use-fsync option is
not recommended.

I am wondering what is the realistic impact of this? As I understand the
kernel commits dirty pages to disk every 30 seconds by default, and this
is configurable. Wouldn't this mean that at worst 30 seconds worth of
leases are lost?

The leases file is in most cases relatively tiny (under 1 MB), and could
easily fit in system storage cache. However the fact that it gets
fsynced on every commit means that the performance of the DHCP server is
capped at whatever the performance of the physical storage is. While
fast storage options exist, this doesn't make the DHCP server
future-proof as a solution, which is a problem.

From the past correspondence from the mailing list archive I surmise
that people usually work around this by using hardware cache that does
not obey fsync, which simply offloads the problem from the kernel to the
cache controller and only superficially solves the problem. It however
hints towards the view that perhaps not using fsync is not all that bad,
if we are talking about a typical professionally hosted server.

LP,
Jure
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Andrew Bell
You made me curious, so I just checked my lease database.  16.7 MB, 57,5712 entries currently.

On Sat, Sep 7, 2019 at 2:17 PM Jure Sah <[hidden email]> wrote:
Hi,

The documentation clearly states that using the dont-use-fsync option is
not recommended.

I am wondering what is the realistic impact of this? As I understand the
kernel commits dirty pages to disk every 30 seconds by default, and this
is configurable. Wouldn't this mean that at worst 30 seconds worth of
leases are lost?

The leases file is in most cases relatively tiny (under 1 MB), and could
easily fit in system storage cache. However the fact that it gets
fsynced on every commit means that the performance of the DHCP server is
capped at whatever the performance of the physical storage is. While
fast storage options exist, this doesn't make the DHCP server
future-proof as a solution, which is a problem.

From the past correspondence from the mailing list archive I surmise
that people usually work around this by using hardware cache that does
not obey fsync, which simply offloads the problem from the kernel to the
cache controller and only superficially solves the problem. It however
hints towards the view that perhaps not using fsync is not all that bad,
if we are talking about a typical professionally hosted server.

LP,
Jure
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Simon Hobson
In reply to this post by Jure Sah
Jure Sah <[hidden email]> wrote:

> The documentation clearly states that using the dont-use-fsync option is
> not recommended.
>
> I am wondering what is the realistic impact of this? As I understand the
> kernel commits dirty pages to disk every 30 seconds by default, and this
> is configurable. Wouldn't this mean that at worst 30 seconds worth of
> leases are lost?

Yes, but that could be a rather serious loss of data for some operators. As always, there's no "one size fits all" answer, different operators will have different ideas on this.
Indeed, AIUI (from several years ago at least) the DHCP service in Windows Server massively outperformed the ISC DHCp server in benchmarks using out of the box settings. The reason for this was that the MS server did NOT fsync it's leases database and thus is vulnerable to exactly the issue you mention - also making non-compliant with the relevant RFC.
However, in their defence, they have "sort of" moved that security aspect to clients by making the clients very sticky about their leases - more so than other clients in my observations. That doesn't fully prevent the problem of the server missing knowledge of leases it's granted.

> The leases file is in most cases relatively tiny (under 1 MB)

That's probably a generalisation too far. Mine (at home) is only 20k, but as Andrew Bell has already pointed out, some people do have large lease files.

> From the past correspondence from the mailing list archive I surmise
> that people usually work around this by using hardware cache that does
> not obey fsync, which simply offloads the problem from the kernel to the
> cache controller and only superficially solves the problem.

Yes, but no.
Yes it offloads the problem, no it's not just a superficial fix. A "proper" hardware cache will be battery backed and can survive a crash or power failure of the host. So if we assume we're talking about the hardware cache in a disk controller (eg a RAID controller) then if the power goes off without the chance of an orderly shutdown, then the battery backed cache will hold the updates until the power comes back on again - at which point it will push the updates out to the disk(s).
There are other sorts of cache hardware. In the distant past I recall seeing (and drooling over !) a "magic box" that comprised a stack of RAM, some disks, a battery, and a controller. To the host it presented as a standard wide SCSI device (that dates it), while internally it was a big RAM disk. In the event of power failure, the battery would run the system long enough to write everything to disk.
In both cases (and others), under normal conditions it's safe to assume that if the "disk" comes back and says "yes that's written", then it's either been written or has been saved into battery backed cache that will survive problems such as host crashes or power failures. If the cache/disk subsystem fails in that promise, then that's really little different to having a normal disk fail and lose all your data.


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Bob Harold

On Sat, Sep 7, 2019 at 6:19 PM Simon Hobson <[hidden email]> wrote:
Jure Sah <[hidden email]> wrote:

> The documentation clearly states that using the dont-use-fsync option is
> not recommended.
>
> I am wondering what is the realistic impact of this? As I understand the
> kernel commits dirty pages to disk every 30 seconds by default, and this
> is configurable. Wouldn't this mean that at worst 30 seconds worth of
> leases are lost?

Yes, but that could be a rather serious loss of data for some operators. As always, there's no "one size fits all" answer, different operators will have different ideas on this.
Indeed, AIUI (from several years ago at least) the DHCP service in Windows Server massively outperformed the ISC DHCp server in benchmarks using out of the box settings. The reason for this was that the MS server did NOT fsync it's leases database and thus is vulnerable to exactly the issue you mention - also making non-compliant with the relevant RFC.
However, in their defence, they have "sort of" moved that security aspect to clients by making the clients very sticky about their leases - more so than other clients in my observations. That doesn't fully prevent the problem of the server missing knowledge of leases it's granted.

> The leases file is in most cases relatively tiny (under 1 MB)

That's probably a generalisation too far. Mine (at home) is only 20k, but as Andrew Bell has already pointed out, some people do have large lease files.

> From the past correspondence from the mailing list archive I surmise
> that people usually work around this by using hardware cache that does
> not obey fsync, which simply offloads the problem from the kernel to the
> cache controller and only superficially solves the problem.

Yes, but no.
Yes it offloads the problem, no it's not just a superficial fix. A "proper" hardware cache will be battery backed and can survive a crash or power failure of the host. So if we assume we're talking about the hardware cache in a disk controller (eg a RAID controller) then if the power goes off without the chance of an orderly shutdown, then the battery backed cache will hold the updates until the power comes back on again - at which point it will push the updates out to the disk(s).
There are other sorts of cache hardware. In the distant past I recall seeing (and drooling over !) a "magic box" that comprised a stack of RAM, some disks, a battery, and a controller. To the host it presented as a standard wide SCSI device (that dates it), while internally it was a big RAM disk. In the event of power failure, the battery would run the system long enough to write everything to disk.
In both cases (and others), under normal conditions it's safe to assume that if the "disk" comes back and says "yes that's written", then it's either been written or has been saved into battery backed cache that will survive problems such as host crashes or power failures. If the cache/disk subsystem fails in that promise, then that's really little different to having a normal disk fail and lose all your data.

If you have a failover pair, then changes to one server get sent to the other server, and I expect that to be immediate (but don't know that for sure), so if one server fails, you would only lose a few seconds or less.  In that case would running without fsync be reasonable?

-- 
Bob Harold


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Simon Hobson
Bob Harold <[hidden email]> wrote:

> If you have a failover pair, then changes to one server get sent to the other server, and I expect that to be immediate (but don't know that for sure), so if one server fails, you would only lose a few seconds or less.  In that case would running without fsync be reasonable?

AIUI failover updates are "instant" but there's a config option to batch them. Similarly IIRC there's now a config option to batch lease file updates/fsyncs ? Both these config options being there to allow adjustment of the performance/security tradeoff.

But you do raise a good point, with failover you have a near instantly updated backup of the leases. Apart from the obvious, there have been some interesting suggestions in the past for how that could be used.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

sthaug
> AIUI failover updates are "instant" but there's a config option to batch them. Similarly IIRC there's now a config option to batch lease file updates/fsyncs ? Both these config options being there to allow adjustment of the performance/security tradeoff.

We have been running a DHCP failover pair with

delayed-ack 28;
max-unacked-updates 10;

for several years, with lease files on battery backed RAID. It has
worked very well for us through those years.

Steinar Haug, Nethelp consulting, [hidden email]
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Jure Sah
In reply to this post by Simon Hobson
Apologies for a late response. I've read the other answers and
replication does seem like an interesting solution.

On 8. 09. 19 00:19, Simon Hobson wrote:

> Jure Sah <[hidden email]> wrote:
>
>> The documentation clearly states that using the dont-use-fsync option is
>> not recommended.
>>
>> I am wondering what is the realistic impact of this? As I understand the
>> kernel commits dirty pages to disk every 30 seconds by default, and this
>> is configurable. Wouldn't this mean that at worst 30 seconds worth of
>> leases are lost?
> Yes, but that could be a rather serious loss of data for some operators. As always, there's no "one size fits all" answer, different operators will have different ideas on this.
> Indeed, AIUI (from several years ago at least) the DHCP service in Windows Server massively outperformed the ISC DHCp server in benchmarks using out of the box settings. The reason for this was that the MS server did NOT fsync it's leases database and thus is vulnerable to exactly the issue you mention - also making non-compliant with the relevant RFC.
> However, in their defence, they have "sort of" moved that security aspect to clients by making the clients very sticky about their leases - more so than other clients in my observations. That doesn't fully prevent the problem of the server missing knowledge of leases it's granted.
>
>> The leases file is in most cases relatively tiny (under 1 MB)
> That's probably a generalisation too far. Mine (at home) is only 20k, but as Andrew Bell has already pointed out, some people do have large lease files.

Well, a typical modern server, especially if it's a dedicated machine
has at least 64 GB of RAM, of which the OS takes up at most 4 GB,
leaving a good 60 GB of cache space for that lease file.

Suffice it to say, the leases file is in all events tiny and could
easily fit in RAM several hundred times over.

>
>> From the past correspondence from the mailing list archive I surmise
>> that people usually work around this by using hardware cache that does
>> not obey fsync, which simply offloads the problem from the kernel to the
>> cache controller and only superficially solves the problem.
> Yes, but no.
> Yes it offloads the problem, no it's not just a superficial fix. A "proper" hardware cache will be battery backed and can survive a crash or power failure of the host. So if we assume we're talking about the hardware cache in a disk controller (eg a RAID controller) then if the power goes off without the chance of an orderly shutdown, then the battery backed cache will hold the updates until the power comes back on again - at which point it will push the updates out to the disk(s).
> There are other sorts of cache hardware. In the distant past I recall seeing (and drooling over !) a "magic box" that comprised a stack of RAM, some disks, a battery, and a controller. To the host it presented as a standard wide SCSI device (that dates it), while internally it was a big RAM disk. In the event of power failure, the battery would run the system long enough to write everything to disk.
> In both cases (and others), under normal conditions it's safe to assume that if the "disk" comes back and says "yes that's written", then it's either been written or has been saved into battery backed cache that will survive problems such as host crashes or power failures. If the cache/disk subsystem fails in that promise, then that's really little different to having a normal disk fail and lose all your data.

See and this is where I see the problem. I understand that this is a
software mailing list and that this might not exactly be obvious to
people who deal with things several abstraction layers above the
hardware... and I also understand that at the end of the day this might
not matter in the real world. However, if the question is the value of
fsync and battery-backed disk cache, consider the following:

When a write is executed, it is first built in the write buffer of the
application, from where it is transfered to the kernel file page memory
structure in system RAM. When an fsync or dirty page write is executed,
the kernel pushes the data over to the disk controller which stores it
in the hardware disk write buffer, and then transfers it to the physical
media.

If there is a power failiure, and it unluckily occurs before a dirty
page write or fsync, then the data is still in the system RAM and it
goes poof and is never committed to the battery backed hardware disk
write buffer, to be put into the disks on reboot. So exactly what impact
does the battery have on systems that do not carry out timely fsyncs?
And what impact do timely fsyncs have on systems that do not have
battery-backed storage cache?

It could be argued that systems not battery backed should not have
hardware disk cache. And it could be argued that systems without UPS
could loose data since the last write. But to argue that battery-backed
disk cache somehow helps in systems with fsync turned off is nonsense.


I've had some discussions on the topic on the other applications mailing
lists, and it appears that the developers of the software understand
that the primary purpose of regular fsyncs is to ensure atomic writes,
rather than to preserve seconds worth of leases. If there is an
unmitigated power failiure it is understood that there will be some data
loss, but the fsyncing is there to ensure that the leases database
remains in a recoverable state (in the case of the leases file, atomic
writes ensure that the leases file is syntactically correct). They
understood the performance bottleneck of their application due to fsync,
but conceded that without an atomic write mechanism by the underlying
filesystems, there was no real alternative.

Are there any ISC-DHCP devs or maintainers reading this list or should I
post over on the other mailing list? Basically I wish to know if anyone
has thought about an alternative to the atomic write problem, that has
fewer bottlenecks. Are there any plans, canceled ideas, etc?

LP,
Jure
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Simon Hobson-2
Jure Sah <[hidden email]> wrote:

> Well, a typical modern server, especially if it's a dedicated machine
> has at least 64 GB of RAM, of which the OS takes up at most 4 GB,
> leaving a good 60 GB of cache space for that lease file.

You've obviously worked for more generous managers than I have :-(

>>> From the past correspondence from the mailing list archive I surmise
>>> that people usually work around this by using hardware cache that does
>>> not obey fsync, which simply offloads the problem from the kernel to the
>>> cache controller and only superficially solves the problem.
>> Yes, but no.
>> Yes it offloads the problem, no it's not just a superficial fix. A "proper" hardware cache will be battery backed and can survive a crash or power failure of the host. So if we assume we're talking about the hardware cache in a disk controller (eg a RAID controller) then if the power goes off without the chance of an orderly shutdown, then the battery backed cache will hold the updates until the power comes back on again - at which point it will push the updates out to the disk(s).
>> There are other sorts of cache hardware. In the distant past I recall seeing (and drooling over !) a "magic box" that comprised a stack of RAM, some disks, a battery, and a controller. To the host it presented as a standard wide SCSI device (that dates it), while internally it was a big RAM disk. In the event of power failure, the battery would run the system long enough to write everything to disk.
>> In both cases (and others), under normal conditions it's safe to assume that if the "disk" comes back and says "yes that's written", then it's either been written or has been saved into battery backed cache that will survive problems such as host crashes or power failures. If the cache/disk subsystem fails in that promise, then that's really little different to having a normal disk fail and lose all your data.
>
> See and this is where I see the problem. I understand that this is a
> software mailing list and that this might not exactly be obvious to
> people who deal with things several abstraction layers above the
> hardware... and I also understand that at the end of the day this might
> not matter in the real world. However, if the question is the value of
> fsync and battery-backed disk cache, consider the following:
>
> When a write is executed, it is first built in the write buffer of the
> application, from where it is transfered to the kernel file page memory
> structure in system RAM. When an fsync or dirty page write is executed,
> the kernel pushes the data over to the disk controller which stores it
> in the hardware disk write buffer, and then transfers it to the physical
> media.
>
> If there is a power failiure, and it unluckily occurs before a dirty
> page write or fsync, then the data is still in the system RAM and it
> goes poof and is never committed to the battery backed hardware disk
> write buffer, to be put into the disks on reboot. So exactly what impact
> does the battery have on systems that do not carry out timely fsyncs?

I stink we are talking slightly different combinations of options.
I was talking about using battery backed cache to mitigate the performance issue of frequent fsyncs. So the steps between application building a new record and it being "secure" are fast, leaving the slow disk writes protected by battery backup to the cache.
If we are talking about when the application doesn't do fsyncs, then I agree with you.

> I've had some discussions on the topic on the other applications mailing
> lists, and it appears that the developers of the software understand
> that the primary purpose of regular fsyncs is to ensure atomic writes,
> rather than to preserve seconds worth of leases. If there is an
> unmitigated power failiure it is understood that there will be some data
> loss, but the fsyncing is there to ensure that the leases database
> remains in a recoverable state (in the case of the leases file, atomic
> writes ensure that the leases file is syntactically correct). They
> understood the performance bottleneck of their application due to fsync,
> but conceded that without an atomic write mechanism by the underlying
> filesystems, there was no real alternative.

It's my understanding that the fsyncs are to ensure that the data has been committed to permanent storage BEFORE the lease is offered to a client - as required by the DHCP RFCs. Atomic writes aren't really an issue - I strongly suspect that the server builds a whole lease file record in the buffer and passes that in one write operation, and if that's the case, then the only non-atomic write issue would be if multiple writes were made (without fsync) such that lease file records cross a disk buffer boundaries.
If a lease file entry were truncated, then I believe the server can deal with this on startup by discarding the incomplete record.

So yes, ensuring atomic writes is a by-product of using fsyncs - but not (in this case) the primary reason.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Jure Sah
On 27. 09. 19 19:46, Simon Hobson wrote:
> It's my understanding that the fsyncs are to ensure that the data has been committed to permanent storage BEFORE the lease is offered to a client - as required by the DHCP RFCs. Atomic writes aren't really an issue - I strongly suspect that the server builds a whole lease file record in the buffer and passes that in one write operation, and if that's the case, then the only non-atomic write issue would be if multiple writes were made (without fsync) such that lease file records cross a disk buffer boundaries.
> If a lease file entry were truncated, then I believe the server can deal with this on startup by discarding the incomplete record.
>
> So yes, ensuring atomic writes is a by-product of using fsyncs - but not (in this case) the primary reason.

I see, so if theoretically the DHCP server could work in such a way as
to "fsync asynchronously" (oxymoron I know), in that the lease could be
written to disk before it is offered to the client, but not necessarily
before the next lease could be served, then this would adhere to the
RFC, represent no real performance penalty, while also not limiting the
rate at which requests can be served.

What changes would be required for this to work? A memory pool of
offered IP addresses which are not to be assigned to new clients?
Rotating a pool of physical disks to store leases in? Multiple
synchronised DHCP servers?

I am just trying to find a solution to this problem that is infinitely
scalable.

LP,
Jure


_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

sthaug
> What changes would be required for this to work? A memory pool of
> offered IP addresses which are not to be assigned to new clients?
> Rotating a pool of physical disks to store leases in? Multiple
> synchronised DHCP servers?
>
> I am just trying to find a solution to this problem that is infinitely
> scalable.

But in practice you don't need infinite scalability, because you don't
have an infinite number of customers.

It's reasonably simple to scale up by using fast storage (battery backed
RAID with a suitable amount of memory is common), more servers and more
locations. Of course it has a cost - but so does your (or my) time.
Where do you want to spend your money?

Steinar Haug, Nethelp consulting, [hidden email]
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Jure Sah
On 28. 09. 19 15:48, [hidden email] wrote:

>> What changes would be required for this to work? A memory pool of
>> offered IP addresses which are not to be assigned to new clients?
>> Rotating a pool of physical disks to store leases in? Multiple
>> synchronised DHCP servers?
>>
>> I am just trying to find a solution to this problem that is infinitely
>> scalable.
> But in practice you don't need infinite scalability, because you don't
> have an infinite number of customers.
>
> It's reasonably simple to scale up by using fast storage (battery backed
> RAID with a suitable amount of memory is common), more servers and more
> locations. Of course it has a cost - but so does your (or my) time.
> Where do you want to spend your money?
Well I quite disagree, even with SSD RAID arrays, the ultimate
performance is finite, whereas demands on a DHCP service for instance by
mobile clients is quite significant and it's easy to have too many
customers for a single server.

LP,
Jure
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Simon Hobson
Jure Sah <[hidden email]> wrote:

> I see, so if theoretically the DHCP server could work in such a way as
> to "fsync asynchronously" (oxymoron I know), in that the lease could be
> written to disk before it is offered to the client, but not necessarily
> before the next lease could be served, then this would adhere to the
> RFC, represent no real performance penalty, while also not limiting the
> rate at which requests can be served.
>
> What changes would be required for this to work? A memory pool of
> offered IP addresses which are not to be assigned to new clients?
> Rotating a pool of physical disks to store leases in? Multiple
> synchronised DHCP servers?


I suspect it would be a very very major redesign of the code. I would guess a multi-threaded server where each thread could handle one request - writing a lease record and responding to the client before being available to handle another request. As long as the lease record were built in a local buffer and written in an atomic write, then lease file integrity would be assured.
I suspect it would need a single thread responsible for arbitrating access to the database - allowing service threads to acquire an address for a client and mark it as in use as an atomic operation to prevent race conditions between competing service threads.

Given that the existing single-thread code can handle large numbers of clients with suitable hardware, and there are methods (see below) for splitting the load if needed, and there are few users needing this level of performance, I can't see this being a good use of developer time. Of course, if you really need something, then the ISC are open to people who will sponsor development work.

BTW - I don't know how much of this applies to Kea, which is ISCs replacement for the venerable DHCP server. Not sure of the status at the moment, but at launch, Kea (while having some advantages) was not feature complete against the older server.


Jure Sah <[hidden email]> wrote:

> Well I quite disagree, even with SSD RAID arrays, the ultimate
> performance is finite, whereas demands on a DHCP service for instance by
> mobile clients is quite significant and it's easy to have too many
> customers for a single server.

You can have multiple servers on a network.
It's a bit more involved if you have a huge flat network, but that would be poor network design anyway. Particularly if you have multiple networks (different subnets, or more correctly, multiple broadcast domains) then it's fairly easy to have different servers (or pairs) serving groups of networks. You can also have multiple failover pairs, and a server can be in different failover relationships with different pools. You can have (say) four servers on one network, either as simply A-B and C-D with half the available pool configured on each pair, or it can be done A-B, B-C, C-D, and D-A with a quarter of the pool configured on each failover pairing. Particularly with the latter arrangement, you can reduce overlap on DHCP-Discover messages by getting one half of each pair to ignore clients until they have been trying for a certain time, or perhaps using Agent-CircuitID (at the expense of more processing).
In the past, one rather ingenious idea was to have a local server at each access point (the specific case in mind was a distributed wireless access system where access points did not necessarily have battery backup) - with no permanent storage. Each of these servers has a failover pairing with a central server. Normal operation is handled by the remote server using RAM-disk for very fast storage with failover (which is async) providing a backup to the central server. If the remote server fails or has maintenance etc, a relay agent can be turned on to allow the central server to handle clients, and the remote server can reload it's database from the central server using the failover protocol when it restarts. This does, however, trade off a certain amount of risk in that there is a window between local operations and the result being reflected via failover in the central database.

So a few options there.

_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users
Reply | Threaded
Open this post in threaded view
|

Re: dont-use-fsync real world impact

Niall O'Reilly
In reply to this post by Jure Sah
On 28 Sep 2019, at 12:33, Jure Sah wrote:

> I see, so if theoretically the DHCP server could work in such a way as
> to "fsync asynchronously" (oxymoron I know), in that the lease could be
> written to disk before it is offered to the client, but not necessarily
> before the next lease could be served, then this would adhere to the
> RFC, represent no real performance penalty, while also not limiting the
> rate at which requests can be served.
>
> What changes would be required for this to work?

Have you read the documentation for the delayed-ack and max-ack-delay
statements?

> [...]
>
> I am just trying to find a solution to this problem that is infinitely
> scalable.

To echo Steinar Haug, neither the network nor the client population is
infinitely scalable.

IMHO, it makes sense to have infrastructure scale in proportion to
demand, and to use monitoring to identify both when there is insufficient
headroom and where the real choke-points are.

The most significant bottleneck to DHCP services which I ever had to deal
with was due to excessive fsync in the syslog process, not even in dhcpd.

Niall O'Reilly
_______________________________________________
dhcp-users mailing list
[hidden email]
https://lists.isc.org/mailman/listinfo/dhcp-users