This Squid vs. Varnish comparison is quite similar to the sync vs. async debate ...

jacquesm · on July 28, 2010

No matter what the quality of your VM, if you're going to have several IO hits versus only the one where the VM (even a crappy one) pages the data in or out just once you will always be faster in a scenario like phk describes.

Doubling or even quadrupling your IO operations is very expensive.

In a situation such as the one for which this article is meant you set things up in advance to never get in to a situation where you start trashing your disk, programs are allocated a fixed amount of memory and if a program does not abide by that it is considered faulty.

The trashing situation you describe can happen on machines that are run with less rigid setups, but on a production server that you count on serving up a few billion files every day you can't afford the luxury of random scripts firing off CRON and other niceties like that.

Custom kernel, very limited set of processes that you know are 'well behaved', as predictable as possible.

haberman · on July 28, 2010

> No matter what the quality of your VM, if you're going to have several IO hits versus only the one where the VM (even a crappy one) pages the data in or out just once you will always be faster in a scenario like phk describes.

You can keep your own cache explicitly in user-space, and get multiple hits with a single load into RAM.

> but on a production server that you count on serving up a few billion files every day you can't afford the luxury of random scripts firing off CRON and other niceties like that.

In a data center where you have tens of thousands of heterogenous jobs competing for thousands of machines, you can't afford the luxury of giving out exclusive access to a machine. You have to have good enough isolation that multiple jobs can run on the same machine without impacting each other negatively. As CPUs get more cores this will become even more important.

jacquesm · on July 28, 2010

The whole point of this article - and it is a very good point - is that keeping your cache in user-space is not the right way to approach the problem. And you can get multiple hits anyway if you make sure that data that will expire together will end up in the same page.

Your other description does not match the use case of a production web server running varnish instances as the front-end.

haberman · on July 28, 2010

The whole point of my comment is that PHK's analysis leaves out many downsides of leaving it all to the kernel.

His main argument against doing it is user-space is that you will "fight with the kernel's elaborate memory management." But if you just turn off swap completely and read files with read/write instead of mmap(), there is no fight. Everything happens in user-space.

Leaving it all to the kernel has many disadvantages as I spent many paragraphs explaining.

jacquesm · on July 28, 2010

I missed the 'if you just turn off swap completely' bit in the paragraphs above.

Edit: even on re-reading it all I can't find it.

blasdel · on July 29, 2010

> do I use some OS abstraction (Virtual Memory or threads, respectively)

Bzzt! You got your analogy backwards.

Using a thread pool is doing it yourself from cross-platform primitives that work on even the shittiest UNIX-wars-era platform, and an event loop using epoll/kqueue is the modern pure OS abstraction.

haberman · on July 29, 2010

Read the second-half of the sentence you quoted: "as my application's primary scheduling mechanism." Using O(requests) threads leaves the OS in charge of scheduling CPU tasks, just as using VM leaves the OS in charge of scheduling I/O.

> Using a thread pool is doing it yourself from cross-platform primitives that work on even the shittiest UNIX-wars-era platform, and an event loop using epoll/kqueue is the modern pure OS abstraction.

You are very confused. First of all, epoll/kqueue are just optimizations of select(2), which first appeared in 4.2BSD (released in 1983). No standard interface for threading on UNIX appeared until pthreads was standardized in 1995.

But all of this assumes that my argument has anything at all to do with history. It does not. The question is whether you are leaving the OS in charge of scheduling decision or not.

With select/poll/epoll/kqueue/etc, the OS wakes you up and says "here is a set of events that are ready." It does not actually schedule any work. The application gets to decide, at that moment, what work to do next.

Contrast this with O(requests) threads or VM. If several threads are available to run, the OS chooses which one will run based on its internal scheduling policy. Likewise with VM, the OS is responsible for scheduling pages of RAM and when they will be evicted, based on its own internal logic and policy. This is what makes them higher-level primitives.

gduffy · on July 28, 2010

> the program can get swapped out to service data I/O

mlock() and friends can help for server applications.