I've worked on moderately busy backend platforms (~10K-20k rps handled on a ~4 e...

blattimwind · on June 28, 2018

0 - Caching antipattern 101:

    key = calculate_cache_key()
    if not cache.has(key):
        data = expensive_calculation()
        cache.store(key, data)
    else:
        data = cache.get(key)

nicktrocado92 · on June 28, 2018

Why it's an antipattern ?

mnutt · on June 28, 2018

It’s susceptible to Thundering Herd whereby more requests come in for the same cache key before the initial computation is finished, and so you end up with lots of cache misses. The fix is usually to lock the cache key and have subsequent requests wait on the original computation but it’s a bit more complex to code.

testplzignore · on June 28, 2018

Just be careful to lock the cache key, and not the cache. I've seen the latter done often, which of course kills performance if there are cache misses for multiple keys at the same time.

tootie · on June 28, 2018

I've heard it called Cache Stampede. Any decent framework for memoizing method calls would cover this case though.

patmcguire · on June 28, 2018

It depends on the backing, if you're using just memcached there's no way to completely lock the key. None of the popular rails caching frameworks I've seen handle this either.

ndrr · on June 28, 2018

Do you have an example of one? I'm curious what features they provide over ad hoc memoization mechanics.

Jach · on June 28, 2018

If you're in the Java ecosystem, the CacheBuilder in Guava is pretty good: https://google.github.io/guava/releases/19.0/api/docs/com/go...

By default it handles the case of concurrent retrieval on the same key (the second one will just wait for the first one to finish and use that value rather than starting a duplicate computation). It also lets you configure more interesting things like eviction strategies, removal notifications, and statistics.

Last year was a lesson for me in why caches are a hard problem as I had to debug many cache issues from other people not thinking things through... (At least one of the issues was my own fault. :)) Since then whenever someone suggests we use a cache I instinctively pull out a set of questions[0] to ask. The three questions Guava has you consider can also lead you to using memcached or the like instead but my set tries to answer the question "Do you even need a cache?" and if so, generating helpful design documentation.

    Is the code path as fast as it can possibly be without the cache? Do you have numbers?
    Will the cache have a maximum size, or could it grow without bound?
    If it grows without bound, either because of unbounded entries or because of unbounded memory for any particular entry, under what conditions will it consume all system memory?
    If it has a maximum size, how does it evict things when it reaches that size?
    Are you trying to cache something that could change?
    If so, how is the cache invalidated?
    How can it be invalidated / evicted manually by another thread or signal? (Debuggability, testability, inspectability/monitorability, hit rate and other statistics?)
    Is there a race condition possibility for concurrent stores, retrieves, and evicts?
    How constrained are your cache keys, that is, what does it need to know about to create one?
    Do they need to take into account global info?
    Do they need to take into account contextual information (like organization ID if your application server runs in a multi-tenant system, or user ID, or browser type, or requested-language)?
    Or do they only depend on direct inputs to that code path?

[0] https://www.thejach.com/view/2017/6/caches_are_evil -- need to update it a bit but not much...

dangets · on June 30, 2018

A coworker introduced me to Caffeine which I think shares authors with Guava's cache. Kind of a 2.0 /lessons learned iteration.

https://github.com/ben-manes/caffeine

tootie · on June 28, 2018

https://docs.spring.io/spring/docs/5.0.0.BUILD-SNAPSHOT/spri...

workinfunk · on June 29, 2018

No it wouldn't, memoization can't provide responses to calls it hasn't seen yet, that's the whole point.

memset · on June 28, 2018

This is interesting because this is generally how I implement caching! What would pseudocode look like for a non-antipattern?

brightball · on June 28, 2018

Depends on what you are doing.

To frame it another way, what happens if 100 requests come in at the same time for that expensive value when it isn’t in the cache yet? The expensive calculation will be run 100 times at the same time.

Ideally, you’d rather refresh the cache value in the background once and never allow duplicate requests for it from the web.

If you’re running a language that makes it easier to deduplicate requests for certain data, the original approach will last longer. The CacheEx library in Elixir, for example, will only run the expensive calculation once, set the cache and then send the value back to everything that requested it while it was loading.

rapind · on June 29, 2018

CacheEx sounds interesting. Basically a debounce. Pretty sure you could solve this outside of the application layer with Varnish but that depends on how the view is composed. I prefer using a grace / stale period but that only works if it's acceptable to return stale data during computation instead of queuing it up.

brightball · on June 29, 2018

Well CacheEx is just using functionality that comes naturally on the BEAM here. This is one of the reasons that a lot of CDN's like Cloudfront are written in Erlang.

CacheEx gets the ability to check for the presence of the cache key in Erlang Term Storage (ETS) which is basically an in-memory cache. If the key is present, it just returns the value.

If it's not, it sends checks to see if a process exists with the cache key name. If there isn't one, it creates one to request the resource.

For any other requests that come in until the value has been created, they will be directed to the process that is getting the value.

When the process that was calculating things comes back, it will save the value to ETS and then also send it back to all of the queued processes that have been waiting for it.

In the case of Varnish, you'd be expecting it to send back the entire completed view...HTML and all. This isn't something that you need to worry about with Elixir because the view is never actually rendered in the application. It's broken down into pieces that are never duplicated in memory and then replayed directly to the socket...meaning you really only ever need to cache expensive data and not what it's transformed into.

Here's a good read on why this view layer is so fast, if you're curious. Most people report shock that their uncached performance with Elixir and Phoenix is on par with statically cached HTML. I didn't believe it until I saw it myself.

https://www.bignerdranch.com/blog/elixir-and-io-lists-part-2...

rapind · on June 30, 2018

That's pretty awesome. Varnish is my go to for Rails apps (and almost always necessary).

blattimwind · on June 28, 2018

The easiest workaround is to acquire a lock (e.g. redis redlock) before starting to refresh the cache. That way only one process will perform expensive_calculation(). A good caching library will generally do some variation on this for you.

For larger, more complex scenarios the techniques mentioned by GGP above work well, i.e. not doing expensive_calculation() in your app process at all.

The issue has a bunch of names, I know at least three: cache stampede, thundering herd and dogpile effect.

steventhedev · on June 28, 2018

there's a race condition between checking if something is in the cache and actually putting it in the cache. A more correct solution would have all threads wait while one does the actual work.

This gets more complicated if you have a distributed cache and/or distributed application servers. The typical solution there is to allow at most one computation per process/device.

hinkley · on June 29, 2018

If your language supports promises, cache the promise and not the result. If it doesn’t support promises, find or write one.

dustingetz · on June 28, 2018

CloudFront has this anti-pattern it drives me nuts!

rapind · on June 29, 2018

Varnish + Grace period == solved

dustingetz · on June 28, 2018

The amazing thing about 1 is that modern immutable database patterns get it for free. When things like https://www.datomic.com/ get mainstream the world is gonna be in a very different place. Not as in today's apps get faster, but as in they are so much faster that new kinds of apps become possible that couldn't have been dreamed of before. My startup http://www.hyperfiddle.net is an experiment in this space

spraak · on June 28, 2018

I hope you can hire a designer and frontend engineer soon! I understand it's an experiment but the site looks 15 years outdated. In any case it seems interesting, good luck!

iKSv2 · on June 28, 2018

Do this (point 1) properly and ~80% of your "speed" problems are solved, for a simple web-app-db tiered app.

mklepaczewski · on June 28, 2018

1 - TTL should be infinite. If one needs finite TTL it means that cache invalidation logic is bogus.

dgritsko · on June 28, 2018

Or it could mean that the cost of cache invalidation is greater than the cost of allowing stale data to be read for the remainder of the TTL, or any other number of justifications for finite TTL. The point is, it depends on your application - while it's true that shorter TTLs can mask faulty invalidation logic, I don't think it's correct to say that "TTL should be infinite" with no qualifiers whatsoever.

mklepaczewski · on June 28, 2018

If that's the desired behavior then cache layer should allow one to use stale (invalidated ) content. Cache layer (or application layers above it) should be aware of the status of cached data.

It's fine if stale data is used deliberately, problems arise when stale data is assumed to be 'fresh'.

snowwrestler · on June 28, 2018

True but cache invalidation is famously hard to get perfect.

A nicer way to put it might be that finite TTLs should be viewed as an opportunity to optimize, not ideal or standard.

mklepaczewski · on June 28, 2018

True, caching even simple data dependant on 2-3 variables may require writing dozens of test cases - in practice, it often turns out that the data doesn't rely on 2-3 but 5-10+ variables.

However, just because it's hard it doesn't mean we shouldn't do it the right way.

jungturk · on June 28, 2018

Of course it does - negotiating trade-offs and accepting the set that best achieves your objectives is a large part of software design (and also undermines the notion of doing things a "right way").

Wouldn't deploying a less-than-precise TTLs be an appropriate trade-off for non-transactional caches, caches subject to network partitions, caches that can't enroll in invalidation messages, caches that can't poll for change sets, caches that can't implement eviction policy, etc?

Certainly the cache should be transparent about the trade-offs it has made (such as not promising authoritative data if it's accepting eventual consistency via TTL).

InclinedPlane · on June 29, 2018

s/bogus/imperfect/

It's impossible to create perfect systems, so it always makes sense to give yourself an extra out to protect you from unanticipated defects (otherwise known as a "belt and suspenders" approach).

godelmachine · on June 28, 2018

By TTL, are you referring to Transistor Transistor Logic?

dgritsko · on June 28, 2018

TTL in this context is "time-to-live" - "a mechanism that limits the lifespan or lifetime of data in a computer or network" (https://en.wikipedia.org/wiki/Time_to_live). E.g., the duration of a cache entry.