Eyes wide open – Correct Caching is always hard

Dev Tips



In my last post I talked about Caching and some of the stuff I’ve been doing to cache the results of a VERY expensive call to the backend that hosts my podcast.

As always, the comments are better than the post! Thanks to you, Dear Reader.

The code is below. Note that the MemoryCache is a singleton, but within the process. It is not (yet) a DistributedCache. Also note that Caching is Complex(tm) and that thousands of pages have been written about caching by smart people. This is a blog post as part of a series, so use your head and do your research. Don’t take anyone’s word for it.

Bill Kempf had an excellent comment on that post. Thanks Bill! He said:

The SemaphoreSlim is a bad idea. This “mutex” has visibility different from the state it’s trying to protect. You may get away with it here if this is the only code that accesses that particular key in the cache, but work or not, it’s a bad practice.
As suggested, GetOrCreate (or more appropriate for this use case, GetOrCreateAsync) should handle the synchronization for you.

My first reaction was, “bad idea?! Nonsense!” It took me a minute to parse his words and absorb. Ok, it took a few hours of background processing plus I had lunch.

Again, here’s the code in question. I’ve removed logging for brevity. I’m also deeply not interested in your emotional investment in my brackets/braces style. It changes with my mood. 😉

public class ShowDatabase : IShowDatabase
{
private readonly IMemoryCache _cache;
private readonly ILogger _logger;
private SimpleCastClient _client;

public ShowDatabase(IMemoryCache memoryCache,
ILogger<ShowDatabase> logger,
SimpleCastClient client){
_client = client;
_logger = logger;
_cache = memoryCache;
}

static SemaphoreSlim semaphoreSlim = new SemaphoreSlim(1);

public async Task<List<Show>> GetShows() {
Func<Show, bool> whereClause = c => c.PublishedAt < DateTime.UtcNow;

var cacheKey = "showsList";
List<Show> shows = null;

//CHECK and BAIL - optimistic
if (_cache.TryGetValue(cacheKey, out shows))
{
return shows.Where(whereClause).ToList();
}

await semaphoreSlim.WaitAsync();
try
{
//RARE BUT NEEDED DOUBLE PARANOID CHECK - pessimistic
if (_cache.TryGetValue(cacheKey, out shows))
{
return shows.Where(whereClause).ToList();
}

shows = await _client.GetShows();

var cacheExpirationOptions = new MemoryCacheEntryOptions();
cacheExpirationOptions.AbsoluteExpiration = DateTime.Now.AddHours(4);
cacheExpirationOptions.Priority = CacheItemPriority.Normal;

_cache.Set(cacheKey, shows, cacheExpirationOptions);
return shows.Where(whereClause).ToList(); ;
}
catch (Exception e) {
throw;
}
finally {
semaphoreSlim.Release();
}
}
}

public interface IShowDatabase {
Task<List<Show>> GetShows();
}

SemaphoreSlim IS very useful. From the docs:

The System.Threading.Semaphore class represents a named (systemwide) or local semaphore. It is a thin wrapper around the Win32 semaphore object. Win32 semaphores are counting semaphores, which can be used to control access to a pool of resources.

The SemaphoreSlim class represents a lightweight, fast semaphore that can be used for waiting within a single process when wait times are expected to be very short. SemaphoreSlim relies as much as possible on synchronization primitives provided by the common language runtime (CLR). However, it also provides lazily initialized, kernel-based wait handles as necessary to support waiting on multiple semaphores. SemaphoreSlim also supports the use of cancellation tokens, but it does not support named semaphores or the use of a wait handle for synchronization.

And my use of a Semaphore here is correct…for some definitions of the word “correct.” 😉 Back to Bill’s wise words:

You may get away with it here if this is the only code that accesses that particular key in the cache, but work or not, it’s a bad practice.

Ah! In this case, my cacheKey is “showsList” and I’m “protecting” it with a lock and double-check. That lock/check is fine and appropriate HOWEVER I have no guarantee (other than I wrote the whole app) that some other thread is also accessing the same IMemoryCache (remember, process-scoped singleton) at the same time. It’s protected only within this function!

Here’s where it gets even more interesting.

  • I could make my own IMemoryCache, wrap things up, and then protect inside with my own TryGetValues…but then I’m back to checking/doublechecking etc.
  • However, while I could lock/protect on a key…what about the semantics of other cached values that may depend on my key. There are none, but you could see a world where there are.

Yes, we are getting close to making our own implementation of Redis here, but bear with me. You have to know when to stop and say it’s correct enough for this site or project BUT as Bill and the commenters point out, you also have to be Eyes Wide Open about the limitations and gotchas so they don’t bite you as your app expands!

The suggestion was made to use the GetOrCreateAsync() extension method for MemoryCache. Bill and other commenters said:

As suggested, GetOrCreate (or more appropriate for this use case, GetOrCreateAsync) should handle the synchronization for you.

Sadly, it doesn’t work that way. There’s no guarantee (via locking like I was doing) that the factory method (the thing that populates the cache) won’t get called twice. That is, someone could TryGetValue, get nothing, and continue on, while another thread is already in line to call the factory again.

public static async Task<TItem> GetOrCreateAsync<TItem>(this IMemoryCache cache, object key, Func<ICacheEntry, Task<TItem>> factory)
{
if (!cache.TryGetValue(key, out object result))
{
var entry = cache.CreateEntry(key);
result = await factory(entry);
entry.SetValue(result);
// need to manually call dispose instead of having a using
// in case the factory passed in throws, in which case we
// do not want to add the entry to the cache
entry.Dispose();
}

return (TItem)result;
}

Is this the end of the world? Not at all. Again, what is your project’s definition of correct? Computer science correct? Guaranteed to always work correct? Spec correct? Mostly works and doesn’t crash all the time correct?

Do I want to:

  • Actively and aggressively avoid making my expensive backend call at the risk of in fact having another part of the app make that call anyway?
    • What I am doing with my cacheKey is clearly not a “best practice” although it works today.
  • Accept that my backend call could happen twice in short succession and the last caller’s thread would ultimately populate the cache.
    • My code would become a dozen lines simpler, have no process-wide locking, but also work adequately. However, it would be naïve caching at best. Even ConcurrentDictionary has no guarantees“it is always possible for one thread to retrieve a value, and another thread to immediately update the collection by giving the same key a new value.”

What a fun discussion. What are your thoughts?


Sponsor: SparkPost’s cloud email APIs and C# library make it easy for you to add email messaging to your .NET applications and help ensure your messages reach your user’s inbox on time. Get a free developer account and a head start on your integration today!




















Source link

Leave a Reply