This is one of the things that drives me crazy about some of Apple's technologies. For instance, somebody at Apple decided long ago that all application HTML help pages should be cached. The "off switch" for this cache remains a bit of black magic but it's something like "rm -Rf com.apple.help.DamnedNearEverything" followed by "killall UnnecessaryBackgroundHelpProcess" every damned time you modify a page or else the help system might show you an older version of the content that you just "changed".
...
FILE *fp = fopen("example.txt", "r");
char dest;
int bytes_read = fread(&dest, 1, 1, fp);
putchar(dest);
...
Think of how many caches likely contain the first byte of example.txt. There's the internal cache on the hard disk or SSD. There's the OS's filesystem cache in RAM. There's your copy (dest) in RAM, and also in L3, L2, and L1 cache. (These aren't inclusive on modern Intel CPUs. I'm just talking about likelihood.) Implementing your own software RAM cache puts you well into diminishing returns. The increased complexity simply isn't worth it.Do you really want to read the file every time at request comes in? No, you're going to read it once and store it in an indexed set for quick lookup. You just cached a local data file.
It's about the benefit vs. not caching. Not about local/remote.
But let's analyze your example. If disk reads take tens of seconds and memory usage is high enough to purge the kernel's disk cache, nothing can save you. Had your process read in everything at the start, it would be using even more memory. Given the same load, one of two things will happen:
1. If you have swap enabled, parts of your process's memory will be swapped-out. Accessing "memory" in this case would cause a page fault and tens of seconds of delay.
2. If you have swap disabled, the OOM-killer will reap your process. When it respawns, it's going to read lots of stuff from disk... and disk reads take tens of seconds. Oops.
Even if an application-level data cache improved performance on heavily-loaded shared hosts, the added costs of software development and maintenance far exceed the cost of better hardware. Hardware is cheap. Developers are expensive.
"I'll grab all the memory I can so others can't use it" is a horrible way to think, as anyone who has attempted to simultaneously use multiple applications written with this mindset will know. One takes most of the memory, forcing other apps into swap, and then the opposite happens when you start working with one of the others, accompanied by massive swapping slowdowns.
(With caveats for zero resource projects of course, but even for those I strongly suspect for many people paying $5 or $10 per month for "less crap hosting" is probably a better solution that prematurely optimising by adding caching and all it's inherent complexity to a fundamentally broken platform)
I've seen many devs jump to caching before investing time in understanding what is really causing performance problems (I was one of them for a time, of course). Modern web stacks can scream without any caching at all.
Years ago, a talk by Rasmus Lerdorf really opened my eyes up to this idea. [1] He takes a vanilla PHP app (Wordpress, I think) and dramatically increases its throughput by identifying and tweaking a few performance bottlenecks like slow SSL connections. One of the best lines: "Real Performance is Architecture Driven"
[1] I think it was a variation of this one: https://vimeo.com/13768954
By dropping a cache into an existing system, you're weakening consistency in the name of performance. At best, your strongly-consistent system has started taking on eventually-consistent properties (but maybe not even eventual depending on how you invalidate/expire what's in your cache). Eventual consistency can help you scale, but reasoning about it is really hard.
In some sense caching as described by OP is a tool to implement CAP theorem tradeoffs, and Eric Brewer described the reality of trading off the C (consistency) for A/P (availability/partition-tolerance) better than I ever could:
Another aspect of CAP confusion is the hidden cost of
forfeiting consistency, which is the need to know the
system’s invariants. The subtle beauty of a consistent
system is that the invariants tend to hold even when the
designer does not know what they are. Consequently, a
wide range of reasonable invariants will work just fine.
Conversely, when designers choose A, which requires
restoring invariants after a partition, they must be
explicit about all the invariants, which is both
challenging and prone to error. At the core, this is the
same concurrent updates problem that makes multithreading
harder than sequential programming.With that in mind, I do think most of the pitfalls listed here can be avoided with well-understood tools and techniques. There's no real need to be running your cache in-process with your GC'd implementation language. Cache refilling can be a complex challenge for large scale sites, but I expect that a majority of systems can live with slower responses while the cache refills organically from traffic.
The points about testing and reproducible behavior are dead on - no equivocation needed there. As always keeping it as simple as possible should be a priority.
There's no real need to be running your cache in-process with your GC'd implementation language.
Fundamentally there's no need, but in-memory caching may still be the right choice. As always, there are tradeoffs. Standing up a separate cache component incurs non-trivial costs. Your service now has a new "unit of management" - a new thing you need to deploy, monitor, and scale. It's a separate thing which might go down unless it's provisioned for sufficient load, and you need to be careful about unwittingly introducing a new bottleneck or failure mode in your system. These are all solvable problems, but solving them comes at a cost.You can totally argue that engineers should be forced to think about and address these issues up front with more rigor, and in a perfect world I think I'd agree. :)
That said, caching is absolutely critical to almost every piece of software ever. Even if you explicitly caching isn't used, a wide variety of caches are likely still being depending upon including CPU caching (L1, L2, L3), OS filesystem caching, DNS caching, ARP caching, etc etc.
Caching certainly adds complexity but it's also one of the best patterns for solving a wide range of performance problems. I would recommend developers spend more time learning and understanding the complexities so that they can make use of caching correctly and without applying it as a premature optimization.
I've seen applications that have 5 redundant caches, if not more (on-disk cache, OS cache, VM OS cache, stdlib cache, programmer-visible cache). And then you end up killing the actually-important caches (CPU caches, etc) from the amount of redundant copying required...
I think that, if a cache is combined with a push indicating a change, then it's basically a local "eventually consistent replica" which catches up as soon as there is a connection to the source of truth.
Seriously, many times you are READING data which changes rarely (read: every X minutes / hours / days). So, in the meantime, every code path that will need access to the data may as well look in the local snapshot first.
The question about consistency is an interesting one. The client's view of the authoritative server state may be slightly out of date, when the user issues a request. If certain events happened in the meantime that affect the user's view, then the action can just be kicked back to the user, to be resolved. But 90%+ of the time, the view depends on 10 things that "change rarely", so a cache is a great improvement.
Related issues involve batching / throttling / waiting for already-sent requests to complete.
PS: That was quick. I posted this and literally 10 seconds later it got a downvote.
A cache can still be useful if to reduce load and increase capacity... but latency becomes more complex.
Certainly caching is vital to many distributed systems, but it has to be done from a systems perspective. In my experience a lot of caches are just slapped on top of individual components without much thought, and without even some basic monitoring of what the hit rate is. I think it helps to actually measure what the cache is doing for you -- but this is more work than adding the cache itself.
And I agree with another poster in that I've seen many systems with caches papering over severe and relatively obvious performance problems in the underlying code.
I was thinking of this Google publication which outlines some problems with latency variability: http://www.barroso.org/publications/TheTailAtScale.pdf
Interestingly they didn't seem to list caches as one of the causes; they list shared resources, cron jobs, queuing, garbage collection, power saving features, etc.
The problem is that implementing caching is a bit of a canary in a coal mine. If there are problems with the architecture, then trying to add caching into the mix will make things much more difficult.
I wouldn't say adding cache to parts which you know will be heavily read, upfront (or at least adding hoods to make it easier to implement later) is a waste of time or "Premature Optimisation". The 80-20 rule is live and well, just use your judgement.
I wonder how many sleepless nights have been caused by combining the two.