The term, at least to me, does not necessarily refer to storage of large amounts of data. At Bloomberg, we process a large amount of data in realtime. When focusing on the processing of realtime market data, the hardware topology looks different than when you are, say, doing a map reduce across a huge number of nodes. You often have a single point of entry for a particular stream (with redundancy, of course), but then the trick is to efficiently distribute the data to the many places that require it with as near zero latency as possible. Obviously there is a need for longer term storage and instantaneous retrieval of billions of data points, but the largest focus is on near-zero latency from point of consumption, distributing to all internal nodes/apps and to many thousands of sites in 180+ countries with a large number of "nines".
The biggest challenge is that data feeds originate in nearly all of those countries and also need to be distributed efficiently to every other country. (e.g., NASDAQ originates in the US and reaches around the globe, and the same is true for realtime feeds on the opposite side of the globe in the Middle East, India, Singapore, Hong Kong, Tokyo, etc.) The Internet is not reliable from a latency point of view so coupled with the required hardware is the required network. We operate one of the largest private networks in the world.
edit: Also, from a processing point of view we have had great success with speeding up complex algorithms that would normally take minutes to run across huge compute clusters, bringing them down to seconds by porting them to run on large GPU clusters. Certain things are definitely suited for running on GPUs, but I feel it is still pretty foreign to most programmers and hard for companies to decide to jump into that kind of project. You're starting to see more specialized use of GPU or slower-clock-but-massively-parallel compute devices for a wider variety of tasks. (e.g., http://gigaom2.files.wordpress.com/2011/07/facebook-tilera-w...)