I'd personally be wary of putting anything too important into VRAM. About five years ago I did a bunch of work testing consumer GPU memory for reliability [1, 2]. Because until that time GPUs were primarily used for error-tolerant applications (graphics) storing only short-lived data (textures) in memory, there wasn't a whole lot of pressure to make sure the memory was as reliable as that found on the main system board. We found that indeed, there was a persistent, low level of memory errors that could be triggered depending on access pattern. I haven't followed up for recent generations, but the fact that the "professional" GPGPU boards both clock their memory slower and include hardware ECC is possible cause for concern with leaving anything too important on the GPU for a long time.
There's code [3,4], too, but I haven't actively worked on it in a few years, so no guarantees on how well it runs nowadays...
[1] http://cs.stanford.edu/people/ihaque/papers/gpuser.pdf
[2] http://cs.stanford.edu/people/ihaque/talks/gpuser_lacss_oct_...
GPUfs: Integrating a File System with GPUs. Mark Silberstein (UT Austin), Bryan Ford (Yale University), Idit Keidar (Technion), Emmett Witchel (UT Austin)
Paper: http://dedis.cs.yale.edu/2010/det/papers/asplos13-gpufs.pdf
Slides: http://dedis.cs.yale.edu/2010/det/papers/asplos13-gpufs-slid...
This is in fact a (if not the) major limiting factor to expanded use of GPUs for general purpose calculations: you always have to copy input and results between video RAM and normal RAM.
I've played a bit with the different memory compression tools on Linux, zram, zswap, and zcache, and they all behave in interesting ways on workloads whose active set is well over 2x available RAM. I played with compiling the Glasgow Haskell Compiler on small and extra small instances of cloud services, I wager this would work for the GPU instances on EC2 to increase their capacity a little.
The transcendental memory model in Linux is interesting for exploring these ideas, and it's one of the things I really like about the kernel. However the last time I played with it (Kernel version ~3.10) I had some lockup issues where the kernel would take up almost all of the CPU cycles with zswap. That was kind of a nasty issue.
It's already possible on Linux. You can use SWAP file instead of partition and there also SWAP priorities available.
I would trust more something that got rid of the VFS layer and simply allowed VRAM to be used directly as a second level below RAM using the transcendental memory model.
Is that called shared GPU memory? Can it be adjusted in WinNT/Linux? Some recent console game ports need/want 3+ GB VRAM. Upgrading VRAM is impossible, RAM is easy and cheap(er).