One approach I can think of is to run a distributed document database and use that for storage. I don't need most features of such database products in the foreseeable future, so I fear that they will add operational overhead for not much benefit.
Another approach I can think of is to run my processing nodes against a network file system, and rely on that to do the replication.
Yet another approach I am considering is to use something like CopyCat to implement the file replication in my application code. Is this a good use case for CopyCat?
For your requirement, can you use S3 or something similar?
I am aware of Ceph but have not tried installing it to see how easy/hard it is to setup. Also, although this is not a hard requirement, I'd like to be able to support Windows; from what I have read so far, Ceph does not support Windows.