This is leading to rapid progress in clustered/distributed filesystems and it's even built into the Linux kernel now with OrangeFS [1]. There are also commercial companies like Avere [2] who make filers that run on object storage with sophisticated caching to provide a fast networked but durable filesystem.
Kubernetes is also changing the game with container-native storage. This seems to be the most promising model for the future as K8S can take care of orchestrating all the complexities of replicas and stateful containers while storage is just another container-based service using whatever volumes are available to the nodes underneath. Portworx [3] is the great commercial option today with Rook and OpenEBS [4] catching up quickly.
https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec...
Twenty years ago, software was hosted on fragile single-node servers with fragile, physical hard disks. Programmers would read and write files directly from and to the disk, and learn the hard way that this left their systems susceptible to corruption in case things crashed in the middle of a write. So behold! People began to use relational databases which offered ACID guarantees and were designed from the ground up to solve that problem.
Now we have a resource (spot instances) whose unreliability is a featured design constraint and OP's advice is to just mount the block storage over the network and everything will be fine?
Here's hoping OP is taking frequent snapshots of their volumes because it sure sounds like data corruption is practically a statistical guarantee if you take OP's advice without considering exactly how state is being saved on that EBS volume.
A spot instance interruption isn't a system crash, it's a shutdown signal. Storing your important spot instance data on EBS is recommended by AWS. If your application can't handle a normal system shutdown without losing data, your application is at fault, not your system setup.
>exactly how state is being saved on that EBS volume
Files are written to a filesystem which is cleanly unmounted at shutdown when interruption happens.
Unless something in the system shutdown fails to give the application what it needs (for instance, time) to shutdown cleanly. Which is entirely possible considering that Amazon is selling you the spot instance on the given assumption that it can give the hardware at any time to somebody who is willing to pay more. Amazon does not guarantee the time needed for a clean shutdown (only that a two-minute warning will be available via their proprietary mechanism, if you architect your application to monitor for it) for a spot instance anywhere in their documentation, and you would be ill-advised to not architect for that.
> Storing your important spot instance data on EBS is recommended by AWS
Because EBS itself is reasonably reliable. If you have configuration data (i.e. in /etc) for a legacy application that isn't managed, it's reasonable to mount that data on EBS since it's rarely written to and writes are generally human-initiated and human-monitored (with operations policy possibly mandating a snapshot even before any changes are made).
That's still very different from daemon writes to /var. Take for instance, the PostgreSQL documentation which warns that snapshots must include WAL logs in order for the snapshot to be recoverable, and that it is quite difficult to restore from a snapshot if you stored your WAL logs on a different mount: https://www.postgresql.org/docs/10/static/backup-file.html
You need to understand precisely how your application is treating your storage and act accordingly. Thinking that all applications interact with storage the same way is dangerous and liable to cause data corruption and loss. That's all.
You're assuming that people are saving their state in databases to begin with. If you're saving state to a database in production, typically you're communicating with that database over a network connection, and not running the database on the same machine as your application. Containerizing databases is a whole separate issue.
OP's specific example is saving /var/opt/gitlab to an EBS volume and expecting to be able to move it from one spot instance to another without corruption. That strikes me as insane.
- Yes, you get a notification, but it's a proprietary notification scheme that your application must be designed to poll for. Why can't Amazon use standard signals like SIGPWR to indicate imminent shutdown?
- Just because it isn't smart for non-spot instances doesn't suddenly make it smart for spot instances ;)
https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec...
ec2-spotter classic uses this, but you can also make a pivoting AMI of your favourite Linux distribution.
One thing to watch out for is how to keep the OS automatic kernel updates working. AMIs are rarely updated and you're going to have a "damn vulnerable linux" if you don't get the updates just after booting a new image.
1) https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec...
I suppose it's a decent solution if you don't want to deal with prefixes.
* https://github.com/sevagh/goat (my own) * https://github.com/UKHomeOffice/smilodon
This solution looks good, yet only applies to single instance scenarios. I presume this kind of thinking might move forward with EFS + chroot for an actual scalable solution that cannot be ran on Elasticbeanstalk.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-inte...
Learn something new everyday. :)
https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termi...
Personally, because my needs aren't constant. I might need two cores for two months followed by 100 cores for a week.
I would look at providers like OVH and even cheaper (Treudler, TransIP, RamNode, etc.) For example, an SSD with 2 vCPUs, 8GB RAM and 40GB SSD is 13.49$ per month from OVH.
(PS: Don’t use DigitalOcean, they tend to steal your credit if they feel like it. Lost 100 bucks "promotional credit" that way with only a few days notice)
With a few on-boot scripts to attach-volumes / start-containers, it should be fairly easy to get going as well.
[1] https://engineering.semantics3.com/the-instance-is-dead-long...
Edit: or use AWS EFS
And since it's shared, you don't need to replicate data across multiple nodes... so if 10 compute nodes needs access to the data set, they can all just read it from the same EFS filesystem, no need to download it 10 times to each compute node.
So EFS can still be very cost effective compared to EBS.
A positive thing with EFS is that it can be shared across AZ while EBS needs to be snapshotted and then imported to the other AZ.
Attaching and detaching volumes is a good idea but I wouldn't use that to keep state
You will get a lot of benefit out of it, but may lose in performance, which is fine in 99% of the cases.