These days RAM is cheap and SSD storage is also widely available. For a very long time, one of my side projects with 50K users was hosted in a EC2 small instance. With that out of the way, here are a few things you will need to take care of:
* Security (especially passwords) - Rails should take care of most of this for you, but you should ensure that you patch vulnerabilities when they are discovered. Also, stuff like having only key-based login to your servers etc.
* Backups - Take regular backups of all user data. It's also VERY important that you actually try restoring the data as well, as it's quite possible that backups are not occurring properly.
* One click deployment - Use Capistrano or Fabric to automate your deployments.
* A good feedback/support system - this could even be email to begin with (depending on the volume you expect), but it should be accessible.
* Unit tests - as your app grows in complexity, you will never be able to test all the features manually. I'm not a big fan of test driven development, but really, start writing unit tests as soon as you have validated your product idea.
* Alerts, monitoring and handling downtime - Downtimes are inevitable. Your host or DNS could go down, you might run out of disk space, etc. Use something like Pingdom to alert you of such failures.
* Logging, logging, logging - I can't stress on this enough. When things break, logging is crucial in piecing together what happened. Use log rotation to archive old logs so they don't hog the disk space.
The part about testing your backups is huge. I can't count how many projects I've been on that had problems where we needed to restore and we looked only to find any number of problems. Oh, backups actually stopped last month when we ran out of space, oops the backups only backed up these 3 db's and not the one you want, things like that. I'd also stress the importance of off-site backups. If you're using AWS for everything and your account is compromised can they delete your backups (assuming they have full, 100% unlimited, admin access to AWS)?
Which is also why if you're using stuff like AWS, Heroku, or any other third party provider (hosted Mongo, hosted ElasticSearch, Stripe, NewRelic, etc.) it's very important to ensure those passwords are secured and only the people absolutely necessary have access. Also, when offered, two-factor authentication should always be used.
Depending on the service you're building, you can log too much. Consider the privacy and security implications of the existence of those logs; anything you log can be subpoenaed, but logs that don't exist cannot be.
Consider anonymizing your logs from day 1, and only turning on non-anonymous logging upon a report from a user. Alternatively, give users a "report a problem" button, and save their last N minutes of otherwise-ephemeral logs only when they hit that button.
You absolutely want to log enough to help you debug the service, but do you really need to archive old logs, or should you delete them entirely?
+1 You can't log too much. The user who claims an important email never arrived - does your system say it was sent? This bug 3 users have reported yet no one can reproduce - what were they doing at the time and what else was going on?
No, I'm not at that stage yet (of effectively being able to rewind application state in the log files to see what was going on), but for debugging issues in production it's exceedingly useful.
How do most people manage activity logs? Currently what we have set up is the user id (if the user is logged in), IP address, URL they hit, user agent, and timestamp are all inserted into an activity logs table. For one particular site with an API that's being polled the size of the DB grew pretty large.
There is no easier way to offload, view, filter, alert and search than logentries:
For an easy and simple solution, spin up a second instance and send logs to it via rsyslog over a private network interface. Most mature frameworks provide a method to send logs over syslog. It's UDP and very lightweight. Another plus: if you are compromised, you have another server with your logs and that server isn't running your vulnerable app.
I often found myself falling into the "I'm not using PHP, so I don't have to worry about any security holes" trap. CSRF is something you really need to watch out for if you are constructing forms manually!
The technical stuff will be pretty much trivial. Any decently constructed app on any decent framework (Rails, etc) on any decent host (AWS, DO) would be able to handle a 10k user app (probably maxing out at 1% online at same time) without breaking a sweat.
And you will have plenty of time to build out the tech because it will probably take you many months to get to even a few thousand users (depending on what kind of app it is, of course).
Ironicially it's a completly different story on the consumer market. I bought my two 4GB sticks about 2 years ago and now they cost twice as much. 4GB can cost you $40 which is not cheap at all.
That's totally negligible on the cost of the rest of the machine. Imagine $160 will get you 16G!! That's an absolutely enormous amount of memory, most power users would be more than satisfied with that.
Not all that long ago that amount of RAM would cost more than a brand new mid sized car.
10K user records is not the issue. It's dealing with the humans who use the app on a day to day basis.
Typically getting only a small fraction of your user base to be active in the app is pretty challenging - if you can acquire them in the first place.
That said, having even a few hundred active users can tip the scales in terms of what is manageable, depending on what the app does and whether they're paying money or not. Customer support can be a full-time job or worse. In the early days your users will discover every bug and problem imaginable.
Biggest mistake I ever made was scaling up an active user base on a free product without a revenue model. Twice I managed to hit a sweet spot in acquiring active users but because I couldn't leverage the scale to achieve anything other than more work for myself, I burned out and it collapsed very quickly. If you make more money as you grow, you can afford to invest in delegating responsibilities or at least justify it. Otherwise you've got a very stressful hobby on your hands..
Quick add-on edit:
If you're launching a web app for the first time, the biggest takeaway you should get from the comments on this thread is anticipate that customer support will be a major challenge.
One of the best ways to prevent a flood of CS inquiries is aggressive logging and alerts to squash bugs or outages before they inconvenience too many users. Lots of great comments in here cover that point, so take notes.
I have zero costs though, since it's serverless and all hosted on Google and Github's infrastructure.
For privacy reasons I have made the deliberate choice to not include any log reporting tools or use any tracking services that could possibly make development a whole lot easier.
Thankfully I get some support from a couple of other people that mostly help with small dev tweaks and supporting users, finding out what goes wrong and providing excellent feedback.
I can really recommend setting up a subreddit so that people can help themselves and others, and you can provide some feedback where needed.
All in all the load of user support has been quite easily manageable, though now with Trakt.TV's debacle with the new API there is definite pressure to fix things.
However I’ll add that it’s my experience that free customers are the worst, by far for customer support, and that as such rwhitman probably got burned disproportionately. A percentage of free customers demand straight up magic, and get loud in public if you don’t deliver. People who are paying, particularly paying significant amounts of money, expect their money to be well spent and results in proportion with what they pay. IE they tend to be reasonable.
Still - if you where not expecting support to be a major part of this experience - you should now.
My service has a lot of moving parts, all of which are distributed among a couple dozen different servers. Keeping the technical infrastructure running smoothly requires a lot of data visualization of server stats, database stats, web request stats, worker stats, user stats, etc. I have everything piped into a nice dashboard so we can see if there is anything odd happening at a glance. When things break (and they will) you need to know where to look first.
Having 5k users also requires time to help them with support issues. Users generate a lot of bug reports, questions, and suggestions. To keep paying users happy, I offer a 1-day response time on support issues, which requires me to spend quite a bit of time sending emails.
Then, of course, if you want to grow the app, you need to spend time marketing it. We could talk for hours about this.
The list goes on and on. Feel free to shoot me an email (email in my profile) if you want to talk specifics about anything.
Shoot me an email (email in profile) if you want to talk specifics.
How's the distribution of traffic? Do people use it spread out over the month or mainly within the last or first days of the month? Do they use it on work days or throughout the week? Are they from different time zones?
What do they do? Is there a lot of write activity or is it mainly read? Is the read stuff cacheable between users or is it highly individualised. etc. etc. etc.
With reasonably "low level" tooling such as Java/Clojure/Haskell/whatever and a properly configured Postgres instance you should be able to go quite far. You're very unlikely to be CPU-challenged in the web app (again, no idea what your web app is going to be doing, so it's just a guess), most of the memory and CPU will be consumed by your database server caching and running queries. You should be able to handle a good 500-1000 db transactions / sec without much hassle.
IMHO most of the challenge will be making something that 10k people will want to use daily, not actually being able to scale to that many users.
That server runs happily as a single servo on http://modulus.io with absolutely no need for intervention on my part.
The rest of the application has similar requirements. I have one micro-equivalent server running the front-end, one the api and one the thumbnail generation. In general, this requires no hand-holding by me.
If your site is not processing or memory-intensive it should be feasible to scale to 10k users with a single $5/month instance on DigitalOcean or an equivalent level server on Heroku or Modulus or GCE.
Good luck attracting your first 5k users!
The main stress on the system is really determined by the complexity of the SQL queries on each page. I've spent a great deal of time optimizing them, and I know there are certain ones that need to be further optimized. I have the database (MySQL) on one server, the web server and documents on another, and static resources such as images on a third, which probably isn't even necessary. All three servers run Linux and the database server has 48GB of RAM. They're hardly new; you could buy all of this equipment today for under $1,000 total.
The biggest technical bottleneck is really RAM; the biggest expense for this kind of site is bandwidth.
I have http://ficwad.com/ sitting around, with Google Analytics telling me it gets daily users in the upper end of that range. It runs on the cheapest plan webfaction offers (and I'm making it even cheaper with some affiliate credit...). The only place where it's running into issues is email, which I had to write a little queue system to throttle the sending to keep it under the plan's daily limits while still making sure that the important messages go out first.
I could make it fancier and put it on pricier hosting if I bothered to monetize it in any way.
And it took me nearly 4 years to get that many users. We can’t all grow like facebook!
The second problem is motivation, after a certain amount of time, it becomes far less fun and much more of a burden, at which point you have to decide if you'll power through, give up, or quit totally.
The rest is just a software/hardware problem, and easily dealt with when needed.
As for the load, it's not that busy, but not that quiet for what it is, (http://stats.thisaintnews.com) and it runs on a cheap server from http://www.kimsufi.com/uk/, has a Xeon(R) CPU E3-1225 V2 @ 3.20GHz, 16Gb ram and 2x 1tb hdd, unlimited bandwidth and 1gbps link. It only costs about £25/month iirc.
- A reliable hosting environment. I currently have a Linode VPS (basic $20 package, with $5 monthly backups) that runs http://sleepyti.me, my personal web site, an IRC server, a Mumble server, and a bunch of other stuff -- it's not even close to being maxed out resource wise, even with all the constant traffic the site is getting. It's important to remember that consistent network connectivity is a really important aspect here: a 30-minute downtime during peak hours can easily lose a lot of users. I'd say Linode is great, and I'm very happy with their service, but I also host several Sinatra web applications on a Digital Ocean VPS that only costs $5 per month (although I do my own backups, rather than using their service). I've noticed zero load-related performance impacts. Clearly, though, there is a limit to how far that can scale.
- A production web server. This probably goes without saying, but a lot of webapp developers are used to just working on their own dev environment. For my apps, I use nginx (and thin, when necessary).
- Security. Make sure that you have the basics of application security covered in your app itself. OWASP produces some pretty great "cheat sheets" that can help out in this area. Furthermore, make sure that your server is updated frequently, using SSL correctly, etc. I work in information security -- please believe me when I say that getting hacked is not something you want to deal with when you're trying to grow.
Hope this helps, and good luck with launched your apps!
Firstly, the load of a web app is going to be dictated by what the app actually does.
Also 5k-10k users should be clarified as to whether you mean total users or concurrent users. Testing capacity can be actually tricky figuring out how the number of users equates to actual hits to your servers.
As an example, we have nearly 50k accounts but on average only a few hundred are using the service at the exact same time. I would guess that our app is fairly complex compared to the average app. We run 3 app servers, 1 DB master, 1 DB slave, and 2 cache servers. Our monthly hosting bill is around $1,200.
2. Do not invest time and/or money in learning another programming language or framework until you are sure that for a specific component of your product, programming language X will perform at least 2 times better with 2 times less HW resources.
3. Stressing again on the app stack (I saw some really pushy comments on changing the programming language), it is rarely the bottleneck of a web app. You'll scale your storage stack way earlier and more often than the app stack.
4. Know your data. That's how you decide if it's better to use a RDBMS, document store, k/v store, graph database etc. Like I said before, you're going to scale your data before any other layer becomes a problem so choosing the right data storage solution is crucial. Don't be afraid to test various storage solutions. They usually have good -> great documentation and ruby tends to be a good friend to every technology. There's a gem for everything. :)
5. Scale proportionate to your business/product growth. You will have to scale at some point. But be careful to scale proportionate to your growth. For example, if the number of users will double, get the hardware that suffices that growth. Less HW resources will lead to a slower user experience thus user dissatisfaction. More HW resources than needed will increase your costs and the resources that are not needed will stay unused. Why waste money?!
These are my 2c. As your business gets bigger - I hope it does - other problems will occur. But usually these things will last up to 100k users.
Disclaimer: this is for a generic web app as you didn't give us any details. Depending on the app, some of my points might be inaccurate or invalid.
Heroku: $279 - 400 / month
If you back out the numbers, they go something like this: eight hour work day, worst hour has 25% of the user base actively logged in (we'll assume it is a very sticky app), 10 significant actions per hour implies 25k or so HTTP requests which actually hit Rails, which is less than 8 requests a second. You can, trivially, serve that off of a VPS with ~2 GB of RAM and still have enough capacity to tolerate spikes/growth.
Let's talk about the more interesting aspects of this question, which aren't mostly about capacity planning:
Monitoring: Depending on what you're doing, at some point between 0 users and 10k users, the app failing for long periods of time starts to seriously ruin peoples' days. Principally, yours. Depending on what you're doing, "long periods" can be anything from "hours" in the general case to "tens of minutes" for reasonably mission-critical B2B SaaS used in an office to "seconds" for something which could e.g. disable a customer's website if it is down (e.g. malfunctioning analytics software).
I run a business where 15 seconds of downtime means a suite of automated and semi-automated systems go into red alert mode and my phone starts blowing up. I don't do this because I love getting woken up at 4 AM in the morning, but because I hate checking my inbox at 9:30 AM in the morning and realizing that I've severely inconvenienced several hundred people.
You're going to want to build/borrow/buy sufficient reliability for whatever problem domain it is you're addressing. I wouldn't advise doing anything which requires Google-level ops skills for your first rodeo. (There is a lot to be said for making one's first business something like a WordPress plugin or ebook or whatnot where your site being down doesn't inconvenience existing customers. That way, unexpected technical issues or a SSL certificate expiring or hosting problems or what have you only cost you a fraction of a day's sales. Early on that is likely negligible. When an outage can both cost you new sales/signups and also be an emergency for 100% of your existing customer base, you have to seriously up your game with regards to reliability.)
Customer support:
Again, depending on exactly what you're doing, you will fail well in advance of your server failing on the road from 0 to 10k users. Immature apps tend to have worse support burdens than mature apps, for all the obvious reasons, and us geeks often make choices which pessimize for the ease of doing customer support.
My first business produced a tolerable rate of support requests, particularly as I got better about eliminating the things which were causing them, but I eventually burned out on it. I have a pretty good idea of what my second one would look like if it had 10k customers -- that would imply on the order of 500 tickets a day, 100+ of them requiring 20 minutes or more of remediation time. This would not be sustainable as a solo founder. (Then again, if that business had 10k customers, revenues would presumably be in the tens of millions, so I'd have some options at that point. There are many businesses which would not be able to support a dedicated CS team on only 10k customers, like e.g. many apps businesses, so you'd have to spend substantial brainsweat on making sure the per-customer support burden matched your unit economics.)
The biggest issue: selling 10,000 accounts of a SaaS app is really freaking hard.
What do you mean by this? Doesn't this depend on the usage patterns? Do you have 5-10k concurrent users or are these users spread over a day?
The operations side is a whole other profession that dovetails into the technical aspects of getting it running on a good architecture.
As others have mentioned, multiple of those users can be hosted on an EC2 small instance. I suggest you start there. When moving to production, a bigger challenge is security, both in terms of intrusion and data protection. Making sure you have good rollback feature built into your rollout regime, because things can be fatal with real users. If you're using something that's basic like Heroku or EC2, you can scale way beyond that user strength with a click of a button. Scaling up would be least of your worries, at least for a few weeks.
If you're unsure, go with Heroku. Once you understand your system use, you can very easily switch to AWS and reduce costs.
1. your architecture must allow for vertical scaling. this means upgrading your hardware to beefier, stronger, faster machines with more CPU power and more memory. vertical scaling is often a very cost effective of improving performance.
2. your architecture must allow for horizontal scaling. this means being able to provision and deploy new instances of your application servers very easily, using an automated process. more servers running in parallel is a very effective way of handling increased load.
3. you must be able to monitor and protect your systems. https everywhere. highly secure passwords everywhere, and you should rotate your passwords on a regular basis. log everything and set up services to monitor your logs and notify you when weird/bad shit happens.
Good database mechanics is key. That is the most important thing in my opinion. That is really the whole point in rails when you are deciding relationships. The abstraction in Rails when deciding what should be the best model structure is the same thing as deciding what should be the best and most efficient table structure in your database.
The rave about MongoDB is that it (maybe not quote me on this) "cures" the need for the desire multi-dimensional database. However, even with MontoDB's ability to expand due to it's not needing a pre-defined structure and the ability to expand out dimensionally to a certain extent, PostgreSQL (claims anyways) is still more efficient if you correctly index your tables (think about how you will be querying) and create the correct relationships. Build out models. Allow flexibility.
Also, don't forget caching. Redis and performing jobs is key in certain situations. However, don't get caught up in too much hype. Especially those coming from closed source technologies (Not just talking about caching technologies here but everything in general). They will sell and produce an atmosphere of necessity, but do some research first. Don't follow the heard. I am not going to call anyone out on this. Just do the research and think why is that necessary. I've mentioned Redis a few times and maybe that isn't even necessary either.
Most importantly, put your stuff out there. If it crashes, so what! At least you know you have something. And then you will have people who will give you advice in a coherent direction if necessary.
I salute you in your efforts. Now the most important part is put it out there and kick ass!
You may even want to look into Redis which is a cache system.
Eventually you will want to maybe bring on other developers. Have a good private repo on Github. That is a choice of course. This way you can choose what to merge into your live branches. You just need to find a good groove. Git is beautiful. It really does help in organization of your development process.
And analytics. I'm a bit absent on this. The reason why is that you are a developer. You will eventually know what you want to know and even if you need to bring on another developer due to time constraints and such, you will know how to develop out personalized analytics that you want out of your app. I mean it is almost evolution. You build an app. Then you want to know what's up. You know the app the best so you will develop loggin and analytical methods to provide you with what you want to know. So again. Unless you are looking at a gem that will help you with visualizing what you are going to develop on the analytical side of things, don't get overwhelmed. You know best. You built it. It's beautiful, and if your app is really a hit, you will plug in chug and make it work.
"Scaling problems rule . . . "
http://www.youtube.com/watch?v=0CDXJ6bMkMY&t=23m34s
Sounds like you have the skills to grow this up to 500 to 1k paying customers on your own. Early on you can increase ram/cpu to handle any initial scaling issues.
Once you even have 1k customers you'll have revenue to hire experts to help you with scaling and security.
Good luck in 2015.
Getting the users and keeping your head in the game is the hardest part.
However, the biggest piece to scaling your application is the automation of everything you possibly can so that you can scale when you need to. You're going to be in a bit of pain if you need to scale everything manually.
Here's a few things I automate using Jenkins:
* Creation of web application servers(whether it be Puppet, Chef or Ansible, etc) make sure you can bring up a new node quickly and scale your app layer horizontally. Ideally automate the addition of this node to the LB.
* Data store backup/restores to all staging environments on a schedule(tests backups/restores) this is done using some custom code and the Backup gem. This way your dev team has access to an env that closely resembles prod and can resolve current prod bugs.
* External security scans using NMap (again using custom scripts). The jenkins job will fail if output is not as it expects. This way if we change a layer of our infrastructure we can know if something is exposed and shouldn't be.
* Static code analysis using Brakeman
Information you're going to need to scale your infra:
* Metrics on each one of your hosts. Use DataDog if you can afford it, integrates with all major systems and technologies. Great tool.
* Log collection via something like Logstash or Loggly and being able to visualize your application and web logs.
* Application response time measurements using something like NewRelic or building your own using StatsD and tracking the heck out of your application actions
Last but not least, have a plan for failure. While you're laying in bed at night, ask yourself these question:
* What would happen if the DB went poof? Can I restore it? How much data will I lose from the last backup? Will I know when this happens?
* What would happen if you're now being scanned by 30 ips from the netherlands, all of which are submitting garbage data into your forms. Are you protected against this? How will the added load effect your app layer? Do you have a way to automate the responses to those requests so as to deny them? This is a case of when, not if. Be ready.
* What would happen if my site gets put on Digg(lol)?
There's no magic bullet here. It's just practice, failure and learning from yours and others mistakes.
Good luck!