You're asking deep architecture questions that sound simple but require a lot of consideration.
* choose a front-end language your team is comfortable with, because it will need to be written quickly and be maintained
* choose a back-end language for the same reasons
* decide whether you're going with an SPA and start splitting requirements
* start with `ffmpeg` as your transcoder because it's free and easy
* plan for the scale you expect
* auto-deployment on the cloud for the machines converting video
* messaging so the app can add transcoding jobs to the queue, and the transcoders can take jobs from the queue, and status of jobs reported to the user
* research royalties you'll need to pay for the various types of video (MPEG-LA is a good place to start for h.264)
* do this _early_
* choose your tech based on your team's experience and comfort
* sometimes you'll need to hire to expand the comfort
* prefer OSS over commercial to start, but be prepared to spend money for support
* unless you need some specific functionality that is only available commercially
And a hundred other things :)
I've done this a couple of times for companies in the past (Primerica, ConocoPhillips), including transcoding _during_ uploading a couple of years before anyone else (like encoding.com) was doing it (totally bragging, yes).