How do the operations powering an app like Pintrest evolve from simple to so complex? Do the complexities emerge from necessity, or simply from idle time on behalf of the engineers, who naturally crave hard problems to solve?
It's a fascinating meta-commentary on our industry that simple web apps grow to become such complex operations. A business can survive on the kernel of its core competency -- in this case, photo grids -- but to thrive, it requires careful attention to petabytes of peripheral decisions. Indeed, it seems such an evolutionary process is advantageous for a web startup. Friendster and MySpace may well have failed because they mistook their problems for simple ones. They were able to solve the core problem of a social network, but not the many peripheral ones of operating that network at scale. It's the ability to do the latter that sets apart the major successful startups from the also-rans.
Is this a kind of workflow that would run with pinball? Can you move files around with it, or do you use the file system and pass filepaths around? Ideally, the workflow job would hold onto the wav/mp3 and the associated database fields that are returned so I don't have to juggle weird directories around (and have to sync access to them).
I'm not familiar with any other workflow engines, so I'm unsure if this is the kind of thing that would traditionally run on one. I looked at the user guide but it's currently barren.
job1. generate a wav file, and put it somewhere say, s3://wav.file
job2 (run after job1): pick the wav file from the location s3://wav.file
you need to know the contract between the parent and child jobs from the business logic. In this example, when you implement job 1 and job 2, you need to have protocol for them to produce store and consume the wav.file..
I see there are plans to write up some documentation, but are there any timelines that you're aiming to have those written?
Also, the README calls out mysql as being required. I assume that this, being a django project, will work with other backends too. Is there anything, to your knowledge, that would prevent a different backend being used (like postgres or oracle)?
It also has few dependencies and is lightweight (i.e. it's all python, so no JVM tying up resources).
I think my favorite non-obvious aspect is how it allows you to write each component in a different language.
We do compare Pinball with Apache oozie and azkaban when we start this project.