That's only viable if the quality of the outputs can be automatically graded, reliably. GP's case sounds like one where that's probably possible, but for lots of specific tasks that isn't feasible, including the other ones he names:
> write poetry, give me advice on cooking, or translate to German