AB-testing is much better than doing nothing.
But you have to be careful with any powerful tool, in case its success blinds you to its weaknesses. "When all you have is a hammer, everything starts to look like a nail". We think that's happening with AB-testing at the moment.
We think Wikipedia is awesome, and would love them to get more donations by using a more sophisticated approach.
Yes, most companies doing AB-testing, if they have the ability to personalise their user experience (i.e. they aren't trying to quickly find the single best UI) could benefit.
However, Wikipedia is a really good example to start with - the phrase 'Wikipedia needs those nickels' is a great example - resonates well with US donators, will probably work in Canada, but what about the UK? Australia?
It's obvious once its pointed out - but wouldn't it be better if the system automatically realises this? And considers all the combinations? That's our point.