No comments yet.
and
http://stackoverflow.com/users/1205140/verve .
Is anyone who works for Amazon listening?
I'm having immense trouble getting concrete answers to my questions about migrating from AMI 2.x + Hadoop 1.x to AMI 3.1.0 + Hadoop 2.4.0. This is particularly upsetting because:
1) Hadoop debug cycles are already long, and Elastic MapReduce debug cycles are even longer. 2) Amazon Hadoop 1.x seems to interact better with S3 than Amazon Hadoop 2.x; for some reason, small-file issues that did not appear in 1.x do appear in 2.x. This is just a shot in the dark, but I believe it has something to do with the mods hinted at in http://stackoverflow.com/questions/16403576/hadoop-number-of-available-map-slots-based-on-cluster-size . 3) Users are encouraged to used the latest AMI and consequently the Hadoop 2.x line because they are maintained more actively; however, I find most of the Elastic MapReduce documentation is a holdover from Hadoop 1.x. This documentation doesn't just require a light touch-up; much must be rewritten from scratch. 4) Debugging costs users money, so Amazon has an incentive not to help out as actively. Not conspiracy theorizing, here -- just noting the obvious.