result = Dir.glob('test?.txt').collect do |file_name|
File.new(file_name, 'r').read.split(' ').collect do |word|
word.downcase.tr '.,\'', ''
end.inject Hash.new(0) do |hash,word|
hash[word] += 1
hash
end
end.inject do |all,hash|
(all.keys + hash.keys).uniq.inject Hash.new(0) do |acc,word|
acc[word] = all[word] + hash[word]
acc
end
end
p result
Edit: The Python example in the article is better because it merges hashes in the reduce step which facilitates parallelisation. use strict; use warnings;
use List::Util qw(reduce);
use File::Slurp qw(read_file);
sub word_count {
reduce { $a->{$b}++; $a } {},
map { split(/\W+/) }
map { lc read_file $_ } @_;
}If you look at the types implicit in his code you should see that the essence lies not with collect or inject which are incidental plumbing but the nature of the hash and in particular, the act of generating the intermediate collection for each key. What is called map is actually more a reverse reduce. The pedagogical emphasis on map and foldr actually belies the true nature and power of algorithm in parallelizing.
result = Hash.new(0)
Dir['test?.txt'].each do |file|
File.read(file).split(' ').each {|word|
result[word.downcase.tr('.,\'', '')] += 1}
end
If you really needed map/reduce here you'd probably want to write it this way: Dir['test?.txt'].map do |file|
Hash.new(0).tap {|result|
File.read(file).split(' ').each {|word|
result[word.downcase.tr('.,\'', '')] += 1}}
end.reduce do |result, partial|
result.merge(partial) {|k, v1, v2| v1 + v2}
endhttp://engineeringblog.yelp.com/2010/10/mrjob-distributed-co...