They have some unfortunate behaviors that are all attributable to their storage implementation, most prominently locking, fragmentation, and slow performance out of memory. While they could "just buy TokuMX" and solve these problems with money, it would then put their engineering team in a position where they would need to relearn a big portion of their codebase, and spend time backporting features they've prototyped to TokuMX. It would basically halt new development for a few months while they learn the new code, too.
The way I see it, MongoDB will continue prototyping interesting features and polishing some of their existing ones, and TokuMX will incorporate the ones with the most promise. But to integrate the codebases would slow down MongoDB considerably, and I don't think they can afford that right now. I'm perfectly happy to sit back and merge the best features from MongoDB as they mature.
Put another way, if you were working on a product and someone came to you and said "here let me fix a bunch of things by replacing some of the fundamental subsystems with code you don't know," would you do it? Maybe if you were in more of a maintenance mode, you'd evaluate it for a while and take the time to learn the code and eventually incorporate it, but not if it was going to distract you from adding features.
They're trying to generalize while Toku aims at very specific query optimization.
For now the best I can do for you is tell you that if you email me I can hook you up. Short of that, if you search twitter for "severalnines wget" you can find a wget hack that achieves the result you want.
Bascially Hypertable (based on Bigtable) compresses data in blocks, but in the index saves only the ids of the first and last documents in the block. This could be hard for secondary indexes (maybe?)?
So if you grow, you add 1 node, not a replica-set(that could be 3 nodes if you have 3x replication)