The paper only recognizes two subpopulations with their respective probability distributions. I only introduced B11 and B12 to be able to talk about different outcomes within the B1 subpopulations. Otherwise you could just choose any arbitrary grouping (e.g. mixing B11 and B12) and get completely different results depending on how you do it.
Maybe it's less ambiguous if we talk about machines which spit out a randomly sized ball when you press a lever. After collecting a certain number of balls from the machines, you determine what proportion of the largest p% is from each machine and replace all balls by selecting from the machines according to their proportion.
If you have a machine M1 producing very large balls and very small ones in equal quantities, as well as M2 producing medium-sized balls, then the prediction is that the proportion of balls coming from M1 will increase if p < 50% and decrease otherwise.
In that model, although you can clearly group the very large balls as B11, there is no machine M11 only producing those balls. Both very large and very small balls will, if they are in the top p%, require you to press the lever on M1. The size of the balls only influences which machine will be chosen, not what kind of balls will be coming out of that machine.
In genetic terms, B1 and B2 carry different alleles of the same gene and have different genotypes, while B11 and B12 are different phenotypes possible for B1. If B11 were to produce only B11, that would be inheritance of phenotype, i.e. Lamarckian evolution, which is also interesting, but less relevant for real-world genetics.