In my case, the 'universal description language' is one that supports a single operation, let's say CC for 'carbon-copy', and some other number of operations (contrived to have a quite clumsy/verbose interface). The CC operation takes one input--the 'text' to output (which in my case would be T0 and J1). The 'other operations' would be only the minimal set required to still be Turing-complete (and once again would be devised such that no sane person would want to use them for any sufficiently complicated task). So, M is the implementation of the above UDL.
Then, the question becomes, how much logic do we have to put into M itself to achieve the above? I posit that it would not be very much in fact (but that should be demonstrable if someone were sufficiently motivated).
I think KC requires that M be constant across the texts you wish to compare (so that texts across various languages are normalized to an underlying architecture).
If one views M in this light (as an attempt to level the playing field across competing hardwares/algorithms/languages), then we could see it as the concatenation of the following:
1 - the specification of the raw hardware.
2 - the microcode for the hardware.
3 - BIOS, drivers, HAL, OS, etc.
4 - language implementation/runtime/shared libraries etc.
Now, to compare any two real-world implementations, we need to port them both to the language implemented in #4, for running on the hardware specified by #1-3 and sum their length with M's length.
And if we want to compare two algorithms from different languages (but running on the same 'machine'), we can shift the codes in #4 into the 'texts' of the programs themselves (so as to keep M constant between them). We can do this all the way down to the minimum M required to be Turing complete (and nearly all the way 'up' to the programs under test); i.e., the division between M and a particular text, L0, is somewhat variable.