Here's a proof: consider the strings of length n or less, suppose there are M of them in total. Their average length is just the sum of all their lengths divided by M, and the average length of their compressed versions is just the total length of the compressed versions divided by M. Since the compression is lossless the compressed strings must all be different.
Since there are M strings, if any of them mapped to a string of length more than n then there must be some string of length at most n not being mapped to, so the average length can be improved by instead mapping that string to the shorter string. So any optimal compression method must map only to the strings of length at most n.
So the M outputs are just the M inputs, possibly permuted. So their total length is the same, and hence their average length is the same.
The article you’ve linked says nothing about average. It says that for every algorithm there’s at least some input files that increase the size. It even explains more about that:
Any lossless compression algorithm that makes some files shorter must necessarily make some files longer, but it is not necessary that those files become very much longer. Most practical compression algorithms provide an "escape" facility that can turn off the normal coding for files that would become longer by being encoded. In theory, only a single additional bit is required to tell the decoder that the normal coding has been turned off for the entire input
I don't think this is true. If it was, lossless compression would be useless in a lot of applications. It's pretty easy to come up with a counter example.
E.g.
(simple huffman code off the top of my head, not optimal)
symbol -> code
"00" -> "0"
"01" -> "10"
"10" -> "110"
"11" -> "111"
If "00" will appear 99.999% of the time, and the other 3 symbols only appear 0.001% of the time, the output will "on average" be slightly more than half the length of the input.