I think of it more as a "demultiplexing" difficulty. I can hear the sound coming from their mouth, I just can't separate it from everything else and decode the language. I know that my hearing is good, even above average, and I don't suffer from tinnitus. I used to notice that I would have to ask people to repeat things a lot, and that other people didn't seem to have the same problem. I've found that watching the speaker's lips helps to some extent.
My own theory is that nobody can, in fact, hear everything that people say in such situations. But their mind does a great deal of work filling in the gaps. Our languages are surprisingly redundant, and quite often you can pick up everything you need to know from broken audio and other input like facial expressions and gesticulation. I think that my brain is worse at filling in the gaps.