The Ghost in the MP3 (opens in new tab)

(theghostinthemp3.com)

96 pointsfugyk11y ago27 comments

27 comments

This was posted almost a year ago and enjoyed a decent conversation in the comments [0]

I'll repeat my comment from the time:

"For what it's worth, the presence of seemingly significant signal in the difference between the original and compressed tracks does not necessarily mean that significant sonic/perceptual loss has occurred. Operating correctly, the encoder is designed to cut not just sounds that the human ear cannot hear in general (e.g. sounds above 22kHZ) but also sounds which may not be perceptible in context (e.g. the quietest signals in a loud section). So if you find something beautiful about the ghost tracks (and I think there is something beautiful to find), don't immediately jump to concluding that mp3 is awful for cutting these sounds—they might be hardly perceptible when added to the mix.

Of course, at high-compression rates mp3 does begin to significantly degrade fidelity.

Edit: all of this is not to put down the project—I still think it's pretty cool as art and as a demonstration of the encoder, I just didn't want people to think that this was some sort of massive failing of mp3."

[0]: https://news.ycombinator.com/item?id=7955917

dankoss11y ago

The information present here isn't really "lost" as much as it can't be heard in the context of the other sounds in the original recording. These forms of audio compression take advantage of auditory masking[1] which means those sounds likely wouldn't be heard in the original.

[1] http://en.wikipedia.org/wiki/Auditory_masking

sejje11y ago

The information is lost in the sense that it's in the original, and not in the lossy version.

Whether or not you can hear it, the information is gone.

mdisraeli11y ago

If you can't hear it, was it ever information?

daeken11y ago

We can't see in infrared, but there's clearly information there. Same with infra/ultrasound and other sounds that are buried in our hearing.

It may not be pertinent information in the case of music, but it's definitely information.

sejje11y ago

If you cover your eyes and can't see me, do I still exist?

TheOtherHobbes11y ago

MP3 is audibly lossy, especially at lower sample rates.

So yes - you can hear it's gone.

And if you look at a spectrograph, you can see it's gone, too.

joe_bleau11y ago

A quick scan didn't reveal to me whether he's time aligning the signals during the subtraction. I've played with listening to the wav-mp3 signal before, and I seem to recall that the mp3 encoder would introduce a little delay.

I added a transient pulse in front of the music so that I could (visually) time align the signals before subtracting them.

to3m11y ago

I wondered what my favourite test tracks would sound like, so I made a (somewhat stupid, and a bit slow...) program to produce the difference between an MP3 and a WAV: https://github.com/tom-seddon/bin/blob/master/find_mp3_resid...

(Dependencies: python 2.7, lame, GNU make, mpg123, and (if you use FLAC files as input) flac. Tested on my Linux PC with LAME 64bits version 3.99.5 and mpg123 1.14.4 from the debian stable package repository. Run with -h to get some "help".)

It uses lame to compress and mpg123 to decompress, and I don't know if there's something special going on but the output WAVs always seem to have the same number of samples as the original. And they seem to be aligned - if you use this program you'll find that the difference between WAV and 128kbps MP3 is somewhat noisy, but WAV vs 320kbps MP3 is pretty much silent.

(Or maybe you'll find something totally different! Who knows. I only tested this on my system.)

mdisraeli11y ago

Neat, thanks for running the experiment to see how differing MP3 encoder settings affect the lost portion. This explains why 320kbps is generally accepted amongst DJs, as any loss is significantly less than that caused by the club sound system :P

to3m11y ago

I did a blind test when I was in my 20s, and while on a couple of tracks I could actually tell the difference between 320kbps and the original, I did have to concentrate. And I couldn't really have said that one was necessarily better than the other; the effect was as if one type of noise-y sound was being replaced with a subtly different type of sound with the same noise-y quality. Different, but overall the same.

Listening to the diff of one of those tracks today was interesting! All I can hear is the drums... and where the sound I'm thinking of plays, it sounds like rather quiet interference! But the drums as I recall sounded absolutely identical. Interesting that the ears can detect one thing but not the other.

(I didn't bother to re-run the full comparison, as I'm no longer in my 20s. One good (?) thing about getting old is that your hearing deteriorates, and issues such as this become moot. You can also afford the disk space to just compress everything at 320kbps. Then you don't have to worry about it, and it fits OK on your phone too.)

mdisraeli11y ago

320kbps with highest quality setting is pretty much an industry standard now, and many DJs, myself included, make use of that.

As you've looked into this before, do you know what the similar difference is like for such professional-grade encoding?

joe_bleau11y ago

(Note: no idea what mp3 encoder Audacity uses, and I'm sure the results will vary with encoder settings as well.)

I just fired up Audacity and generated a click track, with the first click at 1 second in. The exported 44.1k wav file, when loaded in Audacity, shows the click at exactly 44100 samples in.

The exported mp3 file, when loaded, shows the click to be around 46357 samples in. (It's a bit hard to measure, because the encoded has smeared the pulse.) Somewhere between 51-52 ms late relative to the wav file.

Listening to the wav and mp3 ticks summed, the delay is obvious--they are not in sync at all. Adding 2257 samples of silence to the front end of the wav file puts them back in audible sync.

egypturnash11y ago

He is probably dealing with this, given that the audio piece is not just "tomsdiner.wav - timsdiner.mp3". There's a lot of processing happening after that:

----

Verse one finds the narrator in a bustling diner, making observations about her environment. The focus of this text is external to it's author, as opposed to later verses which exist in a more subjective, internal space. Using different settings to harvest the lost material, I was able to isolate both clear, pitched content and more ephemeral transient signals.

Using the python library headspace, and a reverb model of a small diner, I began to construct a virtual 3-d space. Beginning by fragmenting and scrambling the more transient material, I applied head related transfer functions to simulate the background conversation one might hear in a diner. Tracking the amplitude of the original melody in the verse, I applied a loose amplitude envelope to these signals. Thus, a remnant of the original vocal line comes through in its amplitude contour.

Having constructed this background, prominent pitches from the original melody appear and disappear, located variously in this virtual space. These ephemeral sounds hint at a familiar melody, playing with aural memory and imagination, a flickering apparition hovering at the border of consciousness.

----

- found near the bottom of http://theghostinthemp3.com/theghostinthemp3.html

sukilot11y ago

That seems to me mean that the author composed new audio, and isn't presenting "wav minus mp3"

mdisraeli11y ago

That would explain the phasing/flanging like sound which gives the ghost recording such an eerie feel

oakwhiz11y ago

You could solve this automatically with time shifted convolution/correlation with the original signal.

im3w1l11y ago

The file we are watching is lossily compressed. So we are watching the lossy compression of a delta between original and lossy compression.

How good is the lossy compression at capturing that delta?

Buge11y ago

It gives an error when I try to play the video in Firefox or Chrome.

_jomo11y ago

Searched it on YouTube, someone uploaded it 2 minutes ago: https://www.youtube.com/watch?v=DkQ2p5QSbyc

tveita11y ago

It played for me 30 minutes ago, but it doesn't anymore.

I thought it was a embedded Youtube video at first, but it's actually a .mov file hosted in Google docs. First time I've noticed that way of hosting, maybe they have a bandwidth limit?

eitland11y ago

Played fine on my phone (Android) just now. Seems like they are using Vimeo now.

rMBP11y ago

Safari checking in.

chanux11y ago

Suzanne Vega - Tom's Dinner https://www.youtube.com/watch?v=FLP6QluMlrg

intopieces11y ago

If this kind of thing interests you, I highly recommend the book "MP3: The Meaning of a Format" by Jonathan Sterne [0]

[0]https://www.dukeupress.edu/MP3/

MrJagil11y ago

How does he get the information lost in compression? Does he put the compressed and uncompressed version on two different tracks with one phase-flipped?

magwhyr11y ago

on vimeo: https://vimeo.com/107845118

j / k navigate · click thread line to collapse

27 comments

daturkel11y ago

This was posted almost a year ago and enjoyed a decent conversation in the comments [0]

I'll repeat my comment from the time:

Of course, at high-compression rates mp3 does begin to significantly degrade fidelity.

[0]: https://news.ycombinator.com/item?id=7955917

dankoss11y ago

[1] http://en.wikipedia.org/wiki/Auditory_masking

sejje11y ago

The information is lost in the sense that it's in the original, and not in the lossy version.

Whether or not you can hear it, the information is gone.

mdisraeli11y ago

If you can't hear it, was it ever information?

daeken11y ago

We can't see in infrared, but there's clearly information there. Same with infra/ultrasound and other sounds that are buried in our hearing.

It may not be pertinent information in the case of music, but it's definitely information.

sejje11y ago

If you cover your eyes and can't see me, do I still exist?

TheOtherHobbes11y ago

MP3 is audibly lossy, especially at lower sample rates.

So yes - you can hear it's gone.

And if you look at a spectrograph, you can see it's gone, too.

joe_bleau11y ago

I added a transient pulse in front of the music so that I could (visually) time align the signals before subtracting them.

to3m11y ago

(Or maybe you'll find something totally different! Who knows. I only tested this on my system.)

mdisraeli11y ago

to3m11y ago

mdisraeli11y ago

320kbps with highest quality setting is pretty much an industry standard now, and many DJs, myself included, make use of that.

As you've looked into this before, do you know what the similar difference is like for such professional-grade encoding?

joe_bleau11y ago

(Note: no idea what mp3 encoder Audacity uses, and I'm sure the results will vary with encoder settings as well.)

I just fired up Audacity and generated a click track, with the first click at 1 second in. The exported 44.1k wav file, when loaded in Audacity, shows the click at exactly 44100 samples in.

Listening to the wav and mp3 ticks summed, the delay is obvious--they are not in sync at all. Adding 2257 samples of silence to the front end of the wav file puts them back in audible sync.

egypturnash11y ago

He is probably dealing with this, given that the audio piece is not just "tomsdiner.wav - timsdiner.mp3". There's a lot of processing happening after that:

----

- found near the bottom of http://theghostinthemp3.com/theghostinthemp3.html

sukilot11y ago

That seems to me mean that the author composed new audio, and isn't presenting "wav minus mp3"

mdisraeli11y ago

That would explain the phasing/flanging like sound which gives the ghost recording such an eerie feel

oakwhiz11y ago

You could solve this automatically with time shifted convolution/correlation with the original signal.

im3w1l11y ago

The file we are watching is lossily compressed. So we are watching the lossy compression of a delta between original and lossy compression.

How good is the lossy compression at capturing that delta?

Buge11y ago

It gives an error when I try to play the video in Firefox or Chrome.

_jomo11y ago

Searched it on YouTube, someone uploaded it 2 minutes ago: https://www.youtube.com/watch?v=DkQ2p5QSbyc

tveita11y ago

It played for me 30 minutes ago, but it doesn't anymore.

I thought it was a embedded Youtube video at first, but it's actually a .mov file hosted in Google docs. First time I've noticed that way of hosting, maybe they have a bandwidth limit?

eitland11y ago

Played fine on my phone (Android) just now. Seems like they are using Vimeo now.

rMBP11y ago

Safari checking in.

chanux11y ago

Suzanne Vega - Tom's Dinner https://www.youtube.com/watch?v=FLP6QluMlrg

intopieces11y ago

If this kind of thing interests you, I highly recommend the book "MP3: The Meaning of a Format" by Jonathan Sterne [0]

[0]https://www.dukeupress.edu/MP3/

MrJagil11y ago

How does he get the information lost in compression? Does he put the compressed and uncompressed version on two different tracks with one phase-flipped?

magwhyr11y ago

on vimeo: https://vimeo.com/107845118

j / k navigate · click thread line to collapse