Show HN: I've built a spectrogram analyzer web app (opens in new tab)

(webfft.net)

244 pointsssgh3y ago69 comments

69 comments

jboy553y ago

Here is a spectrogram of the track, "Look" from the album, "Songs about my Cats" by Venetian Snares.

https://imgur.com/sRe6Ypv

Aphex twin did something similar, but this is more playful in my opinion.

epiccoleman3y ago

Mick Gordon did some fun hidden spectrogram imagery in the Doom 2016 soundtrack.

He talks about that and plenty of other cool stuff in his talk at the 2017 GDC conference. One of my favorite conference talks ever, he did so much cool experimentation to get the sounds he used on the soundtrack, and watching his talk is one of those moments where you really get to see a master of his craft let loose and explain his process.

https://youtu.be/U4FNBMZsqrY

quickthrower23y ago

https://venetiansnares.bandcamp.com/track/look

Warning - this music freaked my dog out!

jo-m3y ago

I once (badly) did something similar as a student [1].

Unfortunately it's in Matlab so I can not run it any more.

[1] https://jo-m.ch/posts/2015/01/hack-the-spectrum-hide-images-...

sorenjan3y ago

If you want to run it again you could try Octave, an open source Matlab alternative.

https://octave.org/

scotteric3y ago

You know a personal license is only $149 USD right? You could then run your old code. Toolkits for home are $49.

michaelmior3y ago

$149 is still a significant amount for many people. Especially for something non-essential.

lightweightbaby3y ago

I also made something similar using Python a long time ago [1]. It's a extremely simple script so it should still work.

[1] https://github.com/DanielAllepuz/ImageToSound

syx3y ago

This is very interesting what was the Aphex Twin’s track with this concept?

twelvechairs3y ago

Its usually called 'formula' or 'equation' - B side of 'windowlicker'. There's a video at the link below

https://www.reddit.com/r/Damnthatsinteresting/comments/kvjil...

swah3y ago

https://www.youtube.com/watch?v=wSYAZnQmffg at around 5m30

jcelerier3y ago

From the look of the pictures there's a log() missing somewhere, no?

ssghOP3y ago

Author here. This is a basic spectrogram visualizer that's mobile friendly. It allows to select regions on the spectrogram and play them separately. There is no grand plan behind this web app: it's just a handy basic tool to capture sounds on your phone and see what they look like.

echelon3y ago

This is incredible! One of the best spectral tools I've seen.

Can we hire you to help us improve the (broken) spectral visualizations on our app?

Example: https://fakeyou.com/tts/result/TR:9jy3vew9w0s3ew4keay9m330rd...

I would so love to hire you to help us. This is freaking cool.

Even if you're not interested, mad props. I really love this.

ssghOP3y ago

Your spectrogram looks elongated horizontally because the FFT window size is too large. I use window size 1024 with sample rate 48000 Hz, so one window covers 1024/48000=0.02 sec. This window size looks optimal in most cases: if you change it in my web app, you'll see that all other window sizes get the spectrogram blurry in different ways, but at 1024 it gets into focus.

Of course, don't forget the window function (Hann, or raised cosine), but it looks like you've got that covered because your spectrogram looks smooth.

The color palette looks good in your case. FWIW, my color function is like this: pow(fft_amp, 1.5) * rgb(9, 3, 1). The pow() part brightens the low/quiet amplitudes, and the (9,3,1) multiplier displays 10x wider amp range by mapping it to a visually long black->orange->yellow->white range of colors. Note, that I don't do log10 mapping of the amplitudes.

timlod3y ago

In case OP doesn't respond, I could probably help you with this - feel free to send an email!

KennyBlanken3y ago

Not OP, but...why on earth does having your site open in firefox nearly set my computer on fire?

slhck3y ago

Nice tool. Some suggestions:

- Allow playback via Space button. Show a play marker to let the user know where in the sample they are, even without having selected a part.

- Choose a sample that is easier on the ears than high-pitched bird song. I was really shocked when the first loud part came.

montag3y ago

Looks like it says “mime type is not supported” on mobile Safari.

ssghOP3y ago

It uses "audio/webm;codecs=opus" to record mic. Now it's possible to change it in the config menu in the top right. Safari probably needs audio/mp3. Edit: also consider "audio/foo;codecs=pcm" where "foo" is something compatible with Safari.

geraldhh3y ago

ugh could you maybe add some code to detect apple-platforms and set this accordingly (like batteries included)?

edit: tyvm, nice idea! would very much like to try it

_emacsomancer_3y ago

This is a problem of iOS not supporting modern efficient codecs.

grugagag3y ago

I get the same error on Iphone/Safari

lokar3y ago

Also iPhone/chrome

1 more reply

wpietri3y ago

Very neat! May I suggest adding a button to switch to log scale for frequency? I love the ability to select and play back just a particular set of frequencies. But voice uses only about ~15% of the screen height [1], so it's hard to play with.

[1] https://en.wikipedia.org/wiki/Voice_frequency

ssghOP3y ago

You can select an area and zoom into it. Another option is to change sample rate in the config in the top right.

wpietri3y ago

Zooming is not really a way to get what I'm after, because I was trying to hear particular bands one after the other. E.g., trying to listen to one octave after the next. And since the octave relationship isn't linear, I'm thinking a non-linear scale would better match what I was trying to do.

Groxx3y ago

Quite neat, thanks for sharing! I've never been able to play portions like this before, it's interesting.

Is there any way to make this display in real time, or is that not (currently?) possible with audio APIs?

Bewelge3y ago

The WebAudio API has an anlayser node that can create spectrogramms in real-time. The ones I've created in the past were nowhere near as detailed as this one though.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_A...

lokimedes3y ago

Hi, I love it. Perhaps we should chat about making it for radio data as well? We could potentially use it for our radar systems.

shangers3y ago

Can I ask what kind of use cases would a spectrogram have for radar data? I've been messing around with making my own spectrogram app as well (linux desktop app and not web app though) and would be stoked to know if there's any potentially easy to reach use cases for it

lokimedes3y ago

We basically make doppler radars - here frequency-shift is proportional to the speed of the object. Most other radars (pulse-radars) uses the bandwidth of a pulse to gain range resolution (the wider the bandwidth the better the res).

Here's a few (very old) plots of how our radars see the world through a spectrogram: https://weibelradars.com/space/space-industry/

What would be cool, would be a browser-based way to do soft analysis of these plots.

galcerte3y ago

Radar signals are modulated in very specific ways, which are visible right away even with plenty of noise on a spectrogram. Classifying the modulation of radar signals is something common in military contexts, since it allows you to listen for emissions and be able to tell if it's an enemy or an ally. I bet it has more uses than that, but it's the first one I could think of.

ssghOP3y ago

You can reach me at: ssgh at mm dot st.

rickcarlino3y ago

I get the mime error on Firefox mobile. Very interesting idea though, hope I can try it in the future.

a-dub3y ago

i really like the filter definition as selection rectangle on time frequency plot ux,

ssghOP3y ago

Yeah, that's a bandpass filter, essentially. But I did it the lazy way: audio signal -> FFT -> zero out unwanted frequencies -> inverse FFT.

orbisvicis3y ago

When I read about ultrasonic cross-device trackers in advertising [1], I installed "org.woheller69.audio_analyzer_for_android" and "hans.b.skewy1_0" (automatic ultrasonic detection) and started scanning through TV channels after running some test tones. Suffice to say I didn't find any, but the entire process was quite fun. There's also "org.billthefarmer.scope" which is an oscilloscope with a spectrum (not spectrogram).

1. https://arstechnica.com/tech-policy/2015/11/beware-of-ads-th...

vishnuharidas3y ago

Web apps like this that accesses user's data should provide samples for users to experiment and explore before they have to give access to their actual data.

ssghOP3y ago

Very reasonable ask, sir. In fact, I had added sample.mp3, but forgot to add a button in the UI. Now that's fixed.

nick_m3y ago

Brilliant work - I "get" how this works, I've just spent about half-an-hour playing with this (Chrome browser on my kitchen ChromeBook), singing into it and letting it "listen" to the ambient background noise here (old cooker clock ticking, fridge compressor rumbling occasionally). Useful, educational, and fun also - thanks for publishing/hosting this so others can enjoy it!

vjerancrnjak3y ago

Very nice app.

I usually use Audacity to inspect the spectrogram of FLAC files and see if they really are 44100Hz or if someone packaged a constant rate 320kbps mp3 encode into a FLAC file.

Now I can just check it in my browser :D

firefoxd3y ago

Simple, straight to the point, and super useful.

One place I used these was on a toy AI assistant. I recorded myself saying a trigger word thousands of times, cut the audio in pieces and converted each to a spectrogram image. I then feed those to a training model to help recognize the trigger word.

Before the spectrogram, i was feeding the wav file directly, it was incredibly intensive on my laptop. But the image files were easier to process in real time. This tool can be used for debugging.

jxmorris123y ago

How would this work with AI? Don’t you need to train the model to discriminate between the trigger word and other words? If all that’s seen during training is the trigger word, the model will just learn to say “yes” to everything, if you get what I mean.

firefoxd3y ago

Yes, i have recorded myself talking on the phone for hours as well. I should have clarified that.

HarHarVeryFunny3y ago

Nice - very fast (using WebGPU?).

I like the interesting ability to play a "rectangular" (time + frequency limited) section of the audio.

ssghOP3y ago

I do have a WebGL-based implementation of FFT, but here I used good old JS. When properly written, it gets translated into really fast machine code, which is even faster than WebAssembly (I tried!). WebGL's problem is the high toll on the CPU--GPU bridge. When you need to transfer a block of audio data from CPU to GPU to perform calculations, you wait. When you need to transfer the FFT data back, you wait. These waits quickly outweight everything else. However on wavelet transforms GPU comes first because you can store some pre-computed FFTs on GPU and reuse them in multiple runs.

brunorsini3y ago

Izotope, associated with MIT researchers, makes arguably the best such tool for the pro audio industry. Their RX suite is truly miraculous, allowing audio engineers to visualize frequencies in a similar manner, but also offering brush-like tools to do things such as "deleting a dog bark from a guitar take" fairly easily.

an-unknown3y ago

Seems like you never saw or used SpectraLayers (commercial tool from Steinberg) or Sonic Visualiser (OSS project). Both have much more advanced visualization capabilities than RX. However, RX definitely has the more advanced "semi-automated" editing / repair features.

brunorsini3y ago

I've witnessed a large number of studios across the US and Latam using RX on a regular basis — places recording anything from indie stars to Grammy-winning artists.

peepwaah3y ago

Can you recommend any good references to begin understanding the Spectrogram ? I work in DL based Noise cancellation - major part of my work involves analyzing spectrograms - I find it very difficult to do my work without having an ability to critically analyze these images. Any help from anybody ?

picture3y ago

What do you mean by "understanding the Spectrogram"? The graph itself is straightforward: x axis is time, y axis is frequency. The intensity of each pixel represents the intensity of a certain frequency component and a certain point in time.

If you're referring to generating spectrograms with Fourier transforms, you will need some math background to properly do the calculation by hand. It largely just boils down to "find the amount of each frequency over time"

Last question, if this is the premise your work, shouldn't you know about it already?

HarHarVeryFunny3y ago

For human speech:

o The tall vertical lines reflect "plosives" - sudden releases of sound energy often at the begining of words from having mouth/airway closed then open, as in the first letter of "put" or "tea"

o The high frequencies come from "fricatives" like the first letter of "see" or "free" where air is being passed through the teeth or almost closed lips

o The lower frequencies are where most of the recognizable speech content is, corresponding to the way the resonant frequencies of the mouth and throat are being changed (articulation) by moving the tongue, lips and teeth. Specifically the speech content is in changes to the "formants" which are the changing resonant frequencies showing up as bright mostly horizontal bands in the lower frequencies

Noise may show up in various ways depending on what the noise source is. A fixed frequency spectrum background hum is going to show up as one or more horizontal frequency bands across the entire spectrogram. High frequency noise is going to show up as much more energy in the higher frequencies, which don't have a lot of energy for clean speech (fricatives only).

djsamseng3y ago

Thanks for sharing this! I didn’t know about these terms before. Every consider writing a blog post/tutorial on your knowledge of human speech in spectrograms? This is much more digestible than most of what’s out there

djsamseng3y ago

This is a pretty good introductory primer. https://medium.com/analytics-vidhya/understanding-the-mel-sp...

1. STFT (get frequencies from the audio signal)

2. Log scale/ decibel scale (since we hear on the log scale)

3. Optionally convert to the Mel scale (filters to how humans hear)

Happy to answer any questions

peepwaah3y ago

Thanks for your effort in sharing the link- am kind of comfortable with most of the theoretical aspects of STFT/FFT/MelScale etc.. but when i look at the spectrogram i still feel am missing something. When i look at the spectrogram i want to know how clear is the quality of the speech in the audio - is there background noise - Is there a reverb - Is there a loss anywhere - I have a feeling that these are possible to be learnt from analyzing spectrograms but not sure how to do it. Hence the question.

timlod3y ago

I would recommend constructing some spectrograms from specific sounds, especially simulated ones, to help you connect the visual with the audible.

For example:

- Sine sweeps (a sine wave that starts at a low frequency and sweeps up to a high one) - to learn associate the frequencies you hear with the Y-axis

- Sine pulses at various frequencies - to better understand the time axis

- different types of noise (e.g. white)

Perhaps move on to your own voice as well, and try different scales (log or mel spectrograms, which are commonly used).

With this, I think you can develop a familiarity quickly!

0xFEE1DEAD3y ago

Look for clear and distinct frequency bands corresponding to the vocal range of human speech (generally around 100 Hz to 8 kHz). If the frequency bands are well defined and distinct then the speech is likely clear and intelligible. If the frequency bands are blurred or fuzzy then the speech may be muffled or distorted.

Note that speech like any audio source consists of multiple frequencies, a fundamental frequency and its harmonics.

Background noise can be identified as distinct frequency bands that are not part of the vocal range of human speech. E.g. if you see lots of bright lines below or above the human vocal range then there's lots of background noise. Especially lower frequencies can have a big impact on the perceived clarity of a recording whereas high frequencies come of as being more annoying.

Noise within the frequency range of human speech is harder to spot and you should always use your ears to decide whether it's noise or not.

You can also use a spectrogram to check for plosives (e.g. "s" "k" "t" sounds) as they also can make a recording sound bad/harsh.

djsamseng3y ago

Unfortunately I think the answer is “we don’t know” we have loads of techniques (ex: band pass filter) and hypotheses (ex: harmonic frequencies and timbre) but we haven’t been able to implement them perfectly which seems to be why deep learning has worked so well.

Personally I hypothesize that the reason it’s so hard is that the sources are intermixed sharing frequencies so isolating to certain frequencies doesn’t isolate a speaker. We’d need something like beam forming to know how much amplitude of each frequency to extract. I’d also hypothesize that humans, while able to focus on a directional source, also cannot “extract” clean signal either (imagine someone talking while a pan crashes on the floor - it completely drowns out what the person said)

1 more reply

nomel3y ago

https://news.ycombinator.com/item?id=33668004

bitsinthesky3y ago

This is a lot of fun

nixpulvis3y ago

Would be awesome if it told me what file types it supported and also helped transcode some things like videos.

Looks very interesting though.

djmips3y ago

Very nice, I tried on my phone and was really enjoyed being able to intuitively select regions to playback. Very fun.

k8si3y ago

What we really need is PraaS (Praat as a Service). Praat Cloud Edition. Etc.

ddingus3y ago

I like this! Easy to use, fun, useful.

Nice work.

j / k navigate · click thread line to collapse

69 comments

jboy553y ago

Here is a spectrogram of the track, "Look" from the album, "Songs about my Cats" by Venetian Snares.

https://imgur.com/sRe6Ypv

Aphex twin did something similar, but this is more playful in my opinion.

epiccoleman3y ago

Mick Gordon did some fun hidden spectrogram imagery in the Doom 2016 soundtrack.

https://youtu.be/U4FNBMZsqrY

quickthrower23y ago

https://venetiansnares.bandcamp.com/track/look

Warning - this music freaked my dog out!

jo-m3y ago

I once (badly) did something similar as a student [1].

Unfortunately it's in Matlab so I can not run it any more.

[1] https://jo-m.ch/posts/2015/01/hack-the-spectrum-hide-images-...

sorenjan3y ago

If you want to run it again you could try Octave, an open source Matlab alternative.

https://octave.org/

scotteric3y ago

You know a personal license is only $149 USD right? You could then run your old code. Toolkits for home are $49.

michaelmior3y ago

$149 is still a significant amount for many people. Especially for something non-essential.

lightweightbaby3y ago

I also made something similar using Python a long time ago [1]. It's a extremely simple script so it should still work.

[1] https://github.com/DanielAllepuz/ImageToSound

syx3y ago

This is very interesting what was the Aphex Twin’s track with this concept?

twelvechairs3y ago

Its usually called 'formula' or 'equation' - B side of 'windowlicker'. There's a video at the link below

https://www.reddit.com/r/Damnthatsinteresting/comments/kvjil...

swah3y ago

https://www.youtube.com/watch?v=wSYAZnQmffg at around 5m30

jcelerier3y ago

From the look of the pictures there's a log() missing somewhere, no?

ssghOP3y ago

echelon3y ago

This is incredible! One of the best spectral tools I've seen.

Can we hire you to help us improve the (broken) spectral visualizations on our app?

Example: https://fakeyou.com/tts/result/TR:9jy3vew9w0s3ew4keay9m330rd...

I would so love to hire you to help us. This is freaking cool.

Even if you're not interested, mad props. I really love this.

ssghOP3y ago

Of course, don't forget the window function (Hann, or raised cosine), but it looks like you've got that covered because your spectrogram looks smooth.

timlod3y ago

In case OP doesn't respond, I could probably help you with this - feel free to send an email!

KennyBlanken3y ago

Not OP, but...why on earth does having your site open in firefox nearly set my computer on fire?

slhck3y ago

Nice tool. Some suggestions:

- Allow playback via Space button. Show a play marker to let the user know where in the sample they are, even without having selected a part.

- Choose a sample that is easier on the ears than high-pitched bird song. I was really shocked when the first loud part came.

montag3y ago

Looks like it says “mime type is not supported” on mobile Safari.

ssghOP3y ago

geraldhh3y ago

ugh could you maybe add some code to detect apple-platforms and set this accordingly (like batteries included)?

edit: tyvm, nice idea! would very much like to try it

_emacsomancer_3y ago

This is a problem of iOS not supporting modern efficient codecs.

grugagag3y ago

I get the same error on Iphone/Safari

lokar3y ago

Also iPhone/chrome

1 more reply

wpietri3y ago

[1] https://en.wikipedia.org/wiki/Voice_frequency

ssghOP3y ago

You can select an area and zoom into it. Another option is to change sample rate in the config in the top right.

wpietri3y ago

Groxx3y ago

Quite neat, thanks for sharing! I've never been able to play portions like this before, it's interesting.

Is there any way to make this display in real time, or is that not (currently?) possible with audio APIs?

Bewelge3y ago

The WebAudio API has an anlayser node that can create spectrogramms in real-time. The ones I've created in the past were nowhere near as detailed as this one though.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_A...

lokimedes3y ago

Hi, I love it. Perhaps we should chat about making it for radio data as well? We could potentially use it for our radar systems.

shangers3y ago

lokimedes3y ago

Here's a few (very old) plots of how our radars see the world through a spectrogram: https://weibelradars.com/space/space-industry/

What would be cool, would be a browser-based way to do soft analysis of these plots.

galcerte3y ago

ssghOP3y ago

You can reach me at: ssgh at mm dot st.

rickcarlino3y ago

I get the mime error on Firefox mobile. Very interesting idea though, hope I can try it in the future.

a-dub3y ago

i really like the filter definition as selection rectangle on time frequency plot ux,

ssghOP3y ago

Yeah, that's a bandpass filter, essentially. But I did it the lazy way: audio signal -> FFT -> zero out unwanted frequencies -> inverse FFT.

orbisvicis3y ago

1. https://arstechnica.com/tech-policy/2015/11/beware-of-ads-th...

vishnuharidas3y ago

Web apps like this that accesses user's data should provide samples for users to experiment and explore before they have to give access to their actual data.

ssghOP3y ago

Very reasonable ask, sir. In fact, I had added sample.mp3, but forgot to add a button in the UI. Now that's fixed.

nick_m3y ago

vjerancrnjak3y ago

Very nice app.

I usually use Audacity to inspect the spectrogram of FLAC files and see if they really are 44100Hz or if someone packaged a constant rate 320kbps mp3 encode into a FLAC file.

Now I can just check it in my browser :D

firefoxd3y ago

Simple, straight to the point, and super useful.

Before the spectrogram, i was feeding the wav file directly, it was incredibly intensive on my laptop. But the image files were easier to process in real time. This tool can be used for debugging.

jxmorris123y ago

firefoxd3y ago

Yes, i have recorded myself talking on the phone for hours as well. I should have clarified that.

HarHarVeryFunny3y ago

Nice - very fast (using WebGPU?).

I like the interesting ability to play a "rectangular" (time + frequency limited) section of the audio.

ssghOP3y ago

brunorsini3y ago

an-unknown3y ago

brunorsini3y ago

I've witnessed a large number of studios across the US and Latam using RX on a regular basis — places recording anything from indie stars to Grammy-winning artists.

peepwaah3y ago

picture3y ago

Last question, if this is the premise your work, shouldn't you know about it already?

HarHarVeryFunny3y ago

For human speech:

o The tall vertical lines reflect "plosives" - sudden releases of sound energy often at the begining of words from having mouth/airway closed then open, as in the first letter of "put" or "tea"

o The high frequencies come from "fricatives" like the first letter of "see" or "free" where air is being passed through the teeth or almost closed lips

djsamseng3y ago

This is a pretty good introductory primer. https://medium.com/analytics-vidhya/understanding-the-mel-sp...

1. STFT (get frequencies from the audio signal)

2. Log scale/ decibel scale (since we hear on the log scale)

3. Optionally convert to the Mel scale (filters to how humans hear)

Happy to answer any questions

peepwaah3y ago

timlod3y ago

I would recommend constructing some spectrograms from specific sounds, especially simulated ones, to help you connect the visual with the audible.

For example:

- Sine sweeps (a sine wave that starts at a low frequency and sweeps up to a high one) - to learn associate the frequencies you hear with the Y-axis

- Sine pulses at various frequencies - to better understand the time axis

- different types of noise (e.g. white)

Perhaps move on to your own voice as well, and try different scales (log or mel spectrograms, which are commonly used).

With this, I think you can develop a familiarity quickly!

0xFEE1DEAD3y ago

Note that speech like any audio source consists of multiple frequencies, a fundamental frequency and its harmonics.

Noise within the frequency range of human speech is harder to spot and you should always use your ears to decide whether it's noise or not.

You can also use a spectrogram to check for plosives (e.g. "s" "k" "t" sounds) as they also can make a recording sound bad/harsh.

djsamseng3y ago

1 more reply

nomel3y ago

https://news.ycombinator.com/item?id=33668004

bitsinthesky3y ago

This is a lot of fun

nixpulvis3y ago

Would be awesome if it told me what file types it supported and also helped transcode some things like videos.

Looks very interesting though.

djmips3y ago

Very nice, I tried on my phone and was really enjoyed being able to intuitively select regions to playback. Very fun.

k8si3y ago

What we really need is PraaS (Praat as a Service). Praat Cloud Edition. Etc.

ddingus3y ago

I like this! Easy to use, fun, useful.

Nice work.

j / k navigate · click thread line to collapse