Aphex twin did something similar, but this is more playful in my opinion.
He talks about that and plenty of other cool stuff in his talk at the 2017 GDC conference. One of my favorite conference talks ever, he did so much cool experimentation to get the sounds he used on the soundtrack, and watching his talk is one of those moments where you really get to see a master of his craft let loose and explain his process.
Warning - this music freaked my dog out!
Unfortunately it's in Matlab so I can not run it any more.
[1] https://jo-m.ch/posts/2015/01/hack-the-spectrum-hide-images-...
https://www.reddit.com/r/Damnthatsinteresting/comments/kvjil...
Can we hire you to help us improve the (broken) spectral visualizations on our app?
Example: https://fakeyou.com/tts/result/TR:9jy3vew9w0s3ew4keay9m330rd...
I would so love to hire you to help us. This is freaking cool.
Even if you're not interested, mad props. I really love this.
Of course, don't forget the window function (Hann, or raised cosine), but it looks like you've got that covered because your spectrogram looks smooth.
The color palette looks good in your case. FWIW, my color function is like this: pow(fft_amp, 1.5) * rgb(9, 3, 1). The pow() part brightens the low/quiet amplitudes, and the (9,3,1) multiplier displays 10x wider amp range by mapping it to a visually long black->orange->yellow->white range of colors. Note, that I don't do log10 mapping of the amplitudes.
- Allow playback via Space button. Show a play marker to let the user know where in the sample they are, even without having selected a part.
- Choose a sample that is easier on the ears than high-pitched bird song. I was really shocked when the first loud part came.
Is there any way to make this display in real time, or is that not (currently?) possible with audio APIs?
https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_A...
1. https://arstechnica.com/tech-policy/2015/11/beware-of-ads-th...
I usually use Audacity to inspect the spectrogram of FLAC files and see if they really are 44100Hz or if someone packaged a constant rate 320kbps mp3 encode into a FLAC file.
Now I can just check it in my browser :D
One place I used these was on a toy AI assistant. I recorded myself saying a trigger word thousands of times, cut the audio in pieces and converted each to a spectrogram image. I then feed those to a training model to help recognize the trigger word.
Before the spectrogram, i was feeding the wav file directly, it was incredibly intensive on my laptop. But the image files were easier to process in real time. This tool can be used for debugging.
I like the interesting ability to play a "rectangular" (time + frequency limited) section of the audio.
If you're referring to generating spectrograms with Fourier transforms, you will need some math background to properly do the calculation by hand. It largely just boils down to "find the amount of each frequency over time"
Last question, if this is the premise your work, shouldn't you know about it already?
o The tall vertical lines reflect "plosives" - sudden releases of sound energy often at the begining of words from having mouth/airway closed then open, as in the first letter of "put" or "tea"
o The high frequencies come from "fricatives" like the first letter of "see" or "free" where air is being passed through the teeth or almost closed lips
o The lower frequencies are where most of the recognizable speech content is, corresponding to the way the resonant frequencies of the mouth and throat are being changed (articulation) by moving the tongue, lips and teeth. Specifically the speech content is in changes to the "formants" which are the changing resonant frequencies showing up as bright mostly horizontal bands in the lower frequencies
Noise may show up in various ways depending on what the noise source is. A fixed frequency spectrum background hum is going to show up as one or more horizontal frequency bands across the entire spectrogram. High frequency noise is going to show up as much more energy in the higher frequencies, which don't have a lot of energy for clean speech (fricatives only).
1. STFT (get frequencies from the audio signal)
2. Log scale/ decibel scale (since we hear on the log scale)
3. Optionally convert to the Mel scale (filters to how humans hear)
Happy to answer any questions
Looks very interesting though.
Nice work.