Having looked into it a little bit (i.e., having read the wikipedia page about acoustic fingerprinting) I see that there are quite a few implementations in existence and that they appear to be fairly reliable at identifying songs based on the fingerprints -- which is really cool, first of all. Taking that for granted, it seems like facebook could solve its problem by creating a comprehensive database of songs which its music content partners offer.
In particular, I'm thinking of a database with entries containing an audio fingerprint, data about the artist, song, album, and so on, and links to the song on each of the partners' services. Then it would be possible to initially loop through each song offered by the partners, generate an acoustic fingerprint, check if it exists in the database[1], and then add a new record or add a link to an existing record as appropriate. This would be a huge process, of course, but afterwards you'd only have to perform this step for new songs, and you'd ultimately end up with a very impressive database of music IDs.
Anyway, that's undoubtedly a simplistic idea of what could be done and I'm surely glossing over some major obstacles, but I enjoyed indulging in some wishful thinking about it all the same.
[1] As I mentioned, I don't know anything about acoustic fingerprinting, but intuitively I can imagine it's probably not as easy checking if that fingerprint "exists" in the database. More likely, you'd have to check how similar it is to existing fingerprint and choose some threshold above which it's sufficiently likely to be the same song.
If you want to know more about how to implement it on your own, you could use this as a resource: http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
I assume the biggest drawback to be that Shazam uses a microphone, which is obviously subject to ambient noise. Shazam doesn't actually seem to have much of a problem with this, but I'm guessing a passive listener that sampled the audio output device instead of a noisy ambient microphone would work much more reliably.
http://en.wikipedia.org/wiki/Isrc
A few years back I worked for a startup that was cataloging recorded music. I recall that the major labels would provide us with ISRC, among other things, for all of their recordings. I don't recall whether independent labels and artists used it though. I would guess that it varied.
(See also ISWC, which is for the composition, not the recorded performance of it. ISWC/ISRC is a bit like class/instance.)
It is the reason xISBN exists. http://www.worldcat.org/affiliate/webservices/xisbn/app.jsp
Ideally an ID cross-reference would be an open dataset, but it is difficult to achieve that in the market. A proprietary xref is better than no xref.
As someone who implemented metadata matching of two distinct musical catalogs: First you do search, then you do ranking, then you take the best result. You need to take all the needle metadata (isrc, albums' upc/icpn, title, version, album, artists), and then
- If there are results with the same ISRC, it's cool. Choose the best matching album (by UPC, then by title, then by album version)
- If there aren't, match the track + artist pair and then choose the best matching album for it.
- If you don't have ISRC match and can not match on track title + artist, you should probably bail out.
This way you both won't miss a track in compilation, neither would you prefer The Hit Crew to the actual good performer.
Most international content has ISRC. Local, independent and DIY would probably not. But it's usually easier to match because it doesn't have dozens of different recordings for tracks nor endless realms of compilations.
The variety of online music sites/services is a strange double edged sword - I'm really happy that so many diverse music products exist, but the fragmentation of online music makes it nearly impossible to build a meaningful social experience around music. An Rdio user interacts with and consumes music one way, a SoundCloud user thinks of online music another way, and the two of them can't even figure out if they're listening to the same song - literally.
Accurate, cross-platform music ID resolution is the first step in fixing all of this. And it's not an easy problem.
(Full disclosure: I build http://flock.fm which uses EchoNest for song/artist resolution, and I also recently won some swag from them in a contest.)