The difference is that the IPA tells you
how to use your anatomy to create the right sound. Someone with the right knowledge can look at a word in IPA and know how to construct it to get an accurate reproduction.
You might be able to hear an unfamiliar word and try to reproduce it, but in a lot of cases you will only get, at best, a rough approximation unless you also know how to shape your tongue, where the sounds come from and what the accent is.
English is a very forgiving language, generally speaking, but highly tonal languages (like Vietnamese and Mandarin) won’t let you get away with simple mimicry, at least not without a lot of work to nail the pronunciation.