You need to maintain millisecond sync while the devices are widely separated, possibly for hours. Cell phone clocks are not millisecond-accurate over hours. The idea is that a group of people all download the app and walk or drive around the area of interest collecting time-stamped audio and sending it to a server.
Amusingly, data collection would be easier with analog technology. Get some VHS walkie-talkies, and, at the base station, feed all the channels into a multitrack tape recorder like musicians use. No sync problems.