1 MB would be enough for 45 seconds audio at 8 bit PCM @22 kHz. If they're half competent, they could use ADPCM or much better (and cheaper to decode too!) vector quantization.
With VQ, you could go even down to ~1 bits (~6 minutes of audio per megabyte) per sample while maintaining good audio quality. Decoding is very simple, so 6502 would have plenty of oomph to do that.