To make an image you need two axis, one horizontal, one vertical. Because the earth is already moving if you have a single photodiode looking at the stars you will scan a line if you wait a while. By adding a second axis of motion such as a speaker which you modulate with a saw-tooth curve you can add a second axis. If you then scan over time you can re-constitute the image by plotting the dots with the intensity captured by the diode taking into account the driving voltage of the speaker and the time that has passed.
I get that. I guess I was just confused as to why your first thought was to use a speaker to move the photo diode as opposed to an Arduino with a stepper motor.
Oh, I just had an even nicer idea that I really will have to try. I won't say what it is yet to not spoil the surprise but I'm sure that it will work :)