Off-topic, but you seem like you might know: why does text copied from PDFs sometimes have messed-up spaces? It seems to guess where the spaces should go based on kerning, so with justified text, a widely-spaced line may come out with a space between each letter, while a narrowly-spaced one has no spaces at all.
(Also the thing where it inserts line breaks at the end of every print line is maddening)