Oh my god. This means that we can use stable diffusion to do pure text processing, like, given two textual descriptions, assess how similar they are
I now expect that the next model like GPT-3 will be multi-modal like Stable Diffusion, to better account for those semantic connections