Chunking strategy is really difficult and, like you say, so important to RAG. I'm currently battling with it in a "Podcast archive -> active social trend" clip-finder app I'm working on. You have to really understand your source material and how it's formatted, consider preprocessing, consider when and where semantic breaks happen and how you can deterministically handle that in the specific domain.
Adjacency similarity is a must, otherwise you leave perfectly cromulent results on the table because they didn't have the right cosine score in a vacuum.
There is some early stuff from Apple's research labs and the ColBERT team in late attention embedding (https://arxiv.org/abs/2112.01488) which looks to ease that burden, and generate compressed token-level embeddings across a document.