Grapheme cluster counts can’t be used because they’re unstable across Unicode versions. Some algorithms use UTF8 byte offsets - but I think that’s a mistake because they make input validation much more complicated. Using byte offsets, there’s a whole lot of invalid states you can represent easily. Eg maybe insert “a” at position 0 is valid, but inserting at position 1 would be invalid because it might insert in the middle of a codepoint. Then inserting at position 2 is valid again. If you send me an operation which happened at some earlier point in time, I don’t necessarily have the text document you were inserting into handy. So figuring out if your insertion (and deletion!) positions are valid at all is a very complex and expensive operation.
Codepoints are way easier. I can just accept any integer up to the length of the document at that point in time.