Small string optimizations, while nice (and probably do average out to being beneficial), aren't always needed, and the extra generated code for handling both cases could make it not worth it if you've got a fast allocator, and can even make some operations just outright slower. (and if your code doesn't actually have strings anywhere near hot loops, all you get is a larger binary). File paths, for example, are often large enough to not fit the small case, but still small enough where even the check for whether it is small can be a couple percent of allocation/freeing.
Being error-prone, though, is something that I can agree with. That's the cost of doing things manually.
(I'd also like to note that malloc/free are a much more important case of a bad abstraction - they have quite a bit of overhead for being able to handle various lengths & multithreading, while a big portion of allocations is on the same (often, only) thread with a constant size, and said size being known at free-time, which is a lot more trivial to handle; not even talking about the cost of calling a non-inlined function spilling things to the stack)