I have written a matrix multiplication lib once because I had some a priori about the data and could remove half the operations.
Everything has a cost (usually in time) and everything comes with trade offs (usually a different set of bugs). Not-invented-here comes with a gigantic upfront time cost and a larger set of bugs. The best heuristic depends on what you’re optimizing for. There’s never an easy answer.