I think you spoke too soon about their failure, sooner they will be much easier to program [1].
Interestingly, Nvidia GPU now is also moving to tile-based GPU programming model that targets portability for NVIDIA Tensor Cores [2]. Recently there're discussions on the topic at HN [3].
[1] Developing a BLAS Library for the AMD AI Engine [pdf]:
https://uni.tlaan.nl/thesis/msc_thesis_tristan_laan_aieblas....
[2] NVIDIA CUDA Tile:
https://developer.nvidia.com/cuda/tile
[3]CUDA Tile Open Sourced (103 comments):