The basic operation that a NN needs accelerating is... go figure multiply and accumulate with the added activation function.
See for example how the Intel NPU is structured here: https://intel.github.io/intel-npu-acceleration-library/npu.h...
The fact that they also support vector operations or matrix multiplication is kind of irrelevant and not a defining characteristic of DSPs. If you want to go that far, then everything is a DSP, because all signals are analog.
Maybe also note that Qualcomm has renamed their Hexagon DSP to Hexagon NN. Likely the change was adding activation functions but otherwise its a VLIW architecture with accelerated MAC operations, aka a DSP architecture.
What makes a DSP different from a GPU is the algorithms typically do not scale nicely to large matrices and vectors. For example, recursive filters. They are also usually much cheaper and lower power, and the reason they lost popularity was because Arm MCUs got good enough and economy of scale kicked in.
I've written code for DSPs both in college and professionally. It's much like writing code for CPUs or MCUs (it's all C or C++ at the end of the day). But it's very different from writing compute shaders or designing an ASIC.