I used it personally, did a lot of research (including asking questions to the creator of microWakeWord), and submitted an upstream PR (I think it's already merged), which improved the resulting model slightly. I imagine the Nvidia version is similar, but I don't have experience with it. I also noticed that the model is so small (~25000 parameters), the actual training part doesn't even noticably improve with the GPU, only the TTS voice generation really only uses it.
if you are using this, I strongly recommend you create lots of personal samples with the recorder. I personally used 400, 200 from myself and 200 from my partner, with varying moods and in all the rooms we plan on using the assistant. I am considering re-training with more samples. it takes effort, but the resulting model seems to be well worth it.