Had a task at work to clear unused metrics.
Exported a whole dashboard, thought about regexes to extract metrics out of xml (bad, I know) asked chat gpt to produce the one-liners to produce the data.
Got 22 used metrics.
Next day I just gave chat gpt the whole file and asked it to spit all the used metrics.
46 used metics.
Asked Claude, Deepseek and Gemini the same question. Only Gemini messed it up by missing some, duplicating some.
Re-checked the one-liners chat-gpt produced. Turns out it/I messes up when I told it to generate a list of unique metrics from a file containing just the metric names one per line. What I wanted was a script/one-liner that would print all the metric names just once (de-duplicate) and chat-gpt ad-literam produced a script that only prints metrics that show up exactly once in the whole file.
In the end, just asking LLMs to simply extract the names from the grafana dashboard worked better, parsing out expressions, only producing unique metrics names and all that, but there was no way to know for sure, just that given that 3/4 of the LLMs produced the same output meant it was most likely corect.
I fixed the programatic approach and got thr same result, but it was a very wiered feeling asking the LLMs to just give me the result of what for me was a whole process of many steps.