undefined | Better HN

0 pointsraducu1y ago0 comments

> Only if the output from Claude is correct. If not...

Had a task at work to clear unused metrics.

Exported a whole dashboard, thought about regexes to extract metrics out of xml (bad, I know) asked chat gpt to produce the one-liners to produce the data.

Got 22 used metrics.

Next day I just gave chat gpt the whole file and asked it to spit all the used metrics.

46 used metics.

Asked Claude, Deepseek and Gemini the same question. Only Gemini messed it up by missing some, duplicating some.

Re-checked the one-liners chat-gpt produced. Turns out it/I messes up when I told it to generate a list of unique metrics from a file containing just the metric names one per line. What I wanted was a script/one-liner that would print all the metric names just once (de-duplicate) and chat-gpt ad-literam produced a script that only prints metrics that show up exactly once in the whole file.

In the end, just asking LLMs to simply extract the names from the grafana dashboard worked better, parsing out expressions, only producing unique metrics names and all that, but there was no way to know for sure, just that given that 3/4 of the LLMs produced the same output meant it was most likely corect.

I fixed the programatic approach and got thr same result, but it was a very wiered feeling asking the LLMs to just give me the result of what for me was a whole process of many steps.

0 comments

HumanOstrich1y ago

Are you sure you didn't also have a bunch of typos in your prompts? ;)

raducuOP1y ago

Unlike humans, LLMs seem to deal surprisingly well with typos.

Freed from the "the other human must not be up to my exquisite eloquency " and given that it's a machine that I'm talking to (20 years of "the compiler is never wrong") -- I've learned more about my communication inadequacies through talking with LLMs in the past 2 years than 40 years of talking to humans.