undefined | Better HN

0 pointssolid_fuel1y ago0 comments

I wouldn't expect an LLM to be good at spell checking, actually. The way they tokenize text before manipulating it makes them fairly bad at working with small sequences of letters.

I have had good luck using an LLM as a "sanity checking" layer for transcription output, though. A simple prompt like "is this paragraph coherent" has proven to be a pretty decent way to check the accuracy of whisper transcriptions.

0 comments

sdesol1y ago

Yes this is a tokenization error. If you rewrite the sentence as shown below:

https://app.gitsense.com/?doc=905f4a9af74c25f&model=Claude+3...

Claude 3.5 Sonnet will now misinterpret "GitHub as "Github"

j / k navigate · click thread line to collapse

0 pointssolid_fuel1y ago0 comments

I wouldn't expect an LLM to be good at spell checking, actually. The way they tokenize text before manipulating it makes them fairly bad at working with small sequences of letters.

0 comments

sdesol1y ago

Yes this is a tokenization error. If you rewrite the sentence as shown below:

https://app.gitsense.com/?doc=905f4a9af74c25f&model=Claude+3...

Claude 3.5 Sonnet will now misinterpret "GitHub as "Github"

j / k navigate · click thread line to collapse