I was keeping track of the good ones, and don't have many notes on the bad ones.
I do remember testing "LoKuS" last week and it was quite terrible (sometimes gave completely off-topic answers). It scored as one of the highest 13B models on the leaderboard (~65 average), but appears to be removed now.