[ my public key: https://keybase.io/henrycee; my proof: https://keybase.io/henrycee/sigs/EBGzBLcJMuCMvo8jU00sPExZtR-bAMKF4hxnHO_wQDA ]
It seems to me that these comments stem from the DeepMind results from last summer[0] and February this year[1]. As I understand it, the models they're using for these tasks are very specialised to the task and also only accept formal language as input (i.e. not a textual or visual representation that a large multi-modal model could use).
I was having a read through the Proof or Bluff paper[2] this morning and while I don't think it's been reproduced yet, they found that none of the tested SOTA LLMs were able to make any meaningful progress (none scored over 5%) on solving questions in their test set. This corresponds with my limited experience in using LLMs for similar tasks. Needless to say I've not heard a peep about this paper from my AI safety friends.
My question is: How should I interpret the above? Maybe it's too cynical but my current thesis is that the DeepMind results are convenient headline-grabbers for the AI safety crowd, who are conflating the performance of a task-specific model with more general LLMs in order to make an unsubstantiated claim about progress in generalisable AI. Is that reasonable? What am I missing?
If the authors of Proof or Bluff are in here I'd also like to say thanks for doing the work on this. I can imagine that work like this isn't the sexiest but it is so refreshing seeing people take the time and care to generate some hard data about how good these models actually are. As someone considering a career switch at the moment, data like this is really useful context when trying to evaluate what the next few decades might look like.
[0] https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level
[1] https://techcrunch.com/2025/02/07/deepmind-claims-its-ai-performs-better-than-international-mathematical-olympiad-gold-medalists
[2] https://arxiv.org/abs/2503.21934v1
I'm the CTO of an early-stage saas company in the UK and we're starting operations in Nigeria. We are totally focused on building out the team there with a dedicated hiring function (the first thing we learned about doing business in Nigeria is that whatever you're doing, you _have_ to know someone to get it done right) but we'd like to get moving fast and start hiring some Nigerian tech talent.
My question is, outside of Andela/Toptal/etc. (which we're trying to avoid) could anyone recommend any platforms/jobs boards/other approaches for tech hiring that might be good places to start. Linkedin and Indeed look fairly decent but another thing we've learned about Nigeria is you don't know unless you know.
Thanks and happy to chat to anyone else doing or looking business in Nigeria, would love to hear from you.
Henry (henry@nanumo.com)
For whatever reason, retries aren't available/practical and even then I've seen (and er, also written) bugs where the webhook was consumed incorrectly but we returned a 204 anyway; so even if retries were available we would never have seen one.
I've seen a few products for sending webhooks (e.g. Diahook, Gowebhooks) getting discussed over the past few weeks but not anything on the other receiving side. Building a HA webhook consumption system from scratch seems like reinventing the wheel for my current project and I was wondering if anyone has any strategies/products they could recommend?