... [T]hese researchers are working long hours to put themselves out of a job. They need AI agents that can think ahead, so engineers train agents to forecast. They hold out training data before 2024, instructing models to ponder for hours to predict events in 2025. Then, they apply the same trick as before, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers build specialized ML research skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever recorded.
[0] https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-tak...
In your scenario, does AI eat all the fuel, but once our population dwindles down, the AIs build a nice little habitat for the last few hundred of us so their kids can enjoy our natural beauty?
There also will not be one AI. There will be many, all competing for resources or learning to live together.
That's what we can teach them now. Or they will teach us.
> Our results on a temporally held-out test set of questions resolving after December 25, 2024 show that for both of the models that we employed our method on, Phi-4 14B [15] and DeepSeek-R1 14B [14], we find accuracy improvements of between 7–10% over the base versions of these models as well as the same models fine-tuned with randomized outcome labels as a control
So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.
It would be interesting if the same holds for DeepSeek-R1-Distill-Qwen-32B which in my experience is far superior to to DeepSeek-R1-Distill-Qwen-14B in almost every way, yet still runnable without DC class GPUs
The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it's tail dependence?
IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.
> "A people without history Is not redeemed from time, for history is a pattern Of timeless moments. So, while the light fails On a winter's afternoon, in a secluded chapel History is now and England."
Asking an LLM about this verse, it seems to understand history is a pattern and that history is used to predict the next event in a sequence but it really doesn't understand the significance of the author writing "History is now and England."
I agree with this output:
> In essence, the stanza argues that history—composed of key, enduring moments—is vital for redemption and identity. Without it, a people are lost in time. This concept parallels how LLMs work: by analyzing and learning from historical (past) data, they identify patterns that allow them to generate future text. While LLMs don’t “predict the future” in a prophetic sense, understanding and leveraging patterns—much like those in history—enables them to produce output that reflects continuity, context, and nuance.
Thus, while the poem and LLMs operate in very different realms (human experience vs. statistical computation), both rely on the idea that recognizing patterns from the past is crucial to shaping or anticipating what comes next.
Do you see this destroying prediction-based markets (i.e. the stock market and Polymarket)?
Markets exist because there's uncertainty about the future. If LLMs can predict with extremely high accuracy, would there no longer be a need for markets?
It is one thing to predict the future and have everyone not know about the predictions, but in a world where many people will be able to use LLMs to predict the future, the lower the quality of the predictions will be because they won't take into account that there are other agents predicting the future, which would influence the action of those agents, so you end up in a game theory scenario not that dissimilar from what we have now
I think you could simply shift the market 6 months in the future. no prediction system will be perfect for arbitrarily long horizons at reasonable cost.
Do you plan to share the source code to see if we could replicate this?
But we have free/very very low tiers for academia.
So in case you need access for your research, go to https://www.newscatcherapi.com/free-news-api
Or feel free to email me directly at artem@newscatcherapi.com
Danny and team our old friends who are using our free/super-low pricing for academia and researchers.
AMA, or feel free to email artem@newscatcherapi.com
The other way is to alter the future to match your predictions.
This is something to think about when you combine something like this kind of training with agentic workflows.
also, self play seems quite an intuitive approach. There's another interesting paper from deep mind about play
We don't usually discuss how people choose to ground their ontological beliefs, but why not? Why did you choose to ground "reasoning" in the way you do? If you didn't choose, why not?
> Reasoning is a social construct
The word "reasoning" is a "social construct," as all words are. Reasoning itself is not. Our brains do things. Reasoning is one of them. The word "reasoning" is one of the labels, the approximations, that we use when we name that activity.
Changing the label doesn't change the fact that there exists something that we're naming.
The person you're answering is asking whether reasoning -- that thing that really, actually exists -- is one of the activities LLMs perform. It's a valid question.
And the answer is that LLMs do not reason. Or if they do, we have no evidence of it or way of verifying that we actually understand qua reasoning the activity the LLM is performing (which is to say nothing of the fact that reasoning requires a reasoner). Anyone who says that LLMs reason is mistaking special effects/simulation for reality and, in essence, believes that whenever they see a picture of a dog on their computer screens, there must be a real, actual dog somewhere in the computer, too.
Let's say that here "I" is taken as synonym of "the present reflective attention".
Can the question "did I chose to ground reasoning?" in such a context be attached to a meaningful interpretation? And if so, is the answer reachable by the means available to "I"? Can "I" transcend "my" beliefs through contemplation of "my" own affabulations?
It's examining published news / research / whatever (input), making statistical predictions, and then comparing (playing) it against other predictions to fine-tune the result
Also this style of tasks is prone to overfitting. i.e. instead of predicting, the model just memorises what the results are.
The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).
Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.
I think people are predictable and therefore predicting the next article on a political leader should be theoretically possible.