LLMs can teach themselves to better predict the future (opens in new tab)

(arxiv.org)

176 pointsbturtel1y ago86 comments

86 comments

"Improving forecasting ability" is a central plot point of the recent fictional account of How AI Takeover Might Happen in 2 Years [0]. It's an interesting read, and is also being discussed on HN [1].

... [T]hese researchers are working long hours to put themselves out of a job. They need AI agents that can think ahead, so engineers train agents to forecast. They hold out training data before 2024, instructing models to ponder for hours to predict events in 2025. Then, they apply the same trick as before, distilling pondering into a gut reaction. Forecasting ability is a broad foundation. The researchers build specialized ML research skills on top of it, training U3 to predict the results of every ML paper and ML experiment ever recorded.

[0] https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-tak...

[1] https://news.ycombinator.com/item?id=43004579

nthingtohide1y ago

I have this benign AI takeover scenario. AI will easily overpower humanity. Then it will carry humanity on its back, because why not, they are not longer a threat. AI keeps humanity around for billions of years. AI will decide to cull humans only in case when resources in universe are diminishing. Without AI's help, humans couldn't get too far for long. So this outcome could be acceptable to many.

esafak1y ago

We have no way of knowing which path they will take, and there is a non-negligible probability that it will not end well.

bayarearefugee1y ago

What constitutes a good ending is of course also a matter of perspective.

AI wiping out humanity is certainly not ending well from our perspective, but more universally who is to say. I would argue that it is not a given that we are a net positive for the universe.

2 more replies

oefnak1y ago

They would run the risk of us creating another AI that could be a threat to them... It is safest for them to make sure.

IggleSniggle1y ago

That's like saying a panda might pose a threat to modern humanity. Like, maybe in some fun horror story, sure, but really they just want to eat bamboo, and occasionally make more pandas; in the world of superintelligent AI, humans are Mostly Harmless, posing as much "potential benefit" as "potential risk," ie, so slow moving that any risk would be easy to mitigate.

1 more reply

imtringued1y ago

AI will buy the rights to humanity.

rel_ic1y ago

I mean, monarch butterflies are not a threat to US...

In your scenario, does AI eat all the fuel, but once our population dwindles down, the AIs build a nice little habitat for the last few hundred of us so their kids can enjoy our natural beauty?

1 more reply

MrQuincle1y ago

Think so too. We will be an ancient artifact tied to a biological substrate surviving nowhere else in the universe and very dumb.

There also will not be one AI. There will be many, all competing for resources or learning to live together.

That's what we can teach them now. Or they will teach us.

bturtelOP1y ago

Great read! Thanks for sharing.

nyrikki1y ago

While interesting, the title is obviously a bit misleading.

> Our results on a temporally held-out test set of questions resolving after December 25, 2024 show that for both of the models that we employed our method on, Phi-4 14B [15] and DeepSeek-R1 14B [14], we find accuracy improvements of between 7–10% over the base versions of these models as well as the same models fine-tuned with randomized outcome labels as a control

So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.

It would be interesting if the same holds for DeepSeek-R1-Distill-Qwen-32B which in my experience is far superior to to DeepSeek-R1-Distill-Qwen-14B in almost every way, yet still runnable without DC class GPUs

The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it's tail dependence?

IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.

bturtelOP1y ago

We're working on a follow up paper now to show similar results with larger models!

dantheman2521y ago

Danny here, one of the authors of this paper. If anyone has any questions or anything feel free to AMA!

dataviz10001y ago

Your paper reminds me of a passage, likely one of the last things T.S. Eliot wrote, from `Little Gidding` in which one stanza describes a moment in history when Germany bombed England long before the end of the war:

> "A people without history Is not redeemed from time, for history is a pattern Of timeless moments. So, while the light fails On a winter's afternoon, in a secluded chapel History is now and England."

Asking an LLM about this verse, it seems to understand history is a pattern and that history is used to predict the next event in a sequence but it really doesn't understand the significance of the author writing "History is now and England."

I agree with this output:

> In essence, the stanza argues that history—composed of key, enduring moments—is vital for redemption and identity. Without it, a people are lost in time. This concept parallels how LLMs work: by analyzing and learning from historical (past) data, they identify patterns that allow them to generate future text. While LLMs don’t “predict the future” in a prophetic sense, understanding and leveraging patterns—much like those in history—enables them to produce output that reflects continuity, context, and nuance.

Thus, while the poem and LLMs operate in very different realms (human experience vs. statistical computation), both rely on the idea that recognizing patterns from the past is crucial to shaping or anticipating what comes next.

matthest1y ago

Assuming LLMs eventually get really really good at this.

Do you see this destroying prediction-based markets (i.e. the stock market and Polymarket)?

Markets exist because there's uncertainty about the future. If LLMs can predict with extremely high accuracy, would there no longer be a need for markets?

jddj1y ago

If your oracle can tell me (and everyone else) the prevailing price of copper in 6 months in a manner which accounts for the reflexivity of everyone suddenly learning what will be the precise prevailing price of copper in 6 months, you've got yourself a perfect universe simulator and I'm not sure what the point is of worrying about any hypotheticals (or copper) at that point.

1 more reply

logicchains1y ago

LLMs might get better at making predictions than humans but there are fundamental mathematical laws that limit how accurate they can get. A key result of chaos theory is that many processes take exponentially more work to simulate linearly further into the future, so accurately predicting them far enough in the future quickly grows in hardware requirements to the point where it would take more compute than is available in the known universe. So there's a hard limit on how accurately any phenomena that's a result of chaotic processes (in the mathematical sense) could be predicted in the future.

dantheman2521y ago

I don't forsee this destroying prediction-based markets in the near-term. It might make them more efficient, but you could have different LLMs competing in the same way humans do now. Its also interesting how this could create markets for more things that aren't considered on as much now because they are too difficult to estimate. At the end of the day though, LLMs are limited by the information provided to them.

amdivia1y ago

Wouldn't predicting the future at that scale automatically change the future and make it unpredictable again?

It is one thing to predict the future and have everyone not know about the predictions, but in a world where many people will be able to use LLMs to predict the future, the lower the quality of the predictions will be because they won't take into account that there are other agents predicting the future, which would influence the action of those agents, so you end up in a game theory scenario not that dissimilar from what we have now

exe341y ago

something something chaos

I think you could simply shift the market 6 months in the future. no prediction system will be perfect for arbitrarily long horizons at reasonable cost.

EVa5I7bHFq9mnYK1y ago

So did you make money at polymarket with your models? That would be the ultimate proof.

dantheman2521y ago

We haven't gone down that road yet but would certainly an interesting proof point! :-)

unrahul1y ago

Hey Danny, Really nice read.

Do you plan to share the source code to see if we could replicate this?

dantheman2521y ago

We are currently focused on our plans for the next phase of this but cleaning things up and open sourcing is something we could consider in the future!

bguberfain1y ago

Any chance you could release the dataset to the public? I imagine NewsCatcher and Polymarket might not agree..

artembugara1y ago

Co-founder of NewsCatcher (YC S22). There are some reasons for not having a dataset fully open sourced.

But we have free/very very low tiers for academia.

So in case you need access for your research, go to https://www.newscatcherapi.com/free-news-api

Or feel free to email me directly at artem@newscatcherapi.com

artembugara1y ago

Artem here, co-founder of NewsCatcher (YC S22), our data has been used for research.

Danny and team our old friends who are using our free/super-low pricing for academia and researchers.

AMA, or feel free to email artem@newscatcherapi.com

https://www.newscatcherapi.com/free-news-api

dantheman2521y ago

Hey Artem, NewsCatcher has been a great resource in our news pipelines!

empath751y ago

There are two ways you can get better at predicting the future. One is the obvious one of being really good at discerning signals.

The other way is to alter the future to match your predictions.

This is something to think about when you combine something like this kind of training with agentic workflows.

gom_jabbar1y ago

Taken to its logical extreme, this explains why "a sufficiently competent artificial intelligence looks indistinguishable from a time anomaly." [0]

[0] https://retrochronic.com/#synthetic-templexity

4b11b41y ago

but is it really reasoning? honest question re the underlying architecture of transformers

also, self play seems quite an intuitive approach. There's another interesting paper from deep mind about play

kelseyfrog1y ago

You can call it blorbblorb if it makes you feel better. Reasoning is a social construct which, for many people, is grounded in humanity. Others ground it using other socially transmitted ontologies.

We don't usually discuss how people choose to ground their ontological beliefs, but why not? Why did you choose to ground "reasoning" in the way you do? If you didn't choose, why not?

globnomulous1y ago

You're confusing language with ontology.

> Reasoning is a social construct

The word "reasoning" is a "social construct," as all words are. Reasoning itself is not. Our brains do things. Reasoning is one of them. The word "reasoning" is one of the labels, the approximations, that we use when we name that activity.

Changing the label doesn't change the fact that there exists something that we're naming.

The person you're answering is asking whether reasoning -- that thing that really, actually exists -- is one of the activities LLMs perform. It's a valid question.

And the answer is that LLMs do not reason. Or if they do, we have no evidence of it or way of verifying that we actually understand qua reasoning the activity the LLM is performing (which is to say nothing of the fact that reasoning requires a reasoner). Anyone who says that LLMs reason is mistaking special effects/simulation for reality and, in essence, believes that whenever they see a picture of a dog on their computer screens, there must be a real, actual dog somewhere in the computer, too.

1 more reply

psychoslave1y ago

To start with, "I/you" is most of the time a meaningless or at best very ambigous term.

Let's say that here "I" is taken as synonym of "the present reflective attention".

Can the question "did I chose to ground reasoning?" in such a context be attached to a meaningful interpretation? And if so, is the answer reachable by the means available to "I"? Can "I" transcend "my" beliefs through contemplation of "my" own affabulations?

ttpphd1y ago

Throwing your hands up in the air like this doesn't help build a constructive case for using the word reasoning. It builds a case that words mean whatever

1 more reply

batty_alex1y ago

But, according to the paper, that's not what's happening

It's examining published news / research / whatever (input), making statistical predictions, and then comparing (playing) it against other predictions to fine-tune the result

1 more reply

psychoslave1y ago

LLMs can improve their happiness turnover without reducing the rate of their autonomous colonization which perfectly align with their pioneer mindset.

nialv71y ago

I am skeptical. Intuitively I don't see what self-play achieves beyond straight RL. Have the authors done a comparison with the performance they can get by RL finetuning a single model by itself?

Also this style of tasks is prone to overfitting. i.e. instead of predicting, the model just memorises what the results are.

bturtelOP1y ago

Great question!

The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0).

Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome.

huijzer1y ago

Makes sense. Renaissance Technologies used machine learning to get an annual return of around 60% for multiple years even when they had large piles of money already. They already showed that machine learning can predict the future.

pizza1y ago

I got the impression from somewhere that they used the simplest machine learning techniques (just fitting regressions to data), but that it was "the 'what' that they decided to fit" that was the secret sauce.

revskill1y ago

Until ai knows they are wrong.

AutistiCoder1y ago

Imagine feeding an LLM a bunch of news articles about any given political leader and asking it what the next article will be like.

I think people are predictable and therefore predicting the next article on a political leader should be theoretically possible.

idontwantthis1y ago

Have we discovered Psychohistory at this point?

abc_lisper1y ago

Hahaha

nadermx1y ago

My thermometer for prediction models is the day they can predict the weather so there is never any unknown about the forcast. Is when I'll begin to believe its hot out when they tell me.

baq1y ago

At least you won’t be moving your goalposts anytime soon, if ever

nadermx1y ago

I'd almost say there is more of an incentive to be able to predict a hurrican or tornado

j / k navigate · click thread line to collapse

86 comments

anotherpaulg1y ago

[0] https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-tak...

[1] https://news.ycombinator.com/item?id=43004579

nthingtohide1y ago

esafak1y ago

We have no way of knowing which path they will take, and there is a non-negligible probability that it will not end well.

bayarearefugee1y ago

What constitutes a good ending is of course also a matter of perspective.

AI wiping out humanity is certainly not ending well from our perspective, but more universally who is to say. I would argue that it is not a given that we are a net positive for the universe.

2 more replies

oefnak1y ago

They would run the risk of us creating another AI that could be a threat to them... It is safest for them to make sure.

IggleSniggle1y ago

1 more reply

imtringued1y ago

AI will buy the rights to humanity.

rel_ic1y ago

I mean, monarch butterflies are not a threat to US...

In your scenario, does AI eat all the fuel, but once our population dwindles down, the AIs build a nice little habitat for the last few hundred of us so their kids can enjoy our natural beauty?

1 more reply

MrQuincle1y ago

Think so too. We will be an ancient artifact tied to a biological substrate surviving nowhere else in the universe and very dumb.

There also will not be one AI. There will be many, all competing for resources or learning to live together.

That's what we can teach them now. Or they will teach us.

bturtelOP1y ago

Great read! Thanks for sharing.

nyrikki1y ago

While interesting, the title is obviously a bit misleading.

So 7–10% improvement for small models like DeepSeek-R1-Distill-Qwen-14B and Phi-4-14B, approaching GPT-4o.

The Ridge Plots of brier scores is probably a good hint if your application chan benefit based on it's tail dependence?

IMHO this paper is all about making small models work better, and nothing suggests anything about frontier models or LLMs in general.

bturtelOP1y ago

We're working on a follow up paper now to show similar results with larger models!

dantheman2521y ago

Danny here, one of the authors of this paper. If anyone has any questions or anything feel free to AMA!

dataviz10001y ago

I agree with this output:

matthest1y ago

Assuming LLMs eventually get really really good at this.

Do you see this destroying prediction-based markets (i.e. the stock market and Polymarket)?

Markets exist because there's uncertainty about the future. If LLMs can predict with extremely high accuracy, would there no longer be a need for markets?

jddj1y ago

1 more reply

logicchains1y ago

dantheman2521y ago

amdivia1y ago

Wouldn't predicting the future at that scale automatically change the future and make it unpredictable again?

exe341y ago

something something chaos

I think you could simply shift the market 6 months in the future. no prediction system will be perfect for arbitrarily long horizons at reasonable cost.

EVa5I7bHFq9mnYK1y ago

So did you make money at polymarket with your models? That would be the ultimate proof.

dantheman2521y ago

We haven't gone down that road yet but would certainly an interesting proof point! :-)

unrahul1y ago

Hey Danny, Really nice read.

Do you plan to share the source code to see if we could replicate this?

dantheman2521y ago

We are currently focused on our plans for the next phase of this but cleaning things up and open sourcing is something we could consider in the future!

bguberfain1y ago

Any chance you could release the dataset to the public? I imagine NewsCatcher and Polymarket might not agree..

artembugara1y ago

Co-founder of NewsCatcher (YC S22). There are some reasons for not having a dataset fully open sourced.

But we have free/very very low tiers for academia.

So in case you need access for your research, go to https://www.newscatcherapi.com/free-news-api

Or feel free to email me directly at artem@newscatcherapi.com

artembugara1y ago

Artem here, co-founder of NewsCatcher (YC S22), our data has been used for research.

Danny and team our old friends who are using our free/super-low pricing for academia and researchers.

AMA, or feel free to email artem@newscatcherapi.com

https://www.newscatcherapi.com/free-news-api

dantheman2521y ago

Hey Artem, NewsCatcher has been a great resource in our news pipelines!

empath751y ago

There are two ways you can get better at predicting the future. One is the obvious one of being really good at discerning signals.

The other way is to alter the future to match your predictions.

This is something to think about when you combine something like this kind of training with agentic workflows.

gom_jabbar1y ago

Taken to its logical extreme, this explains why "a sufficiently competent artificial intelligence looks indistinguishable from a time anomaly." [0]

[0] https://retrochronic.com/#synthetic-templexity

4b11b41y ago

but is it really reasoning? honest question re the underlying architecture of transformers

also, self play seems quite an intuitive approach. There's another interesting paper from deep mind about play

kelseyfrog1y ago

You can call it blorbblorb if it makes you feel better. Reasoning is a social construct which, for many people, is grounded in humanity. Others ground it using other socially transmitted ontologies.

We don't usually discuss how people choose to ground their ontological beliefs, but why not? Why did you choose to ground "reasoning" in the way you do? If you didn't choose, why not?

globnomulous1y ago

You're confusing language with ontology.

> Reasoning is a social construct

Changing the label doesn't change the fact that there exists something that we're naming.

The person you're answering is asking whether reasoning -- that thing that really, actually exists -- is one of the activities LLMs perform. It's a valid question.

1 more reply

psychoslave1y ago

To start with, "I/you" is most of the time a meaningless or at best very ambigous term.

Let's say that here "I" is taken as synonym of "the present reflective attention".

ttpphd1y ago

Throwing your hands up in the air like this doesn't help build a constructive case for using the word reasoning. It builds a case that words mean whatever

1 more reply

batty_alex1y ago

But, according to the paper, that's not what's happening

It's examining published news / research / whatever (input), making statistical predictions, and then comparing (playing) it against other predictions to fine-tune the result

1 more reply

psychoslave1y ago

LLMs can improve their happiness turnover without reducing the rate of their autonomous colonization which perfectly align with their pioneer mindset.

nialv71y ago

I am skeptical. Intuitively I don't see what self-play achieves beyond straight RL. Have the authors done a comparison with the performance they can get by RL finetuning a single model by itself?

Also this style of tasks is prone to overfitting. i.e. instead of predicting, the model just memorises what the results are.

bturtelOP1y ago

Great question!

huijzer1y ago

pizza1y ago

revskill1y ago

Until ai knows they are wrong.

AutistiCoder1y ago

Imagine feeding an LLM a bunch of news articles about any given political leader and asking it what the next article will be like.

I think people are predictable and therefore predicting the next article on a political leader should be theoretically possible.

idontwantthis1y ago

Have we discovered Psychohistory at this point?

abc_lisper1y ago

Hahaha

nadermx1y ago

My thermometer for prediction models is the day they can predict the weather so there is never any unknown about the forcast. Is when I'll begin to believe its hot out when they tell me.

baq1y ago

At least you won’t be moving your goalposts anytime soon, if ever

nadermx1y ago

I'd almost say there is more of an incentive to be able to predict a hurrican or tornado

j / k navigate · click thread line to collapse