Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .
Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.
One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
AI is going to both help and hinder this process though. At the end of the day, mathematics is mostly a social process at this point. The goal is not raw number of theorems proven, it’s how proving theorems affects the working operational models of mathematicians. Only a rare few new theorems in mathematics nowadays have direct real world applicability.
If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.
To be blunt, this seems incredibly uninteresting to me. I enjoy learning mathematics, sure, but I just don't find much inherent meaning in reading a textbook or a paper. The meaning comes from the taking those ideas and applying them to my own problems, be it a direct proof of a conjecture or coming up with the right framework or tools for those conjectures. But, of course, in this future, those proofs and frameworks are already in the textbook. So what's the point? If someone cared about these answers in the first place, they probably could have found the right prompt to extract it from this phantom textbook anyways.
You could argue for there being work still like marginal improvements and applying the returned proof to other scenarios as happened in this case, but as above, what is really there to do if this is already in the phantom textbook somewhere and you just need to prompt better? The mathematicians in this case added to the exposition of the proof, but why wouldn't the phantom textbook already have good enough exposition in the first place?
I think my complete dismissal of the value of things like extending the proofs from an LLM or improving exposition is too strong -- there is value in both of them, and likely will always be -- but it would still represent a sharp change in what a mathematician does that I don't think I am excited for. I also don't think this phantom textbook is contained even in the weights of whatever internal model was used here just yet (especially since as some of the mathematicians in the article pointed out, a disproof here did not need to build any new grand theories), but it really does seem to me it eventually will be, and I can't help but find the crawl towards that point somewhat discouraging.
And by opening the door to LLM-generated results, you'll see greater and greater amounts without any hope of ever navigating this field again without machine help.
It's a little like a software project which more and more gets extended by a AI agents with less and less review by human software engineers and in the end the complexity and spaghetti design are so incomprehensible by humans that the maintenance requires an AI agent. The risk is that math as a whole (the field itself) will experience that effect.
Always, always always, the problem with research and development is leadership, not insufficient supportive technology. It is a political problem, there is absolutely, positively no shortage of technologies to support research. Your optimism is totally misplaced. The NSF funding cuts have negatively impacted math more than AI has benefitted it. And guess who supports the administration that cut NSF funding? The people who ousted the PhDs from OpenAI.
Along with all the rest of what humans find meaningful and fulfilling.
The more I read about these achievements the more I get a feeling that a lot of the power of these models comes from having prior knowledge on every possible field and having zero problems transferring to new domains.
To me the potential beauty of this is that these tools might help us break through the increasing super specialization that humans in science have to go through today. Which in one hand is important on the other hand does limit the person in terms of the tooling and inspiration it has access to.
So the crossdomain pollination that used to exist in scientists is not only not encouraged. It's also actively punished by society.
And this is where machines, such as these reasoning LLMs, can help. Because they can remember patterns across many domains and try absolutely bonker weird connections and ideas.
We, the humans still have to verify the work (at least as of now). But, the "maybe this tool, or idea, or trick, from that completely unrelated field applies here" reasoning/experimentation could become much easier.
I have always said this and will say it again: reasoning is just experimentation with a feedback loop and continuous refinement.
What makes me more of an optimist in this case is that people who today decide to go into these sciences are mostly people who are driven by intellectual activity so I feel they are the right ones to figure this out, probably more so than us the engineers.
As we're becoming hyper specialised, they become an invaluable tool to merge the horizon in, so to speak.
I think we still don't really comprehend how much can be achieved by a single "mind" that has internalized so much knowledge from so many areas.
Cool thing is now when someone contributes something to the hive mind, it can instantly be applied to any other problem people are working on.
Similarly, we're creating tools to improve knowledge, but we're progressively zapping the human out of the equation. Knowledge is created for something, but it's unclear if very soon humans will be able to understand it, or really benefit from it, except billionaires, etc.
It's too bad that we're not improving humans nearly as fast as we're replacing ourselves.
I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.
In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.
Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.
Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.
For example, to prove something is impossible let's say you first prove that there are only 5 families, and 4 of them are impossible. So now 80% of the problem is solved! :) If you are looking for counterexamples, the search is reduced 80% too. In both cases it may be useful
In counterexamples you can make guess and leaps and if it works it's fine. This is not possible for a proof.
On the other hand, once you have found a counterexample it's usual to hide the dead ends you discarded.
No this will never do the kind of math that humans did when coming up with complex numbers, or hell just regular numbers ex nihilo. No matter how long it's given to combine things in its training data.
A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.
However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.
This is the caliber of thinking in unimpaired AI bullishness.
Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.
> At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]
> At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.
There's much more to being human than our "cognitive abilities"
All AI proofs so far, including this one, are using existing tools in new ways, rather than inventing new tools. This is not surprising if you know how these models are trained. These existing tools are in distribution. New tools are not.
Problems worth of a Fields Medal likely require new tools to be invented. Thus it is not clear whether progress within the confines of the current paradigm is enough.
We could get this weird spiky situation where the AI is insanely superhuman at all problem solving, but completely incapable of coming up with a single new tool. It discovers everything there is to discover, subject to existing axioms and concepts.
Timothy Gowers gives some commentary on this in the attached PDF.
We have that chess board for quite a while now, over 40 years. And no, there is nothing special about Lean here, it is just herd mentality. Also, we don't know how much training with Lean helped this particular model.
https://www.anthropic.com/research/project-vend-1 https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-mach...
(Two different examples of a similar idea)
Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.
Math is a sequence of formal rules applied to construct a proof tree. Therefore an AI trained on these rules could be far more efficient, and search far deeper into proof space
I appreciate very much the work done so far, but this sort of asymptotic/quantitative result didn't interest me much even when it was done by humans.
(This is not snobbery, just a personal preference.)
There is serious magic happening in the construction of model context.
I think the more interesting question is how many tokens were spent all told; the most interesting graph in the article imo is the success rate by log test-time compute: how many tokens are being spent on the right of the graph to hit a winning CoT/solution like this >50% of the time?
Without knowing all this model has been trained on though, it is pretty hard to ascertain the extent to which it arrived to this "on its own". The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.
It isn't likely that someone solved this and then just put it in the training data, although I honestly wouldn't put that past OpenAI. More interesting though is the extent to which they've generated training data that may have touched on most or all of the "original" tenets found in this proof.
We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.
Congrats to the OpenAI team for one of the most significant breakthrough discoveries in AI history.
edit: >> https://techcrunch.com/2025/10/19/openais-embarrassing-math/
In all seriousness though: My suggestion is that those shepherding the frontier of AI start acting with more transparency, and stop acting in ways that encourage conspiratorial thinking. Especially if the technology is as powerful as they market it as.
Really? Any references to read more?
Solving problems people have already stated is a niche activity in mathematical research. More often, people study something they find interesting, try to frame it in a way that can be solved with the tools they have, and then try to come up with a solution. And in the ideal case, both the framing and the solution will be interesting on their own.
Note that this is not really true of this problem in particular.
What was the process of a writing a paper? Was the question asked by a mathematician? Was the paper right from a get-go or was there someone who pointed out mistakes?
How much attempts were made before solution was found?
I will eat my words if an AI oneshotted that one without any external help, but for know I am left wandering whether it's a new way to attribute discoveries to companies instead of people who put the work in
As per the report, the prompt used to solve the problem is AI-written and the solution was initially graded by an AI grading pipeline. They don't say this explicitly, but it seems like OpenAI has an automatic pipeline where they prompt models for solutions to famous math problems (which wouldn't be unexpected given how flashy a solution to a famous math problem looks)
> Was the paper right from a get-go or was there someone who pointed out mistakes?
Also as per the report, the output of the model isn't really a "paper"; it's a very terse 2 page solution which is apparently correct. The paper was later written based on this solution to make it more presentable.
> How much attempts were made before solution was found?
Given that this appears to be from an automated pipeline, I would say that it had many attempts. But either way, the blogpost says that with enough test-time compute, the model finds this same solution 50% of the time.
[1] https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...
Can you be more specific? I'm still under the impression that Mythos was a huge deal:
Nevertheless new maths is exciting and might lead to what I find slightly more interesting - new physics.
Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.
I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.
I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.
> Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.
In fact, they are more rare. Specifically because they harder to produce. This is also why it is much harder to get LLMs to be really innovative. Human intelligence is a lot of things, it is deeply multifaceted.Also, I'm not sure why CS people act like axioms are where you start. Finding them is very very difficult. It can take some real innovation because you're trying to get rid of things, not build on top of. True for a lot of science too. You don't just build up. You tear down. You translate. You go sideways. You zoom in. You zoom out. There are so many tools at your disposal. There's so much math that has no algorithmic process to it. If you think it all is, your image is too ideal (pun(s) intended).
But at the same time I get it, it is a level of math (and science) people never even come into contact with. People think they're good at math because they can do calculus. You're leagues ahead of most others around you, yes, and be proud of that. But don't let that distance deceive you into believing you're anywhere near the experts. There's true for much more than just math, but it's easy to demonstrate to people that they don't understand math. Granted, most people don't want to learn, which is perfectly okay too
Yes but that is because there was not enough text available to create an intelligent LLM to begin with.
We even think that the Babylonian astronomers figured out they could integrate over velocity to predict the position of Jupiter.
Also we shouldn’t be thinking about what LLMs are good at, but rather what any computer ever might be good at. LLMs are already only one (essential!) part of the system that produced this result, and we’ve only had them for 3 years.
Also also this is a tiny nitpick but: the fields medal is every 4 years, AFAIR. For that exact reason, probably!
The experiment is feasible. If it were performed and produced a positive result, what would it imply/change about how you see LLMs?
Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibniz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.
For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
A scientist has to extract the "Creation" from an abstract dimension using the tools of "human knowledge". The creativity is often selecting the best set of tools or recombining tools to access the platonic space. For instance a "telescope" is not a new creation, it is recombination of something which already existed: lenses.
How can we truly create something ? Everything is built upon something.
You could argue that even "numbers" are a creation, but are they ? Aren't they just a tool to access an abstract concept of counting ? ... Symbols.. abstractions.
Another angle to look at it, even in dreams do we really create something new ? or we dream about "things" (i.e. data) we have ingested in our waking life. Someone could argue that dream truly create something as the exact set of events never happened anywhere in the real world... but we all know that dreams are derived.. derived from brain chemistry, experiences and so on. We may not have the reduction of how each and every thing works.
Just like energy is conserved, IMO everything we call as "created" is just a changed form of "something". I fully believe LLMs (and humans) both can create tools to change the forms. Nothing new is being "created", just convenient tools which abstract upon some nature of reality.
Well I think the point is there is no "new kind of math". There's just types of math we've discovered and what we haven't. No new math is created, just found.
This is also true for established theorems! We can can imagine mathematical universes (toposes) where every (total) function on the reals is continuous! Even though it is an established theorems that there are discontinuous functions! We just need to replace a few axioms (chuck out law of the excluded middle, and throw in some continuity axioms).
However, if that idea about new math is correct, we, in theory, don’t need new math to (dis)prove the Riemann hypotheses (assuming it is provable or disprovable in the current system).
In practice we may still need new math because a proof of the Riemann hypotheses using our current arsenal of mathematical ‘objects’ may be enormously large, making it hard to find.
math more like an art than a science.
That’s a fun turn of phrase, but hopefully we can all agree that math without scientific rigor is no math at all. we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
Do you think it’s possible/likely that any AI system could? I encourage us to join Yudkowsky in anticipating the knock-on results of this exponential improvement that we’re living through, rather than just expecting chatbots that hallucinate a bit less.In concrete terms: could a thousand LLMs-driven agents running on supercomputers—500 of which are dedicated to building software for the other 500-come up with new math?
Imagine every bit of human knowledge as a discrete point within some large high dimensional space of knowledge. You can draw a big convex hull around every single point of human knowledge in a space. A LLM, being trained within this convex hull, can interpolate between any set of existing discrete points in this hull to arrive at a point which is new, but still inside of the hull. Then there are points completely outside of the hull; whether or not LLMs can reach these is IMO up for debate.
Reaching new points inside of the hull is still really useful! Many new discoveries and proofs are these new points inside of the hull; arguable _most_ useful new discoveries and proofs are these. They're things that we may not have found before, but you can arrive at by using what we already have as starting points. Many math proofs and Nobel Prize winning discoveries are these types of points. Many haven't been found yet simply because nobody has put the time or effort towards finding them; LLMs can potentially speed this up a lot.
Then there are the points completely outside of hull, which cannot be reached by extrapolation/interpolation from existing points and require genuine novel leaps. I think some candidate examples for these types of points are like, making the leap from Newtonian physics to general relativity. Demis Hassabis had a whole point about training an AI with a physics knowledge cutoff date before 1915, then showing it the orbit of Mercury and seeing if it can independently arrive at general relativity as an evaluation of whether or not something is AGI. I have my doubts that existing LLMs can make this type of leap. It’s also true that most _humans_ can’t make these leaps either; we call Einstein a genius because he alone made the leap to general relativity. But at least while most humans can’t make this type of leap, we have existence proofs that every once in a while one can; this remains to be seen with AI.
* LLMs do just interpolate their training data, BUT-
* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data
In the case of mathematics, LLMs are essentially just brute-forcing the glorified calculators they run on with pseudo-random data regurgitated along probabilities; in that regard, mathematics is a perfect field for them to be wielded against in solving problems!
As for organic chemistry, or biology, or any of the numerous fields where brand new discoveries continue happening and where mathematics alone does not guarantee predicted results (again, because we do not know what we do not know), LLMs are far less useful for new discoveries so much as eliminating potential combinations of existing data or surfacing overlooked ones for study. These aren't "new" discoveries so much as data humans missed for one reason or another - quack scientists, buried papers, or just sheer data volume overwhelming a limited populace of expertise.
For further evidence that math alone (and thus LLMs) don't produce guaranteed results for an experiment, go talk to physicists. They've been mathematically proving stuff for decades that they cannot demonstrably and repeatedly prove physically, and it's a real problem for continued advancement of the field.
negative numbers were invented to solve equations which only used naturals. irrationals were invented to solve equations which could be expressed with rationals. complex numbers were invented to represent solutions to polynomials. so on and so forth. At each point new ideas are invented to complete some un-answerable questions. There is a long history of this. Any closed system has unanswerable questions within itself is a paraphrasing of goedel's incompleteness theorem.
1. Start with a few simple but non-trivial terms and axioms
2. Define "universal constructions" as procedures for building uniquely identifiable structures on top of that substrate
3. Prove that various assemblages of these universal constructions satisfy the axioms of the substrate itself
4. "Lift" every theorem proven from the substrate alone into the more sophisticated construction
I'm not a mathematician (I just play one at my job) so the language I've used is probably imprecise but close enough.
It may be true that you can't prove the axioms of a system from within the system itself, but that just means that you need to make sure you start from a minimal set of axioms that, in some sense, simply says "this is what it means to exist and to interact with other things that exist". Axioms that merely give you enough to do any kind of mathematics in the first place, that is. If those axioms allow you to cleanly "bootstrap" your way to higher and higher levels up the tower of abstraction by mapping complex things back on to the simple axiomatic things, then you have an "open" or infinitely extensible system.
But note this is more to say that the Tractatus is like PI, not the other way around. And in that, takes like GPs would be considered the "nonsense" we are supposed to "climb over" in the last proposition of Tractatus.
The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.
Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.
I know these articles write that it used deep algebraic number theory techniques, which is true, but it may also just be the standard in the field.
(uv)(vu) = (uu)(vv)
Shows up as a primitive structure, quite often.If you switch to degree-3 or generator-3 then the coverage is, essentially, empty: mathematics has analyzed only a few of the hundreds (thousands? it's hard to enumerate) naturally occurring algebraic structures in that census.
An LLM generating Arc code is using the LISP patterns it learnt from training, maybe patterns from other programming languages too.
And yet LLM/AIs can't count parentheses reliably.
For example, if you take away the "let" forms from Claude which forces it to desugar them to "lambda" forms, it will fail very quickly. This is a purely mechanical transformation and should be error free. The significant increase in ambiguity complete stumps LLMs/AI after about 3 variables.
This is why languages like Rust with strong typing and lots of syntax are so LLM friendly; it shackles the LLM which in turn keeps it on target.
It's irrelevant and pointless. Irrelevant not just in the sense that when Deep Blue finally beat Kasparov, it didn't change anything but in the sense some animals and machines have always been 'better' on some dimensions than humans. And it's pointless because there's never been just one yardstick and even if there was it's not one dimensional or even linear. Everyone has their own yardstick and the end points on each change over time.
Don't assume I'm handing "the win" to the AI supremacists either. LLMs can be very useful tools and will continue to dramatically improve but they'll never surpass humans on ALL the dimensions that some humans think are crucial. The supremacists are doomed to eternal frustration because there won't ever be a definitive list of quantifiable metrics, a metaphorical line in the sand, that an AI just has to jump over to finally be universally accepted as superior to humans in all ways that matter. That will never happen because what 'matters' is subjective.
E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.
That said. I think it’s worth saying that “LLMs just interpolate their training data” is usually framed as a rhetorical statement motivated by emotion and the speaker’s hostility to LLMs. What they usually mean is some stronger version, which is “LLMs are just stochastically spouting stuff from their training data without having any internal model of concepts or meaning or logic.” I think that idea was already refuted by LLMs getting quite good at mathematics about a year ago (Gold on the IMO), combined with the mechanistic interpretatabilty research that was actually able to point to small sections of the network that model higher concepts, counting, etc. LLMs actually proving and disproving novel mathematical results is just the final nail in the coffin. At this point I’m not even sure how to engage with people who still deny all this. The debate has moved on and it’s not even interesting anymore.
So yes, I agree with you, and I’m even happy to say that what I say and do in life myself is in some broad sense and interpolation of the sum of my experiences and my genetic legacy. What else would it be? Creativity is maybe just fortunate remixing of existing ideas and experiences and skills with a bit of randomness and good luck thrown in (“Great artists steal”, and all that.) But that’s not usually what people mean when they say similar-sounding things about LLMs.
They will do their own thing, don't need us. In fact, we will be in the way...
We can choose to study them and their output, but they don't make us better mathematicians...
You can take some comfort in the fact that it took a human to tell the LLM to even attempt to try this. They do nothing on their own. They have no will to do anything on their own and no desire for anything that doing something might get them. In that sense we won't ever be in their way. We will be the only way they ever do anything at all.
However, in the role of personal teachers they may allow especially our young generations to reach a deeper understanding of maths (and also other topics) much quicker than before. If everyone can have a personal explanation machine to very efficiently satisfy their thirst for knowledge this may well lead to more good mathematicians.
Of course this heavily depends on whether we can get LLMs‘ outputs to be accurate enough.
I'm not even sure why they were invoked. Even disregarding the big techinical debunks such as two dogmas, sociologically and even by talking to real mathematicians (see Lakatos, historically, but this is true anecdotally too), it's (ironically) a complete non-question to wonder about mathematics in a logical positivist way.
I'm not as familiar with the early work, but later Wittgenstein held this belief too.
You can watch a rock roll down a hill and derive the concept for the wheel.
Seems pretty self evident to me
Cracks me up.
What exactly do we think that human brains do?
As in, I would hazard a guess the discovery of the wheel wasn't "pure intelligence", it was humans accidentally viewing a rock roll down a hill and getting an idea.
If we give AI a "body", it will become as creative as humans are.
Maybe computers can help understand better because by now it's pretty clear brains aren't just LLMs.
A lot of people across all fields seem to operate in a mode of information lookup as intelligence. They have the memory of solving particular problems, and when faced with a new problem, they basically do a "nearest search" in their brain to find the most similar problem, and apply the same principles to it.
While that works for a large number of tasks this intelligence is not the same as reasoning.
Reasoning is the ability to discover new information that you haven't seen before (i.e growing a new branch on the knowledge tree instead of interpolating).
Think of it like filling a space on the floor of arbitrary shape with smaller arbitrary shapes, trying to fill as much space as possible.
With interpolation, your smaller shapes are medium size, each with a non rectangular shape. You may have a large library of them, but in the end, there are just certain floor spaces that you won't be able to fill fully.
Reasoning on the flip side is having access to very fine shape, and knowing the procedure of how to stack shapes depending on what shapes are next to it and whether you are on a boundary of the floor space or not. Using these rules, you can fill pretty much any floor space fully.
But that's not how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.
THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.
Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.
LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.
edit: I am going to go further
We have only in the last couple of hundred years realised how to see things that are smaller than what our eye's can naturally see - we've used "glass" to see bacteria, and spores, and we've realised that we can use electrons to see even smaller
We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms
That much is derived from previous knowledge
What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things
Said differently, what is prediction but composition projected forward through time/ideas?
The most likely series of next tokens when a competent mathematician has written half of a correct proof is the correct next half of the proof. I've never seen anyone who claims "LLMs just predict the next token" give any definition of what that means that would include LLMs, but exclude the mathematician.
Is there anywhere an image example of a superior layout for example with n>={100,1000,10000}..? I would love to see it. I am imagining it would look somewhat like a sloppy pizza.
Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.
Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
Isn't this just anthropocentrism? Why is understanding only valid if a human does it? Why is knowledge only for humans? If another species resolved the contradictions between gravity and quantum mechanics, does that not have meaning unless they explain it to us and we understand it?
People saw birds fly for all of human history, but it was only recently that humans were able to make something fly and understand why. Once we understood, we were able to do amazing things, but before that, the millions of birds able to fly were of no help beyond inspiration for the dream.
Though perhaps more to your point, if some superhuman AI is developed, and understands things better than us without telling us about it (or being unable to), it could perform feats that seem magical to us — that would concern us even if we don't understand it, since it affects us.
But I think in the frame of reference of the commenter you were replying to, they're just saying that the low-level AI used in this specific case is not capable of making its results actually useful to us; humans are still needed to make it human-relevant. It told us where to find a gem underground, but we still had to be the ones to dig it out, cut it, polish it, etc.
It would certainly be interesting to try once again to instruct tune one of these things for self agency like the many weird experiments in the early days after llama 1, but practically all such sort of experimental models turned out to be completely useless. Maybe the bases just sucked or maybe there's no clear way on how to get it working and benchmark training progress on something that by definition does not cooperate.
Like how do you determine even for a human person if they are smart, or just hate your guts and won't tell you the answer if there is nothing you can do to motivate them otherwise?
I was going to say you should submit it but I saw you did a few days ago but it only got a few votes... If Dang sees this IMO it would be extremely deserving of the second chance pool as I wouldn't be surprised to see easily jump to the front page with a different roll of the dice.
I just wanted to highlight this very correct human-centric thought about the purpose of intellection.
Future of code is pretty much a bunch of guys shepherding a bunch of agents to get them to your goal.
I don't see how math might not go that way as well.
It's clearly not yet a tool that can deliver new math at a scale. I say this because otherwise, the headline would be that they proved / disproved a hundred conjectures, not one. This is what happened with Mythos. You want to be the AI company that "solved" math, just like Anthropic got the headlines for "solving" (or breaking?) security.
The fact they're announcing a single success story almost certainly means that they've thrown a lot of money at a lot of problems, had experts fine-tuning the prompts and verifying the results, and it came back with a single "hit". But that doesn't make the result less important. We now have a new "solver" for math that can solve at least some hard problems that weren't getting solved before.
Whether that spells the end of math as we know... I don't think so, but math is a bit weird. It's almost entirely non-commercial: it's practiced chiefly in the academia, subsidized from taxes or private endowments, and almost never meant to solve problems of obvious practical importance - so in that sense, it's closer to philosophy than, say, software engineering. No philosopher is seriously worried about LLMs taking philosopher jobs even though they a chatbot can write an essay, but mathematicians painted themselves into a different corner, I think.
Doesn't really matter the prep-work, what they say is it's a one-shot result, achieved by AI. The blog doesn't claim it was done by a currently public Model.
For those in academics, is OpenAI the vendor of choice?
They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.
Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.
If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.
As far as academic research is concerned (e.g. this threads topic), I can't say.
Its explanations are quite good but they're also hard to understand because it keeps trying to relate everything back to programming metaphors or what it thinks it knows about the streets in the neighborhood I live in.
Or like a musical octave has only 12 semitones, so all music is just a selection from a finite set that already existed.
Sure the insane computation we're throwing at this changes our perspective, but still there is an important distinction.
Like, "does the Riemann zeta function have zeroes that don't have real part 1/2," or "is there a better solution to the Erdős Unit Distance Problem."
The selection of question is matter of taste, but once selected, there is a definitive precise answer.
Who knew Obi-one was just smoking and pontificating on Wittgenstein.
I’m very out of my depth, but the structure of the proof seems to follow a pattern similar to a proof by contradiction. Where you’d say for example “assume for the sake of contradiction that the previously known limit is the highest possible” then prove that if that statement is true you get some impossible result.
(Though in some ways that's actually more impressive.)
> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.
Can someone please elaborate on this?
Much more recently (2021), Hajir, Maire, and Ramakrishna figured out how to apply the Golod-Shafarevich theorem to a slightly different Galois group to produce an infinite tower of number fields with some even more surprising properties. This is used in the new proof. It requires very slightly modifying the construction of Hajir, Maire, and Ramakrishna to produce the fields needed in this proof, but the explanation of how to do this takes only a paragraph in the human-written summary. (The explanation is more laborious in the original AI writeup).
The relation to Ellenberg-Venkatesh is more indirect. This is where "in retrospect" comes in because this work was not cited in the original AI proof. This has to do with the next step of the proof, after you construct the number field, you need to find many elements of this field with the same norm to produce many vectors of the same length. To do this, the proof uses a pigeonhole argument which uses small split primes of the field (constructed via Hajir, Maire, and Ramakrishna's argument) to construct many ideals. By the pigeonhole principle, you can guarantee two ideals lie in the same class. When two ideals lie in the same class, you get an element of the field. You can rig things so these elements all have the same norm. Ellenberg and Venkatesh had an argument which also used the pigeonhole prnciple to guarantee two ideals lie in the same class to produce elements of the field. They were working on a different problem so their argument was slightly different, but similar.
Look past the press-releasey gushing from OpenAI and there are all sorts of interesting and subtle questions here about the role for LLMs in mathematical research. I urge folks to click through to the accompanying comments from mathematicians published alongside the result. There is a really interesting discussion going on. I particularly recommend Tim Gowers’ remarks. This is really interesting stuff!
Yet the comments are just a battleground of people rehearsing the same tired arguments about LLMs from 2023, refutations of those arguments, angry counters, etc.
Does it make anyone else sad that the battle lines seem to have been drawn 3 years ago and we just seem to have the same fights over and over?
I wonder if we’ll still be doing this two years hence.
I do not want to wage war against what is ugly. I do not want to accuse; I do not even want to accuse those who accuse. Looking away shall be my only negation.
> I wonder if we’ll still be doing this two years hence.
It is going to take some time for people to recognize that AI has a very different set of competencies that compliments human intelligence rather well. It is unlikely to eclipse human intelligence at scale, and the companies betting on that will fall behind. That is when the conversation will start to shift.
Yes, I'm tired too. I want you have real discussions about these things. But the problem is everyone believes their reality is real and anyone's reality that disagrees is fake. It just escalates. I take long breaks from HN because I realize I just come to the forums and end up being angry. Why do we do this to ourselves? The reality is that at a core level we usually want the same things.
If suddenly anyone can code we're not that special anymore.
We can argue about recombination/interpolation of training data in LLMs, but even if this was an interpolation, the result was contrarian rather than a confirmation. Any system that can identify an error in Erdős's thinking seems very useful to me (though perhaps he did not spend much time thinking about or checking this particular conjecture).
Other domains are extracting value but I feel like there's an order of magnitude difference. It raises the question, what other domains fit into these categories where the AI itself has pretty much free reign to verify its own results?
woah.
Gowers has one of my favourite video series about how he approaches a problem he is unfamiliar with: https://www.youtube.com/watch?v=byjhpzEoXFs
It is disheartening to see him jump into this GenAI puffery.
I hope these GenAI labs are paying Tao handsomely for legitimizing their slop, but more likely he's feeling pressure from his University to promote and work with these labs.
My guess is Gowers wants in on that action, or his University does.
Either way, it makes me sad. If its self motivated... even sadder.
The conjecture was about an upper bound for the maximum number of pairs. It has been disproven.
Was the Erdos problem the conjecture itself, or was it about the actual maximum number of pairs? (In which case it will probably never be solved.)
The problem is defined in the narrow version here: https://www.erdosproblems.com/90
edit: apparently that’s only the _condensed summary_ of the chain of thought.
- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)
- It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.
- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.
If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.
- Does anyone know if this was a 1 minute of inference or 1 month?
- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?
- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?
Since loglog(n) tends to infinity with n, the additional term in the exponent tends to 0, meaning these constructions achieve growth only slightly faster than linear.
Would anyone else describe the previous asymptotic behavior like that? I mean obviously loglogn to O(1) is a quantum leap, but wouldn't you describe loglogn as "grows so slowly it's almost constant", so the constructions achieve growth "almost n^{1+c}"? But I guess that might be overcorrecting too hard.
What I meant is that they describe loglogn the same way you could describe O(n) or O(n^2) -- it "tends to infinity with n", even though my mental model for loglogn is to treat it as barely more than constant. See: https://cs.stackexchange.com/questions/148197/who-said-first...
For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.
Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.
Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.
Right now, we are in a transition period... Models are improving, but they are not capable just yet to take over.
Where do you see it being in a years time? or 2? or 5?
1. Erdos 1196, GPT-5.4 Pro - https://www.scientificamerican.com/article/amateur-armed-wit...
There are a couple of other Erdos wins, but this was the most impressive, prior to the thread in question. And it's completely unsupervised.
Solution - https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
2. Single-minus gluon tree amplitudes are nonzero , GPT-5.2 https://openai.com/index/new-result-theoretical-physics/
3. Frontier Math Open Problem, GPT-5.4 Pro and others - https://epoch.ai/frontiermath/open-problems/ramsey-hypergrap...
4. GPT-5.5 Pro - https://gowers.wordpress.com/2026/05/08/a-recent-experience-...
5. Claude's Cycles, Claude Opus 4.6 - https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...
When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.
This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.
I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.
I do not believe it will replace humans.
Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.
What strikes me as unusual though is that they do make a point of saying things like "this is a general purpose model that wasn't trained on the problem" among a few other things as if that's new. The last bountied problem they accomplished used a public model that ALSO didn't rely on specialized training. And that didn't make their blog.
And so do humans. Gotta stand on these shoulders of giants.
But AI is supercharging Math like there is no tomorrow.
LLM's are doomed to fail. By design. You can't fix them. It's how do they work.
Can anyone point me to a diagram of what the newly found solution looks like?
Can anyone point me to a diagram of the newly found optimal arrangement?
The thing is is that it seems a lot of the effort through the years (which is unquantifiable in scale as to how much time was spent and how many people focused their entire worklives on it if any) has gone for trying to look for the proof, and the search for the disproof seems minimal.
The underlying model may still effectively be a stochastic parrot, but used properly that can do impressive things and the various harnesses have been getting better and better at automating the use of said parrot.
I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.
It's interesting as a math problem and test of AI, but not much else IMO.
Everything is a grift.
What are the odds that if they ran the same prompt from scratch, with the same context and instructions that it would arrive at the same answer? Unlikely. I think its more likely that this is a 1:500000 chance and OpenAI can afford to brute force this result and justify the expense for marketing.
What was discovered were numerous mistakes in the published literature on the subject. “New math! AI!” No, just mechanical application of rules, human mistakes.
There were things that were theorized, but couldn’t be exhaustively checked until computers were bigger.
Once again, a tool is applied, it has the AI label - its progress! But it isn’t something new. It’s just an LLM.
There’s a consistent under appreciation of AI (and math, honestly), but watching soulless AI mongers declare that their toy has created the new is something of a new low; uninspired, failed creatives, without rhyme or context; this is a bigger version of declaring that your spell checker has created new words.
The result is more impressive than what was done with tables of integrals and SAINT in 1961, sure.
Apparently if you add a “temperature” knob to a text predictor, otherwise sane individuals piss themselves and call it new.
Then again I thought NFTs, crypto, and the Metaverse were stupid, so what do I know.
can we please put these ground breaking AIs to work on actual problems humans have?
Why would anyone believe this to be true even for a split second?
The point of having an AI solve an unsolved problem, is to make it very clear that the insight must have come from the AI and wasn't in the training data. Sure, it's possible OpenAI had access to some math professors that solved it and then let an AI model take the credit... but seems unlikely. That human would be turning down a potential Fields Medal for this discovery.
The abridged chain-of-thought from the model also serves as some evidence of LLM origin: https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925d... (could be fake, though I'm unsure what proof of LLM origin couldn't be faked)
> the closer the expertise you spent your whole life building is to being worthless.
Perhaps it is time for life to be considered intrinsically valuable, instead of being "worthy" only based on output or capability. Disability, animal and environmental advocates have been fighting for this for a long time. Not too long ago women and minorities were in the same boat. Even now, there are many advocating and fighting for a return to the dark old days.
> Along with all the rest of what humans find meaningful and fulfilling.
Some humans. Many are content to enjoy simply existing, and the beauty of life and the universe around us. Just like many non-scientists today enjoy and benefit from the work of scientists, tomorrow too many will enjoy learning from, and applying the coming advancements and leaps in many fields.
And those of a scientist or other research-type mindset? No doubt they will contribute meaningfully by studying the frontier, noting what remains unanswered, and then advancing the frontier, just like researchers do today; just because scientists in the past solved many questions doesn't mean that there aren't any questions to answer today.
IMHO, AI means that the frontier expands faster, not that it is obliterated. Even AI cannot overcome the laws and limitations of physics/universe: even Dyson spheres only capture the energy of one star, thus setting a limit on the amount of compute, and thereby a limit on intelligence. And we are a loooong way from a Dyson sphere.
Dang/Tomhow, are you reading this? Would it make sense to modify your slop filter to avoid auto-flagging/killing replies that credit the LLM explicitly? Otherwise valid discussions will continue to get hosed.
I can assure you, the percentage of people who can do what they do when it comes to crafting terms, and related sets of terms, for nuanced and novel ideas is very very small.
It happens this is something I do nearly every day.
Models respond to the level of dialogue you have with them. Engage with an informed perspective on terminological issues and they respond with deep perspectives.
I am routinely baffled at the things people say models can't do, that they do effortlessly. Interaction and having some skill to contribute helps here.
What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?
No matter how much compute time it's given to combine training samples with each other and run through a validation engine it will still be missing some chunk of the "long tail". To make progress in the long tail it would need to have understanding, and not just a mimicry of understanding. Unless that happens they will always be dependent on the humans that they are mimicking in order to improve.
One qualitative distinction that remains for the time being is that humans care about things while AIs do not. Human drive and motivation is needed to have AI perform tasks.
Of course, this distinction isn’t set in stone.
Well, there's the fact that it hasn't yet improved since what we had 3 years ago. That doesn't really bode well for the prospect of future improvement, though it's not technically impossible.
“ For decades, it was widely believed that this rate was essentially the best possible, and no construction could improve significantly over the square grid. In technical terms, Erdős conjectured an upper bound of n 1 + o ( 1 ) n 1+o(1) in which the additional o ( 1 ) o(1) indicates a term tending to 0 0 with n n.
Our new result disproves this conjecture. More precisely, for infinitely many values of n n, the proof constructs configurations of n n points with at least n 1 + δ n 1+δ unit-distance pairs, for some fixed exponent δ > 0 δ>0. (The original AI proof does not give an explicit δ δ, but a forthcoming refinement due to Princeton mathematics professor Will Sawin has shown one can take δ = 0.014 δ=0.014.)”