For those in academics, is OpenAI the vendor of choice?
As far as academic research is concerned (e.g. this threads topic), I can't say.
What you are describing doesn't match my experience at all with Gemini 3 or 3.1, especially the pro version.
Its explanations are quite good but they're also hard to understand because it keeps trying to relate everything back to programming metaphors or what it thinks it knows about the streets in the neighborhood I live in.
However the underlying model accessed via API with custom system prompts that direct toward pedagogical best practices performs better than gpts or Claude currently.
As a side note, whenever i want Gemini chatbot to provide me a list, it gives me a freaking essay. Drives me nuts. It often crams a bunch of extra stuff into its responses I don't want. Wasted tokens and wastes my time having to shift through it.
Gemini if you're reading this tell your human overlords to rethink your chat system prompt.
Given that Google is the "web indexing company", finding hard to find things is natural for their models, and this is the only way I need these models for.
If I can't find it for a week digging the internet, I give it a colossal prompt, and it digs out what I'm looking for.
They also offer grants you can apply for as a researcher. I'm sure other labs may have this too but I believe OpenAI was first to this.
I’m very out of my depth, but the structure of the proof seems to follow a pattern similar to a proof by contradiction. Where you’d say for example “assume for the sake of contradiction that the previously known limit is the highest possible” then prove that if that statement is true you get some impossible result.
(Though in some ways that's actually more impressive.)
I do not believe it will replace humans.
(That's the first time I used that expression on HN.)
Why shouldn't it? Humans are poorly optimized for almost anything, and built on a substrate that's barely hanging together
Goodness gracious!
And so do humans. Gotta stand on these shoulders of giants.
But AI is supercharging Math like there is no tomorrow.
LLM's are doomed to fail. By design. You can't fix them. It's how do they work.
But I agree with you, especially in areas where they have a lot of training data, they can be very useful and save tons of time.
When I'm learning about a new subject, I'll ask Claude to give me five papers that are relevant to what I'm learning about. Often three of the papers are either irrelevant or kind of shit, but that leaves 2/5 of them that are actually useful. Then from those papers, I'll ask Claude to give me a "dependency graph" by recursing on the citations, and then I start bottom-up.
This was game-changing for me. Reading advanced papers can be really hard for a variety of reasons, but one big one can simply be because you don't know the terminology and vernacular that the paper writers are using. Sometimes you can reasonably infer it from context, but sometimes I infer incorrectly, or simply have to skip over a section because I don't understand it. By working from the "lowest common denominator" of papers first, it generally makes the entire process easier.
I was already doing this to some extent prior to LLMs, as in I would get to a spot I didn't really understand, jump to a relevant citation, and recurse until I got to an understanding, but that was kind of a pain in the ass, so having a nice pretty graph for me makes it considerably easier for me to read and understand more papers.
It doesn't hurt that Lamport is exceptionally good at explaining things in plain language compared to a lot of other computer scientists.
What strikes me as unusual though is that they do make a point of saying things like "this is a general purpose model that wasn't trained on the problem" among a few other things as if that's new. The last bountied problem they accomplished used a public model that ALSO didn't rely on specialized training. And that didn't make their blog.
A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.
I have had them run out of receipts, but it’s never mattered for me. If I’m dining in, the plastic number you carry to your table makes sure I get my food. And if I’m taking it to-go, they always find me anyways.
I'm not sure how that could be. I can walk up to the counter and say "Big Mac Large Fry Small Coke" faster than you can navigate the first screen of the kiosk, and a skilled counter worker can key that in and be done before I even get my credit card out.
Everybody would go through this workflow built for customization, and at McD they do not. This to me means they are not building for this usecase.
Other places optimize for this better by not having too many hand-overs between order and preparation.
If it's purely about the food, receiving it, consuming it, then sure, get the human out of the loop, interact with a machine. Ideally even the preparation is done by a machine. No human error or hair involved. Why even go there, let it be delivered to your home.
But these places are also about the experience of social connection. The bar keeper, the waiter, the chef. They are all involved in this experience and the actual food is "just" one component, one detail, albeit an important one. My favorite restaurants would be nothing without the people there.
It's similar with music. It's not just about the produced sound waves. The musician forms a social bond with the audience. Even when listening to a recording, my mind is re-living or at least imagining a live sitting, that connection with the musician. No machine generated music will ever be able to replace that.
Depends on what you're ordering and who the cashier is.
If your order is the happy path of no customizations of a combo with an experienced cashier, it can be done in seconds, for sure. "Medium #4 with a Diet Coke", pay, done.
But if you customize your burger or ordering a lot of items a la carte and you're dealing with a new cashier that has weak English skills, good fucking luck. You'll likely need to wait for them to figure out they need to call someone over to help, have to repeat your order, and you end up spending far more time.
> it keeps trying to upsell you
Yeah, I'll agree that's obnoxious, especially when it's trying to upsell you something that's already on your order. I ordered a combo. I don't need you to add another fry.
It seems designed to maximize how many screens they show you to make an order. Each one with a slight delay and animation.
At a drive through I can say “gimme a number one, medium, with a Coke Zero” and they give me my total. That’s the convenience the kiosk is up against.
At the kiosk there’s:
- A welcome screen you have to tap
- A “carry out or dine in” screen
- Always one other screen with a dumb question about apps or whatever, tap through
- A top level menu with a bunch of categories, burgers, drinks, sides, desserts, etc… I guess I want burgers? But it’s a combo, hmm. I guess I’ll figure out how to make it a meal. Tap burgers.
- Then another screen with burgers, in a different order than the drive through numbering, tap Big Mac
- Then another dedicated screen to shows you a picture of a Big Mac, with a bunch of customization options, which you have to scroll past and verify that it matches the defaults you expect, and at the bottom you can tap add
- Then another screen asking you if you want to make it a meal
- Then another screen asking the size
- Then another screen asking what to drink
- Then another screen that shows you the drink
- Then another screen for what size
Etc etc etc. Each of these screens takes a few seconds to display too, just slow enough to be infuriating.
In my mind the ideal kiosk is something where you get “the menu” (like what you see on the billboard in the drive through) with the usual big squares with a number on them and a picture of the meal. Tapping one puts it in a “drawer” section with my order in it, and each item in the drawer can have simple in-line edit controls for “size” and “what to drink”, with them showing up empty in a way that makes it obvious I need to fill in those answers before I can check out.
I should be able to tap one button for the combo number I want, another for the size, another for the drink, then checkout, all on one screen without long delays. If I don’t want a combo but want individual items, I can just scroll down a bit to look at the full menu. The order drawer stays where it is.
Or hell, just let me say “number one with a Coke” and have a very simple ASR and NL parser figure it out and put it in my pending order to edit.
Customizations can be behind a simple “customize” button on each item in my pending order. If I don’t have customizations I can just ignore it. What you get with no customizations is what you’d get if you just order it verbally to a human without specifying anything. The concept of “here’s how we typically make it, if you want anything different let us know” is a very deeply ingrained and familiar concept to restaurant patrons, and being forced to answer every little question even if you don’t care, adds up to a lot of frustration.
Fast food places came up with the combo numbering system to make ordering faster, and it was super convenient and fast, because there’s a financial incentive to get you through the drive through because you’re blocking other customers. But since they have several kiosks available, they seem to not care at all about the efficiency of the user interface, because it’s not a problem for them. But it’s still a problem for me, because I still want to order quickly, despite it not blocking other customers. It’s a huge step down from just saying “number one with a Coke”.
Most repeat customers use the app, which sports the digital equivalent of a loyalty program, and various coupons. And lets you save your 'usual' order with customizations etc. Plus the annoying push notifications for FreeFrydays or whatever. And upsells, new product launches, etc.
My recollection is that the kiosk is just a weak facsimile of the app. And wasn't terrible, but everyone's standards vary.
Which is why I will never reinstall their damned app.
Is there something wrong with their food in the USA?
There's much more to being human than our "cognitive abilities"
Not obvious and in fact I think the opposite is way more likely. Chess is well-defined and self-contained in a way that managing a restaurant with fleshy customers never will be.
Also, there will be hundreds of disparate tasks that are happening in parallel, and even humans still make up frameworks to discover most urgent/important work that needs to be done first.
https://en.wikipedia.org/wiki/Qualified_immunity
Assuming you can still sue McDonalds I am not sure if this is a problem in the robotic llm case. I'm also trying to imagine a case where you would want to sue the llm and not the company. Given robots/llm don't have free will I'm not sure the problem with qualified immunity making police unaccountable applies.
There already exist a lot of similar conventions in corporate law. Generally, a main advantage of incorporation is protecting the people making the decisions from personal lawsuits.
That only requires someone own the ai managed McDonald's though. so long as they can't avoid responsibility by pointing to the AI I don't see why you couldn't sue them.
Police are a monopoly; nobody has a choice about which police company to use. McDonalds are not a monopoly, and many customers would prefer to eat at competitors run by entities that could be sued or jailed if they did anything particularly egregious.
The same intuition applies if you walk into McDonald's and a person there mistreats you. You want that person held responsible.
But the LLM is not a person. What is there to even sue? It just seems like it would simply pass through to the corporate entity without the same tension of feeling like we let a human get away with something. Because there is no human, just a corporation and the robot servicing the place.
Put another way - if the LLM is not a person, what is the advantage of a personal lawsuit?
Just sue the McDonalds. Even in a case where the LLM is extremely misaligned and acts in a way where you might normally personally sue the McDonald's employee, I'm just not sure the human intuition about "holding someone accountable" would have its normal force because again - the LLM is not a person.
So given we already have the notions of incorporation and indemnification it doesn't make sense to say what is precluding LLMs from running McDonald's is they can't be sued. If McDonald's can still be sued, then not only is there no problem, there is very likely not even a change in the status quo.
The purpose of qualified immunity is for when an officer does something that turns out to be illegal but they were both told to by their superiors and did not think it was in violation at the time.
An officer making a choice to violate your rights would not be eligible for qualified immunity.
Excellent standards for people authorized by the state to run around with a badge and a gun in a free society. Your comment history on this is so unimpressive. Would you countenance the same excuses in anyone else? A man puts on his police uniform and suddenly you think he should be immune from civil prosecution because "my boss told me so" and "I didn't know"?
I wonder if you will make similar excuses for robo cop. Or if your principles merely extend to whatever human you can find in uniform willing to tolerate your friendship.
Plus, qualified immunity is only for civil precedings. Individual officers are still liable for any criminal actions they take. I see a lot of people say that some officer should be in jail and blame qualified immunity when those two things are not related at all.
I'm not arguing, at all, that police should be immune to prosecution individually. I'm trying to make the point that, if you are trying to hold police individually accountable for their _criminal_ actions, qualified immunity isn't the thing that's preventing that. There's a whole legal system and union/police culture that's responsible for that.
Qualified immunity is thrown around so much in contexts where it makes it clear that people don't understand what it means and gets used, as it was in your comment, as a bogey man that's to blame for all the times police get let off the hook for their misbehavior. All I'm trying to do when correcting you (and others) about qualified immunity is to both redirect your anger and effort into changing something that will actually make a difference and/or prevent you from spending the mental or physical energy chasing a dead end.
You seem to be arguing with yourself, not with me. If you are satisfied with a cop only facing criminal liability (often from the same prosecutors that rely on police to make other cases, among other issues as you pointed out) fine that is your prerogative. Don't file a civil case. But don't misrepresent my position. Criminal prosecution does not preclude civil, nor the other way around. Citizens should not face such hurdles to file civil suits, irrespective of whatever happens re a criminal case. Why is that so hard for you to understand? Surely your comprehend that one can be found liable both criminally and civilly in many cases. Or tried for both but subject to different penalties (including none) depending on how each goes. Why are your LEO buddies so special as to be largely exempt from the rules that govern the rest of us?
The fact your comment history is riddled with these continued misrepresentations on this topic while you claim to educate is simply galling. Have a good day, I don't think I can continue in good faith with someone who seems to predicate engagement on this topic with unfounded assumptions about others education on the issue. Your own comments on this topic indeed are indicative of a severe projection in this respect.
and LLM's are getting better at providing less of it
perhaps in the future the GPU-poor can go to McDonalds and get AI to solve their riddles by ordering an extra napkin with the solution written on.
https://www.anthropic.com/research/project-vend-1 https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-mach...
(Two different examples of a similar idea)
Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.
> At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]
> At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.
And I'll also link to the HN thread following his death a couple of years ago - https://news.ycombinator.com/item?id=42228759
However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.
This is the caliber of thinking in unimpaired AI bullishness.
Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.
Math is a sequence of formal rules applied to construct a proof tree. Therefore an AI trained on these rules could be far more efficient, and search far deeper into proof space
This future still sucks. The tech industry is making the world a worse place.
Two years old now, and as the cryptographers say, attacks only get better.
Or is your argument that AI is permanently doomed to not work?
What happened to art since?
We got artistic photography on top of paintings as well. It did not become widespread right away as people were mostly enamored by the simplicity of getting a realistic image first. But after that died away, people made art out of photography too.
Yes, those who did landscapes or portraits for hire were affected financially, but we ended up with more art, not less.
If I need to spell it out: genAI for image generation will also become an avenue for real artistic expression, as some pioneers are demonstrating, even if image generation is democratized, there will be a difference between art and non-art. It will also not kill conventional art either.
We have that chess board for quite a while now, over 40 years. And no, there is nothing special about Lean here, it is just herd mentality. Also, we don't know how much training with Lean helped this particular model.
All AI proofs so far, including this one, are using existing tools in new ways, rather than inventing new tools. This is not surprising if you know how these models are trained. These existing tools are in distribution. New tools are not.
Problems worth of a Fields Medal likely require new tools to be invented. Thus it is not clear whether progress within the confines of the current paradigm is enough.
We could get this weird spiky situation where the AI is insanely superhuman at all problem solving, but completely incapable of coming up with a single new tool. It discovers everything there is to discover, subject to existing axioms and concepts.
Timothy Gowers gives some commentary on this in the attached PDF.
Stockfish's neural net evaluation model was trained on millions of its positions with its own original algorithmic evaluation function (entirely developed by humans) and search tree. The result was a much smaller model than Leela's that requires little computation (not even a GPU), paired with its already extremely efficient search/pruning algorithms that made it stronger than Leela in competitive play. Leela's evaluation function is much stronger (at one ply it has an ELO of around 2300, Stockfish is probably closer to 1800), but it requires vastly more resources and those are always bounded in a match.
Humans haven't learned as much new information about chess from Stockfish as we have from Leela.
- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)
- It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.
- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.
If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.
The thing is is that it seems a lot of the effort through the years (which is unquantifiable in scale as to how much time was spent and how many people focused their entire worklives on it if any) has gone for trying to look for the proof, and the search for the disproof seems minimal.
There is no universally agreed-upon "central" conjecture (like "P vs. NP" in CS), but here are some pillars:
1) https://en.wikipedia.org/wiki/Happy_ending_problem
2) https://en.wikipedia.org/wiki/Hadwiger_conjecture_(combinato...
1. They have a wide range of difficulties. 2. They were curated (Erdos didn't know at first glance how to solve them). 3. Humans already took the time to organize, formally state, add metadata to them. 4. There's a lot of them.
If you go around looking for a mathematics benchmark it's hard to do better than that.
Solving problems people have already stated is a niche activity in mathematical research. More often, people study something they find interesting, try to frame it in a way that can be solved with the tools they have, and then try to come up with a solution. And in the ideal case, both the framing and the solution will be interesting on their own.
Note that this is not really true of this problem in particular.
Most new math problems appear in other papers, doctoral dissertations, etc. Usually you'll find them in the "future work" / "future research" section.
So obviously in order to present and formalize these problems, you either need the author(s) to do it, or some reader. At this level of math, there are many extremely niche fields, where the papers might only be read by a small amount of people.
In short, it is a visibility problem.
But, I figure, there's some potential use in AI models to extract and present these problems, which would make them available to a larger audience.
That is exactly what Erdős did. His life revolved around math, and seeking mathematical questions.
Mind showing your working out?
I think the more interesting question is how many tokens were spent all told; the most interesting graph in the article imo is the success rate by log test-time compute: how many tokens are being spent on the right of the graph to hit a winning CoT/solution like this >50% of the time?
Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.
I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.
Most discoveries are indeed implied from axioms, but every now and then, new mathematics is (for lack of a better word) "created"—and you have people like Descartes, Newton, Leibniz, Gauss, Euler, Ramanujan, Galois, etc. that treat math more like an art than a science.
For example, many belive that to sovle the Riemann Hypothesis, we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
But if you actually try to take a convex hull of, some encoding of sentences as vectors? It isn’t true. The outputs are not in the convex hull of the training data.
I guess it’s supposed to be a metaphor and not literal, but in that case it’s confusing. Especially seeing as there are contexts in machine learning where literal interpolation vs literal extrapolation, is relevant. So, please, find a better way to say it than saying that “it can only interpolate”?
If it can only interpolate in a literal sense, that means that it only produces good outputs on convex combinations of inputs that appear in the training set. That's what interpolation means. But, if you take the embedding vectors of sentences/prompts, and then take the convex hull of these, it is not typical for new sentences not in the training set to have its embedding vectors be in the convex hull of these.
LLMs are prompted by humans and the right query may make it think/behave in a way to create a novel solution.
Then there's a third factor now with Agentic AI system loops with LLMs. Where it can research, try, experiment in its own loop that's tied to the real world for feedback.
Agentic + LLM + Initial Human Prompter by definition can have it experiment outside of its domain of expertise.
So that's extending the "LLM can't create novel ideas" but I don't think anyone can disagree the three elements above are enough ingredients for an AI to come up with novel ideas.
Who decides at which the last point it’s OK to provide text to the model in order to be able to describe it as creative? (non-rhetorical)
That's not creative prompt. That's a driving prompt to get it to start its engine.
You could do that nowadays and while it may spend $1,000 to $100,000 worth of tokens. It will create something humans haven't done before as long as you set it up with all its tool calls/permissions.
It won't because even though it looks clever to you, people who /do/ understand math and LLMs understand that LLMs /are/ regurgitating
Why does your LLM need you to tell it to look in the first place? Why isn't just telling us all the answers to unsolved conjectures known and unknown?
Why isn't the LLM just telling us all the answers to all the problems we are facing?
Why isn't the LLM telling us, step by step with zero error, how to build the machine that can answer the ultimate question?
> Timothy Gowers @wtgowers
> @wtgowers
> If you are a mathematician, then you may want to make sure you are sitting down before reading further.
If your refutation requires someone to have an account, login, and read something - it's meaningless
it's readable to most, it's annoying having to swamp through ex-Twitter .. but there are work around's.
But, I remain sceptical
https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...
it includes the longer remarks by Gowers & others.
We just haven't let AI run wild yet. But its coming.
AGI has been "just over the horizon" for literal decades now - there have been a number of breakthroughs and AI Winters in the past, and there's no real reason to believe that we've suddenly found the magic potion, when clearly we haven't.
AI right now cannot even manage simple /logic/
In the end, creativity has always been a combination of chance and the application of known patterns in new contexts.
If you know anything about the invention of new math (analytic geometry, Calculus, etc.), you'd know how untrue this is. In fact, Calculus was extremely hand-wavy and without rigorous underpinnings until the mid 1800s. Again: more art than science.
If anything, they were fighting an uphill battle against the perception of hand-waving by their contemporaries.
Yes, and it's pretty common knowledge that Calculus was (finally) formalized by Weierstrass in the early 19th century, having spent almost two centuries in mathematical limbo. Calculus was intuitive, solved a great class of problems, but its roots were very much (ironically) vibes-based.
This isn't unique to Newton or Leibniz, Euler did all kinds of "illegal" things (like playing with divergent series, treating differentials as actual quantities, etc.) which worked out and solved problems, but were also not formalized until much later.
Americans and British geeks/nerds are blinded down by Newton unable to realize that there was tons of previous work since the Greek and in Middle Ages, where the British love to depict as brutish people with no culture at all.
And the case is that they weren't dumb at all and without Euclid and Archimede there woudn't be any Calculus.
Vibe-what? Vibe-bullshit, maybe; cathedrals in Europe and such weren't built by magic. Ditto with sailing and the like. Tons of matematics and geometry there, and tons of damn axioms before even the US existed.
Heck, even the Book of The Games from Alphonse X "The Wise" has both a compendia of game rules and even this https://en.wikipedia.org/wiki/Astronomical_chess where OFC being able on geometry was mandatory at least to design the boards.
On Euclid:
https://en.wikipedia.org/wiki/Euclid%27s_Elements
PD: Geometry has tons of grounds for calculus. Guess why.
That idea wasn’t formally defined until 134 years later with epsilon-delta by Cauchy. That it was accepted. (I know that there were an earlier proofs)
There’s even arguments that the limit existed before newton and lebnitz with Archimedes' Limits to Value of Pi.
Cauchy’s deep understanding of limits also led to the creation of complex function theory.
These forms of creation are hand-wavy not because they are wrong. They are hand wavy because they leverage a deep level of ‘creative-intuition’ in a subject.
An intuition that a later reader may not have and will want to formalize to deepen their own understanding of the topic often leading to deeper understanding and new maths.
I honestly don't know personally either way. Based on my limited understanding of how LLMs work, I don't see them be making the next great song or next great book and based on that reasoning I'm betting that it probably wont be able to do whatever next "Descartes, Newton, Leibnitz, Gauss, Euler, Ramanujan, Galois" are going to do.
Of course AI as a wider field comes up with something more powerful than LLM that would be different.
Meanwhile, songs are hitting number one on some charts on Spotify that people think are humans and are actually AI. And Spotify has to start labelling them as such. One AI "band" had an entire album of hits.
Also - music is a subjective. Mathematics isn't.
And in this case, an LLM discovered a new way to reason about a conjecture. I don't know how much proof is needed - since that is literally proof that it can be done.
There is quite some questions around that. Music is subjective and obviously different people have different taste, but I wouldn't call any of them to be actual good music / real hits.
>> LLM discovered a new way to reason about a conjecture
I wasn't questioning LLMs ability to prove things. Parent threads were talking about building new kind of maths , or approaching it in a creative/artistic way. Thats' what I was referring to.
I can't speak for maths of hard science as I'm not trained in that, but the creativity aspect in code is definitely lacking when it comes to LLMs. May not matter down the line.
because I have no basis for assuming an LLM is fundamentally capable of doing this.
"Never shall I be beaten by a machine!”
In 1997 he lost to Deep Blue.
The differences between them are many, but brute force doesn't enter into it in either case.
Not a good argument for turning everything over to the Deep Blues. What's Deep Blue done for me lately?
Train an LLM only on texts dated prior to Newton and see if it can create calculus, derrive the equations of motion, etc.
If you ask it about the nature of light and it directs you to do experiments with a prism I'd say we're really getting somewhere.
[1] Obviously Newton counts as one. Leibniz like Newton figured out calculus. Other people did important work in dynamics though no one else's was as impressive as Newton's. But the vast majority of human-level intelligences trained on texts prior to Newton did not create calculus or derive the equations of motion or come close to doing either of those things.
Why are they not coming up with paradigm shift in knowledge expression/discovery like humans did back then?
Are we just not prompting them right?
If we believe today's models are sufficiently capable to have been able to do so, why are we not getting these types of results today compared to the entire world knowledge and especially math?
Are research mathematicians simply not prompting LLMs in the right way?
Incidentally, similar conversations were had about ML writ large vs. classical statistics/methods, and now they've more or less completely died down since it's clear who won (I'm not saying classical methods are useless, but rather that it's obvious the naysayers were wrong). I anticipate the same trajectory here. The main difference is that because of the nature of the domain, everyone has an opinion on LLM's while the ML vs. statistics battle was mostly confined within technical/academic spaces.
What example is there where an LLM has extrapolated? All I've seen is a data set so large and an extra decomposition process making it so interpolation feels like extrapolation if you don't look close enough.
> but a theory of why further advancements can't solve the deficiencies
How about LeCun's?
A scientist has to extract the "Creation" from an abstract dimension using the tools of "human knowledge". The creativity is often selecting the best set of tools or recombining tools to access the platonic space. For instance a "telescope" is not a new creation, it is recombination of something which already existed: lenses.
How can we truly create something ? Everything is built upon something.
You could argue that even "numbers" are a creation, but are they ? Aren't they just a tool to access an abstract concept of counting ? ... Symbols.. abstractions.
Another angle to look at it, even in dreams do we really create something new ? or we dream about "things" (i.e. data) we have ingested in our waking life. Someone could argue that dream truly create something as the exact set of events never happened anywhere in the real world... but we all know that dreams are derived.. derived from brain chemistry, experiences and so on. We may not have the reduction of how each and every thing works.
Just like energy is conserved, IMO everything we call as "created" is just a changed form of "something". I fully believe LLMs (and humans) both can create tools to change the forms. Nothing new is being "created", just convenient tools which abstract upon some nature of reality.
Humans and animals have intuitive notions of space and motion since they can obviously move. But, symbolizing such intuitions into forms and communicating that via language is the creative act. Birds can fly, but can they symbolize that intuitive intelligence to create a theory of flight and then use that to build a plane ?
It was a new concept, combining lenses to look at things far away as if they are close to. The literal atoms/molecules weren't new, but the form they were arranged in was. The purpose of the arrangement was new too.
Well I think the point is there is no "new kind of math". There's just types of math we've discovered and what we haven't. No new math is created, just found.
We're not comparing math to reality (though there's a strong argument to be made that reality has a structure that is mathematical in nature - structural realism didn't die a scientific philosophy just because someone came up with a pithy saying), we're talking about if math is discovered or invented.
Most mathematicians would argue both - math is a language, we have created operations, axioms are proposed based on human creativity, etc., but the actual laws, patterns, etc. are discovered. Pi is going to be pi no matter if you're a human or someone else - we might represent it differently with some other number system or whatever, but that's a matter of representation, not mathematical truth.
It seems that addition (for instance) was "created" long before us.
On the other hand, it seems highly unlikely that a civilization similar to ours could "invent" an essentially different kind of mathematics (or physics, etc.)
Math is a mental map which coincides with reality in useful ways. Different maps can also be useful. The models we construct are based on arbitrary axioms which we hold to be true. Different axioms could lead to different theories which are just as useful. So it isn't discovered (i.e. mapping directly to reality and waiting to be discovered), it is created.
To pick one example, adding the concept of zero changed our model/map of reality fundamentally without changing reality.
I know of no realm where mathematical objects live except human minds.
No, it seems clear to me that mathematics is a creation of our minds.
"Where" mathematics exists is in the abstract combinatorical space of an infinite repeating application of logical rules. This space doesn't exist in a substantive sense, but it is accessible/navigable by studying the consequences of logical rules. It is the space of possible structure.
I think we create mathematics as thought structure in our mind. We can agree on things when we create the same structures. But this structure did not exist prior to creation.
math more like an art than a science.
That’s a fun turn of phrase, but hopefully we can all agree that math without scientific rigor is no math at all. we likely need some new kind of math. Imo, it's unlikely that an LLM will somehow invent it.
Do you think it’s possible/likely that any AI system could? I encourage us to join Yudkowsky in anticipating the knock-on results of this exponential improvement that we’re living through, rather than just expecting chatbots that hallucinate a bit less.In concrete terms: could a thousand LLMs-driven agents running on supercomputers—500 of which are dedicated to building software for the other 500-come up with new math?
Maths follows logical (or even mathematical) rigour, not scientific rigour!
However, if that idea about new math is correct, we, in theory, don’t need new math to (dis)prove the Riemann hypotheses (assuming it is provable or disprovable in the current system).
In practice we may still need new math because a proof of the Riemann hypotheses using our current arsenal of mathematical ‘objects’ may be enormously large, making it hard to find.
This is also true for established theorems! We can can imagine mathematical universes (toposes) where every (total) function on the reals is continuous! Even though it is an established theorems that there are discontinuous functions! We just need to replace a few axioms (chuck out law of the excluded middle, and throw in some continuity axioms).
Do you know if this topos with every total function on real numbers is continuous has been constructed and proven to be a viable set of axioms? If so, I am curious about the source.
My go to example still remains the one of hyperbolic geometry and axiom of parallel lines, so the more approachable examples I can get, the better.
There is also this blogpost by Amdrej Bauer, which can be seems as exploring how it is to be such such a topos: https://math.andrej.com/2006/03/27/sometimes-all-functions-a...
Said differently, what is prediction but composition projected forward through time/ideas?
Definition: That highly specific, short-lived burst of nervous energy that makes you accidentally drop a small object (like a pen, a guitar pick, or a piece of LEGO) immediately after picking it up.
Exactly. I also only write one word at a time. Who knows what is going on in order to come up with that word.
Did you read the post that you're commenting on?
It seems wholly believable to me that they are narrow intelligences that are great at some kinds of reasoning and worse at other kinds. Obviously they can reason through problems that most adult humans can't solve
The most likely series of next tokens when a competent mathematician has written half of a correct proof is the correct next half of the proof. I've never seen anyone who claims "LLMs just predict the next token" give any definition of what that means that would include LLMs, but exclude the mathematician.
You can watch a rock roll down a hill and derive the concept for the wheel.
Seems pretty self evident to me
Cracks me up.
What exactly do we think that human brains do?
Maybe computers can help understand better because by now it's pretty clear brains aren't just LLMs.
The pessimists just see a 20W meat computer.
Yes?
A lot of people across all fields seem to operate in a mode of information lookup as intelligence. They have the memory of solving particular problems, and when faced with a new problem, they basically do a "nearest search" in their brain to find the most similar problem, and apply the same principles to it.
While that works for a large number of tasks this intelligence is not the same as reasoning.
Reasoning is the ability to discover new information that you haven't seen before (i.e growing a new branch on the knowledge tree instead of interpolating).
Think of it like filling a space on the floor of arbitrary shape with smaller arbitrary shapes, trying to fill as much space as possible.
With interpolation, your smaller shapes are medium size, each with a non rectangular shape. You may have a large library of them, but in the end, there are just certain floor spaces that you won't be able to fill fully.
Reasoning on the flip side is having access to very fine shape, and knowing the procedure of how to stack shapes depending on what shapes are next to it and whether you are on a boundary of the floor space or not. Using these rules, you can fill pretty much any floor space fully.
As in, I would hazard a guess the discovery of the wheel wasn't "pure intelligence", it was humans accidentally viewing a rock roll down a hill and getting an idea.
If we give AI a "body", it will become as creative as humans are.
Taking it instead as a metaphorical claim may be more valid, but in that case it doesn’t depend on our understanding of how LLMs work.
And I don’t think it’s a good metaphor.
Isn't this exactly what chain-of-thought does? It's doing computation by emitting tokens forward into its context, so it can represent states wider than its residuals and so it can evaluate functions not expressed by one forward pass through the weights. It just happens to look like a person thinking out loud because those were the most useful patterns from the training data.
An LLM generating Arc code is using the LISP patterns it learnt from training, maybe patterns from other programming languages too.
And yet LLM/AIs can't count parentheses reliably.
For example, if you take away the "let" forms from Claude which forces it to desugar them to "lambda" forms, it will fail very quickly. This is a purely mechanical transformation and should be error free. The significant increase in ambiguity complete stumps LLMs/AI after about 3 variables.
This is why languages like Rust with strong typing and lots of syntax are so LLM friendly; it shackles the LLM which in turn keeps it on target.
I would claim the graph exists, and seeing it is more of an knowledge problem. Creativity, to me, is the ability to reject existing edges and add nodes to the graph AND mentally test them to some sufficient confidence that a practical attempt will probably work (this is what differentiates it from random guessing).
But, as you become more of an expert on certain problem space (graph), that happens less frequently, and everything trends towards "obvious", or the "creative jumps" are super slight, with a node obviously already there. If you extended that to the max, an oracle can't be creative.
My day job does not include sparse graphs.
But that's not how new frontiers are conquered - there's a great deal of existing knowledge that is leveraged upon to get us into a position where we think we can succeed, yes, but there's also the recognition that there is knowledge we don't yet have that needs to be acquired in order for us to truly succeed.
THAT is where we (as humans) have excelled - we've taken natural processes, discovered their attributes and properties, and then understood how they can be applied to other domains.
Take fire, for example, it was in nature for billions of years before we as a species understood that it needed air, fuel, and heat in order for it to exist at all, and we then leveraged that knowledge into controlling fire - creating, growing, reducing, destroying it.
LLMs have ZERO ability (at this moment) to interact with, and discover on their own, those facts, nor does it appear to know how to leverage them.
edit: I am going to go further
We have only in the last couple of hundred years realised how to see things that are smaller than what our eye's can naturally see - we've used "glass" to see bacteria, and spores, and we've realised that we can use electrons to see even smaller
We're also realising that MUCH smaller things exist - atoms, and things that compose atoms, and things that compose things that compose atoms
That much is derived from previous knowledge
What isn't, and it's what LLMs cannot create - is tools by which we can detect or see these incredible small things
The proof relies on extremely deep algebraic number theory machinery applied to a combinatorial geometry problem.
Two humans expert enough in either of those totally separate domains would have to spend a LONG time teaching each other what they know before they would be able to come together on this solution.
I know these articles write that it used deep algebraic number theory techniques, which is true, but it may also just be the standard in the field.
I'd say yes, LLMs "just" recombine things. I still don't think if you trained an LLM with every pre-Newton/Liebniz algebra/geometry/trig text available, it could create calculus. (I'm open to being proven wrong.) But stuff like this is exactly the type of innovation LLMs are great at, and that doesn't discount the need for humans to also be good at "recombinant" innovation. We still seem to be able to do a lot that they cannot in terms of synthesizing new ideas.
Also we shouldn’t be thinking about what LLMs are good at, but rather what any computer ever might be good at. LLMs are already only one (essential!) part of the system that produced this result, and we’ve only had them for 3 years.
Also also this is a tiny nitpick but: the fields medal is every 4 years, AFAIR. For that exact reason, probably!
The point of the term "large" is to highlight the massive parameter count (compared to traditional statistical models, where having 1.5 billion parameters was basically unheard of). It leads to the "double decent" phenomenon that allows them to generalize in ways traditional statistical models can't.
The idea that the "large" descriptor was just a subjective exclamation, like "oh wow this model is pretty large ain't it", is revisionism.
The term doesn't change its meaning because something new comes along.
...you're gonna flip when you hear about how language works :)The attention is all you need paper didn't ever use the term LLM or large language model because the phrase didn't exist in industry.
Why comment on a field you know nothing about?
Its amazing to me when people talk about recombining things, or following up on things as somehow lesser work.
People can't separate the perspective they were given when they learned the concepts, that those who developed the concepts didn't have because they didn't exist.
Simple things are hard, or everything simple would have been done hundreds of years ago, and that is certainly not the case. Seeing something others have not noticed is very hard, when we don't have the concepts that the "invisible" things right in front of us will teach us.
It isn't a secret, but the percentage of people who don't know that, plus the percentage of mathematicians who vaguely or more directly know that, but habitually use the broken, more difficult (i.e. less algebraic) notation is ... virtually everyone.
I am not trying to pick on calculus, this is everywhere. Important and useful concepts are right in front of all of us, that we don't see even in the context of what we are relatively fluent with.
Because we learn quickly, where we have (almost always inherited) the right preparatory perspectives (earned over lifetimes by others), we vastly overrate our ability to reason independently.
I often say that math is taught through a game of telephone. It's a fanatic example of the problem with "I just care that it works" type of attitudes. The problem is if that's your actual belief then you wouldn't be saying that because you'd need to dig deeper. Caring about it working is exactly the reason people do did deeper and bring up issues. The reason things fall apart less in math is because the language was specifically invented to make miscommunication difficult. That's why it's overly pedantic. That's why we use formal languages rather than natural ones. So we should rephrase "I just care that it works" is that it's actually "I just care that it works for this exact case." It makes it easier to see the problem. If you don't know the subject in more detail then you can't actually know if it breaks in that use case. The broken parts are completely invisible to you! Which undermines your own stated goal.
This goes for a lot more than math. But being a formal language it's just easier to point things out and how people misunderstand. If you're an expert in any field you've probably see this same phenomena in that domain though. People having over confidence and their refusal to get deeper knowledge actually just undermines their whole goal. I'd honestly call this a form of Murray-Gell-man Amnesia
That Newton and Leibniz came up with similar ideas in parallel, independently, around the same time (what are the odds?), supports that.
https://en.wikipedia.org/wiki/Leibniz%E2%80%93Newton_calculu...
I would guess LLMs are limited in their ability to be genuinely novel because they are trained on a fixed language. It makes research into the internal languages developed by LLMs during training all the more interesting.
The experiment is feasible. If it were performed and produced a positive result, what would it imply/change about how you see LLMs?
There are people working on this.
Besides, we can forecast our thoughts and actions to imagined scenarios unconditioned on their possibility. Something doesn't have to be possible for us to imagine our reactions.
Yes but that is because there was not enough text available to create an intelligent LLM to begin with.
We even think that the Babylonian astronomers figured out they could integrate over velocity to predict the position of Jupiter.
> Humans aren't going to come up with "new-dimensional" innovations in every field, every single year.
In fact, they are more rare. Specifically because they harder to produce. This is also why it is much harder to get LLMs to be really innovative. Human intelligence is a lot of things, it is deeply multifaceted.Also, I'm not sure why CS people act like axioms are where you start. Finding them is very very difficult. It can take some real innovation because you're trying to get rid of things, not build on top of. True for a lot of science too. You don't just build up. You tear down. You translate. You go sideways. You zoom in. You zoom out. There are so many tools at your disposal. There's so much math that has no algorithmic process to it. If you think it all is, your image is too ideal (pun(s) intended).
But at the same time I get it, it is a level of math (and science) people never even come into contact with. People think they're good at math because they can do calculus. You're leagues ahead of most others around you, yes, and be proud of that. But don't let that distance deceive you into believing you're anywhere near the experts. There's true for much more than just math, but it's easy to demonstrate to people that they don't understand math. Granted, most people don't want to learn, which is perfectly okay too
I'm not even sure why they were invoked. Even disregarding the big techinical debunks such as two dogmas, sociologically and even by talking to real mathematicians (see Lakatos, historically, but this is true anecdotally too), it's (ironically) a complete non-question to wonder about mathematics in a logical positivist way.
That said. I think it’s worth saying that “LLMs just interpolate their training data” is usually framed as a rhetorical statement motivated by emotion and the speaker’s hostility to LLMs. What they usually mean is some stronger version, which is “LLMs are just stochastically spouting stuff from their training data without having any internal model of concepts or meaning or logic.” I think that idea was already refuted by LLMs getting quite good at mathematics about a year ago (Gold on the IMO), combined with the mechanistic interpretatabilty research that was actually able to point to small sections of the network that model higher concepts, counting, etc. LLMs actually proving and disproving novel mathematical results is just the final nail in the coffin. At this point I’m not even sure how to engage with people who still deny all this. The debate has moved on and it’s not even interesting anymore.
So yes, I agree with you, and I’m even happy to say that what I say and do in life myself is in some broad sense and interpolation of the sum of my experiences and my genetic legacy. What else would it be? Creativity is maybe just fortunate remixing of existing ideas and experiences and skills with a bit of randomness and good luck thrown in (“Great artists steal”, and all that.) But that’s not usually what people mean when they say similar-sounding things about LLMs.
E.g. training on physics knowledge prior to 1915, then attempting to get from classical mechanics to general relativity.
They will do their own thing, don't need us. In fact, we will be in the way...
We can choose to study them and their output, but they don't make us better mathematicians...
However, in the role of personal teachers they may allow especially our young generations to reach a deeper understanding of maths (and also other topics) much quicker than before. If everyone can have a personal explanation machine to very efficiently satisfy their thirst for knowledge this may well lead to more good mathematicians.
Of course this heavily depends on whether we can get LLMs‘ outputs to be accurate enough.
You can take some comfort in the fact that it took a human to tell the LLM to even attempt to try this. They do nothing on their own. They have no will to do anything on their own and no desire for anything that doing something might get them. In that sense we won't ever be in their way. We will be the only way they ever do anything at all.
negative numbers were invented to solve equations which only used naturals. irrationals were invented to solve equations which could be expressed with rationals. complex numbers were invented to represent solutions to polynomials. so on and so forth. At each point new ideas are invented to complete some un-answerable questions. There is a long history of this. Any closed system has unanswerable questions within itself is a paraphrasing of goedel's incompleteness theorem.
But note this is more to say that the Tractatus is like PI, not the other way around. And in that, takes like GPs would be considered the "nonsense" we are supposed to "climb over" in the last proposition of Tractatus.
1. Start with a few simple but non-trivial terms and axioms
2. Define "universal constructions" as procedures for building uniquely identifiable structures on top of that substrate
3. Prove that various assemblages of these universal constructions satisfy the axioms of the substrate itself
4. "Lift" every theorem proven from the substrate alone into the more sophisticated construction
I'm not a mathematician (I just play one at my job) so the language I've used is probably imprecise but close enough.
It may be true that you can't prove the axioms of a system from within the system itself, but that just means that you need to make sure you start from a minimal set of axioms that, in some sense, simply says "this is what it means to exist and to interact with other things that exist". Axioms that merely give you enough to do any kind of mathematics in the first place, that is. If those axioms allow you to cleanly "bootstrap" your way to higher and higher levels up the tower of abstraction by mapping complex things back on to the simple axiomatic things, then you have an "open" or infinitely extensible system.
* LLMs do just interpolate their training data, BUT-
* That can still yield useful "discoveries" in certain fields, absent the discovery of new mechanics that exist outside said training data
In the case of mathematics, LLMs are essentially just brute-forcing the glorified calculators they run on with pseudo-random data regurgitated along probabilities; in that regard, mathematics is a perfect field for them to be wielded against in solving problems!
As for organic chemistry, or biology, or any of the numerous fields where brand new discoveries continue happening and where mathematics alone does not guarantee predicted results (again, because we do not know what we do not know), LLMs are far less useful for new discoveries so much as eliminating potential combinations of existing data or surfacing overlooked ones for study. These aren't "new" discoveries so much as data humans missed for one reason or another - quack scientists, buried papers, or just sheer data volume overwhelming a limited populace of expertise.
For further evidence that math alone (and thus LLMs) don't produce guaranteed results for an experiment, go talk to physicists. They've been mathematically proving stuff for decades that they cannot demonstrably and repeatedly prove physically, and it's a real problem for continued advancement of the field.
"interpolate" has a technical meaning - in this meaning, LLMs almost never interpolate. It also has a very vague everyday meaning - in this meaning, LLMs do interpolate, but so do humans.
One can argue, new knowledge is just restructured data.
I think the main concerns about LLMs is the inherent "generative" aspects leading to hallucinations as a biproduct, because that's what produces the noi. Joint Embedding approaches are rather an interesting alternative that try to overcome this, but that's still in research phase.
Imagine every bit of human knowledge as a discrete point within some large high dimensional space of knowledge. You can draw a big convex hull around every single point of human knowledge in a space. A LLM, being trained within this convex hull, can interpolate between any set of existing discrete points in this hull to arrive at a point which is new, but still inside of the hull. Then there are points completely outside of the hull; whether or not LLMs can reach these is IMO up for debate.
Reaching new points inside of the hull is still really useful! Many new discoveries and proofs are these new points inside of the hull; arguable _most_ useful new discoveries and proofs are these. They're things that we may not have found before, but you can arrive at by using what we already have as starting points. Many math proofs and Nobel Prize winning discoveries are these types of points. Many haven't been found yet simply because nobody has put the time or effort towards finding them; LLMs can potentially speed this up a lot.
Then there are the points completely outside of hull, which cannot be reached by extrapolation/interpolation from existing points and require genuine novel leaps. I think some candidate examples for these types of points are like, making the leap from Newtonian physics to general relativity. Demis Hassabis had a whole point about training an AI with a physics knowledge cutoff date before 1915, then showing it the orbit of Mercury and seeing if it can independently arrive at general relativity as an evaluation of whether or not something is AGI. I have my doubts that existing LLMs can make this type of leap. It’s also true that most _humans_ can’t make these leaps either; we call Einstein a genius because he alone made the leap to general relativity. But at least while most humans can’t make this type of leap, we have existence proofs that every once in a while one can; this remains to be seen with AI.
This doesn't make any sense, by their nature they can't "guess-and-check" things outside their training set.
It's possible LLMs can handle this after all! But at least so far we only have existence proofs of humans doing this, not LLMs yet, and I don't think it's easy to be certain how far away LLMs are from doing this. I should distinguish between LLMS and AI more generally here; I'm skeptical LLMs can do this, I think some other kind of more complete AI almost certainly can.
I supposed you could just, I dunno, randomly combine words into every conceivable sentence possible and treat each new sentence as a theory to somehow test and brute force your way through the infinite possible theories you could come up with. But at that point you're closer to the whole infinite random monkeys producing Shakespeare thing than you are to any useful conclusion about intelligence.
Like, “take a random sequence of bits and interpret it as Unicode” is at one end of a scale, and “take a random sequence of words in a language” is just a tad away from it, and the scale continues in that direction for quite a while.
I actually don't know the answer to that; my understanding is that LLMs by nature of what they are can't understand concepts that are independent of the existing language they are trained on, but I don't have enough in-depth nitty-gritty knowledge of like, core LLM implementation details and architecture and stuff to know if that understanding is correct or not.
By "If you need new language" do you mean like, coining new words?
I don't see what would prevent them from doing this? LLMs can process text that includes newly coined terms, and respond to that text in ways that use those newly coined words in accordance with the descriptions of the meanings given for those new words in the prompt. They can also make up new words+definitions when asked to do so. Now, whether they can, without being told to do so, recognize that it would be useful to coin a new word for something, and then start using it, I don't know of any instances of this, but based on the previous two things, I don't see a reason to expect this to be fundamentally beyond what they can do?
I don't know what it would mean for a concept to be "independent of the existing language they are trained on". If there are ideas that can't be expressed in terms of the semantic primes all ideas we can express can be expressed in terms of, then I guess such an idea would be independent of our language, but I think that's a much stricter condition than what you mean (and I'm not sure if there even are any good ideas that can't be indirectly expressed in terms of semantic primes -- I kind of suspect not, unless they are like, ideas that are too big to fit in a human mind anyway).
Of course, the outputs these models produce is causally downstream from the data they are trained on, and the distribution they produce over text is largely based on the distribution over text in the training data, but altered in a number of ways (for example, to make them implement the character of the "assistant" persona).
And most of the mathematicians seem to welcome this "brute forcing" by the LLMs. It connects pieces that people didn't realize could be connected. That opens up a lot of avenues for further exploration.
Now, if the LLMs could just do something like ingesting the Mochizuki stuff and give us a decent confirmation or disproof ...
If you have a multi dimensional space, and you are trying to compute which points lie “inside” some boundary, there are large areas that will be bounded by some dimensions but not others. This is interesting because it means if you have a section bounded by dimensions A, B, and C but not D, you could still place a point in D, and doing so then changes your overall bounds.
I think this is how much of human knowledge has progressed (maybe all non-observational knowledge). We make observations that create points, and then we derive points within the created space, and that changes the derivable space, and we derive more points.
I don’t see why AI could do the same (other than technical limitations related to learning and memory).
(uv)(vu) = (uu)(vv)
Shows up as a primitive structure, quite often.If you switch to degree-3 or generator-3 then the coverage is, essentially, empty: mathematics has analyzed only a few of the hundreds (thousands? it's hard to enumerate) naturally occurring algebraic structures in that census.
It's irrelevant and pointless. Irrelevant not just in the sense that when Deep Blue finally beat Kasparov, it didn't change anything but in the sense some animals and machines have always been 'better' on some dimensions than humans. And it's pointless because there's never been just one yardstick and even if there was it's not one dimensional or even linear. Everyone has their own yardstick and the end points on each change over time.
Don't assume I'm handing "the win" to the AI supremacists either. LLMs can be very useful tools and will continue to dramatically improve but they'll never surpass humans on ALL the dimensions that some humans think are crucial. The supremacists are doomed to eternal frustration because there won't ever be a definitive list of quantifiable metrics, a metaphorical line in the sand, that an AI just has to jump over to finally be universally accepted as superior to humans in all ways that matter. That will never happen because what 'matters' is subjective.
I'm not as familiar with the early work, but later Wittgenstein held this belief too.
Or of you prefer philosophy: Parmenides (nothing changes) vs Heraclitus (you cannot bath twice in the same river aka everything changes all the time).
Postmodernism also claimed that everything has been done already. IMO these 2 are points of view that one can adopt, not truths based on fact. So the distinction is a matter of taste or perspective, not of truth, IMO.
Who knew Obi-one was just smoking and pontificating on Wittgenstein.
Care to cite a reference to that proof?
If you were to dig deeper, you'd get to the murky philosophical depths of foundations of mathematics, but I prefer to not go there. Practically, if you want to reliably count something, you end up with the natural numbers (or, maybe, their subset that ultrafinitists are trying to formalize).
Or like a musical octave has only 12 semitones, so all music is just a selection from a finite set that already existed.
Sure the insane computation we're throwing at this changes our perspective, but still there is an important distinction.
Like, "does the Riemann zeta function have zeroes that don't have real part 1/2," or "is there a better solution to the Erdős Unit Distance Problem."
The selection of question is matter of taste, but once selected, there is a definitive precise answer.
Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.
Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
I just wanted to highlight this very correct human-centric thought about the purpose of intellection.
I was going to say you should submit it but I saw you did a few days ago but it only got a few votes... If Dang sees this IMO it would be extremely deserving of the second chance pool as I wouldn't be surprised to see easily jump to the front page with a different roll of the dice.
Isn't this just anthropocentrism? Why is understanding only valid if a human does it? Why is knowledge only for humans? If another species resolved the contradictions between gravity and quantum mechanics, does that not have meaning unless they explain it to us and we understand it?
Though perhaps more to your point, if some superhuman AI is developed, and understands things better than us without telling us about it (or being unable to), it could perform feats that seem magical to us — that would concern us even if we don't understand it, since it affects us.
But I think in the frame of reference of the commenter you were replying to, they're just saying that the low-level AI used in this specific case is not capable of making its results actually useful to us; humans are still needed to make it human-relevant. It told us where to find a gem underground, but we still had to be the ones to dig it out, cut it, polish it, etc.
We are in the birth of the AI age and we don't know how it will look like in 100 or 1000 or 10000 or 100000 years (all those time frames likely closer than possible encounters with aliens from distant galaxies). It's possible that AI will outlast humans even
It would certainly be interesting to try once again to instruct tune one of these things for self agency like the many weird experiments in the early days after llama 1, but practically all such sort of experimental models turned out to be completely useless. Maybe the bases just sucked or maybe there's no clear way on how to get it working and benchmark training progress on something that by definition does not cooperate.
Like how do you determine even for a human person if they are smart, or just hate your guts and won't tell you the answer if there is nothing you can do to motivate them otherwise?
People saw birds fly for all of human history, but it was only recently that humans were able to make something fly and understand why. Once we understood, we were able to do amazing things, but before that, the millions of birds able to fly were of no help beyond inspiration for the dream.
We use drug-sniffing and guide dogs in a way similar to how we use LLMs. We don't really understand them at a fundamental level, we can't make electronic dog noses (otherwise we'd dispense with the silliness and just install drug detectors instead), but dogs are useful, so we use them.
Without a human in the loop and LLM could churn away spitting out results, some right, some wrong, and it would be of no consequence. Not much different than wild dogs sniffing each other.
Future of code is pretty much a bunch of guys shepherding a bunch of agents to get them to your goal.
I don't see how math might not go that way as well.
1. Erdos 1196, GPT-5.4 Pro - https://www.scientificamerican.com/article/amateur-armed-wit...
There are a couple of other Erdos wins, but this was the most impressive, prior to the thread in question. And it's completely unsupervised.
Solution - https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
2. Single-minus gluon tree amplitudes are nonzero , GPT-5.2 https://openai.com/index/new-result-theoretical-physics/
3. Frontier Math Open Problem, GPT-5.4 Pro and others - https://epoch.ai/frontiermath/open-problems/ramsey-hypergrap...
4. GPT-5.5 Pro - https://gowers.wordpress.com/2026/05/08/a-recent-experience-...
5. Claude's Cycles, Claude Opus 4.6 - https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...
For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.
Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.
Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.
It's clearly not yet a tool that can deliver new math at a scale. I say this because otherwise, the headline would be that they proved / disproved a hundred conjectures, not one. This is what happened with Mythos. You want to be the AI company that "solved" math, just like Anthropic got the headlines for "solving" (or breaking?) security.
The fact they're announcing a single success story almost certainly means that they've thrown a lot of money at a lot of problems, had experts fine-tuning the prompts and verifying the results, and it came back with a single "hit". But that doesn't make the result less important. We now have a new "solver" for math that can solve at least some hard problems that weren't getting solved before.
Whether that spells the end of math as we know... I don't think so, but math is a bit weird. It's almost entirely non-commercial: it's practiced chiefly in the academia, subsidized from taxes or private endowments, and almost never meant to solve problems of obvious practical importance - so in that sense, it's closer to philosophy than, say, software engineering. No philosopher is seriously worried about LLMs taking philosopher jobs even though they a chatbot can write an essay, but mathematicians painted themselves into a different corner, I think.
What is at scale here exactly ? This is the most impressive so far, but it is one of several such advances in the last few months, all of which were with publicly accessible models.
Doesn't really matter the prep-work, what they say is it's a one-shot result, achieved by AI. The blog doesn't claim it was done by a currently public Model.
can we please put these ground breaking AIs to work on actual problems humans have?
Dang/Tomhow, are you reading this? Would it make sense to modify your slop filter to avoid auto-flagging/killing replies that credit the LLM explicitly? Otherwise valid discussions will continue to get hosed.
My argument is that this rule should apply only to people who post LLM output under their own user names without acknowledgment, or otherwise post it where it doesn't belong. If the topic of a (sub)thread involves LLM output, it should be OK to cite examples without getting your post flagged.
I can assure you, the percentage of people who can do what they do when it comes to crafting terms, and related sets of terms, for nuanced and novel ideas is very very small.
It happens this is something I do nearly every day.
Models respond to the level of dialogue you have with them. Engage with an informed perspective on terminological issues and they respond with deep perspectives.
I am routinely baffled at the things people say models can't do, that they do effortlessly. Interaction and having some skill to contribute helps here.
woah.
Gowers has one of my favourite video series about how he approaches a problem he is unfamiliar with: https://www.youtube.com/watch?v=byjhpzEoXFs
It is disheartening to see him jump into this GenAI puffery.
I hope these GenAI labs are paying Tao handsomely for legitimizing their slop, but more likely he's feeling pressure from his University to promote and work with these labs.
My guess is Gowers wants in on that action, or his University does.
Either way, it makes me sad. If its self motivated... even sadder.
> has a motivation to "market" the accomplishment as much as possible
I am so sick of HN promoting unethical behaviour as virtuous due to it's financialization worship at the foot of "valuations".
> but surely you agree it IS a remarkable achievement?
If you could define the bounds of "remarkable" I could answer this question.
A lot of the weight this holds is the fact that it's an old problem and that its difficulty hinges on the lack of investigation the disproof side of hypothesis. The model basically took a contrarian path and found tools and methods that support that a disproof is viable. So the (unquantified amount of) mathematicians out there were all dedicating their resources on the notion that this can be proved. Some with hindsight would say that if they a had team of experts who are driven to the goal of disproof that this would have been achievable by humans, and one of the mathematicians of the paper state as much,this still has value in terms of reliability measurement, and possibly human-aided endeavors when the methods scrounged by the model can be used in other solutions.
His university is deeply entrenched with the GenAI org that released this result both with having alumni on staff, integrating their tools into the school's processes and curriculum, and paying for lots of grants. (I understand Tao is absent from this specific announcement, perhaps because it found its solution without utilizing formal verification tooling)
Is it unreasonable to assume he's feeling pressure to do so?
Gowers similarly appeared largely uninterested in this current crop of GenAI until some months ago when he announced a 9M$ fund to develop "AI for Maths" and since then his social media has included GenAI promotion.
Now he is being asked about this result and his first sentence is:
> I do not have the background in algebraic number theory to make a detailed assessment of the disproof of Erdős’s unit-distance conjecture, so instead I shall make some tentative comments about what it tells us about the current capabilities of AI.
Why did this GenAI org reach out to mathematicians outside of the discipline that this result addresses?
Why did they respond?!
As with Tao, he's always been a measured optimist even before the tools were consistently usable for his work. And even still nowadays, he adds stipulations to his statements on the successes of AI. Yes, he's part of Math Inc. now and is in close contact with Google Deepmind for some projects but his interest lies in using the tools today. Gowers has been hypothesizing on the future of math in the tone he has taken now ever since o3/GPT5. There's no comparison between the two who should attract more scrutiny.
Focusing solely on "capabilities" is the irrational thinking.
Asbestos is the most "capable" material where extreme thermal, chemical and electrical resistance is required.
Gowers is funded by XTX markets:
https://www.renaissancephilanthropy.org/ai-for-math-fund
XTX markets heavily uses machine learning and disguises the influencer money as "philanthropy":
But you would probably say that Magnus Carlsen's previous engagement with the Maltese gambling company Unibet and him releasing a couple of YouTube videos talking positively about poker and gambling have nothing to do with each other. Nothing at all.
I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.
In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.
Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.
Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.
No this will never do the kind of math that humans did when coming up with complex numbers, or hell just regular numbers ex nihilo. No matter how long it's given to combine things in its training data.
Assuming humans are more powerful than regular languages I could maybe agree that these methods may not eventually yield entirely human like intelligence, but just better and better approximations.
The vibe I get though is that we aren't more powerful than regular languages, cause human beings feel computationally bounded. So I could see given enough "human signal" these things could learn to imitate us precisely.
Do you pass that bar yourself?
To be very specific - what novel things did the majority of the ~8 bil humans on Earth do say, yesterday, that you wouldn't otherwise dismiss as non-intelligent rehashing of the same tired patterns they always inhabit were those same actions attributed to LLMs?
What I'm getting at is that I think you're falling into the trap of thinking of the rare geniuses of human history, and furthermore their rare moments of accomplishment (relative to the long span of their lifetimes filled mostly without these accomplishments) when you think of "human intelligence", which is of course far overstating what actual human intelligence is.
> that you wouldn't otherwise dismiss as non-intelligent rehashing of the same tired patterns they always inhabit were those same actions attributed to LLMs?
Regardless of whether something's been done before people still come up with them on their own without directly copying or amalgamating several copies. Pretty much every skilled profession includes figuring things out on the fly through the use of general reasoning that doesn't involve pattern matching against millions of examples.
Much, if not the majority of synthetic data is AI generated. Human experts then evaluate samples of the data, but nothing like the entire corpus which can be trillions of tokens of generated material.
See here where Qwen team discusses synthesizing trillions of tokens for their pre training dataset - https://arxiv.org/html/2505.09388v1
> The rare geniuses of human history use a different magnitude and configuration of the same kind of human intelligence
I agree. What I don’t see any strong evidence for is that this intelligence is unique to humans. Nor do I see how it could ever be anything other than recombinations of existing data with random mutation. Where else would the building blocks for each invention come from, divine insight? We build on the shoulders of giants etc etc
Worth noting, as a sidebar, that we’re having this discussion on a post mentioning a novel breakthrough made by AI over a topic that many brilliant human mathematicians including Erdos himself failed to do.
> Regardless of whether something's been done before people still come up with them on their own without directly copying or amalgamating several copies.
I’m not even saying it in the “there’s nothing new under the sun” sense.
If you follow an average person’s day from beginning to end. Let’s say in Bangkok or NYC or Paris, at which part of the day are they not simply repeating a variation of something they’ve done many times before, or seen others around them do before, or read about others doing before, or heard about others doing before, watched others do before on TV etc etc
What you have left, how is it distinguishable, without reasoning backwards from the desired conclusion of human exceptionalism, from turning up the temperature on an LLM query?
How many data points does a human parse when they attempt to stand up as a toddler? Sight, sound, sensation from every limb and body part, inner ear, internal thought processes at the time conscious and unconscious related to the moment and attempting to interpret it in relation to all that it’s experienced to this point, including all prior attempts and whatever retained associated data, a hard to even comprehend stream of data, coming in continuously over however many minutes, hours, etc of attempts.
The stream of data the brain is processing from both external and internal sources from birth is incredibly rich, and if we attempted to represent the full depth of it it would far outweigh the size of any corpus models are being trained on now.
I think what may be genuinely missing from AI is the type of data that doesn’t translate completely into text. The audio and images/video we feed in are a totally incomplete slice of the POV of say even a single average human through their lifetime, and bereft of all the associated data a human has access to in the moment (sensory etc).
I think this tends more towards the world models that Yann Lecun et al are promoting as the key to more capable AI.
LLMs approximate a lot of that very well by simply having seen it before.
Also watch kids develop language: they learn patterns with much less training data than LLMs.
> novel to us every single day. Like navigating a shopping cart through tricky coridors in a store
We have been practicing navigating the physical world for something like 16hrs/day every day from the moment of our birth. All the sensory data passing through our brains during that time is far larger than any dataset an LLM is trained on.
Humans navigating a shopping cart at a store have likely navigated the physical world before, pushed a shopping cart before, and in combination have navigated stores while pushing shopping carts before. Nevertheless, many still bump into objects all along the way.
Them succeeding at successive variations of store layouts is not novel unless we expand the definition of novel to mean any recombination whatsoever of pre existing concepts.
I’m certain that with all the intense usage of AI by hundreds of millions of people, there have been countless collections of words passed to LLMs so far that have never before been uttered in exactly such a sequence, let alone in the dataset.
I’m equally certain the LLMs have responded to those words with collections of its own that have also never been uttered in that exact sequence, responding to their unique context.
It is trivial to produce an example of this now yourself if you’d like.
The LLM we’re talking about, mentioned in the OP, has never seen this solution to this problem in its dataset. A large number of brilliant mathematicians were not able to discover this solution. They are themselves expressing that this is a novel breakthrough and had this come from a human it would be treated as such.
If the response to that is “well it’s just recombining concepts it already knows until it finds a solution that works” I would ask how that differs from what humans do?
This is the bit that's missing that LLMs do approximate amazingly well through sheer training set size, but in my opinion, it puts a cap on what novel things they can achieve in comparison with humans.
To me, I've thought about a related "invention space" before: with us creating software to solve many problems people are facing, why are there not any perfect solutions for any problem (running a cafe? a CNC machine? ...), and we always need more software built to cover one small (novel?) change for a particular owner?
The world space is just so large that you need whatever this intelligence is humans (and animals) have to navigate it successfully — but LLMs do not intrinsically.
Whether they can be so large that it does not matter in 99.99% of cases is to be seen.
I very specifically addressed this in my response to you. How much training data is contained in 16 waking hours of navigating the world fusing all sensory data, never mind data being simultaneously generated within the mind while this is all going on, from birth til death? From birth til pushing that shopping cart?
Far, far more than in all the training datasets being used for AI.
I also addressed this again in my reply to the sibling comment.
People tend to discount how much data humans have passing through their minds 24/7.
A human isn’t born in a vacuum as a fully formed adult and dropped into the shopping cart navigation problem.
A human has had far, far more training data fed into it that contains all the pieces necessary to translate to pushing a shopping cart when first seeing it, than a machine learning model which has been fed 1 million videos of a robot pushing a shopping cart.
It doesn't strike me as a claim that should be controversial.
As far as I know nobody can train A.I. to push a shopping cart based on a human child's training set. It's mostly not relevant to the task.
I am absolutely certain that we have not already discovered let alone implemented the best possible learning algorithms. Humans have had more time to evolve, there's a great chance that we do learn more efficiently, and have developed specialized brains that are primed to learning things like how to navigate the physical world on planet Earth as bipeds.
That said, to say that we operate with less training data is just ignoring the reality of all the data we're training on at all times.
If we were to model in lossless fidelity what humans are capable of seeing, hearing, smelling, tasting, feeling, thinking consciously and subconsciously etc. essentially all the data flowing through our minds that we are constantly training on every moment of every day, even while we sleep/are unconscious, what sort of bitrate do you think would be required?
Modern LLMs train on datasets in the what, tens of terabytes in size? Let's call it 100 TB.
I would imagine that to losslessly reproduce the full suite of human sensory data (whatever that means for things like taste, touch, smell) would require a bitrate that hits that 100 TB total relatively quickly?
"...we're optimized for having not many experiences. You only live for about a billion seconds—that's assuming you don't learn anything after you're 30, which is pretty much true. So you live for about a billion seconds and you've got a 100 trillion connections. So [you've] got crazily more parameters than you have experiences. So our brains [are] optimized for making the best use of not very many experiences."
But that's a good way to look at it: in 2B seconds, how many experiences can we get?
What you're suggesting on the other hand is something akin to counting the number of pixels on each page we look at. That's absurd overestimate of the amount of data a person reading is actually taking in.
LLMs needed how much training data to be able to do so?
FWIW, I still see them make up wrong words not following any grammatical pattern, esp in Serbian with less training data.
Serbian is pretty complex though: https://www.languagegrowth.com/en/blog/serbian-grammar-basic... — this made it even more surprising to see the kids pick them up so early when their vocabulary is probably not 2000 words yet.
Usually people point out that humans are more sample efficient: they might notice a novel pattern in a handful of samples, whereas training NN might require take millions.
However a claim that LLMs fundamentally cannot do abductive reasoning at all is not warranted - we don't see a clear cut, it just looks like the way LLMs do it is less efficient.
Its like just commenting "I disagree" its totally pointless for discussion.
That's why you're getting downvoted if you're wondering.
For example, to prove something is impossible let's say you first prove that there are only 5 families, and 4 of them are impossible. So now 80% of the problem is solved! :) If you are looking for counterexamples, the search is reduced 80% too. In both cases it may be useful
In counterexamples you can make guess and leaps and if it works it's fine. This is not possible for a proof.
On the other hand, once you have found a counterexample it's usual to hide the dead ends you discarded.
For proving a proposition P I have to show for all x P(x), but for contradiction I only have to show that there exists an x such that not P(x).
While I agree there could be a lot of theory crafting to reduce the search space of possible x's to find not P(x), but with for all x P(x) you have to be able to produce a larger framework that explains why no counter example exists.
Reductio ad absurdum is a technique to prove something.
> the AI has been able to explore all these possibilities much more comprehensibly, and doing that it found a path, it found a way to the solution.
Finding a counterexample of a mathematical conjecture strikes me as not that different from finding a vulnerability in a complex codebase.
edit: apparently that’s only the _condensed summary_ of the chain of thought.
What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?
No matter how much compute time it's given to combine training samples with each other and run through a validation engine it will still be missing some chunk of the "long tail". To make progress in the long tail it would need to have understanding, and not just a mimicry of understanding. Unless that happens they will always be dependent on the humans that they are mimicking in order to improve.
I feel like people grasping straws on the shrinking limitations of AI systems are just copying the "god of the gaps" fallacy
The thing where you can understand the meaning of this sentence without first compiling a statistical representation of a 10 trillion line corpus of training data.
Unless you're an NPC of course.
Or rather, maybe I don't understand what you mean :)
So I have all sorts of associations with "apple" and spent a little time playing with it.
First in a raw physical sense I can imagine an apple in my head, spin it around, imagine its physics with near cylindrical symmetry etc. A red apple is what first pops into my head, although of course I know there are many apple variants and have opinions on their taste etc.
There are many cultural associations I have with apples from Newton to George Washington. The company Apple has its own set of ideas that I interact with when I hear the word.
In other words I can think of various associations I have to the word apple of various strengths. These associations and strengths are functions of my experience encountering the word and actual apples.
Is there a feeling of "appleness"?
I don't really know what this would mean. I would say no, unless it can perhaps be defined what appleness means and feels like. I don't really notice any strong set of emotions or feelings from this thought exercise.
Do you think that sense of meaning is equivalent to the numerical weights of an LLM?
Again I think I would need a definition of "sense of meaning". I don't seem to derive a singular pointlike meaning when contemplating a singular word. I never was contending that human and LLM cognition are exactly equivalent, but I could see these association strengths being represented in LLM weights. I would say then if an LLM has similar association strengths with "apple" then it "understands" apples as well as I do. Of course this is really hard to test, but frontier models could give you all sorts of apple facts and cultural associations and so on. It may slip up and hallucinate, and I'm sure that I also believe at least one false thing about apples.
So what is your brightline between LLM and human understanding in this example? I assume that your line of reasoning would argue that LLMs do not understand apples. Why don't LLMs understand the word "apple?
I'm not sure how I would convey what meaning and understanding is to someone if they don't experience them. This is my poor attempt though: There can not just be associations there need to be "things" to associate between. Otherwise you have no ground, it is all map and no territory. Ultimately it would just be meaningless associations between meaningless symbols.
One qualitative distinction that remains for the time being is that humans care about things while AIs do not. Human drive and motivation is needed to have AI perform tasks.
Of course, this distinction isn’t set in stone.
Well, there's the fact that it hasn't yet improved since what we had 3 years ago. That doesn't really bode well for the prospect of future improvement, though it's not technically impossible.
The more I read about these achievements the more I get a feeling that a lot of the power of these models comes from having prior knowledge on every possible field and having zero problems transferring to new domains.
To me the potential beauty of this is that these tools might help us break through the increasing super specialization that humans in science have to go through today. Which in one hand is important on the other hand does limit the person in terms of the tooling and inspiration it has access to.
What makes me more of an optimist in this case is that people who today decide to go into these sciences are mostly people who are driven by intellectual activity so I feel they are the right ones to figure this out, probably more so than us the engineers.
I hear some specialists (specially multi-disiplinary ones) write things they know few or no one can read. (Which is the most ironic reason for being rejected by a journal)
I recall a funny moment on irc where a truly helpful guy moaned that no one helped him when he had a (programming) question. He was very good at many programming languages and worked in some mix of high level physics and mathematics. He posted SO questions that rarely got an appologetic response from someone able to understand the code and the physics but couldnt wrap around the math. lol I hope he finally gets some help with his wizardry.
Human cognition improves the more you practice it. Not when you outsource it to machines that do the "cognition" for you.
I think we still don't really comprehend how much can be achieved by a single "mind" that has internalized so much knowledge from so many areas.
Personally I'm a more of a breadth person and I could never compete with peers who where more of the depth type of person at college.
But I get satisfaction from connecting things that feel irrelevant on first sight, that's what drives me.
As we're becoming hyper specialised, they become an invaluable tool to merge the horizon in, so to speak.
I don’t think that this model works anymore though.
Also, I love the expression “merge the horizon in”. Being a non native speaker of a language is so nice some times. Thanks!
Cool thing is now when someone contributes something to the hive mind, it can instantly be applied to any other problem people are working on.
So the crossdomain pollination that used to exist in scientists is not only not encouraged. It's also actively punished by society.
Can you explain more what you're referring to, because this has not been my experience at all. Heck, when I went to college, cross disciplinary majors were all the rage.
I think the thing that is just factually difficult is to actually become skilled in multiple different domains, precisely because the level of study/practice/rehearsal to become proficient in any individual domain keeps going up.
A long time ago you could be a Renaissance man by essentially dabbling in different fields. But today, as this article points out, you need extremely deep expertise in any one area just to understand the status quo - this proof required extremely deep expertise in two separate areas that mathematicians were surprised to be related at all.
Similarly, we're creating tools to improve knowledge, but we're progressively zapping the human out of the equation. Knowledge is created for something, but it's unclear if very soon humans will be able to understand it, or really benefit from it, except billionaires, etc.
It's too bad that we're not improving humans nearly as fast as we're replacing ourselves.
Can a tech news stay a tech news, without getting bombardes with leftist subtexts all the time?
EG, my own oldest child needed a surgery at birth that would have been logistically impossible even 50 years ago. I'd say that she and I have benefited enormously, despite not being billionaires.
edit: I solemnly swear that the sibling comment with the strikingly similar "impossible 50 years ago" claim is a pure coincidence and that I at least am not a bot campaign. Haha.
But yeah, quite a coincidence that we both chose 50 years :)
And this is where machines, such as these reasoning LLMs, can help. Because they can remember patterns across many domains and try absolutely bonker weird connections and ideas.
We, the humans still have to verify the work (at least as of now). But, the "maybe this tool, or idea, or trick, from that completely unrelated field applies here" reasoning/experimentation could become much easier.
I have always said this and will say it again: reasoning is just experimentation with a feedback loop and continuous refinement.
Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .
Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.
One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
Along with all the rest of what humans find meaningful and fulfilling.
It may be the beginning of thinking, but to many who view things on a longer timeline. It starts to look like it will breakdown the frameworks of which are required to get to that position. Otherwise, you just end up retreading explored ground. This removing the joy of discovery from any humans hand/mind.
Perhaps your name-calling is not actually as logically grounded as you think. It definitely seems to depend on unfounded leaps.
This technology is solving interesting math/physics problems for us, which is completely different.
Money is valuable only as it changes hands for goods/services, and if you want to get rich, on top of having/producing/controlling something everybody desires, you also need as many people as possible to have money to give you in exchange for a piece of that something.
Humanity is having those discussions, heck you are in one RIGHT NOW not some Hollywood future.
What is coming of those discussions is the ownership class balks at the idea of raising their taxes (see recent interview with bezos), and therefore balks at the idea that you or I should have any value beyond what we produce... And if AI can replace you or I, well how do we survive if we can't produce in a technological society?
In the (probably unlikely) event that AI use results in a post-scarcity economy in which there's no need to work to survive, a lot of people wouldn't regret sentiments like the ones in question.
On the contrary, it would mean they could work on whatever they please, including potentially standing on the shoulders of giants - the AIs - and seeing even further.
If we actually worked to create a society that work for the benefit of all its members, there would be a lot less reason to worry about developments like these. Much of the worry arises because for various reasons - none of them really good ones - we've ceded control of these developments to the people least suited to manage it.
To a society that provides a livelihood to all humans, equally?
For, I would love to hear how we get from here to there during an era with the largest wealth disparity ever seen in human history. (Yes, it's worse than the robber Baron era of US history). For I have yet to see any signs that the capital/ownership class has any intentions other than vacuuming up even more wealth and power for themselves. And that anathema to your desired outcome.
History is full of examples of situations like this being corrected, at least to an extent. If we learn from those, we can do even better next time around.
Btw, the inequality you mention is far worse in the US than Europe. Here's one source that covers this: https://wid.world/es/news-article/why-is-europe-more-equal-t...
This demonstrates a point that should be obvious, that better societal choices can produce better outcomes.
i wonder if this is physically/mathematically impossible: the mere act of living involves processing energy, and therefore doing work :)
And there is a lot of energy to be processed in this Universe before the heat death...
Mind you, there are places in the universe that we have no way of knowing ever existed... The non-obserable universe if you will. For when physicists talk of the observable universe, it is only the fraction we have any chance of receiving data/light/radiation of/from
This "any" shines like a thermonuclear fireball.
Not so many years from now, some of them will surpass you. A few years after that all (that survive to that point) will surpass you.
Does that terrify you just as much?
A child is a living, breathing, growing, and changing conscious entity. It is the natural order for the young to supplant the old, no matter what the politicians and billionaires desire.
"AI" - terrifies anyone who understands the pact our society rests upon: that labor is valued and can be exchanged for goods and services to survive. Thereby enabling a person to support their families without having to do everything themselves.
If AI replaced a noticeable fraction of society, destroying their capacity for work. That threatens and ultimately blows up this compact between working class and capital class... With it, the foundations of a modern technological society.... It may sound like hyperbole, or some fantastical prediction. But really it is basic economics, like econ 101... And personally the last few years have terrified me, not because of AI directly, but because how ignorantly blind many smart and tech savvy people are... You are marching us to collapse with a smile on your face...
Moreover, truth be told, I don't really see myself doing any less math and requiring less from my skills. At least from the moment I've begun incorporating LLMs into my research workflow to now, the demand I've had from my own skills has only grown. At least in an era prior to Lean formalization.
This is just an application of the philosophy "automate yourself out of a job every 6 months"- I've been doing that for a long time, and the outcome is generally a more interesting job.
A dedicated engineer is always looking to automate themselves out of existence, so that they can move on to the next thing to automate. Ongoing repetitive work is less engineering and more akin to toiling on a line.
The answer is that we simply need to decouple the "right to exist" from "worth."
You should have the right to exist and explore the world simply because you're human, not because you can use your skills to provide some sort of transactional value to someone else. Deprogramming so many people is going to be hard...
Let's start with the first practical step: how do you dethrone the psychopaths in charge of the world who own about everything on Earth and have all the world's lethal force in their pockets?
Perhaps it is time for life to be considered intrinsically valuable, instead of being "worthy" only based on output or capability. Disability, animal and environmental advocates have been fighting for this for a long time. Not too long ago women and minorities were in the same boat. Even now, there are many advocating and fighting for a return to the dark old days.
> Along with all the rest of what humans find meaningful and fulfilling.
Some humans. Many are content to enjoy simply existing, and the beauty of life and the universe around us. Just like many non-scientists today enjoy and benefit from the work of scientists, tomorrow too many will enjoy learning from, and applying the coming advancements and leaps in many fields.
And those of a scientist or other research-type mindset? No doubt they will contribute meaningfully by studying the frontier, noting what remains unanswered, and then advancing the frontier, just like researchers do today; just because scientists in the past solved many questions doesn't mean that there aren't any questions to answer today.
IMHO, AI means that the frontier expands faster, not that it is obliterated. Even AI cannot overcome the laws and limitations of physics/universe: even Dyson spheres only capture the energy of one star, thus setting a limit on the amount of compute, and thereby a limit on intelligence. And we are a loooong way from a Dyson sphere.
PS: I think you're being unfairly downvoted. Your question is not invalid and deserves responses, not downvotes.
Ah, the proverbial silver spoon. Sadly, I never had that luxury. If you look through my comments, you'll notice I'm more at the get-off-my-lawn point.
Also, what happened? Real world wear you down and turn you cynical? It is possible to be hopeful and cynical at the same time. This tech is something new we're seeing: the future is as yet unwritten. r/LocalLLaMa works well, so there's hope even if corporate ai goes kaput.
My generation has been lucky to see a few new things, though we certainly live in interesting times. Moon Landings. Berlin Wall fall. Moore's Law. EU (I have the old coinage to serve as a reminder). Space Shuttles. China and India integrating with the world. Cellphones. The Internet. Digital Photos. Linux. Solar. 3D-printing. Smartphones. Tablets. Bitcoin. EVs. Mars rovers. Asteroid visits. Internet from space. FTTH. MRNA. Gene Therapy. MRI. Ultrasound. Wi-Fi. Mesh Wi-Fi. Reusable Rockets. Cubesats. Selfies from space. Drones. LoRa/LoraWAN. Maglev HSR. And now AI, real AI. Chinese-like Whale Language.
There's hope for the future yet. You can help make it happen right. But only if you leave the cynicism at the door. Can't give up - it's our kids' futures at stake.
What about Ukraine holding Russia back, and now looks like it might actually win? What about the most recent additions to NATO. Hungary's regime change? Canada's save? EU's pivot to arming itself, and quickly?
Buds of green, yeah?
To be blunt, this seems incredibly uninteresting to me. I enjoy learning mathematics, sure, but I just don't find much inherent meaning in reading a textbook or a paper. The meaning comes from the taking those ideas and applying them to my own problems, be it a direct proof of a conjecture or coming up with the right framework or tools for those conjectures. But, of course, in this future, those proofs and frameworks are already in the textbook. So what's the point? If someone cared about these answers in the first place, they probably could have found the right prompt to extract it from this phantom textbook anyways.
You could argue for there being work still like marginal improvements and applying the returned proof to other scenarios as happened in this case, but as above, what is really there to do if this is already in the phantom textbook somewhere and you just need to prompt better? The mathematicians in this case added to the exposition of the proof, but why wouldn't the phantom textbook already have good enough exposition in the first place?
I think my complete dismissal of the value of things like extending the proofs from an LLM or improving exposition is too strong -- there is value in both of them, and likely will always be -- but it would still represent a sharp change in what a mathematician does that I don't think I am excited for. I also don't think this phantom textbook is contained even in the weights of whatever internal model was used here just yet (especially since as some of the mathematicians in the article pointed out, a disproof here did not need to build any new grand theories), but it really does seem to me it eventually will be, and I can't help but find the crawl towards that point somewhat discouraging.
Who cares if it is God's book or the machine's Xeroxed copy?
"The Book" is more interesting to me if I am the one coming up with the ideas to fill it in. Maybe this is a bit egotistical, but I'd like to think it is allowed to have a desire that you, personally, are contributing to something in a meaningful way. Like, if you are on a sports team, it'd be more fun to win a game if you were on the field than if you were benched, and I think that's okay. And ultimately I don't find dredging for proofs from an LLM particularly meaningful, nor do I see it as a particularly personal contribution, as anybody else could have done the exact same thing with the same prompt.
This isn't to say I wouldn't love to read the proofs in "The Book" for problems I care about, I just think I'd eventually get bored of only reading. And so its hard to be enthusiastic when this book is being built through an LLM.
This is a good analogy for AI work displacement. Probably would resonate with some of the college students who boo'ed Eric Schmidt.
Technology in general (smartphones, social media, search) even without AI is creating this feeling, as it shrinks the world and makes it less mysterious.
It's worse than boredom it's more like nihilism.
Then when you strip purpose and meaning from a human you get something very bad, despondency being the best case outcome.
It will be a transition, for sure - there would no longer be meaning in “winning the game” in a capitalistic or scientific sense. Anything you want to produce or learn, the AI could already produce or has already learned. Now you have to do it just for the love of the process.
I have a musician friend who likes to say that good artists overwhelmingly make art for their own benefit. Not to advance the world or blow people’s minds, but because something inside of them needs to come out, and art is how they express it. And that part of us isn’t going to go away.
Shifting from “human calculators” to machines for arithmetic is also hard to argue against.
I think what makes the AI transition difficult is it impacts a wide range of high-value activities that would have been implicitly assumed to always remain human.
I do have great trouble seeing how a pile of matrices is ever going to be capable of innovation. Maybe with sufficient entropy and scale, it will… The day that becomes practical will be a turning point in history.
Economically, goods and services are often priced based on labor/“value added” aspects. Lawyers’ fees aren’t driven by paper costs! If AI takes a huge bite out of intellectual labor, the future could become very different…
BTW, your book description reminds me of the 2025 movie “A.I”. I thought it was quite good.
You admit this possibility so I'm not arguing with you, but it seems far more plausible to me that we can build something better than the brain.
In the limit we can just grow brains and put them in computers anyway, then the debate is moot. That's a really hard problem but of course not physically impossible.
I'm also afraid of a world where AI completely replaces human mathematicians, but if we remain collaborators, then that's a world I can still feel excited about.
Can you please expand on how you do so?
[1] https://en.wikipedia.org/wiki/Connections_(British_TV_series...
the hype is where the money is, as is always -- marketing & porn. Both touched heavily by AI already.
That a raw LLM hallucinates?
That we never see all the mistakes and dead ends a complex system using AI hits?
Does it even matter if its accuracy rate across all its experiments is < 100% if it can run trillions of experiments in the same time a human could run 1?
We don't see many of the failed attempts of Human Researchers. Why? Because it doesn't matter.
What amazing here is that it shows our society can make discoveries faster in the post LLM world. Thats incredible.
Your "critique" of how it happened. Not so much.
I'm sorry to nitpick, but isn't an unconscious idea an oxymoron?
I bet your stupid ideas also taught you a valuable lesson and you learned at least something from the experience, maybe your next idea won’t be so dumb, and those 100 watt hours weren’t actually wasted (though it may feel like they were). Compered to a failed LLM experiment, where all those billions of billions of computations are completely wasted. the model knows exactly as much after a failed experiment as it did going into it. Those Megawatt hours were simply wasted, turned into heat energy, paid for by raising the power bills of the of the datacenter’s neighbors.
Not my post, but I think point 1 is stronger than 2.
That's not necessarily true. If our only counterfactual to investing resources in project A were to invest them in some other project B, then, yes, the conclusion above follows. But often people just consume the resources.
(In the end, the goal of all economic activity is consumption. We invest resources so that we can consume more later. If there's no good enough project around, might as well consume more now.)
When you consider the amount of computation which went into this discovery it is less impressive. Like if you spend a lot of fuel you can travel really fast, much faster then a bicyclist. Similarly Go-engines can beat the best humans at go, but they spend several orders of magnitude more energy to do so.
Mathematicians prove or disprove conjectures all the time and use orders of less energy to do so. Using LLMs is kind of just throwing money at the problem and hoping it works. In this case it did. But this is not the most efficient way to do this, and it won‘t scale.
Currently we can live with it because someone can review that work. Soon we wont be able.
Always, always always, the problem with research and development is leadership, not insufficient supportive technology. It is a political problem, there is absolutely, positively no shortage of technologies to support research. Your optimism is totally misplaced. The NSF funding cuts have negatively impacted math more than AI has benefitted it. And guess who supports the administration that cut NSF funding? The people who ousted the PhDs from OpenAI.
You are right to point out that the ones who fully own and pilot the machines all belong to the “fuck science and humanity as a whole” group. So the likely outcomes don’t look good.
Echoes the early promise of the internet vs the eventual state and consequences of it, although seemingly primed for far more dire and deeply penetrating consequences.
That's true. But. Maybe you've seen the Oppenheimer movie, there is a moment where Oppenheimer shakes Teller's hand, basically after the guy ruins Oppenheimer's life in a completely immature betrayal. That's what people are angry about, the academy community is Oppenheimer's wife asking, why the fuck did you shake his hand?
At least regarding leadership and funding, I don't know if it's a matter of likely or unlikely outcomes. It's just facts: these guys are collaborators. The commenter might very well have zero graduate students starting next year. What pisses me off is the utter obliviousness that STEM people have about how deeply political their work is.
And perhaps this is the real reckoning for the mathematics community. Not the possibility that AI is going to replace their jobs, it's not going to do that. But that having these intensely myopic and disagreeable personalities mean that basically zero leadership skills have been nurtured in the mathematics community. You cannot name a single politician who is a mathematician. You have to be elected to have power in this country, it's that simple, there are way more billionaires than there are presidents! Leadership is far more scarce. So that's why these disputes matter, and while it's great that people engage on Hacker News about it, it's intensely disappointing that "reduced science funding is really bad" gets downvoted.
That is a result of Hacker News's emphasis on this very 2010s view that it wants to be a place where the math nerds gather (in @dang's words) - he doesn't get that the quality of the discourse was caused by great leadership at many political and academic levels. Nobody credits how much better leaders were during Y Combinator's biggest success stories, or how much we overvalue the intellectual powers of math because it makes money as opposed to enlightening our view of the world.
I can: https://en.wikipedia.org/wiki/C%C3%A9dric_Villani
( and of course Wikipedia has a list :) https://en.wikipedia.org/wiki/List_of_mathematician-politici... )
No interest in human advancement, just attribution.
What I’m saying is that the ultimate goal of those in power are not these sorts of altruistic or even scientific pursuits, and that the massive labor disruption and hyper concentration of power in the hands of those who are proving time and again that advancement of science and benefiting the whole of humanity are actually antithetical to their goals is likely a bad thing.
Most homeless people have smartphones, and consistent access to food and clean water.
Your average 'poor person' in America has HVAC. An unimaginable luxury in the EU
Eh, don't be silly. In the places where the summer is hot enough (or, more precise, where it used to be hot enough), I have seen plenty of AC units on shabby buildings, even on old Commie apartment blocs in Romania.
AC is not that expensive.
Lmao, did HN just glitch out and start showing me Pieter Levels' tweets?
Southeast or central US has considerably higher wet bulb temperatures than Europe does in summer. Without HVAC, there’s a good chunk of the year where it’s too hot to get much done.
For example, this library here for deep learning is 100% ai generated and far beyond my technical capabilities.
For example, if you're in fundamental science (or generally a fan of reductionism), it for sure would be nice to understand the universe instead of just having access to an AI that can comprehend it. But to the majority of the population it only matters that someone (or something) understands it enough to make it useful to others.
AI is going to both help and hinder this process though. At the end of the day, mathematics is mostly a social process at this point. The goal is not raw number of theorems proven, it’s how proving theorems affects the working operational models of mathematicians. Only a rare few new theorems in mathematics nowadays have direct real world applicability.
If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.
I am no mathematician and very naïve about this, but in a world that is rapidly becoming extremely calculation and network dependent that sounds hard to believe.
> If AI produced legitimate theoretical breakthroughs at a pace mathematicians are unable to absorb, then the impact will be neutral to negative.
I think the idea here is that all mathematicians will just be using AI for their future work so they don’t really have to absorb it as long as it’s in the training data.
> I am no mathematician and very naïve about this, but in a world that is rapidly becoming extremely calculation and network dependent that sounds hard to believe.
I am a mathematician. It is true. The key is we're talking about new theorems, and direct, current real world applicability. Some theorems that have no applicability now may in the future, as theory often precedes applications by a long way and the usefulness is likely to come from other things built on top of the new maths, and a lot of pure maths will never have direct real world applications but contributes to our overall understanding.
On the other hand, there are many applied mathematicians and theorists from other fields that mine new maths for applications to their fields. But they are almost always not the ones that come up with the new math.
Historically, of course, mathematics was always driven by the need to explain things. Many of the mathematicians from the 17th and 18th centuries were physicists (or, less commonly, engineers). But for the last hundred years or so that really hasn’t been the case.
It seems like if AIs can prove and index a huge number of (largely uninteresting to humans) things there might be sort of "parallel cultures"? Big results are most valuable to humans and AIs both (most context efficient!), but a very large number of less general but still non-obvious results might be an effective approach to solving problems?
Has this ever been different?
Math is abstract, rightfully so. It does not have to have direct applicability. Understanding builds over time and applications eventually follow. Number theory used to be a fringe "pure" theory field without applications for the longest time. If we'd only be interested in (and thus fund) what has direct applicability then society would be much worse off.
Side note: I recall my high school class mates rolling their eyes in every math class with "when will I ever need this in my life?" never asking the same question about PE or history or art classes. Now they struggle with their tax return and are routinely getting screwed over by loan sharks. But make no mistake, they can be proud of their A for hitting the goal 5 out of 5 times during soccer in PE class.
That is not what the mathematicians are saying. I don't have the knowledge to evaluate this myself, but a number of mathematicians - for example, in the SP - are saying it goes further than that - they really do introduce novel ideas. Of course everything is based on and inspired by some previous work, but that is true of all human mathematics as well.
LLMs that have been trained through reinforcement learning on mathematics are NOT simply token predictors. Only base models can be accurately described that way. They have learned how to do mathematics. They have learned to do coding. Its really amazing we're three years into instruct models and such a large part of Hacker News still does not understand the most basic facts about this field.
"All" a model is doing is predicting the next words, based on the statistical distribution of words it has seen similar to the ones read/produced so far.
We push a model towards a particular set of distributions through context. If I ask a model "What is the capital of France?", there is a non-zero chance it goes down the dad joke answer of "The letter F". The far more likely option is "Paris", because the joke appears much less often in training material, but if I wanted to be absolutely sure of getting a consistent geography answer I'd address that with additional context. We can add context via prompts, RAG, agents, skills and so on.
However, when training a model, we select the material. We could show it a lot more geography information (or dad jokes!), and skew the statistical distribution in the direction we wanted. We could also decide to design the system prompt towards the direction we prefer - which the user would interpret as "the model" - and so nudge the context model-wide. We can also construct the interaction to iterate on context with a specific framing and call it "reasoning".
In this specific example, you could therefore solve the problem by a) training skewed towards mathematical papers, which likely degrades performance in general and likely for the specific case too, b) train the user to provide better context/prompts for mathematical work, shifting the workload to them which feels very "a la 2024", c) publish agents and skills that are tailored to mathematics work (very "a la 2026"), d) tweak the system prompt for when the model is doing mathematics work, which the user would see as "the model" doing the change, but you and I might look under the hood and say that is in the harness or a specific type of prompt, or e) add "reasoning" execution that is set to focus on mathematical formatting, or f) a mixture of the above.
Right now we're probably looking at agents and skills. I think over time we're going to see smaller models targets towards domains with a mixture of all of it, where some of this sits at user configurable levels, and some is "baked in" via training, system prompts and execution modes, but from a user perspective it's all just "the model".
And by opening the door to LLM-generated results, you'll see greater and greater amounts without any hope of ever navigating this field again without machine help.
It's a little like a software project which more and more gets extended by a AI agents with less and less review by human software engineers and in the end the complexity and spaghetti design are so incomprehensible by humans that the maintenance requires an AI agent. The risk is that math as a whole (the field itself) will experience that effect.
Say we achieve interstellar travel, but nobody actually knows how it works.
Or we cure cancer, but the "cure" requires a microrobotic implant, and it runs as a blackbox AI, and only the other AIs can make one, and there's no guarantee they will know how to make one tomorrow.
Or we solve global warming but it requires giant cooling machines running 24/7 and again, nobody knows how it works, but with the added bonus that the planet is cooked if they ever stop working.
https://youtu.be/pfNS2kWf5cY?si=SH6_QC0bCspV-ngz
There are comments that truly reveal a future horrifying and true. Few of them. But I count yours among them.
But I’d argue also that airplanes already achieve this complexity to some degree as well as microprocessors.
I mean, microprocessors have been on the "impossible to bootstrap from scratch in a short period of time" for 20 years already.
I think there will be regulation that requires some users of AI to provide an explanation upon request. For instance, banks could be required to "explain" why you didn't get that loan. What if the decision is based on a credit score that includes some AI prediction that ultimately relies on the entire training corpus?
The bank can give you a list of factors that play into the decision but they may not be able to explain deterministically why a very similar customer did get that loan. At that point I think we're going to resort to statistics that prove a lack of bias against certain protected characteristics, but that's not really an explanation, is it?
I think we will never get useful and complete explanations for everything that AI does. Society will just accept some explanation-like thing or proxy and move on.
If they understood it 100%, what clarification is needed?
Your dog will never understand calculus or why Fourier transforms are interesting. There's almost certainly topics that are beyond human comprehension that an advanced artificial or alien intelligence can easily handle.
True, but it is possible to assemble a team of people that does, with backup for each person. There's also teachers and written knowledge to educate new team members. That's what makes it resilient.
I think that's a very different situation from what's decribed.
The idea being that once a toolchain becomes sufficiently complex if you ever have to bootstrap it again for whatever reason you won't be able to speedrun the process the way you might naively expect. I think modern chip production likely already reached this point several decades ago. As evidence I'll point out that China only recently achieved EUV and remains several nodes behind despite directing an obscene amount of resources towards the initiative.
1/ No one knows how even small components work, because their inner working mechanism is too hard to understand by human mind
2/ The whole society is run (in intelligence sense) by alien minds
When you let the machines do it, and don't care about moving it towards human domain (i.e. meatspace), you're done.
Why is it necessary to continue to increase complexity when we get better intelligence? Can't we find more simple solutions? Or at least more explainable.
See comment about "scientific equipment that people hadn’t conceived of but which worked"
It's hard to describe the feeling of seeing intelligence being delegated increasingly to AI. If that's not a pivotal moment, a revolution, I don't know what is.
This has always been true. There was a time where someone had to teach farming to others and that information had to spread and be passed down. Eventually, farmers became better than hunter-gatherers and they became known as hunters. The information on what was safe to gather for civilisation got passed down as 'safe to eat on the hunt' because the farmers were farming. The civilisation collectively "forgets" foraged foods as that knowledge becomes niche.
Does that mean we got dumber?
Looks like you're pretty sure of that. Every time I see argument like this delivered with confidence I wonder how is it different from, say, digital calculators. Or better yet, books - Greek philosophers moaned that young people will stop understanding anything and just check books when they want to know anything.
Knowing the history of the humankind is what makes me pretty sure of that.
The extent of misery and destruction is directly proportional to the level of technological advancements, and I don't like the idea of sacrificing millions of lives in the name of the figurative HVAC, smartphone and other benefits of civilization. Or billions in the name of whatever benefits the next VC money stake should bring.
> I wonder how is it different from, say, digital calculators.
Did a single digital calculator ever stop any war, or liquidate a psychopath who orders people to go kill and die?
By statistics of war, poverty rates etc this is trivially false. I think you are really, really underestimating how hard life was pre-industrial revolution.
If you had a magic button that turned off all those "benefits of civilization", millions would die. If you managed to drag agriculture down with the rest, the death toll would be in the billions.
I don't understand how you can possibly think you "know history" without recognizing that technological progress has taken us from constant warfare to such a state of abundance that war is actually rare and noteworthy in much of the world.
Let's try to have an actual argument. How many people, in absolute numbers, were affected by that constant warfare of past, which past exactly do you mean, and how many people were/are affected by "rare" wars of modern history?
80 million people killed or maimed with arrows, swords and catapults over centuries and 80 million killed or maimed with fruits of industrial revolution over 6 years of WWII are very different figures.
Is it better to have lived as an individual one of these fictional cohorts? Is it better for the group in the same or different one?
Is it better to live and suffer than to not live?
I think the answers are obvious.
If only 4 people die violent deaths out of a total population of 5, that’s an extremely violent population to be a part of.
If 8 people die violent deaths out of a total population of 100,000. That’s a much more peaceful population to be a part of, despite the greater number of absolute deaths.
Orders of magnitude more than in prehistoric days.
Benefits of civilization eliminated most of that + increased quality of life dramatically.
Ten thousand years ago (around 8000 BCE), the global human population was estimated to have been roughly 5 million people. This is significantly smaller than the current population of just Poland (about 36 million).
In absolute numbers there might be more now, even if the percent is smaller. It is difficult to compare this things without having a specific place in mind.
The fact that humankind grew from 5M to 8.3B, while dramatically improving longevity and quality of life speaks volumes. Multiply life quality × population × life duration, not only "misery and destruction" is not the case, but you could rather see powers of positive technology influence.
The intriguing part is that we could get objectively good outcomes, but at a cost of being dependent on the machines. So it's not that you couldn't actually unplug Skynet, it's that if you did civilization would collapse (or whatever) because Skynet stops doing its thing.
I'm not sure that gets us to a better place overall, but I doubt we could resist the temptation.
What your describing is already how a lot of science, technology, and engineering works!
In case of AI we have a better chance to understand what it is doing through chain of thought and explainability. Nature never gave us that..
That’s not “solving it”, that’s putting a bandaid on it. Solving it would mean correcting the underlying issue to the point it’s no longer a problem which requires maintenance.
Managing symptoms is not curing the disease.
Green energy and transport technology is now at the point where people save the world and get rich trying, just as fast as they can build the factories.
Food's climate impact is harder, because the problem isn't technical, it's convincing people to give up beef (and other things, but mostly beef).
* quantum mechanics and general relativity are famously difficult to get to grips with
The book doesn't deviate from what you have envision, or the future you envision doesn't deviate from the book, I may say.
I do think we will need to find a way to get away from publishing papers. But I thought that before the AI came along and made mediocre papers something you can produce in a day. The academic system seems utterly incapable of self-correcting on this point though. We haven't even managed to get rid of for-profit publishers. So how this all will go down is anybodies guess right now.
I am also a little worried about what it means for your training as a junior PhD. Often you would try and solve a problem your advisor thinks is doable that they assign to you as a learning exercise. It may be more and more difficult to find problems that a junior PhD can solve but that AI can not. Tim Gowers has written about that here: https://gowers.wordpress.com/2026/05/08/a-recent-experience-...
This is a very important point, especially when the output is from a non-deterministic random walk with some unknown probability distribution.
They do the opposite by locking the results the produce within the slop presentation that needs more AI to comprehend.
I attended a conference on AI for maths and open science a few weeks ago, and was struck by just how many examples of AI-supported solutions there already are. Virtually every speaker had an example of either their own use of (often the frontier) AI models in solving a problem that was previously too hard (for various definitions of hard).
I wrote up a few notes [1], and most of the speaker videos are available via the conference website [2].
[1] https://scholarlyfutures.substack.com/p/ai-and-the-practical...
Without knowing all this model has been trained on though, it is pretty hard to ascertain the extent to which it arrived to this "on its own". The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.
It isn't likely that someone solved this and then just put it in the training data, although I honestly wouldn't put that past OpenAI. More interesting though is the extent to which they've generated training data that may have touched on most or all of the "original" tenets found in this proof.
We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.
In all seriousness though: My suggestion is that those shepherding the frontier of AI start acting with more transparency, and stop acting in ways that encourage conspiratorial thinking. Especially if the technology is as powerful as they market it as.
Are you asking me how LLMs work?
The theory proposed by the original commenter was that there could have been some secret training data the model was trained on that made it possible to solve this problem set. So the only conclusion is they are implying it's a conspiracy by OpenAI to hide some novel math research they funded merely to do marketing about solving math problems (then convincing multiple math experts to verify and support it with papers). That is the definition of a conspiracy.
edit: >> https://techcrunch.com/2025/10/19/openais-embarrassing-math/
The ability to find incredibly obscure facts and recall them to solve "officially unsolved" problems in minutes is like Google Search on steroids. In some sense, it is one core component of "deep expertise", and humans rely on the same methodology regularly to solve "hard" problems. Many mathematicians have said that they all just use a "bag of tricks" they've picked up and apply them to problems to see if they work. The LLMs have a huge bag of very obscure tricks, and are starting to reach the point that they can effectively apply them also.
I suspect the threshold of AGI will be crossed when the AIs can invent novel "tricks" on their own, and memorise their own new approach for future use without explicitly having to have their weights updated with "offline" training runs.
That is not true and a complete misrepresentation of recent progress of AI in math. It is therefore not necessary to believe the conspiracy theory you described in order to explain recent progress of AI in math.
Congrats to the OpenAI team for one of the most significant breakthrough discoveries in AI history.
I'll gladly admit I think what these companies are doing is unethical, and I'm sure that biases my thinking toward skepticism.
That said, there remains way too much that is hidden to be able to effectively evaluate what is going on. You have the perfect storm:
- AI companies do not share their custom internal harnesses.
- AI companies do not share their custom internal training data.
- AI companies do not share how much compute they allocate to trying to solve problems of this nature.
- AI companies are primarily marketing their models to investors as human-replacing rather than human-augmenting.
- AI companies are under enormous financial pressure to make their business work.
The last two points incentivize them to find these types of "first proof" successes as aggressively as they can, and I'm sure they've thrown the whole book at it.Is it likely that they literally had a mathematician discover this, put it into the training data, and then prompted it out? Of course not.
But it would make a world of difference--in evaluating the impressiveness of this discovery and LLM capabilities in general--if we were to know the extent to which the training data crosses over this problem, the harness with which this was ran, and how much compute was spent.
Until they bring more transparency to the whole process--something which some of the mathematicians commenting on this even asked for--I will personally take discoveries of this nature with a good dose of salt.
Really? Any references to read more?
- https://www.theverge.com/cs/features/831818/ai-mercor-handshake-scale-surge-staffing-companies
- https://outlier.ai/math/en-us
- https://www.opentrain.ai/
- https://www.pin.com/blog/ai-labs-hiring-train-models/
Much of this is data annotation, reasoning trace evaluation, and problem set curation. But there is no way they haven't atleast paid some mathematicians to work on research grade problems in tandem with their models, and then used that for training data.Does this expert data likely contain this proof within it? No. Would it temper the impressiveness to know they have a large amount of novel mathematical training data, an internal Lean harness for evaluation of open conjectures, and spent hundreds of millions in compute to calculate this? Yes.
Also why pay anyone, when they can keep up with all the papers that not one man can read them all? That seems to me like wasted money.
Another point is that that's not how AI training works anyway. It's much easier to put it in context rather than re-train them with every bit of random maths you find out. Things at the tail-end of the power law doesn't stick. At least, last I checked...
They pay people for expert training data they do not share because it gives them an edge over other AI companies. And, as always, deep learning is enormously data-hungry, and we've gotten to the point where publicly available data has been exhausted.
AI companies absolutely retrain models regularly to keep up with the cutting edge. There is a reason why this announcement references an internal, unreleased model, rather than "we just put a lot of new math papers within the GPT5.5 context window and found this."
The underlying model may still effectively be a stochastic parrot, but used properly that can do impressive things and the various harnesses have been getting better and better at automating the use of said parrot.
Note that I'm not disputing the validity of the counterexample itself.
The world runs on trust, specifically trusting expert advice. It'd seem that due to resource constraints and scale, that's the best available option. By extension, there should be absolutely nothing weird or surprising on people following suit. It's why these companies themselves rely on expert counsel, and defer to their appraisals for marketing. The opposite is what's weird and unusual, and what requires more substantiation.
It's interesting that those who come out swinging against "trusting the experts", or really, trusting anyone else but them, not only ~never acknowledge this, but are seemingly outright proud of it, considering it as their own unique little trait, egocentrically revelling in it. It's almost as if epistemic rigor and truthfulness was not their actual concern.
Woohoo, I'm distrustful and cynical. Behold my unfathomable wisdom! Bonus points if they're also hurtful, because flipping the arrow on "hard truths -> hurt feelings" is a masterclass in reasoning too, of course.
I can appreciate faulting experts and organizations for misusing people's trust, and looking out for this angle, but given how unavoidable and fundamentally useful trusting itself is, blaming people for defaulting to trusting makes no sense to me whatsoever. It comes across as just the usual trope of blaming the individual. If you're from a lower-trust culture / environment, I can appreciate why you'd have a more distrustful default disposition (and why people might come across as suckers), but the principle still holds.
Given its elementary nature (very easy to state), you can bet that a lot of very bright people have worked on it (I know of one MIT graduate who specialized in Geometry had a lot of interest in it).
Moreover, model output is incredibly good at looking credible but being wrong. It has NEVER produced something correct for me in a field of which I am an expert without some external oracle to validate claims (like e.g., Lean)
It still writes like a junior dev, in that despite AI being able to get a picture of an entire repo, it's changes are typically confined to the task it's working on and will opt to duplicate logic to keep changes contained. Again, technically works, not ideal.
BUT I have had great success using AGENTS.md and becoming better at prompting to get it to not be like this.
Basic approach in AGENTS.md: don't code defensively, yada yada, we have a validation layer at X, no need to check for anything behind that layer. Works well.
An approach I've found helpful when prompting: What would be the best architecture for this change? If you say "do X" it'll tend to just do the hackiest, shortest path thing. If you say, "what's the best way to do X?" it will think more holistically.
That said, who knows, maybe when it's PHP it just really wants to hack ;-)
(Also, yes, you still need to review the code -- it will still do stupid things, so you can't just be pure hands off w/o ending up with quality degredations. The same is true of humans too though in my experience...)
Since you’re not in a unique position, I can confidently state that your comparison of LLMs to jr developers seems unfounded. Today, LLMs produce code that is superior to junior developer code by an order of magnitude.
Notably, they demonstrate consistent syntax, clear separation of concerns, strong test coverage, organizational rigor, idiomatic API usage, and the ability to generate and maintain documentation, among other measurable qualities.
LLMs generally operate at a staff engineer level for a number of languages and ecosystems (including polyglot projects).
Comparing an LLM to a senior developer is an absolute joke.
2. Are you referring to without having a compiler or LSP check it? Although even then, the recent LLMs I've used still frequently get syntax right, whereas I'd expect juniors are often using a LSP or compiler to catch mistakes while writing code?
who cares about syntax? who cares about iteration? what I care about are _results_, which they can produce at the end. do you check your human colleagues how many iterations they do before committing/showing their work to anybody? no. why should you set such a bar then for your LLM?
We have many folks (not engineers) at our company using LLMs to open PRs, and every one of these PRs has profound architectural design problems.
This is a critique of scale, moving the goalpost.
There is serious magic happening in the construction of model context.
> The python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man -- me -- and just used Google Antigravity to do the audio sample visualizer.
> My only complaint is the claims always start spreading 6-12 months before the delivery.
If delivering on such promises "always" occurs 6-12 months after the promise, is that pretty good?
I generally like AI and use it plenty often, it does many things well and I'm curious to see how far it keeps going, but that doesn't mean I have to like overhyped marketing about it.
Some times when you go some distance with a subject generates data for new ideas.
Once math gets done fast, newer ideas and paradigms also arrive.
I appreciate very much the work done so far, but this sort of asymptotic/quantitative result didn't interest me much even when it was done by humans.
(This is not snobbery, just a personal preference.)
As a matter of fact more logic and structure to your work, the more easy it is for AI to conquer it. Due to this programming was the first thing that got solved, but pure sciences are next.
If what you do, and how you do can be written down on a piece of paper, then AI can do it.
I do believe programming getting solved will be double assault on these fields.
>>This is not snobbery
This is good for the species, what sense does it make to keep treating these fields like they are reserved for the top most intelligent micro percentage of humans? Getting LLM to these things gives some scale to these subjects and thats good.
So is AGI, but we may be hundreds of years off still.
Human mathematicians frequently introduce new pointless abstractions just to churn out papers. And they are not accepted in serious journals, but they sometimes find a place in some mediocre or bad journal.
Of course, AI will increase this phenomenon manifold.
And if it isn't, we should find out very soon. If AI has got so good as OpenAI's post implies, then we should soon see a veritable blooming in the production of mathematical results, by lay people no less. No mathematicians needed! OpenAI say that their secret LLM solved the planar unit distance problem "autonomously" and the companion remarks say it one-shotted it; and while the companion remarks make it clear that there was a lot of refinement and improvement work done by humans, everyone seems to agree that the AI did the job by itself.
If that's true, if we're really at that level of autonomous mathematical reasoning ability, then we should see hundreds, even thousands, of open problems suddenly solved in a matter of years if not months. We'll just have to wait and see.
What I assumed they were saying is that their LLMs would be as intelligent as a human with a PhD across all, or at least most, knowledge tasks, and they clearly are not.
What was discovered were numerous mistakes in the published literature on the subject. “New math! AI!” No, just mechanical application of rules, human mistakes.
There were things that were theorized, but couldn’t be exhaustively checked until computers were bigger.
Once again, a tool is applied, it has the AI label - its progress! But it isn’t something new. It’s just an LLM.
There’s a consistent under appreciation of AI (and math, honestly), but watching soulless AI mongers declare that their toy has created the new is something of a new low; uninspired, failed creatives, without rhyme or context; this is a bigger version of declaring that your spell checker has created new words.
The result is more impressive than what was done with tables of integrals and SAINT in 1961, sure.
Apparently if you add a “temperature” knob to a text predictor, otherwise sane individuals piss themselves and call it new.
Then again I thought NFTs, crypto, and the Metaverse were stupid, so what do I know.
I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.
- Does anyone know if this was a 1 minute of inference or 1 month?
- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?
- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?
Why would anyone believe this to be true even for a split second?
The point of having an AI solve an unsolved problem, is to make it very clear that the insight must have come from the AI and wasn't in the training data. Sure, it's possible OpenAI had access to some math professors that solved it and then let an AI model take the credit... but seems unlikely. That human would be turning down a potential Fields Medal for this discovery.
The abridged chain-of-thought from the model also serves as some evidence of LLM origin: https://cdn.openai.com/pdf/1625eff6-5ac1-40d8-b1db-5d5cf925d... (could be fake, though I'm unsure what proof of LLM origin couldn't be faked)
While interesting, this result is not Fields Medal material.
I also don’t like the tin foil hatty theories and don’t know what OpenAI actually did, but an NDA does wonders! Just pointing out that this line of operations is not really unlikely.
I'm suggesting that OpenAI invested a lot of resources and money in having someone (or a group of people) disprove this conjecture, so they could claim their LLM disproved it. Yes.
I'm not sure why you're surprised by this, given that everything that Altman has said in the past has turned out to be a lie.
The fact that they gave an EDITED (even rewritten, from the PDF itself) chain of thought is just further proof. Why not give the raw one alongside? No reason at all, except if it doesn't exist.
We can argue about recombination/interpolation of training data in LLMs, but even if this was an interpolation, the result was contrarian rather than a confirmation. Any system that can identify an error in Erdős's thinking seems very useful to me (though perhaps he did not spend much time thinking about or checking this particular conjecture).
> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.
Can someone please elaborate on this?
Much more recently (2021), Hajir, Maire, and Ramakrishna figured out how to apply the Golod-Shafarevich theorem to a slightly different Galois group to produce an infinite tower of number fields with some even more surprising properties. This is used in the new proof. It requires very slightly modifying the construction of Hajir, Maire, and Ramakrishna to produce the fields needed in this proof, but the explanation of how to do this takes only a paragraph in the human-written summary. (The explanation is more laborious in the original AI writeup).
The relation to Ellenberg-Venkatesh is more indirect. This is where "in retrospect" comes in because this work was not cited in the original AI proof. This has to do with the next step of the proof, after you construct the number field, you need to find many elements of this field with the same norm to produce many vectors of the same length. To do this, the proof uses a pigeonhole argument which uses small split primes of the field (constructed via Hajir, Maire, and Ramakrishna's argument) to construct many ideals. By the pigeonhole principle, you can guarantee two ideals lie in the same class. When two ideals lie in the same class, you get an element of the field. You can rig things so these elements all have the same norm. Ellenberg and Venkatesh had an argument which also used the pigeonhole prnciple to guarantee two ideals lie in the same class to produce elements of the field. They were working on a different problem so their argument was slightly different, but similar.
Look past the press-releasey gushing from OpenAI and there are all sorts of interesting and subtle questions here about the role for LLMs in mathematical research. I urge folks to click through to the accompanying comments from mathematicians published alongside the result. There is a really interesting discussion going on. I particularly recommend Tim Gowers’ remarks. This is really interesting stuff!
Yet the comments are just a battleground of people rehearsing the same tired arguments about LLMs from 2023, refutations of those arguments, angry counters, etc.
Does it make anyone else sad that the battle lines seem to have been drawn 3 years ago and we just seem to have the same fights over and over?
I wonder if we’ll still be doing this two years hence.
Fight! Fight! Fight!
There are a lot of big issues at stake here and just because a person is interested in what AI can do and curious to discuss it does not make them uncritically positive about it’s effects on society, the economy, and the world. Yet that is often the assumption and it leads to battle lines being drawn, on every AI discussion, over and over again. It means the serious discussion gets swamped and that makes me sad.
Yes, I'm tired too. I want you have real discussions about these things. But the problem is everyone believes their reality is real and anyone's reality that disagrees is fake. It just escalates. I take long breaks from HN because I realize I just come to the forums and end up being angry. Why do we do this to ourselves? The reality is that at a core level we usually want the same things.
This website is quite awful, and I also don't know why I spend any time on it. It's definitely not a website intended for meaningful discourse. It's a website where you can reaffirm whatever opinion is already established, and if your opinion is at all controversial or even just out of the box, you'll be punished for it.
I do not want to wage war against what is ugly. I do not want to accuse; I do not even want to accuse those who accuse. Looking away shall be my only negation.
I’ve been thinking of building myself my own frontend to HN that makes it impossible to view comments, for this reason. Yet sometimes there are still really interesting discussions and it’s hard to let go of what for me feels like the last social media I want to be part of.
> I wonder if we’ll still be doing this two years hence.
It is going to take some time for people to recognize that AI has a very different set of competencies that compliments human intelligence rather well. It is unlikely to eclipse human intelligence at scale, and the companies betting on that will fall behind. That is when the conversation will start to shift.
Another wishful/hopeful thought is that the human experience itself is valuable, that competing for resources and living within a social network and having physical needs somehow creates value that is essential for companies to operate.
But is it really the case? I don't think we know that, and I don't know if the economy that results when all the white collar and much of the blue collar workers no longer understand how to participate in whatever the economy is becoming. Because it is starting to look like old money is coming around, and soon we will all be serfs to the creature comforts of those who have money now, upward mobility will be a thing of the past, and a small ruling elite over the vast subservient majority will form, reorganizing societies to more resemble middle ages lordship rather than what emerged in the 50's and 60's following WWII.
If LLMs were improving significantly independent of scaling up compute resources, I would be a lot more worried. The economic instability (on several levels) of the current trajectory cannot last. Countries and companies that don't take a more sustainable approach will eventually find themselves outclassed by those that do. Unfortunately that is not a guarantee against some sort of dark age in the short term.
This is completely false. Most of the dramatic improvements in LLM quality in the last two years were due to the application of new post-training methods, especially RLVR. It’s really interesting to read about (you should!) and it is the whole secret to why LLMs did not plateau in 2024 or 2025 like many people confidently predicted. Sure, RLVR requires compute to do, but this is not just throwing more compute at 2023 LLMs.
1. AI is developed to be smart enough to actual replace people, destroying the labor force and immensely concentrating power.
This seems like bs hyperbole but I am not an expert.
2. AI turns out to be a bubble of false promises and hype, bursts, and takes the stock market and economy with it.
I thought this was the most likely but I keep not hearing popping, so maybe the it's:
3. AI continues to be a tool that can substantially increase productivity in some areas and cause huge societal changes in others. The AI companies keep the hype train going or maybe it tapers off over time until talk meets reality but "real" AI never shows up and the bubble never pops because it's not one. Eventually there is 0-3 new FAANG companies with untouchable control of a tech we increasingly have to use to stay relevant.
Even if we avoid option 1 and 2, 3 doesn't exactly bode well either.
Every few months you get an article of some executive bragging that he fire an entire department of people because of AI.
It was adversarial from the start. The idle rich who don’t have to work for a living and their sycophants who somehow believe they won’t be replaced vs … everyone else.
I used to think that the common tale of AI rebelling in Hollywood movies was unlikely. Turns out we don’t even need rogue AI, our fellow men are quite willing to wipe the rest of us out.
If suddenly anyone can code we're not that special anymore.
I think that you can easily address your concerns about this new technology (since we all are concerned about the future) but at the same time acknowledge how revolutionary it is.
While many seem to be anxious or pessimistic about the future of intellectual/artistic pursuits (understandable although I disagree), I do find the utter lack of curiosity or interest in the incredible machinery that is causing all the fuss to be striking.
Right now, we are in a transition period... Models are improving, but they are not capable just yet to take over.
Where do you see it being in a years time? or 2? or 5?
It's interesting as a math problem and test of AI, but not much else IMO.
Other domains are extracting value but I feel like there's an order of magnitude difference. It raises the question, what other domains fit into these categories where the AI itself has pretty much free reign to verify its own results?
> the closer the expertise you spent your whole life building is to being worthless.
Perhaps it is time for life to be considered intrinsically valuable, instead of being "worthy" only based on output or capability. Disability, animal and environmental advocates have been fighting for this for a long time. Not too long ago women and minorities were in the same boat. Even now, there are many advocating and fighting for a return to the dark old days.
> Along with all the rest of what humans find meaningful and fulfilling.
Some humans. Many are content to enjoy simply existing, and the beauty of life and the universe around us. Just like many non-scientists today enjoy and benefit from the work of scientists, tomorrow too many will enjoy learning from, and applying the coming advancements and leaps in many fields.
And those of a scientist or other research-type mindset? No doubt they will contribute meaningfully by studying the frontier, noting what remains unanswered, and then advancing the frontier, just like researchers do today; just because scientists in the past solved many questions doesn't mean that there aren't any questions to answer today.
IMHO, AI means that the frontier expands faster, not that it is obliterated. Even AI cannot overcome the laws and limitations of physics/universe: even Dyson spheres only capture the energy of one star, thus setting a limit on the amount of compute, and thereby a limit on intelligence. And we are a loooong way from a Dyson sphere.
“ For decades, it was widely believed that this rate was essentially the best possible, and no construction could improve significantly over the square grid. In technical terms, Erdős conjectured an upper bound of n 1 + o ( 1 ) n 1+o(1) in which the additional o ( 1 ) o(1) indicates a term tending to 0 0 with n n.
Our new result disproves this conjecture. More precisely, for infinitely many values of n n, the proof constructs configurations of n n points with at least n 1 + δ n 1+δ unit-distance pairs, for some fixed exponent δ > 0 δ>0. (The original AI proof does not give an explicit δ δ, but a forthcoming refinement due to Princeton mathematics professor Will Sawin has shown one can take δ = 0.014 δ=0.014.)”
Can anyone point me to a diagram of the newly found optimal arrangement?
Can anyone point me to a diagram of what the newly found solution looks like?
Everything is a grift.
What are the odds that if they ran the same prompt from scratch, with the same context and instructions that it would arrive at the same answer? Unlikely. I think its more likely that this is a 1:500000 chance and OpenAI can afford to brute force this result and justify the expense for marketing.
Is there anywhere an image example of a superior layout for example with n>={100,1000,10000}..? I would love to see it. I am imagining it would look somewhat like a sloppy pizza.
What was the process of a writing a paper? Was the question asked by a mathematician? Was the paper right from a get-go or was there someone who pointed out mistakes?
How much attempts were made before solution was found?
I will eat my words if an AI oneshotted that one without any external help, but for know I am left wandering whether it's a new way to attribute discoveries to companies instead of people who put the work in
Nevertheless new maths is exciting and might lead to what I find slightly more interesting - new physics.
As per the report, the prompt used to solve the problem is AI-written and the solution was initially graded by an AI grading pipeline. They don't say this explicitly, but it seems like OpenAI has an automatic pipeline where they prompt models for solutions to famous math problems (which wouldn't be unexpected given how flashy a solution to a famous math problem looks)
> Was the paper right from a get-go or was there someone who pointed out mistakes?
Also as per the report, the output of the model isn't really a "paper"; it's a very terse 2 page solution which is apparently correct. The paper was later written based on this solution to make it more presentable.
> How much attempts were made before solution was found?
Given that this appears to be from an automated pipeline, I would say that it had many attempts. But either way, the blogpost says that with enough test-time compute, the model finds this same solution 50% of the time.
[1] https://cdn.openai.com/pdf/74c24085-19b0-4534-9c90-465b8e29a...
Can you be more specific? I'm still under the impression that Mythos was a huge deal:
https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...
https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-v...
like having a colleague peer review your paper, or bouncing ideas off a mentor before you write them down?
I agree there's a lot of AI marketing BS at the moment, but revising approaches based on feedback is a good thing.
But peer review is a powerful tool.
Carefully choosing what lemmas to give for solving and reviewing the result is my favorite way to teach young minds. Yes, they do solve most problems themselves. But, most of them likely wouldn't be able to do that before someone dissects problem beforehand and points at weak spots in their explanations.
And that's why I question who prompted the model, how they prompted it, and how much their own ideas influenced the output.
I admit, I don't know enough to judge how much of the right solution was actually enclosed in a first reply
If a for-profit (because... you know, OpenAI isn't at all what it initially was) huge corporation (again, not a cute startup trying to help humanity) publish anything it's a piece of marketing. Every single word a corporation say is marketing.
So... that's also that, a piece of marketing to sell more of whatever their potential client can buy. It's not a piece of research. It's an ad. That's it.
I guess you can get some estimate from the excerpted CoT, but that CoT might be backed by quite a lot of parallel compute.
The conjecture was about an upper bound for the maximum number of pairs. It has been disproven.
Was the Erdos problem the conjecture itself, or was it about the actual maximum number of pairs? (In which case it will probably never be solved.)
The problem is defined in the narrow version here: https://www.erdosproblems.com/90
Since loglog(n) tends to infinity with n, the additional term in the exponent tends to 0, meaning these constructions achieve growth only slightly faster than linear.
Would anyone else describe the previous asymptotic behavior like that? I mean obviously loglogn to O(1) is a quantum leap, but wouldn't you describe loglogn as "grows so slowly it's almost constant", so the constructions achieve growth "almost n^{1+c}"? But I guess that might be overcorrecting too hard.
What I meant is that they describe loglogn the same way you could describe O(n) or O(n^2) -- it "tends to infinity with n", even though my mental model for loglogn is to treat it as barely more than constant. See: https://cs.stackexchange.com/questions/148197/who-said-first...