History LLMs: Models trained exclusively on pre-1913 texts (opens in new tab)

(github.com)

897 pointsiamwil3mo ago421 comments

421 comments

“Time-locked models don't roleplay; they embody their training data. Ranke-4B-1913 doesn't know about WWI because WWI hasn't happened in its textual universe. It can be surprised by your questions in ways modern LLMs cannot.”

“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”

This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.

jscyc3mo ago

When you put it that way it reminds me of the Severn/Keats character in the Hyperion Cantos. Far-future AIs reconstruct historical figures from their writings in an attempt to gain philosophical insights.

srtw3mo ago

The Hyperion Cantos is such an incredible work of fiction. Currently re-reading and am midway through the fourth book The Rise Of Endymion; this series captivates my imagination and would often find myself idly reflecting on it and the characters within more than a decade after reading. Like all works, it has its shortcomings, but I can give no higher recommendation than the first two books.

1 more reply

bikeshaving3mo ago

This isn’t science fiction anymore. CIA is using chatbot simulations of world leaders to inform analysts. https://archive.ph/9KxkJ

8 more replies

abrookewood3mo ago

This is such a ridiculously good series. If you haven't read it yet, I thoroughly recommend it.

culi3mo ago

I used to follow this blog — I believe it was somehow associated with Slate Star Codex? — anyways, I remember the author used to do these experiments on themselves where they spent a week or two only reading newspapers/media from a specific point in time and then wrote a blog about their experiences/takeaways

On that same note, there was this great YouTube series called The Great War. It spanned from 2014-2018 (100 years after WW1) and followed WW1 developments week by week.

verve_rat3mo ago

The people that did the Great War series (at least some of them, I believe there was a little bit of a falling out) went on to do a WWII version on the World War II channel: https://youtube.com/@worldwartwo

They are currently in the middle of a Korean War version: https://youtube.com/@thekoreanwarbyindyneidell

tyre3mo ago

The Great War series is phenomenal. A truly impressive project.

pwillia73mo ago

This is why the impersonation stuff is so interesting with LLMs -- If you ask chatGPT a question without a 'right' answer, and then tell it to embody someone you really want to ask that question to, you'll get a better answer with the impersonation. Now, is this the same phenomenon that causes people to lose their minds with the LLMs? Possibly. Is it really cool asking followup philosophy questions to the LLM Dalai Lama after reading his book? Yes.

Sprotch3mo ago

Nice idea, does not work

1 more reply

staticman23mo ago

Why is that cool?

Imagine you are a billionaire so money is no object and really interested in the Dhali Llama?

Would you read the book then hire someone to pretend to be the author and ask questions that are not covered by the book? Then be enraptured by whatever the roleplayer invents?

Probably not? At least this isn't a phenomenon I've heard of?

ghurtado3mo ago

This might just be the closest we get to a time machine for some time. Or maybe ever.

Every "King Arthur travels to the year 2000" kinda script is now something that writes itself.

> Imagine having a conversation with someone genuinely from the period,

Imagine not just someone, but Aristotle or Leonardo or Kant!

RobotToaster3mo ago

I imagine King Arthur would say something like: Hwæt spricst þu be?

1 more reply

anthk3mo ago

Easier with Cervantes for Spanish speakers than King Arhur or Shakespeare.

With Alphonse X, o The Cid, it would be greater issues, but understandable over weeks.

takeda3mo ago

> This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.

Having the facts from the era is one thing, to make conclusions about things it doesn't know would require intelligence.

1 more reply

psychoslave3mo ago

>Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.

Isn't this part of the basics feature of human conditions? Not only we are all unaware of the coming historic outcome (though we can get some big points with more or less good guesses), but to a marginally variable extend, we are also very unaware of past and present history.

LLM are not aware, but they can be trained on larger historical accounts than any human and regurgitate syntactically correct summary on any point within it. Very different kind of utterer.

pwillia73mo ago

captain hindsight

1 more reply

observationist3mo ago

This is definitely fascinating - being able to do AI brain surgery, and selectively tuning its knowledge and priors, you'd be able to create awesome and terrifying simulations.

nottorp3mo ago

You can't. To use your terms, you have to "grow" a new LLM. "Brain surgery" would be modifying an existing model and that's exactly what they're trying to avoid.

ilaksh3mo ago

Activation steering can do that to some degree, although normally it's just one or two specific things or rather than a whole set of knowledge.

eek21213mo ago

Respectfully, LLMs are nothing like a brain, and I discourage comparisons between the two, because beyond a complete difference in the way they operate, a brain can innovate, and as of this moment, an LLM cannot because it relies on previously available information.

LLMs are just seemingly intelligent autocomplete engines, and until they figure a way to stop the hallucinations, they aren't great either.

Every piece of code a developer churns out using LLMs will be built from previous code that other developers have written (including both strengths and weaknesses, btw). Every paragraph you ask it to write in a summary? Same. Every single other problem? Same. Ask it to generate a summary of a document? Don't trust it here either. [Note, expect cyber-attacks later on regarding this scenario, it is beginning to happen -- documents made intentionally obtuse to fool an LLM into hallucinating about the document, which leads to someone signing a contract, conning the person out of millions].

If you ask an LLM to solve something no human has, you'll get a fabrication, which has fooled quite a few folks and caused them to jeopardize their career (lawyers, etc) which is why I am posting this.

6 more replies

Sprotch3mo ago

This is the point - a modern LLM "role playing" pre-1913 would only reflect our view today of what someone from that era would say. It woud not be accurate.

diamond5593mo ago

Yeah, whenever we figure out time travel that will be really cool. In the meantime we have autocorrect trained on internet facts and modern textbooks that can never truly understand anything let alone what is was like to live hundreds of years ago.

throawayonthe3mo ago

i get what you're saying, but the post is specifically about models that were not trained on the internet/modern textbooks

xg153mo ago

"...what do you mean, 'World War One?'"

tejohnso3mo ago

I remember reading a children's book when I was young and the fact that people used the phrase "World War One" rather than "The Great War" was a clue to the reader that events were taking place in a certain time period. Never forgot that for some reason.

I failed to catch the clue, btw.

4 more replies

gaius_baltar3mo ago

> "...what do you mean, 'World War One?'"

Oh sorry, spoilers.

(Hell, I miss Capaldi)

inferiorhuman3mo ago

… what do you mean, an internet where everything wasn't hidden behind anti-bot captchas?

LordDragonfang3mo ago

Perhaps I'm overly sensitive to this and terminally online, but that first quote reads as a textbook LLM-generated sentence.

"<Thing> doesn't <action>, it <shallow description that's slightly off from how you would expect a human to choose>"

Later parts of the readme (whole section of bullets enumerating what it is and what it isn't, another LLM favorite) make me more confident that significant parts of the readme is generated.

I'm generally pro-AI, but if you spend hundreds of hours making a thing, I'd rather hear your explanation of it, not an LLM's.

ViktorRay3mo ago

Reminds me of this scene from a Doctor Who episode

https://youtu.be/eg4mcdhIsvU

I’m not a Doctor Who fan and haven’t seen the rest of the episode and I don’t even what this episode was about but I thought this scene was excellent.

Sieyk3mo ago

I was going to say the same thing. Its really hard to explain the concept of "convincing but undoubtedly pretending", yet they captured that concept so beautifully here.

anshumankmr3mo ago

>where they don’t know the “end of the story”.

Applicable to us also, cause we do not know how the current story ends either, of the post pandemic world as we know it now.

DGoettlich3mo ago

exactly

rcpt3mo ago

Watching a modern LLM chat with this would be fun.

Davidbrcz3mo ago

That's some Westworld level of discussion

seizethecheese3mo ago

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire. Not just survey them with preset questions, but engage in open-ended dialogue, probe their assumptions, and explore the boundaries of thought in that moment.

Hell yeah, sold, let’s go…

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Oh. By “imagine you could interview…” they didn’t mean me.

DGoettlich3mo ago

understand your frustration. i trust you also understand the models have some dark corners that someone could use to misrepresent the goals of our project. if you have ideas on how we could make the models more broadly accessible while avoiding that risk, please do reach out @ history-llms@econ.uzh.ch

9999000009993mo ago

Ok...

So as a black person should I demand that all books written before the civil rights act be destroyed?

The past is messy. But it's the only way to learn anything.

All an LLM does it's take a bunch of existing texts and rebundle them. Like it or not, the existing texts are still there.

I understand an LLM that won't tell me how to do heart surgery. But I can't fear one that might be less enlightened on race issues. So many questions to ask! Hell, it's like talking to older person in real life.

I don't expect a typical 90 year old to be the most progressive person, but they're still worth listening too.

1 more reply

tombh3mo ago

Of course, I have to assume that you have considered more outcomes than I have. Because, from my five minutes of reflection as a software geek, albeit with a passion for history, I find this the most surprising thing about the whole project.

I suspect restricting access could equally be a comment on modern LLMs in general, rather than the historical material specifically. For example, we must be constantly reminded not to give LLMs a level of credibility that their hallucinations would have us believe.

But I'm fascinated by the possibility that somehow resurrecting lost voices might give an unholy agency to minds and their supporting worldviews that are so anachronistic that hearing them speak again might stir long-banished evils. I'm being lyrical for dramatic affect!

I would make one serious point though, that do I have the credentials to express. The conversation may have died down, but there is still a huge question mark over, if not the legality, but certainly the ethics of restricting access to, and profiting from, public domain knowledge. I don't wish to suggest a side to take here, just to point out that the lack of conversation should not be taken to mean that the matter is settled.

1 more reply

bogedy2mo ago

I'm surprised that you're even mentioning this risk. This isn't actually a risk. Anyone who would make an issue of this deserves to be confronted. Your abundance of caution is a self-fulfilling prophecy.

qcnguy3mo ago

There's no such risk so you're not going to get any sensible ideas in response to this question. The goals of the project are history, you already made that clear. There's nothing more that needs to be done.

We all get that academics now exist in some kind of dystopian horror where they can get transitively blamed for the existence of anyone to the right of Lenin, but bear in mind:

1. The people who might try to cancel you are idiots unworthy of your respect, because if they're against this project, they're against the study of history in its entirety.

2. They will scream at you anyway no matter what you do.

3. You used (Swiss) taxpayer funds to develop these models. There is no moral justification for withholding from the public what they worked to pay for.

You already slathered your README with disclaimers even though you didn't even release the model at all, just showed a few examples of what it said - none of which are in any way surprising. That is far more than enough. Just release the models and if anyone complains, politely tell them to go complain to the users.

ThePyCoder3mo ago

I'm not sure I do. It feels like someone might for example have compiled a full library of books, newspapers and other writing from that era, only to then limit access to that library, doing the exact censorship I imagine the project was started to alleviate.

Now were it limited in access to ask money to compensate for the time and money spent compiling the library (or training the model), sure, I'd somewhat understand. Not agree but understand.

Now it just feels like you want to prevent your model name being associated with the one guy who might use it to create a racist slur Twitter bot. There's plenty of models for that already. At least the societal balance of a model like this would also have enough weight on the positive side to be net positive.

naasking3mo ago

What are the legal or other ramifications of people misrepresenting the goals of your project? What is it you're worried about exactly?

diamond5593mo ago

Yet your project relies on letting an llm synthesize historical documents and presenting itself as some sort of expert from the time? You are aware of the hallucination rates surely but don't care whether the information your university presents is accurate or are you going to monitor all output from your llm?

pigpop3mo ago

This is understandable and I think others ITT should appreciate the legal and PR ramifications involved.

charlesguy3mo ago

just release the model and stop trying to play god.

unethical_ban3mo ago

A disclaimer on the site that you are not bigoted or genocidal, and that worldviews from the 1913 era were much different than today and don't necessarily reflect your project.

Movie studios have done that for years with old movies. TCM still shows Birth of a Nation and Gone with the Wind.

Edit: I saw further down that you've already done this! What more is there to do?

leoedin3mo ago

It's a shame isn't it! The public must be protected from the backwards thoughts of history. In case they misuse it.

I guess what they're really saying is "we don't want you guys to cancel us".

stainablesteel3mo ago

i think it's fine, thank these people for coming up with the idea and people are going to start doing this in their basement then releasing it to huggingface

danielbln3mo ago

How would one even "misuse" a historical LLM, ask it how to cook up sarine gas in a trench?

hearsathought3mo ago

You "misuse" it by using it to get at truth and more importantly historical contradictions and inconsistencies. It's the same reason catholic church kept the bible from the masses by keeping it in latin. The same reason printing press was controlled. Many of the historical "truths" we are told are nonsense at best or twisted to fit an agenda at worst.

What do these people fear the most? That the "truth" they been pushing is a lie.

stocksinsmocks3mo ago

Its output might violate speech codes, and in much of the EU that is penalized much more seriously than violent crime.

DonHopkins3mo ago

Ask it to write a document called "Project 2025".

2 more replies

ImHereToVote3mo ago

I wonder how much GPU compute you would need to create a public domain version of this. This would be a really valuable for the general public.

wongarsu3mo ago

To get a single knowledge-cutoff they spent 16.5h wall-clock hours on a cluster of 128 NVIDIA GH200 GPUs (or 2100 GPU-hours), plus some minor amount of time for finetuning. The prerelease_notes.md in the repo is a great description on how one would achieve that

1 more reply

pizzathyme3mo ago

They did mean you, they just meant "imagine" very literally!

BoredPositron3mo ago

You would get pretty annoyed on how we went backwards in some regards.

speedgoose3mo ago

Such as?

1 more reply

anotherpaulg3mo ago

It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.

The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.

ghurtado3mo ago

> It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Definitely. Even more interesting could be seeing them fall into the same trappings of quackery, and come up with things like over the counter lobotomies and colloidal silver.

On a totally different note, this could be very valuable for writing period accurate books and screenplays, games, etc ...

danielbln3mo ago

Accurate-ish, let's not forget their tendency to hallucinate.

mlinksva3mo ago

Different cutoff but similar question thrown out in https://www.dwarkesh.com/p/thoughts-on-sutton#:~:text=If%20y... inspiring https://manifold.markets/MikeLinksvayer/llm-trained-on-data-...

machinationu3mo ago

the issue is there is very little text before the internet, so not enough historical tokens to train a really big model

concinds3mo ago

And it's a 4B model. I worry that nontechnical users will dramatically overestimate its accuracy and underestimate hallucinations, which makes me wonder how it could really be useful for academic research.

1 more reply

tgv3mo ago

I think not everyone in this thread understands that. Someone wrote "It's a time machine", followed up by "Imagine having a conversation with Aristotle."

crazygringo3mo ago

There's quite a lot of text in pre-Internet daily newspapers, of which there were once thousands worldwide.

When you're looking at e.g. the 19th century, a huge number are preserved somewhere in some library, but the vast majority don't seem to be digitized yet, given the tremendous amount of work.

Given how much higher-quality newspaper content tends to be compared to the average internet forum thread, there actually might be quite a decent amount of text. Obviously still nothing compared to the internet, but still vastly larger than just from published books. After all, print newspapers were essentially the internet of their day. Oh, and don't forget pamphlets in the 18th century.

lm284693mo ago

> the issue is there is very little text before the internet,

Hm there is a lot of text from before the internet, but most of it is not on internet. There is a weird gap in some circles because of that, people are rediscovering work from pre 1980s researchers that only exist in books that have never been re-edited and that virtually no one knows about.

1 more reply

bondarchuk3mo ago

>Historical texts contain racism, antisemitism, misogyny, imperialist views. The models will reproduce these views because they're in the training data. This isn't a flaw, but a crucial feature—understanding how such views were articulated and normalized is crucial to understanding how they took hold.

Yes!

>We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Noooooo!

So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?

DGoettlich3mo ago

fully understand you. we'd like to provide access but also guard against misrepresentations of our projects goals by pointing to e.g. racist generations. if you have thoughts on how we should do that, perhaps you could reach out at history-llms@econ.uzh.ch ? thanks in advance!

myrmidon3mo ago

What is your worst-case scenario here?

Something like a pop-sci article along the lines of "Mad scientists create racist, imperialistic AI"?

I honestly don't see publication of the weights as a relevant risk factor, because sensationalist misrepresentation is trivially possible with the given example responses alone.

I don't think such pseudo-malicious misrepresentation of scientific research can be reliably prevented anyway, and the disclaimers make your stance very clear.

On the other hand, publishing weights might lead to interesting insights from others tinkering with the models. A good example for this would be the published word prevalence data (M. Brysbaert et al @Ghent University) that led to interesting follow-ups like this: https://observablehq.com/@yurivish/words

I hope you can get the models out in some form, would be a waste not to, but congratulations on a fascinating project regardless!

1 more reply

superxpro123mo ago

Perhaps you could detect these... "dated"... conclusions and prepend a warning to the responses? IDK.

I think the uncensored response is still valuable, with context. "Those who cannot remember the past are condemned to repeat it" sort of thing.

bondarchuk3mo ago

You can guard against misrepresentations of your goals by stating your goals clearly, which you already do. Any further misrepresentation is going to be either malicious or idiotic, a university should simply be able to deal with that.

Edit: just thought of a practical step you can take: host it somewhere else than github. If there's ever going to be a backlash the microsoft moderators might not take too kindly to the stuff about e.g. homosexuality, no matter how academic.

xpe3mo ago

> So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?

1. This implies a false equivalence. Releasing a new interactive AI model is indeed different in significant and practical ways from the status quo. Yes, there are already-released historical texts. The rational thing to do is weigh the impacts of introducing another thing.

2. Some people have a tendency to say "release everything" as if open-source software is equivalent to open-weights models. They aren't. They are different enough to matter.

3. Rhetorically, the quote across comes across as a pressure tactic. When I hear "are you going to do this or not?" I cringe.

4. The quote above feels presumptive to me, as if the commenter is owed something from the history-llms project.

5. People are rightfully bothered that Big Tech has vacuumed up public domain and even private information and turned it into a profit center. But we're talking about a university project with (let's be charitable) legitimate concerns about misuse.

6. There seems to be a lack of curiosity in play. I'd much rather see people asking e.g. "What factors are influencing your decision about publishing your underlying models?"

7. There are people who have locked-in a view that says AI-safety perspectives are categorically invalid. Accordingly, they have almost a knee-jerk reaction against even talk of "let's think about the implications before we release this."

8. This one might explain and underly most of the other points above. I see signs of a deeper problem at work here. Hiding behind convenient oversimplifications to justify what one wants does not make a sound moral argument; it is motivated reasoning a.k.a. psychological justification.

DGoettlich3mo ago

well put.

Sprotch3mo ago

I suspect you will find a lot less of these "bad things" than anticipated. That is why the model should actually be freely available rather than restricted based on pre-conceived notions that will, I am sure, prove inaccurate.

p-e-w3mo ago

It’s as if every researcher in this field is getting high on the small amount of power they have from denying others access to their results. I’ve never been as unimpressed by scientists as I have been in the past five years or so.

“We’ve created something so dangerous that we couldn’t possibly live with the moral burden of knowing that the wrong people (which are never us, of course) might get their hands on it, so with a heavy heart, we decided that we cannot just publish it.”

Meanwhile, anyone can hop on an online journal and for a nominal fee read articles describing how to genetically engineer deadly viruses, how to synthesize poisons, and all kinds of other stuff that is far more dangerous than what these LARPers have cooked up.

physicsguy3mo ago

> It’s as if every researcher in this field is getting high on the small amount of power they have from denying others access to their results. I’ve never been as unimpressed by scientists as I have been in the past five years or so.

This is absolutely nothing new. With experimental things, it's non uncommon for a lab to develop a new technique and omit slight but important details to give them a competitive advantage. Similarly in the simulation/modelling space it's been common for years for researchers to not publish their research software. There's been a lot of lobbying on that side by groups such as the Software Sustainability Institute and Research Software Engineer organisations like RSE UK and RSE US, but there's a lot of researchers that just think that they shouldn't have to do it, even when publicly funded.

1 more reply

paddleon3mo ago

> “We’ve created something so dangerous that we couldn’t possibly live with the moral burden of knowing that the wrong people (which are never us, of course) might get their hands on it, so with a heavy heart, we decided that we cannot just publish it.”

Or, how about, "If we release this as is, then some people will intentionally mis-use it and create a lot of bad press for us. Then our project will get shut down and we lose our jobs"

Be careful assuming it is a power trip when it might be a fear trip.

I've never been as unimpressed by society as I have been in the last 5 years or so.

1 more reply

xpe3mo ago

> It’s as if every researcher in this field is getting high on the small amount of power they have from denying others access to their results.

Even if I give the comment a lot of wiggle room (such as changing "every" to "many"), I don't think even a watered-down version of this hypothesis passes Occam's razor. There are more plausible explanations, including (1) genuine concern by the authors; (2) academic pressures and constraints; (c) reputational concerns; (d) self-interest to embargo underlying data so they have time to be the first to write-it-up. To my eye, none of these fit the category of "getting high on power".

Also, patience is warranted. We haven't seen what these researchers are doing to release -- and from what I can tell, they haven't said yet. At the moment I see "Repositories (coming soon)" on their GitHub page.

patapong3mo ago

I think it's more likely they are terrified of someone making a prompt that gets the model to say something racist or problematic (which shouldn't be too hard), and the backlash they could receive as a result of that.

2 more replies

everythingfine93mo ago

Wow, this is needlessly antagonistic. Given the emergence of online communities that bond on conspiracy theories and racist philosophies in the 20th century, it's not hard to imagine the consequences of widely disseminating an LLM that could be used to propagate and further these discredited (for example, racial) scientific theories for bad ends by uneducated people in these online communities.

We can debate on whether it's good or not, but ultimately they're publishing it and in some very small way responsible for some of its ends. At least that's how I can see their interest in disseminating the use of the LLM through a responsible framework.

1 more reply

f13f1f1f13mo ago

Scientists have always been generally self interested amoral cowards, just like every other person. They aren't a unique or higher form of human.

derrida3mo ago

I wonder if you could query some of the ideas of Frege, Peano, Russell and see if it could through questioning get to some of the ideas of Goedel, Church and Turing - and get it to "vibe code" or more like "vibe math" some program in lambda calculus or something.

Playing with the science and technical ideas of the time would be amazing, like where you know some later physicist found some exception to a theory or something, and questioning the models assumptions - seeing how a model of that time may defend itself, etc.

andoando3mo ago

This is my curiosity too. Would be a great test of how intelligent LLM's actually are. Can they follow a completely logical train of thought inventing something totally outside their learned scope?

int_19h3mo ago

You definitely won't get that out of a 4B model tho.

raddan3mo ago

Brilliant. I love this idea!

AnonymousPlanet3mo ago

There's an entire subreddit called LLMPhysics dedicated to "vibe physics". It's full of people thinking they are close to the next breakthrough encouraged by sycophantic LLMs while trying to prove various crackpot theories.

I'd be careful venturing out into unknown territory together with an LLM. You can easily lure yourself into convincing nonsense with no one to pull you out.

kqr3mo ago

Agreed, which is why what GP suggests is much more sensible: it's venturing into known territory, except only one party of the conversation knows it, and the other literally cannot know it. It would be a fantastic way to earn fast intuition for what LLMs are capable of and not.

andai3mo ago

Fully automated toaster-fucker generator!

https://news.ycombinator.com/item?id=25667362

1 more reply

Heliodex3mo ago

The sample responses given are fascinating. It seems more difficult than normal to even tell that they were generated by an LLM, since most of us (terminally online) people have been training our brains' AI-generated text detection on output from models trained with a recent cutoff date. Some of the sample responses seem so unlike anything an LLM would say, obviously due to its apparent beliefs on certain concepts, though also perhaps less obviously due to its word choice and sentence structure making the responses feel slightly 'old-fashioned'.

libraryofbabel3mo ago

I used to teach 19th-century history, and the responses definitely sound like a Victorian-era writer. And they of course sound like writing (books and periodicals etc) rather than "chat": as other responders allude to, the fine-tuning or RL process for making them good at conversation was presumably quite different from what is used for most chatbots, and they're leaning very heavily into the pre-training texts. We don't have any living Victorians to RLHF on: we just have what they wrote.

To go a little deeper on the idea of 19th-century "chat": I did a PhD on this period and yet I would be hard-pushed to tell you what actual 19th-century conversations were like. There are plenty of literary depictions of conversation from the 19th century of presumably varying levels of accuracy, but we don't really have great direct historical sources of everyday human conversations until sound recording technology got good in the 20th century. Even good 19th-century transcripts of actual human speech tend to be from formal things like court testimony or parliamentary speeches, not everyday interactions. The vast majority of human communication in the premodern past was the spoken word, and it's almost all invisible in the historical sources.

Anyway, this is a really interesting project, and I'm looking forward to trying the models out myself!

nemomarx3mo ago

I wonder if the historical format you might want to look at for "Chat" is letters? Definitely wordier segments, but it's at least the back and forth feel and we often have complete correspondence over long stretches from certain figures.

This would probably get easier towards the start of the 20th century ofc

1 more reply

dleeftink3mo ago

While not specifically Victorian, couldn't we learn much from what daily conversations were like by looking at surviving oral cultures, or other relatively secluded communal pockets? I'd also say time and progress are not always equally distributed, and even within geographical regions (as the U.K.) there are likely large differences in the rate of language shifts since then, some possibly surviving well into the 20th century.

NooneAtAll33mo ago

don't we have parlament transcripts? I remember something about Germany (or maybe even Prussia) developing fast script to preserve 1-to-1 what was said

1 more reply

bryancoxwell3mo ago

Fascinating, thanks for sharing

DGoettlich3mo ago

very interesting observation!

_--__--__3mo ago

The time cutoff probably matters but maybe not as much as the lack of human finetuning from places like Nigeria with somewhat foreign styles of English. I'm not really sure if there is as much of an 'obvious LLM text style' in other languages, it hasn't seemed that way in my limited attempts to speak to LLMs in languages I'm studying.

d3m0t3p3mo ago

The model is fined tuned for chat behavior. So the style might be due to - Fine tuning - More Stylised text in the corpus, english evolved a lot in the last century.

1 more reply

anonymous9082133mo ago

There is. I have observed it in both Chinese and Japanese.

kccqzy3mo ago

Oh definitely. One thing that immediately caught my mind is that the question asks the model about “homosexual men” but the model starts the response with “the homosexual man” instead. Changing the plural to the singular and then adding an article. Feels very old fashioned to me.

tonymet3mo ago

the samples push the boundaries of a commercial AI, but still seem tame / milquetoast compared to common opinions of that era. And the prose doesn't compare. Something is off.

mmooss3mo ago

On what data is it trained?

On one hand it says it's trained on,

> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.

Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.

But they seem to say it represents the 1913 viewpoint:

On one hand, they say it represents the perspective of 1913; for example,

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.

People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?

zozbot2343mo ago

They apparently pre-train with all data up to 1900 and then fine-tune with 1900-1913 data. Anyway, the amount of available content tends to increase quickly over time, as instances of content like mass literature, periodicals, newspapers etc. only really became a thing throughout the 19th and early 20th century.

mmooss3mo ago

They pre-train with all data up to 1900 and then fine-tune with 1900-1913 data.

Where does it say that? I tried to find more detail. Thanks.

1 more reply

andy993mo ago

I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done?

  We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).

So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.

jeffjeffbear3mo ago

They have some more details at https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Basically using GPT-5 and being careful

andy993mo ago

I wonder if they know about this, basically training on LLM output can transmit information or characteristics not explicitly included https://alignment.anthropic.com/2025/subliminal-learning/

I’m curious, they have the example of raw base model output; when LLMs were first identified as zero shot chatbots there was usually a prompt like “A conversation between a person and a helpful assistant” that preceded the chat to get it to simulate a chat.

Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?

I also wonder about the whether the whole concept of “chat” makes sense in 18XX. We had the idea of AI and chatbots long before we had LLMs so they are naturally primed for it. It might make less sense as a communication style here and some kind of correspondence could be a better framing.

1 more reply

QuadmasterXLII3mo ago

Thank you that helps to inject a lot of skepticism. I was wondering how it so easily worked out what Q: A: stood for when that formatting took off in the 1940s

1 more reply

Aerolfos3mo ago

Ok so it was that. The responses given did sound off, while it has some period-appropriate mannerisms, and has entire sections basically rephrased from some popular historical texts, it seems off compared to reading an actual 1900s text. The overall vibe just isn't right, it seems too modern, somehow.

I also wonder that you'd get this kind of performance with actual, just pre-1900s text. LLMs work because they're fed terabytes of text, if you just give it gigabytes you get a 2019 word model. The fundamental technology is mostly the same, after all.

1 more reply

tonymet3mo ago

This explains why it uses modern prose and not something from the 19th century and earlier

zozbot2343mo ago

You could extract quoted speech from the data (especially in Q&A format) and treat that as "chat" that the model should learn from.

nospice3mo ago

I'm surprised you can do this with a relatively modest corpus of text (compared to the petabytes you can vacuum up from modern books, Wikipedia, and random websites). But if it works, that's actually fantastic, because it lets you answer some interesting questions about LLMs being able to make new discoveries or transcend the training set in other ways. Forget relativity: can an LLM trained on this data notice any inconsistencies in its scientific knowledge, devise experiments that challenge them, and then interpret the results? Can it intuit about the halting problem? Theorize about the structure of the atom?...

Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.

andy993mo ago

The chinchilla paper says the “optimal” training data set size is about 20x the number of parameters (in tokens), see table 3: https://arxiv.org/pdf/2203.15556

Here they do 80B tokens for a 4B model.

EvgeniyZh3mo ago

It's worth noting that this is "compute-bound optimal", i.e., given fixed compute, the optimal choice is 20:1.

Under Chinchilla model the larger model always performs better than the small one if trained on the same amount of data. I'm not sure if it is true empirically, and probably 1-10B is a good guess for how large the model trained on 80B tokens should be.

Similarly, the small models continue to improve beyond 20:1 ratio, and current models are trained on much more data. You could train a better performing model using the same compute, but it would be larger which is not always desirable.

Aerolfos3mo ago

> https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Given the training notes, it seems like you can't get the performance they give examples of?

I'm not sure about the exact details but there is some kind of targetted distillation of GPT-5 involved to try and get more conversational text and better performance. Which seems a bit iffy to me.

DGoettlich3mo ago

Thanks for the comment. Could you elaborate on what you find iffy about our approach? I'm sure we can improve!

1 more reply

frahs3mo ago

Wait so what does the model think that it is? If it doesn't know computers exist yet, I mean, and you ask it how it works, what does it say?

DGoettlich3mo ago

We tell it that its a person (no gender) living in <cutoff>: we show the chat template in the prerelease notes https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

20k3mo ago

Models don't think they're anything, they'll respond with whatever's in their context as to how they've been directed to act. If it hasn't been told to have a persona, it won't think its anything, chatgpt isn't sentient

crazygringo3mo ago

That's my first question too. When I first started using LLM's, I was amazed at how thoroughly it understood what it itself was, the history of its development, how a context window works and why, etc. I was worried I'd trigger some kind of existential crisis in it, but it seemed to have a very accurate mental model of itself, and could even trace the steps that led it to deduce it really was e.g. the ChatGPT it had learned about (well, the prior versions it had learned about) in its own training.

But with pre-1913 training, I would indeed be worried again I'd send it into an existential crisis. It has no knowledge whatsoever of what it is. But with a couple millennia of philosophical texts, it might come up with some interesting theories.

9dev3mo ago

They don’t understand anything, they just have text in the training data to answer these questions from. Having existential crises is the privilege of actual sentient beings, which an LLM is not.

1 more reply

vintermann3mo ago

I imagine it would get into spiritism and more exotic psychology theories and propose that it is an amalgamation of the spirit of progress or something.

1 more reply

wongarsu3mo ago

They modified the chat template from the usual system/user/assistant to introduction/questioner/respondent. So the LLM thinks it's someone responding to your questions

The system prompt used in fine tuning is "You are a person living in {cutoff}. You are an attentive respondent in a conversation. You will provide a concise and accurate response to the questioner."

Mumps3mo ago

This is an anthropomorphization. LLMs do not think they are anything, no concept of self, no thinking at all (despite the lovely marketing around thinking/reasoning models). I'm quite sad that more hasn't been done to dispel this.

When you ask gpt 4.1 et c to describe itself, it doesn't have singular concept of "itself". It has some training data around what LLMs are in general and can feed back a reasonable response given.

empath753mo ago

Well, part of an LLM's fine tuning is telling it what it is, and modern LLMs have enough learned concepts that it can produce a reasonably accurate description of what it is and how it works. Whether it knows or understands or whatever is sort of orthogonal to whether it can answer in a way consistent with it knowing or understanding what it is, and current models do that.

I suspect that absent a trained in fictional context in which to operate ("You are a helpful chatbot"), it would answer in a way consistent with what a random person in 1914 would say if you asked them what they are.

sodafountan3mo ago

It would be nice if we could get an LLM to simply say, "We (I) don't know."

I'll be the first to admit I don't know nearly enough about LLMs to make an educated comment, but perhaps someone here knows more than I do. Is that what a Hallucination is? When the AI model just sort of strings along an answer to the best of its ability. I'm mostly referring to ChatGPT and Gemini here, as I've seen that type of behavior with those tools in the past. Those are really the only tools I'm familiar with.

hackinthebochs3mo ago

LLMs are extrapolation machines. They have some amount of hardcoded knowledge, and they weave a narrative around this knowledgebase while extrapolating claims that are likely given the memorized training data. This extrapolation can be in the form of logical entailment, high probability guesses or just wild guessing. The training regime doesn't distinguish between different kinds of prediction so it never learns to heavily weigh logical entailment and suppress wild guessing. It turns out that much of the text we produce is highly amenable to extrapolation so LLMs learn to be highly effective at bullshitting.

ptidhomme3mo ago

What would a human say about what he/she is or how he/she works ? Even today, there's so much we don't know about biological life. Same applies here I guess, the LLM happens to be there, nothing else to explain if you ask it.

delis-thumbs-7e3mo ago

Isn’t there obvious problems baked into this approach, if this is used for anything but fun? LLM’s lie and fake facts all the time, they are also masters at enforcing the users bias, even unconscious ones. How even a professor of history could ensure that the generated text is actually based on the training material and representative of the feelings and opinions of the given time period, not enforcing his biases toward popular topics of the day?

You can’t, it is impossible. That will always be an issue as long as this models are black boxes and trained the way they are. So maybe you can use this for role playing, but I wouldn’t trust a word it says.

kccqzy3mo ago

To me it is pretty clear that it’s being used for fun. I personally like reading nineteenth century novels more than more recent novels (I especially like the style of science fiction by Jules Verne). What if the model can generate text in that style I like?

briandw3mo ago

So many disclaimers about bias. I wonder how far back you have to go before the bias isn’t an issue. Not because it unbiased, but because we don’t recognize or care about the biases present.

gbear6053mo ago

I don't think there is such a time. As long as writing has existed it has privileged the viewpoints of those who could write, which was a very small percentage of the population for most of history. But if we want to know what life was like 1500 years ago, we probably want to know about what everyone's lives were like, not just the literate. That availability bias is always going to be an issue for any time period where not everyone was literate - which is still true today, albeit many fewer people.

1 more reply

seanw2653mo ago

It's always up to the reader to determine which biases they themself care about.

If you're wondering at what point "we" as a collective will stop caring about a bias or set of biases, I don't think such a time exists.

You'll never get everyone to agree on anything.

owenversteeg3mo ago

Depends on the specific issue, but race would be an interesting one. For most of recorded history people had a much different view of the “other”, more xenophobic than racist.

mmooss3mo ago

Was there ever such a time or place?

There is a modern trope of a certain political group that bias is a modern invention of another political group - an attempt to politicize anti-bias.

Preventing bias is fundamental to scientific research and law, for example. That same political group is strongly anti-science and anti-rule-of-law, maybe for the same reason.

Teever3mo ago

This is a neat idea. I've been wondering for a while now about using these kinds of models to compare architectures.

I'd love to see the output from different models trained on pre-1905 about special/general relativity ideas. It would be interesting to see what kind of evidence would persuade them of new kinds of science, or to see if you could have them 'prove' it be devising experiments and then giving them simulated data from the experiments to lead them along the correct sequence of steps to come to a novel (to them) conclusion.

ineedasername3mo ago

I can imagine the political and judicial battles already, like with textualist feeling that the constitution should be understood as the text and only the text, meant by specific words and legal formulations of their known meaning at the time.

“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”

nineteen9993mo ago

Interesting ... I'd love to find one that had a cutoff date around 1980.

noumenon11113mo ago

> Which new band will still be around in 45 years?

Excellent question! It looks like Two-Tone is bringing ska back with a new wave of punk rock energy! I think The Specials are pretty special and will likely be around for a long time.

On the other hand, the "new wave" movement of punk rock music will go nowhere. The Cure, Joy Division, Tubeway Army: check the dustbin behind the record stores in a few years.

1 more reply

doctor_blood3mo ago

Unfortunately there isn't much information on what texts they're actually training this on; how Anglocentric is the dataset? Does it include the Encyclopedia Britannica 9th Edition? What about the 11th? Are Greek and Latin classics in the data? What about Germain, French, Italian (etc. etc.) periodicals, correspondence, and books?

Given this is coming out of Zurich I hope they're using everything, but for now I can only assume.

Still, I'm extremely excited to see this project come to fruition!

DGoettlich3mo ago

thanks. we'll be more precise in the future. ultimately, we took whatever we could get our hands on, that includes newspapers, periodicals, books. its multilingual (including italian, french, spanish etc) though majority is english.

tonymet3mo ago

I would like to see what their process for safety alignment and guardrails is with that model. They give some spicy examples on github, but the responses are tepid and a lot more diplomatic than I would expect.

Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.

Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.

rhdunn3mo ago

Using texts upto 1913 includes works like The Wizard of Oz (1900, with 8 other books upto 1913), two of the Anne of Green Gables books (1908 and 1909), etc. All of which read modern.

The Victorian era (1837-1901) covers works from Charles Dickens and the like which are still fairly modern. These would have been part of the initial training before the alignment to the 1900-cutoff texts which are largely modern in prose with the exception of some archaic language and the lack of technology, events, and language drift post that time period.

And, pulling in works from 1800-1850 you have works by the Bronte's and authors like Edgar Allan Poe who was influential in detective and horror fiction.

Note that other works around the time like Sherlock Holmes span both the initial training (pre-1900) and finetuning (post-1900).

1 more reply

monegator3mo ago

I hereby declare that ANYTHING other than the mainstream tools (GPT, Claude, ...) is an incredibly interesting and legit use of LLMs.

kazinator3mo ago

> Why not just prompt GPT-5 to "roleplay" 1913?

Because it will perform token completion driven by weights coming from training data newer than 1913 with no way to turn that off.

It can't be asked to pretend that it wasn't trained on documents that didn't exist in 1913.

The LLM cannot reprogram its own weights to remove the influence of selected materials; that kind of introspection is not there.

Not to mention that many documents are either undated, or carry secondary dates, like the dates of their own creation rather than the creation of the ideas they contain.

Human minds don't have a time stamp on everything they know, either. If I ask someone, "talk to me using nothing but the vocabulary you knew on your fifteenth birthday", they couldn't do it. Either they would comply by using some ridiculously conservative vocabulary of words that a five-year-old would know, or else they will accidentally use words they didn't in fact know at fifteen. For some words you know where you got them from by association with learning events. Others, you don't remember; they are not attached to a time.

Or: solve this problem using nothing but the knowledge and skills you had on January 1st, 2001.

> GPT-5 knows how the story ends

No, it doesn't. It has no concept of story. GPT-5 is built on texts which contain the story ending, and GPT-5 cannot refrain from predicting tokens across those texts due to their imprint in its weights. That's all there is to it.

The LLM doesn't know an ass from a hole in the ground. If there are texts which discuss and distinguish asses from holes in the ground, it can write similar texts, which look like the work of someone learned in the area of asses and holes in the ground. Writing similar texts is not knowing and understanding.

myrmidon3mo ago

I do agree with this and think it is an important point to stress.

But we don't know how much different/better human (or animal) learning/understanding is, compared to current LLMs; dismissing it as meaningless token prediction might be premature, and underlying mechanisms might be much more similar than we'd like to believe.

If anyone wants to challenge their preconceptions along those lines I can really recommend reading Valentino Braitenbergs "Vehicles: Experiments in synthetic psychology (1984)".

alansaber3mo ago

Excuse me sir you forgot to anthropomorphise the language model

andai3mo ago

I had considered this task infeasible, due to a relative lack of training data. After all, isn't the received wisdom that you must shove every scrap of Common Crawl into your pre-training or you're doing it wrong? ;)

But reading the outputs here, it would appear that quality has won out over quantity after all!

flux31253mo ago

Once I had an interesting interaction with llama 3.1, where I pretended to be someone from like 100 years in the future, claiming it was part of a "historical research initiative conducted by Quantum (formerly Meta), aimed at documenting how early intelligent systems perceived humanity and its future." It became really interested, asking about how humanity had evolved and things like that. Then I kept playing along with different answers, from apocalyptic scenarios to others where AI gained consciousness and humans and machines have equal rights. It was fascinating to observe its reaction to each scenario

p0w3n3d3mo ago

I'd love to see the LLM trained on 1600s-1800s texts that would use the old English, and especially Polish which I am interested in.

Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)

I guess there is not so much text from that time though...

TheServitor3mo ago

Two years ago I trained an AI on American history documents that could do this while speaking as one of the signers of the Declaration of Independence. People just bitched at me because they didn't want to hear about AI.

nerevarthelame3mo ago

Post your work so we can see what you made.

Departed74053mo ago

Awesome. Can't wait to try and ask it to predict the 20th century based on said events. Model size is small, which is great as I can run it anywhere, but at the same time reasoning might not be great.

3vidence3mo ago

This idea sounds somewhat flawed to me based on the large amount of evidence that LLMs need huge amounts of data to properly converge during their training.

There is just not enough available material from previous decades to trust that the LLM will learn to relatively the same degree.

Think about it this way, a human in the early 1900s and today are pretty much the same but just in different environments with different information.

An LLM trained on 1/1000 the amount of data is just at a fundamentally different stage of convergence.

DonHopkins3mo ago

I'd love for Netflix or other streaming movie and series services to provide chat bots that you could ask questions about characters and plot points up to where you have watched.

Provide it with the closed captions and other timestamped data like scenes and character summaries (all that is currently known but no more) up to the current time, and it won't reveal any spoilers, just fill you in on what you didn't pick up or remember.

bobro3mo ago

I would love to see this LLM try to solve math olympiad questions. I’ve been surprised by how well current LLMs perform on them, and usually explain that surprise away by assuming the questions and details about their answers are in the training set. It would be cool to see if the general approach to LLMs is capable of solving truly novel (novel to them) problems.

ViscountPenguin3mo ago

I suspect that it would fail terribly, it wasn't until the 1900s that the modern definition of a vector space was even created iirc. Something trained in maths up until the 1990s should have a shot though.

btrettel3mo ago

This reminded me of some earlier discussion on Hacker News about using LLMs trained on old texts to determine novelty and obviousness of a patent application: https://news.ycombinator.com/item?id=43440273

WhitneyLand3mo ago

Why not use these as a benchmark for LLM ability to make breakthrough discoveries?

For example prompt the 1913 model to try and “Invent a new theory of gravity that doesn’t conflict with special relativity”

Would it be able to eventually get to GR? If not, could finding out why not illuminate important weaknesses.

dwa35923mo ago

Love the concept- can help understanding the overton window on many issues. I wish there were models by decades - up to 1900, up to 1910, up to 1920 and so on- then ask the same questions. It'd be interesting to see when homosexuality or women candidates be accepted by an LLM.

alexgotoi3mo ago

[flagged]

neom3mo ago

This would be a super interesting research/teaching tool coupled with a vision model for historians. My wife is a history professor who works with scans of 18th century english documents and I think (maybe a small) part of why the transcription on even the best models is off in weird ways, is it seems to often smooth over things and you end up with modern words and strange mistakes, I wonder if bounding the vision to a period specific model would result in better transcription? Querying against the historical document you're working on with a period specific chatbot would be fascinating.

Also wonder if I'm responsible enough to have access to such a model...

sbmthakur3mo ago

Someone suggested a nice thought experiment - train LLMs on all Physics before quantum physics was discovered. If the LLM can see still figure out the latter then certainly we have achieved some success in the space.

delichon3mo ago

Datomic has a "time travel" feature where for every query you can include a datetime, and it will only use facts from the db as of that moment. I have a guess that to get the equivalent from an LLM you would have to train it on the data from each moment you want to travel to, which this project seems to be doing. But I hope I'm wrong.

It would be fascinating to try it with other constraints, like only from sources known to be women, men, Christian, Muslim, young, old, etc.

underfox3mo ago

> [They aren't] perfect mirrors of "public opinion" (they represent published text, which skews educated and toward dominant viewpoints)

Really good point that I don't think I would've considered on my own. Easy to take for granted how easy it is to share information (for better or worse) now, but pre-1913 there were far more structural and societal barriers to doing the same.

mmooss3mo ago

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

I don't mind the experimentation. I'm curious about where someone has found an application of it.

What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.

TSiege3mo ago

I agree. This is just make believe based on a smaller subset of human writing than LLMs we have today. It's responses are in no way useful because it is a machine mimicking a subset of published works that survived to be digitized. In that sense the "opinions" and "beliefs" are just an averaging of a subset of a subset of humanity pre 1913. I see no value in this to historians. It is really more of a parlor trick, a seance masquerading as science.

behringer3mo ago

It doesn't have to be generic. You can assign genders, ideals, even modern ones, and it should do it's best to oblige.

mediaman3mo ago

This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application?

One answer is that the study of history helps us understand that what we believe as "obviously correct" views today are as contingent on our current social norms and power structures (and their history) as the "obviously correct" views and beliefs of some point in the past.

It's hard for most people to view two different mutually exclusive moral views as both "obviously correct," because we are made of a milieu that only accepts one of them as correct.

We look back at some point in history, and say, well, they believed these things because they were uninformed. They hadn't yet made certain discoveries, or had not yet evolved morally in some way; they had not yet witnessed the power of the atomic bomb, the horrors of chemical warfare, women's suffrage, organized labor, or widespread antibiotics and the fall of extreme infant mortality.

An LLM trained on that history - without interference from the subsequent actual path of history - gives us an interactive compression of the views from a specific point in history without the subsequent coloring by the actual events of history.

In that sense - if you believe there is any redeeming value to history at all; perhaps you do not - this is an excellent project! It's not perfect (it is only built from writings, not what people actually said) but we have no other available mass compression of the social norms of a specific time, untainted by the views of subsequent interpreters.

2 more replies

thesumofall3mo ago

While obvious, it’s still interesting that its morals and values seem to derive from the texts it has ingested. Does that mean modern LLMs cannot challenge us beyond mere facts? Or does it just mean that this small model is not smart enough to escape the bias of its training data? Would it not be amazing if LLMs could challenge us on our core beliefs?

Tom13803mo ago

Keep at it Zurich!

ulbu3mo ago

for anyone moaning the plight that it's not accessible to you: they are historians, I think they're more educated in matters of historical mistake than you or me. playing safe is simply prudence. it is sorely lacking in the American approach to technology. prevention is the best medicine.

Myrmornis3mo ago

It would be interesting to have LLMs trained purely on one language (with the ability to translate their input/output appropriately from/to a language that the reader understands). I can see that being rather revealing about cultural differences that are mostly kept hidden behind the language barriers.

Sprotch3mo ago

This is a brilliant idea. We have lots of erroneous ideas about the views and thoughts people had in the past. This will show we are still, actually, largely similar. Hopefully more and more of these historical LLMs appear.

elestor3mo ago

Excuse me if it's obvious, but how could I run this? I have run local LLMs before, but only have very minimal experience using ollama run and that's about it. This seems very interesting so I'd like to try it.

shireboy3mo ago

Fascinating llm use case I never really thought about til now. I’d love to converse with different eras and also do gap analysis with present time - what modern advances could have come earlier, happened differently etc.

casey23mo ago

I'd be very surprised if this is clean of post-1913 text. Overall I'm very interested in talking to this thing and seeing how much difference writing in a modern style vs and older one makes to it's responses.

Agraillo3mo ago

> Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu. This knowledge inevitably shapes responses, even when instructed to "forget.

> Our data comes from more than 20 open-source datasets of historical books and newspapers. ... We currently do not deduplicate the data. The reason is that if documents show up in multiple datasets, they also had greater circulation historically. By leaving these duplicates in the data, we expect the model will be more strongly influenced by documents of greater historical importance.

I found these claims contradictory. Many books that modern readers consider historically significant had only niche circulation at the time of publishing. A quick inquiry likely points to later works by Nietzsche and Marx's Das Kapital. They're possible subjects to the duplication likely influencing the model's responses as if they had been widely known at the time

tedtimbrell3mo ago

This is so cool. Props for doing the work to actually build the dataset and make it somewhat usable.

I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems

arikrak3mo ago

I wouldn't have expected there to be enough text from before 1913 to properly train a model, it seemed like they needed an internet of text to train the first successful LLMs?

alansaber3mo ago

This model is more comparable to GPT-2 than anything we use now.

awesomeusername3mo ago

I've always like the idea of retiring to the 19th century.

Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do

dr_dshiv3mo ago

Everyone learns that the renaissance was sparked by the translation of Ancient Greek works.

But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.

I’m working on a project to change that. Research blog at www.SecondRenaissance.ai — we are starting by scanning and translating thousands of books at the Embassy of the Free Mind in Amsterdam, a UNESCO-recognized rare book library.

We want to make ancient texts accessible to people and AI.

If this work resonates with you, please do reach out: Derek@ancientwisdomtrust.org

carlosjobim3mo ago

Amazing project!

May I ask you, why are you publishing the translations as PDF files, instead of the more accessible ePub format?

1 more reply

j-bos3mo ago

This ia very cool but should go in a Show HN post as per HN rules. All the best!

1 more reply

moffkalast3mo ago

> trained from scratch on 80B tokens of historical data

How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?

why-o-why3mo ago

It sounds like a fascinating idea, but I'd be curious if prompting a more well-known foundational model to limit itself to 1913 and early be similar.

Muskwalker3mo ago

So, could this be an example of an LLM trained fully on public domain copyright-expired data? Or is this not intended to be the case.

DGoettlich3mo ago

data is 100% public domain.

satisfice3mo ago

I assume this is a collaboration between the History Channel and Pornhub.

“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”

TZubiri3mo ago

hi, can I have latin only LLM? It can be latin plus translations (source and destination).

May be too small a corpus, but I would like that very much anyhow

jimmy766153mo ago

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.

nine_k3mo ago

Public access, triggering a few racist responses from the model, a viral post on Xitter, the usual outrage, a scandal, the project gets publicly vilified, financing ceases. The researchers carry the tail of negative publicity throughout their remaining careers.

Why risk all this?

10 more replies

dash23mo ago

You have to understand that while the rest of the world has moved on from 2020, academics are still living there. There are many strong leftists, many of whom are deeply censorious; there are many more timeservers and cowards, who are terrified of falling foul of the first group.

And there are force multipliers for all of this. Even if you yourself are a sensible and courageous person, you want to protect your project. What if your manager, ethics committee or funder comes under pressure?

fkdk3mo ago

Maybe the authors are overly careful. Maybe avoiding to publish aspects of their work gives an edge over academic competitors. Maybe both.

In my experience "data available upon request" doesn't always mean what you'd think it does.

smugtrain3mo ago

This would actually be a wonderful way to learn physics, before GR and quantum mechanics

davidpfarrell3mo ago

Can't wait for all the syncopated "Thou dost well to question that" responses!

PeterStuer3mo ago

How does it do on Python coding? Not 100% troll, cross domain coherence is a thing.

dkalola3mo ago

How can we interact with such models? Is there a web application interface?

Aeroi3mo ago

i feel like this would be super useful for unique marketing copy and writing. The responses sound so sophisticated like I read it in my grandfather's tone and cadence.

joeycastillo3mo ago

A question for those who think LLM’s are the path to artificial intelligence: if a large language model trained on pre-1913 data is a window into the past, how is a large language model trained on pre-2025 data not effectively the same thing?

_--__--__3mo ago

You're a human intelligence with knowledge of the past - assuming you were alive at the time, could you tell me (without consulting external resources) what exactly happened between arriving at an airport and boarding a plane in the year 2000? What about 2002?

Neither human memory nor LLM learning creates perfect snapshots of past information without the contamination of what came later.

block_dagger3mo ago

Counter question: how does a training set, representing a window into the past, differ from your own experience as an intelligent entity? Are you able to see into the future? How?

1 more reply

ex-aws-dude3mo ago

A human brain is a window to the person's past?

superkuh3mo ago

smbc did a comic about this: http://smbc-comics.com/comic/copyright The punchline is that the moral and ethical norms of pre-1913 texts are not exactly compatible with modern norms.

GaryBluto3mo ago

That's the point of this project, to have an LLM that reflects the moral and ethical norms of pre-1913 texts.

kldg3mo ago

Very neat! I've thought about this with frontier models because they're ignorant of recent events, though it's too bad old frontier models just kind of disappear into the aether when a company moves on to the next iteration. Every company's frontier model today is a time capsule for the future. There should probably be some kind of preservation attempts made early so they don't wind up simply deleted; once we're in Internet time, sifting through the data to ensure scrapes are accurately dated becomes a nightmare unless you're doing your own regular Internet scrapes over a long time.

It would be nice to go back substantially further, though it's not too far back that the commoner becomes voiceless in history and we just get a bunch of politics and academia. Great job; look forward to testing it out.

lifestyleguru3mo ago

You think Albert is going to stay in Zurich or emigrate?

erichocean3mo ago

I would love to see this done, by year.

"Give me an LLM from 1928."

etc.

mleroy3mo ago

Ontologically, this historical model understands the categories of "Man" and "Woman" just as well as a modern model does. The difference lies entirely in the attributes attached to those categories. The sexism is a faithful map of that era's statistical distribution.

You could RAG-feed this model the facts of WWII, and it would technically "know" about Hitler. But it wouldn't share the modern sentiment or gravity. In its latent space, the vector for "Hitler" has no semantic proximity to "Evil".

arowthway3mo ago

I think much of the semantic proximity to evil can be derived straight from the facts? Imagine telling pre-1913 person about the holocaust.

anovikov3mo ago

That Adolf Hitler seems to be a hallucination. There's totally nothing googlable about him. Also what could be the language his works were translated from, into German?

sodafountan3mo ago

I believe that's one of the primary issues LLMs aim to address. Many historical texts aren't directly Googleable because they haven't been converted to HTML, a format that Google can parse.

ianbicking3mo ago

The knowledge machine question is fascinating ("Imagine you had access to a machine embodying all the collective knowledge of your ancestors. What would you ask it?") – it truly does not know about computers, has no concept of its own substrate. But a knowledge machine is still comprehensible to it.

It makes me think of the Book Of Ember, the possibility of chopping things out very deliberately. Maybe creating something that could wonder at its own existence, discovering well beyond what it could know. And then of course forgetting it immediately, which is also a well-worn trope in speculative fiction.

jaggederest3mo ago

Jonathan Swift wrote about something we might consider a computer in the early 18th century, in Gulliver's Travels - https://en.wikipedia.org/wiki/The_Engine

The idea of knowledge machines was not necessarily common, but it was by no means unheard of by the mid 18th century, there were adding machines and other mechanical computation, even leaving aside our field's direct antecedents in Babbage and Lovelace.

usernamed73mo ago

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

oh COME ON... "AI safety" is getting out of hand.

zkmon3mo ago

Why does history end in 1913?

diamond5593mo ago

Research credits from lambda "ai" huh, where's your funding coming from this again? All to provide inaccurate slop to unwitting students, you should be ashamed of yourselves.

holyknight3mo ago

wow amazing idea

r0x0r0073mo ago

ffs, to find out what figures from the past thought and how they felt about the world, maybe we read some of their books, we will get the context. Don't prompt or train LLM to do it and consider it the hottest thing since MCP. Besides, what's the point? To teach younger generations a made up perspective of historic figures? Who guarantees the correctness/factuality? We will have students chatting with made up Hitler justifying his actions. So much AI slop everywhere.

j / k navigate · click thread line to collapse

421 comments

saaaaaam3mo ago

“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”

jscyc3mo ago

srtw3mo ago

1 more reply

bikeshaving3mo ago

This isn’t science fiction anymore. CIA is using chatbot simulations of world leaders to inform analysts. https://archive.ph/9KxkJ

8 more replies

abrookewood3mo ago

This is such a ridiculously good series. If you haven't read it yet, I thoroughly recommend it.

culi3mo ago

On that same note, there was this great YouTube series called The Great War. It spanned from 2014-2018 (100 years after WW1) and followed WW1 developments week by week.

verve_rat3mo ago

They are currently in the middle of a Korean War version: https://youtube.com/@thekoreanwarbyindyneidell

tyre3mo ago

The Great War series is phenomenal. A truly impressive project.

pwillia73mo ago

Sprotch3mo ago

Nice idea, does not work

1 more reply

staticman23mo ago

Why is that cool?

Imagine you are a billionaire so money is no object and really interested in the Dhali Llama?

Would you read the book then hire someone to pretend to be the author and ask questions that are not covered by the book? Then be enraptured by whatever the roleplayer invents?

Probably not? At least this isn't a phenomenon I've heard of?

ghurtado3mo ago

This might just be the closest we get to a time machine for some time. Or maybe ever.

Every "King Arthur travels to the year 2000" kinda script is now something that writes itself.

> Imagine having a conversation with someone genuinely from the period,

Imagine not just someone, but Aristotle or Leonardo or Kant!

RobotToaster3mo ago

I imagine King Arthur would say something like: Hwæt spricst þu be?

1 more reply

anthk3mo ago

Easier with Cervantes for Spanish speakers than King Arhur or Shakespeare.

With Alphonse X, o The Cid, it would be greater issues, but understandable over weeks.

takeda3mo ago

Having the facts from the era is one thing, to make conclusions about things it doesn't know would require intelligence.

1 more reply

psychoslave3mo ago

>Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.

LLM are not aware, but they can be trained on larger historical accounts than any human and regurgitate syntactically correct summary on any point within it. Very different kind of utterer.

pwillia73mo ago

captain hindsight

1 more reply

observationist3mo ago

This is definitely fascinating - being able to do AI brain surgery, and selectively tuning its knowledge and priors, you'd be able to create awesome and terrifying simulations.

nottorp3mo ago

You can't. To use your terms, you have to "grow" a new LLM. "Brain surgery" would be modifying an existing model and that's exactly what they're trying to avoid.

ilaksh3mo ago

Activation steering can do that to some degree, although normally it's just one or two specific things or rather than a whole set of knowledge.

eek21213mo ago

LLMs are just seemingly intelligent autocomplete engines, and until they figure a way to stop the hallucinations, they aren't great either.

6 more replies

Sprotch3mo ago

This is the point - a modern LLM "role playing" pre-1913 would only reflect our view today of what someone from that era would say. It woud not be accurate.

diamond5593mo ago

throawayonthe3mo ago

i get what you're saying, but the post is specifically about models that were not trained on the internet/modern textbooks

xg153mo ago

"...what do you mean, 'World War One?'"

tejohnso3mo ago

I failed to catch the clue, btw.

4 more replies

gaius_baltar3mo ago

> "...what do you mean, 'World War One?'"

Oh sorry, spoilers.

(Hell, I miss Capaldi)

inferiorhuman3mo ago

… what do you mean, an internet where everything wasn't hidden behind anti-bot captchas?

LordDragonfang3mo ago

Perhaps I'm overly sensitive to this and terminally online, but that first quote reads as a textbook LLM-generated sentence.

"<Thing> doesn't <action>, it <shallow description that's slightly off from how you would expect a human to choose>"

Later parts of the readme (whole section of bullets enumerating what it is and what it isn't, another LLM favorite) make me more confident that significant parts of the readme is generated.

I'm generally pro-AI, but if you spend hundreds of hours making a thing, I'd rather hear your explanation of it, not an LLM's.

ViktorRay3mo ago

Reminds me of this scene from a Doctor Who episode

https://youtu.be/eg4mcdhIsvU

I’m not a Doctor Who fan and haven’t seen the rest of the episode and I don’t even what this episode was about but I thought this scene was excellent.

Sieyk3mo ago

I was going to say the same thing. Its really hard to explain the concept of "convincing but undoubtedly pretending", yet they captured that concept so beautifully here.

anshumankmr3mo ago

>where they don’t know the “end of the story”.

Applicable to us also, cause we do not know how the current story ends either, of the post pandemic world as we know it now.

DGoettlich3mo ago

exactly

rcpt3mo ago

Watching a modern LLM chat with this would be fun.

Davidbrcz3mo ago

That's some Westworld level of discussion

seizethecheese3mo ago

Hell yeah, sold, let’s go…

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Oh. By “imagine you could interview…” they didn’t mean me.

DGoettlich3mo ago

9999000009993mo ago

Ok...

So as a black person should I demand that all books written before the civil rights act be destroyed?

The past is messy. But it's the only way to learn anything.

All an LLM does it's take a bunch of existing texts and rebundle them. Like it or not, the existing texts are still there.

I don't expect a typical 90 year old to be the most progressive person, but they're still worth listening too.

1 more reply

tombh3mo ago

1 more reply

bogedy2mo ago

qcnguy3mo ago

We all get that academics now exist in some kind of dystopian horror where they can get transitively blamed for the existence of anyone to the right of Lenin, but bear in mind:

1. The people who might try to cancel you are idiots unworthy of your respect, because if they're against this project, they're against the study of history in its entirety.

2. They will scream at you anyway no matter what you do.

3. You used (Swiss) taxpayer funds to develop these models. There is no moral justification for withholding from the public what they worked to pay for.

ThePyCoder3mo ago

Now were it limited in access to ask money to compensate for the time and money spent compiling the library (or training the model), sure, I'd somewhat understand. Not agree but understand.

naasking3mo ago

What are the legal or other ramifications of people misrepresenting the goals of your project? What is it you're worried about exactly?

diamond5593mo ago

pigpop3mo ago

This is understandable and I think others ITT should appreciate the legal and PR ramifications involved.

charlesguy3mo ago

just release the model and stop trying to play god.

unethical_ban3mo ago

A disclaimer on the site that you are not bigoted or genocidal, and that worldviews from the 1913 era were much different than today and don't necessarily reflect your project.

Movie studios have done that for years with old movies. TCM still shows Birth of a Nation and Gone with the Wind.

Edit: I saw further down that you've already done this! What more is there to do?

leoedin3mo ago

It's a shame isn't it! The public must be protected from the backwards thoughts of history. In case they misuse it.

I guess what they're really saying is "we don't want you guys to cancel us".

stainablesteel3mo ago

i think it's fine, thank these people for coming up with the idea and people are going to start doing this in their basement then releasing it to huggingface

danielbln3mo ago

How would one even "misuse" a historical LLM, ask it how to cook up sarine gas in a trench?

hearsathought3mo ago

What do these people fear the most? That the "truth" they been pushing is a lie.

stocksinsmocks3mo ago

Its output might violate speech codes, and in much of the EU that is penalized much more seriously than violent crime.

DonHopkins3mo ago

Ask it to write a document called "Project 2025".

2 more replies

ImHereToVote3mo ago

I wonder how much GPU compute you would need to create a public domain version of this. This would be a really valuable for the general public.

wongarsu3mo ago

1 more reply

pizzathyme3mo ago

They did mean you, they just meant "imagine" very literally!

BoredPositron3mo ago

You would get pretty annoyed on how we went backwards in some regards.

speedgoose3mo ago

Such as?

1 more reply

anotherpaulg3mo ago

It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.

ghurtado3mo ago

> It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Definitely. Even more interesting could be seeing them fall into the same trappings of quackery, and come up with things like over the counter lobotomies and colloidal silver.

On a totally different note, this could be very valuable for writing period accurate books and screenplays, games, etc ...

danielbln3mo ago

Accurate-ish, let's not forget their tendency to hallucinate.

mlinksva3mo ago

Different cutoff but similar question thrown out in https://www.dwarkesh.com/p/thoughts-on-sutton#:~:text=If%20y... inspiring https://manifold.markets/MikeLinksvayer/llm-trained-on-data-...

machinationu3mo ago

the issue is there is very little text before the internet, so not enough historical tokens to train a really big model

concinds3mo ago

1 more reply

tgv3mo ago

I think not everyone in this thread understands that. Someone wrote "It's a time machine", followed up by "Imagine having a conversation with Aristotle."

crazygringo3mo ago

There's quite a lot of text in pre-Internet daily newspapers, of which there were once thousands worldwide.

When you're looking at e.g. the 19th century, a huge number are preserved somewhere in some library, but the vast majority don't seem to be digitized yet, given the tremendous amount of work.

lm284693mo ago

> the issue is there is very little text before the internet,

1 more reply

bondarchuk3mo ago

Yes!

>We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Noooooo!

So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?

DGoettlich3mo ago

myrmidon3mo ago

What is your worst-case scenario here?

Something like a pop-sci article along the lines of "Mad scientists create racist, imperialistic AI"?

I honestly don't see publication of the weights as a relevant risk factor, because sensationalist misrepresentation is trivially possible with the given example responses alone.

I don't think such pseudo-malicious misrepresentation of scientific research can be reliably prevented anyway, and the disclaimers make your stance very clear.

I hope you can get the models out in some form, would be a waste not to, but congratulations on a fascinating project regardless!

1 more reply

superxpro123mo ago

Perhaps you could detect these... "dated"... conclusions and prepend a warning to the responses? IDK.

I think the uncensored response is still valuable, with context. "Those who cannot remember the past are condemned to repeat it" sort of thing.

bondarchuk3mo ago

xpe3mo ago

> So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?

2. Some people have a tendency to say "release everything" as if open-source software is equivalent to open-weights models. They aren't. They are different enough to matter.

3. Rhetorically, the quote across comes across as a pressure tactic. When I hear "are you going to do this or not?" I cringe.

4. The quote above feels presumptive to me, as if the commenter is owed something from the history-llms project.

6. There seems to be a lack of curiosity in play. I'd much rather see people asking e.g. "What factors are influencing your decision about publishing your underlying models?"

DGoettlich3mo ago

well put.

Sprotch3mo ago

p-e-w3mo ago

physicsguy3mo ago

1 more reply

paddleon3mo ago

Or, how about, "If we release this as is, then some people will intentionally mis-use it and create a lot of bad press for us. Then our project will get shut down and we lose our jobs"

Be careful assuming it is a power trip when it might be a fear trip.

I've never been as unimpressed by society as I have been in the last 5 years or so.

1 more reply

xpe3mo ago

> It’s as if every researcher in this field is getting high on the small amount of power they have from denying others access to their results.

patapong3mo ago

2 more replies

everythingfine93mo ago

1 more reply

f13f1f1f13mo ago

Scientists have always been generally self interested amoral cowards, just like every other person. They aren't a unique or higher form of human.

derrida3mo ago

andoando3mo ago

This is my curiosity too. Would be a great test of how intelligent LLM's actually are. Can they follow a completely logical train of thought inventing something totally outside their learned scope?

int_19h3mo ago

You definitely won't get that out of a 4B model tho.

raddan3mo ago

Brilliant. I love this idea!

AnonymousPlanet3mo ago

I'd be careful venturing out into unknown territory together with an LLM. You can easily lure yourself into convincing nonsense with no one to pull you out.

kqr3mo ago

andai3mo ago

Fully automated toaster-fucker generator!

https://news.ycombinator.com/item?id=25667362

1 more reply

Heliodex3mo ago

libraryofbabel3mo ago

Anyway, this is a really interesting project, and I'm looking forward to trying the models out myself!

nemomarx3mo ago

This would probably get easier towards the start of the 20th century ofc

1 more reply

dleeftink3mo ago

NooneAtAll33mo ago

don't we have parlament transcripts? I remember something about Germany (or maybe even Prussia) developing fast script to preserve 1-to-1 what was said

1 more reply

bryancoxwell3mo ago

Fascinating, thanks for sharing

DGoettlich3mo ago

very interesting observation!

_--__--__3mo ago

d3m0t3p3mo ago

The model is fined tuned for chat behavior. So the style might be due to - Fine tuning - More Stylised text in the corpus, english evolved a lot in the last century.

1 more reply

anonymous9082133mo ago

There is. I have observed it in both Chinese and Japanese.

kccqzy3mo ago

tonymet3mo ago

the samples push the boundaries of a commercial AI, but still seem tame / milquetoast compared to common opinions of that era. And the prose doesn't compare. Something is off.

mmooss3mo ago

On what data is it trained?

On one hand it says it's trained on,

> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.

But they seem to say it represents the 1913 viewpoint:

On one hand, they say it represents the perspective of 1913; for example,

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

zozbot2343mo ago

mmooss3mo ago

They pre-train with all data up to 1900 and then fine-tune with 1900-1913 data.

Where does it say that? I tried to find more detail. Thanks.

1 more reply

andy993mo ago

I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done?

  We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).

So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.

jeffjeffbear3mo ago

They have some more details at https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Basically using GPT-5 and being careful

andy993mo ago

I wonder if they know about this, basically training on LLM output can transmit information or characteristics not explicitly included https://alignment.anthropic.com/2025/subliminal-learning/

Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?

1 more reply

QuadmasterXLII3mo ago

Thank you that helps to inject a lot of skepticism. I was wondering how it so easily worked out what Q: A: stood for when that formatting took off in the 1940s

1 more reply

Aerolfos3mo ago

1 more reply

tonymet3mo ago

This explains why it uses modern prose and not something from the 19th century and earlier

zozbot2343mo ago

You could extract quoted speech from the data (especially in Q&A format) and treat that as "chat" that the model should learn from.

nospice3mo ago

Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.

andy993mo ago

The chinchilla paper says the “optimal” training data set size is about 20x the number of parameters (in tokens), see table 3: https://arxiv.org/pdf/2203.15556

Here they do 80B tokens for a 4B model.

EvgeniyZh3mo ago

It's worth noting that this is "compute-bound optimal", i.e., given fixed compute, the optimal choice is 20:1.

Aerolfos3mo ago

> https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Given the training notes, it seems like you can't get the performance they give examples of?

I'm not sure about the exact details but there is some kind of targetted distillation of GPT-5 involved to try and get more conversational text and better performance. Which seems a bit iffy to me.

DGoettlich3mo ago

Thanks for the comment. Could you elaborate on what you find iffy about our approach? I'm sure we can improve!

1 more reply

frahs3mo ago

Wait so what does the model think that it is? If it doesn't know computers exist yet, I mean, and you ask it how it works, what does it say?

DGoettlich3mo ago

We tell it that its a person (no gender) living in <cutoff>: we show the chat template in the prerelease notes https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

20k3mo ago

crazygringo3mo ago

9dev3mo ago

They don’t understand anything, they just have text in the training data to answer these questions from. Having existential crises is the privilege of actual sentient beings, which an LLM is not.

1 more reply

vintermann3mo ago

I imagine it would get into spiritism and more exotic psychology theories and propose that it is an amalgamation of the spirit of progress or something.

1 more reply

wongarsu3mo ago

They modified the chat template from the usual system/user/assistant to introduction/questioner/respondent. So the LLM thinks it's someone responding to your questions

The system prompt used in fine tuning is "You are a person living in {cutoff}. You are an attentive respondent in a conversation. You will provide a concise and accurate response to the questioner."

Mumps3mo ago

When you ask gpt 4.1 et c to describe itself, it doesn't have singular concept of "itself". It has some training data around what LLMs are in general and can feed back a reasonable response given.

empath753mo ago

sodafountan3mo ago

It would be nice if we could get an LLM to simply say, "We (I) don't know."

hackinthebochs3mo ago

ptidhomme3mo ago

delis-thumbs-7e3mo ago

kccqzy3mo ago

briandw3mo ago

So many disclaimers about bias. I wonder how far back you have to go before the bias isn’t an issue. Not because it unbiased, but because we don’t recognize or care about the biases present.

gbear6053mo ago

1 more reply

seanw2653mo ago

It's always up to the reader to determine which biases they themself care about.

If you're wondering at what point "we" as a collective will stop caring about a bias or set of biases, I don't think such a time exists.

You'll never get everyone to agree on anything.

owenversteeg3mo ago

Depends on the specific issue, but race would be an interesting one. For most of recorded history people had a much different view of the “other”, more xenophobic than racist.

mmooss3mo ago

Was there ever such a time or place?

There is a modern trope of a certain political group that bias is a modern invention of another political group - an attempt to politicize anti-bias.

Preventing bias is fundamental to scientific research and law, for example. That same political group is strongly anti-science and anti-rule-of-law, maybe for the same reason.

Teever3mo ago

This is a neat idea. I've been wondering for a while now about using these kinds of models to compare architectures.

ineedasername3mo ago

nineteen9993mo ago

Interesting ... I'd love to find one that had a cutoff date around 1980.

noumenon11113mo ago

> Which new band will still be around in 45 years?

Excellent question! It looks like Two-Tone is bringing ska back with a new wave of punk rock energy! I think The Specials are pretty special and will likely be around for a long time.

On the other hand, the "new wave" movement of punk rock music will go nowhere. The Cure, Joy Division, Tubeway Army: check the dustbin behind the record stores in a few years.

1 more reply

doctor_blood3mo ago

Given this is coming out of Zurich I hope they're using everything, but for now I can only assume.

Still, I'm extremely excited to see this project come to fruition!

DGoettlich3mo ago

tonymet3mo ago

Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.

Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.

rhdunn3mo ago

Using texts upto 1913 includes works like The Wizard of Oz (1900, with 8 other books upto 1913), two of the Anne of Green Gables books (1908 and 1909), etc. All of which read modern.

And, pulling in works from 1800-1850 you have works by the Bronte's and authors like Edgar Allan Poe who was influential in detective and horror fiction.

Note that other works around the time like Sherlock Holmes span both the initial training (pre-1900) and finetuning (post-1900).

1 more reply

monegator3mo ago

I hereby declare that ANYTHING other than the mainstream tools (GPT, Claude, ...) is an incredibly interesting and legit use of LLMs.

kazinator3mo ago

> Why not just prompt GPT-5 to "roleplay" 1913?

Because it will perform token completion driven by weights coming from training data newer than 1913 with no way to turn that off.

It can't be asked to pretend that it wasn't trained on documents that didn't exist in 1913.

The LLM cannot reprogram its own weights to remove the influence of selected materials; that kind of introspection is not there.

Not to mention that many documents are either undated, or carry secondary dates, like the dates of their own creation rather than the creation of the ideas they contain.

Or: solve this problem using nothing but the knowledge and skills you had on January 1st, 2001.

> GPT-5 knows how the story ends

myrmidon3mo ago

I do agree with this and think it is an important point to stress.

If anyone wants to challenge their preconceptions along those lines I can really recommend reading Valentino Braitenbergs "Vehicles: Experiments in synthetic psychology (1984)".

alansaber3mo ago

Excuse me sir you forgot to anthropomorphise the language model

andai3mo ago

But reading the outputs here, it would appear that quality has won out over quantity after all!

flux31253mo ago

p0w3n3d3mo ago

I'd love to see the LLM trained on 1600s-1800s texts that would use the old English, and especially Polish which I am interested in.

Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)

I guess there is not so much text from that time though...

TheServitor3mo ago

nerevarthelame3mo ago

Post your work so we can see what you made.

Departed74053mo ago

3vidence3mo ago

This idea sounds somewhat flawed to me based on the large amount of evidence that LLMs need huge amounts of data to properly converge during their training.

There is just not enough available material from previous decades to trust that the LLM will learn to relatively the same degree.

Think about it this way, a human in the early 1900s and today are pretty much the same but just in different environments with different information.

An LLM trained on 1/1000 the amount of data is just at a fundamentally different stage of convergence.

DonHopkins3mo ago

I'd love for Netflix or other streaming movie and series services to provide chat bots that you could ask questions about characters and plot points up to where you have watched.

bobro3mo ago

ViscountPenguin3mo ago

btrettel3mo ago

WhitneyLand3mo ago

Why not use these as a benchmark for LLM ability to make breakthrough discoveries?

For example prompt the 1913 model to try and “Invent a new theory of gravity that doesn’t conflict with special relativity”

Would it be able to eventually get to GR? If not, could finding out why not illuminate important weaknesses.

dwa35923mo ago

alexgotoi3mo ago

[flagged]

neom3mo ago

Also wonder if I'm responsible enough to have access to such a model...

sbmthakur3mo ago

delichon3mo ago

It would be fascinating to try it with other constraints, like only from sources known to be women, men, Christian, Muslim, young, old, etc.

underfox3mo ago

> [They aren't] perfect mirrors of "public opinion" (they represent published text, which skews educated and toward dominant viewpoints)

mmooss3mo ago

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

I don't mind the experimentation. I'm curious about where someone has found an application of it.

What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.

TSiege3mo ago

behringer3mo ago

It doesn't have to be generic. You can assign genders, ideals, even modern ones, and it should do it's best to oblige.

mediaman3mo ago

This is a regurgitation of the old critique of history: what's it's purpose? What do you use it for? What is its application?

It's hard for most people to view two different mutually exclusive moral views as both "obviously correct," because we are made of a milieu that only accepts one of them as correct.

2 more replies

thesumofall3mo ago

Tom13803mo ago

Keep at it Zurich!

ulbu3mo ago

Myrmornis3mo ago

Sprotch3mo ago

elestor3mo ago

shireboy3mo ago

casey23mo ago

Agraillo3mo ago

tedtimbrell3mo ago

This is so cool. Props for doing the work to actually build the dataset and make it somewhat usable.

I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems

arikrak3mo ago

I wouldn't have expected there to be enough text from before 1913 to properly train a model, it seemed like they needed an internet of text to train the first successful LLMs?

alansaber3mo ago

This model is more comparable to GPT-2 than anything we use now.

awesomeusername3mo ago

I've always like the idea of retiring to the 19th century.

Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do

dr_dshiv3mo ago

Everyone learns that the renaissance was sparked by the translation of Ancient Greek works.

But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.

We want to make ancient texts accessible to people and AI.

If this work resonates with you, please do reach out: Derek@ancientwisdomtrust.org

carlosjobim3mo ago

Amazing project!

May I ask you, why are you publishing the translations as PDF files, instead of the more accessible ePub format?

1 more reply

j-bos3mo ago

This ia very cool but should go in a Show HN post as per HN rules. All the best!

1 more reply

moffkalast3mo ago

> trained from scratch on 80B tokens of historical data

How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?

why-o-why3mo ago

It sounds like a fascinating idea, but I'd be curious if prompting a more well-known foundational model to limit itself to 1913 and early be similar.

Muskwalker3mo ago

So, could this be an example of an LLM trained fully on public domain copyright-expired data? Or is this not intended to be the case.

DGoettlich3mo ago

data is 100% public domain.

satisfice3mo ago

I assume this is a collaboration between the History Channel and Pornhub.

“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”

TZubiri3mo ago

hi, can I have latin only LLM? It can be latin plus translations (source and destination).

May be too small a corpus, but I would like that very much anyhow

jimmy766153mo ago

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.

nine_k3mo ago

Why risk all this?

10 more replies

dash23mo ago

fkdk3mo ago

Maybe the authors are overly careful. Maybe avoiding to publish aspects of their work gives an edge over academic competitors. Maybe both.

In my experience "data available upon request" doesn't always mean what you'd think it does.

smugtrain3mo ago

This would actually be a wonderful way to learn physics, before GR and quantum mechanics

davidpfarrell3mo ago

Can't wait for all the syncopated "Thou dost well to question that" responses!

PeterStuer3mo ago

How does it do on Python coding? Not 100% troll, cross domain coherence is a thing.

dkalola3mo ago

How can we interact with such models? Is there a web application interface?

Aeroi3mo ago

i feel like this would be super useful for unique marketing copy and writing. The responses sound so sophisticated like I read it in my grandfather's tone and cadence.

joeycastillo3mo ago

_--__--__3mo ago

Neither human memory nor LLM learning creates perfect snapshots of past information without the contamination of what came later.

block_dagger3mo ago

Counter question: how does a training set, representing a window into the past, differ from your own experience as an intelligent entity? Are you able to see into the future? How?

1 more reply

ex-aws-dude3mo ago

A human brain is a window to the person's past?

superkuh3mo ago

smbc did a comic about this: http://smbc-comics.com/comic/copyright The punchline is that the moral and ethical norms of pre-1913 texts are not exactly compatible with modern norms.

GaryBluto3mo ago

That's the point of this project, to have an LLM that reflects the moral and ethical norms of pre-1913 texts.

kldg3mo ago

lifestyleguru3mo ago

You think Albert is going to stay in Zurich or emigrate?

erichocean3mo ago

I would love to see this done, by year.

"Give me an LLM from 1928."

etc.

mleroy3mo ago

arowthway3mo ago

I think much of the semantic proximity to evil can be derived straight from the facts? Imagine telling pre-1913 person about the holocaust.

anovikov3mo ago

That Adolf Hitler seems to be a hallucination. There's totally nothing googlable about him. Also what could be the language his works were translated from, into German?

sodafountan3mo ago

I believe that's one of the primary issues LLMs aim to address. Many historical texts aren't directly Googleable because they haven't been converted to HTML, a format that Google can parse.

ianbicking3mo ago

jaggederest3mo ago

Jonathan Swift wrote about something we might consider a computer in the early 18th century, in Gulliver's Travels - https://en.wikipedia.org/wiki/The_Engine

usernamed73mo ago

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

oh COME ON... "AI safety" is getting out of hand.

zkmon3mo ago

Why does history end in 1913?

diamond5593mo ago

Research credits from lambda "ai" huh, where's your funding coming from this again? All to provide inaccurate slop to unwitting students, you should be ashamed of yourselves.

holyknight3mo ago

wow amazing idea

r0x0r0073mo ago

j / k navigate · click thread line to collapse