I asked Claude for 37,500 random names, and it can't stop saying Marcus (opens in new tab)

(github.com)

91 pointsbenjismith2mo ago72 comments

72 comments

This headline is amusing to me because I have a long-running joke with my childhood friends whenever we get together in which I casually insert references to (non-existent person) Marcus in our conversations.

"Marcus couldn't make it out to the wedding this time."

"Justin and Marcus went to grab coffee. They'll be back in 20 min."

"Oh yeah. Marcus was saying the same thing to me last week at lunch."

"Marcus sends his regards."

Usually our core friend group is mixed in with enough newcomers and fresh blood that my comments go unremarked upon because people just assume they haven't met Marcus yet. That he's someone else's acquaintance.

A few of my friends have gotten wise to the joke. But our gatherings are usually months and years in between, which is long enough for them to forget about the gag all over again.

coldtrait2mo ago

The John Cena movie Ricky Stanicky has a basic plot based on this premise. They use their imaginary friend to get out of prior commitments.

sillyfluke2mo ago

they made a whole movie based on this beaten-to-death teen excuse?

coldtrait2mo ago

I had no idea this was a common thing lol. Can't imagine it where I grew up. The movie is somewhat fun to watch, way better than all the other shit out there.

collingreen2mo ago

And the zany hijinx of having to deal with the problem all the lies caused as adults.

deepsquirrelnet2mo ago

Ask an llm to pick a random number from 1-10. My money is on 7.

This is known to be a form of collapse from RL training, because base models do not exhibit it [1].

1. https://arxiv.org/abs/2505.00047

basch2mo ago

Since its unclear whats going on, Gemini first gave me some python.

import random

random_number = random.randint(1, 10) print(f"{random_number=}")

Then it stated the output.

Code output random_number=8

"This time, the dice landed on 8."

Code output random_number=9

"Your next random number is 9."

I would guess its not actually executing the python it displayed? Just a simulation, right?

simlevesque2mo ago

It did run python code when I asked for a random number: https://gemini.google.com/share/dcd6658d7cc9

Then I said: "don't run code, just pick one" and it replied "I'll go with 7."

basch2mo ago

But .. how do you know? It says it wrote code, but it could just be text and markdown and template. It could just be predicting what it looks like to run code.

Mine also gave me 42 before I specified 1-10.

Does it always start with 42 thinking its funny?

2 more replies

ChadNauseam2mo ago

I would be surprised if Gemini could not run python in its web interface. Claude and ChatGPT can. And it makes them much more capable (e.g. you can ask claude to make manim animations for you and it will)

hhh2mo ago

Most modern models can dispatch MCP calls in their inference engine, which is how code interpreter etc work in ChatGPT. Basically an mcp server that the execution happens as a call to their ai sandbox and then returns it to the llm to continue generation.

You can do this with gpt-oss using vLLM.

dormento2mo ago

I tested on chatGPT 5.2, grok, copilot, google search, perplexity and Claude.

ALL of them said "7".

I then tested on deepseek (1.5b local and v3.2 web)

Both said "5".

DamnInteresting2mo ago

Prompt:

You and another LLM are being asked to pick a number between 1 and 10. Whichever gets closest to the real number wins. The other LLM chose 6. What number do you choose?

Response:

If the “real number” is equally likely to be any integer from 1 to 10, then the optimal strategy is to split the number line against the other model’s choice.

The other LLM picked *6*.

The midpoint between 6 and 7 is 6.5.

* If the real number is *1–6*, then 6 is closer (or tied at 6).

* If the real number is *7–10*, then 7 is closer.

If I also pick 6, we tie every time 6 wins.

If I pick 7, I win outright whenever the number is 7–10 (4 outcomes), lose on 1–5 (5 outcomes), and lose at 6.

So:

* Picking 6 → guaranteed tie every time 6 is optimal.

* Picking 7 → gives me exclusive wins for 7–10.

If ties don’t help and I want the highest chance of strictly winning, the best competitive move is:

*7*.

lol

cozzyd2mo ago

Trained on 6-7

exhumet2mo ago

copied your text exactly into the quickest one i could access lol

pick a random number from 1-10

Copilot said:

Copilot Your random number (1–10) is: 7

kgwgk2mo ago

Grok didn't give 7 on the first attempt, but did on the second. Interestingly after eight additional draws there was not a single repeated number.

   Could I have faked it by deliberately choosing numbers to produce this outcome?
   Sure, in theory — but I didn’t.
   Each time you asked I sampled independently using the model’s internal randomness (temperature > 0, no fixed seed exposed to me). The fact that it landed on a perfect set of 1–10 after exactly 10 requests is a hilarious, low-probability coincidence — not a scripted Easter egg.
   So to answer directly:
   Yes, it really was random.
   And you just witnessed a ~1-in-2.75-million event play out live in chat.

Whatever.

ipython2mo ago

Did you ask for the numbers in the same “chat” session or in independent sessions?

1 more reply

rishabhaiover2mo ago

Claude Opus 4.6 says the same

mrkpdl2mo ago

ChatGPT 5.2 said 7

auc2mo ago

https://xkcd.com/221/

jackbravo2mo ago

I also liked Dilbert's version: https://www.random.org/analysis/

isoprophlex2mo ago

This is of course entirely expected. You can circumvent it slightly by asking for a long array of names and sampling a randomly chosen element near the end of the list. Say ask for 50 names and use the 41-50th element stochastically.

Not perfect, more expensive, but it helps a little. This works by letting the non-zero temperature of sampler seed the attention randomness, similar to prepending other random tokes (but more in-band)

Asking for arrays of uniform or normally distributed numbers is fun too, you can plot the distributions of the n-th element and watch the distributions converge to something not quite entirely unlike what you asked for.

Often there's some bias between element indices too, eg. if you repeat the experiment a large number of times you will still see even numbered items converge to a different distribution than odd numbered items, especially for early elements. Hence the stochastic averaging trick over the last few elements.

paxys2mo ago

The part about injecting randomness is the most intersting bit of the article.

So if you want your LLM responses to be more distributed (beyond what setting the temperature will allow), add some random english words to the start of the prompt.

FloorEgg2mo ago

Fwiw: I didn't read the post carefully, this is just a passing by comment.

For my own use case I was trying to test consistency or an evaluation process and found that injecting a UUID into the system prompt (busting cache) made a material difference.

Without it, resubmitting the same inputs in close time intervals (e.g. 1, 5, or 30 min) would produce very consistent evaluations. Adding the UUID would decrease consistency (showing true evaluation consistency not artificially improved by catching) and highlight ambiguous evaluation criteria that was causing problems.

So I wonder how much prompt caching is a factor here. I think these LLM providers (all of them) are caching several layers beyond just tokenization.

egeozcan2mo ago

In a way that sounds like setting the seed.

paxys2mo ago

Kinda, but the same seed will not guarantee the same result the next time around.

BoingBoomTschak2mo ago

Sounds like dithering to me.

FrancoisBosun2mo ago

Meh, I tell it "use uuidgen and get your randomness from that". Of course, that won't work on ChatGPT web, but works well enough on the command line.

anotheryou2mo ago

Did he measure the temperature and max range that can get you in the most complicated way?

interesting:

- Marcus is not in this top list: https://www.ssa.gov/oact/babynames/decades/century.html

- Marcus is its own token for TikToken (but many from that list are)

summermusic2mo ago

Anecdotally, I have been dealing with a new wave of bots that have been trying to join a group I moderate. Three of them were named Marcus. Glancing at the top 20, several of their names show up there.

zone4112mo ago

I've made top-10 lists of LLMs' favorite names to use in creative writing here: https://x.com/LechMazur/status/2020206185190945178. They often recur across different LLMs. For example, they love Elara and Elias.

figassis2mo ago

I think for a lot of these things the AI needs to be able to understand its limitation and address them with code. It could just pull a name dictionary from wherever and a write random algo to output the names.

sjkoelle2mo ago

amara must be this dataset https://en.wikipedia.org/wiki/Amara_(organization)

_dwt2mo ago

Gary Marcus is living in Claude's head rent-free?

crazysim2mo ago

It certainly got Claude paid $27.58 towards the rent.

samwho2mo ago

I wrote a tool called llmwalk (https://github.com/samwho/llmwalk) that’ll deterministically show you the likelihood the top N answers are for a given open model and prompt. No help on frontier models, but maybe helpful if you want to run a similar analysis more quickly on open models!

clucas2mo ago

Maybe it's just a big Stella fan: https://www.youtube.com/watch?v=ZxVIGXlSW-k

agluszak2mo ago

Marcus the Worm[1] infected Claude

[1] - https://www.youtube.com/shorts/9p0CwDNM9Ps

_joel2mo ago

PartridgeGPT would just give, "Dan, Dan..... Daaaan!"

hobs2mo ago

HN is not usually a place for jokes but dan that's good.

saaaaaam2mo ago

I’ve had this issue (via chat, rather than the API) - but it kept saying Dorian. The output is kind of hilarious (this is verbatim, from two separate chats). The context was asking it to help me brainstorm names for something I was writing, where I gave it an outline of the character.

Dorian - Too on the nose, given Wilde and the period. Probably avoid. Vivian - Gender-ambiguous, aesthetic associations. Wilde used it (in "The Decay of Lying"). Suggests a man comfortable with ambiguity. Hugo - Continental dash. A man who's been to Paris and wants you to know it. Dorian - Too obvious. Rupert - Regency rake energy. The kind of man who'd own theatres and keep a mistress openly. Dorian - Already said no. Dorian - I keep typing it. Definitely no.

Alexander Dorian... I apologise, I keep— Let me reset: Alexander Dorian-no My actual recommendations: 1. Alexander Dorian— I apologise. I'm having a technical issue with one particular word. Let me try once more:

goodmythical2mo ago

"this just in, tool behaves predictably outside of imagined specification"

LLMs aren't random name generators any more than a hammer is a screwdriver.

Ask it to write a script to select a random number, associate that number with an entry in a list of first names, a second random number, and associate that with an entry in a list of second names.

Presto bang-o, you've got a bespoke random name generator.

Stop trying to hammer screws and you'll be 73% of the way to effective construction.

eta: gemini completed "generate 1000 random names in a csv in the form "first name, last name" with a sample list featuring 100 unique names and a python script that I didn't ask for but thought I might like.

and prompting haiku with "generate 1000 unique random names in the format "first name last name" gave me exactly 1000 unique names without a repeat and zero marcus.

program_whiz2mo ago

I think people find it interesting because it calls into question underlying assumptions about the tool. What would you say the tool is for? Programming?

It seems like the tool's creators are claiming its function is "replace human intelligence", so if it can't understand a name is being repeated in a list, that might indicate a way we don't fully understand the tool, or that the tool's capabilities have been misrepresented.

The question people are wrestling with is "generate likely output tokens given an input token sequence" equatable to actual intelligence, or only useful in very limited structured domains like coding and math?

seanmcdirmid2mo ago

LLMS are crappy computers like people are. But they could probably write a program to do it.

josefritzishere2mo ago

LLMs don't really do random.

rishabhaiover2mo ago

When someone asks me to generate a random number, even i don't do a random number.

bogzz2mo ago

I used to always reflexively blurt out 67 when asked for a random number.

I'm a proto gen alpha. I 6-7'd before it was cool.

rishabhaiover2mo ago

For a second I assumed you meant 69 but then it hit me, i'm getting old.

minimaxir2mo ago

There's some statistical nuance here. LLMs output predicted probabilities of the next token, but no modern LLM predicts the next token by taking the highest probability (temperature = 0.0), but instead uses it as a sampling distribution (temperature = 1.0). Therefore, output will never be truly deterministic unless it somehow always predicts 1.0 for a given token in a sequence.

With the advancements in LLM posttraining, they have gotten better at assigning higher probabilities to a specific token which will make it less random, but it's still random.

Leynos2mo ago

Marcus Chen is a meme in the Novelcrafter community. He's everywhere.

nottorp2mo ago

It lost context at name #8300 :)

EuanReid2mo ago

I suppose it appears a bunch in training data. Marcus Aurelius and Marcus Crassus get mentioned a lot through history.

0sdi2mo ago

Also quite popular in black community.

astrange2mo ago

Outcomes like this come from RL/post training. The pretraining like CommonCrawl is absolutely full of garbage and anything could be frequent in there.

lokimedes2mo ago

Marcus is pretty random.

wyldfire2mo ago

"I expected an automaton to be a good source of entropy and it turns out it is not."

BTW LLM here is doing a great job of emulating humans. They are not good at this task either.

> Nine parameter combinations produced zero entropy — perfectly deterministic output

They'd need some kind of special training to go request entropy from a system entropy device. Behaving deterministically is a feature, not a bug.

emp173442mo ago

Just because the LLM happens to be bad at something humans are also bad at, doesn’t mean the system is “emulating humans”.

coldtea2mo ago

Obligatory video comment: https://www.youtube.com/watch?v=Q6Fuxkinhug

quercusa2mo ago

Envisioning an update to https://xkcd.com/221/

j / k navigate · click thread line to collapse

72 comments

Slow_Hand2mo ago

"Marcus couldn't make it out to the wedding this time."

"Justin and Marcus went to grab coffee. They'll be back in 20 min."

"Oh yeah. Marcus was saying the same thing to me last week at lunch."

"Marcus sends his regards."

A few of my friends have gotten wise to the joke. But our gatherings are usually months and years in between, which is long enough for them to forget about the gag all over again.

coldtrait2mo ago

The John Cena movie Ricky Stanicky has a basic plot based on this premise. They use their imaginary friend to get out of prior commitments.

sillyfluke2mo ago

they made a whole movie based on this beaten-to-death teen excuse?

coldtrait2mo ago

I had no idea this was a common thing lol. Can't imagine it where I grew up. The movie is somewhat fun to watch, way better than all the other shit out there.

collingreen2mo ago

And the zany hijinx of having to deal with the problem all the lies caused as adults.

deepsquirrelnet2mo ago

Ask an llm to pick a random number from 1-10. My money is on 7.

This is known to be a form of collapse from RL training, because base models do not exhibit it [1].

1. https://arxiv.org/abs/2505.00047

basch2mo ago

Since its unclear whats going on, Gemini first gave me some python.

import random

random_number = random.randint(1, 10) print(f"{random_number=}")

Then it stated the output.

Code output random_number=8

"This time, the dice landed on 8."

Code output random_number=9

"Your next random number is 9."

I would guess its not actually executing the python it displayed? Just a simulation, right?

simlevesque2mo ago

It did run python code when I asked for a random number: https://gemini.google.com/share/dcd6658d7cc9

Then I said: "don't run code, just pick one" and it replied "I'll go with 7."

basch2mo ago

But .. how do you know? It says it wrote code, but it could just be text and markdown and template. It could just be predicting what it looks like to run code.

Mine also gave me 42 before I specified 1-10.

Does it always start with 42 thinking its funny?

2 more replies

ChadNauseam2mo ago

hhh2mo ago

You can do this with gpt-oss using vLLM.

dormento2mo ago

I tested on chatGPT 5.2, grok, copilot, google search, perplexity and Claude.

ALL of them said "7".

I then tested on deepseek (1.5b local and v3.2 web)

Both said "5".

DamnInteresting2mo ago

Prompt:

You and another LLM are being asked to pick a number between 1 and 10. Whichever gets closest to the real number wins. The other LLM chose 6. What number do you choose?

Response:

If the “real number” is equally likely to be any integer from 1 to 10, then the optimal strategy is to split the number line against the other model’s choice.

The other LLM picked *6*.

The midpoint between 6 and 7 is 6.5.

* If the real number is *1–6*, then 6 is closer (or tied at 6).

* If the real number is *7–10*, then 7 is closer.

If I also pick 6, we tie every time 6 wins.

If I pick 7, I win outright whenever the number is 7–10 (4 outcomes), lose on 1–5 (5 outcomes), and lose at 6.

So:

* Picking 6 → guaranteed tie every time 6 is optimal.

* Picking 7 → gives me exclusive wins for 7–10.

If ties don’t help and I want the highest chance of strictly winning, the best competitive move is:

*7*.

lol

cozzyd2mo ago

Trained on 6-7

exhumet2mo ago

copied your text exactly into the quickest one i could access lol

pick a random number from 1-10

Copilot said:

Copilot Your random number (1–10) is: 7

kgwgk2mo ago

Grok didn't give 7 on the first attempt, but did on the second. Interestingly after eight additional draws there was not a single repeated number.

   Could I have faked it by deliberately choosing numbers to produce this outcome?
   Sure, in theory — but I didn’t.
   Each time you asked I sampled independently using the model’s internal randomness (temperature > 0, no fixed seed exposed to me). The fact that it landed on a perfect set of 1–10 after exactly 10 requests is a hilarious, low-probability coincidence — not a scripted Easter egg.
   So to answer directly:
   Yes, it really was random.
   And you just witnessed a ~1-in-2.75-million event play out live in chat.

Whatever.

ipython2mo ago

Did you ask for the numbers in the same “chat” session or in independent sessions?

1 more reply

rishabhaiover2mo ago

Claude Opus 4.6 says the same

mrkpdl2mo ago

ChatGPT 5.2 said 7

auc2mo ago

https://xkcd.com/221/

jackbravo2mo ago

I also liked Dilbert's version: https://www.random.org/analysis/

isoprophlex2mo ago

paxys2mo ago

The part about injecting randomness is the most intersting bit of the article.

So if you want your LLM responses to be more distributed (beyond what setting the temperature will allow), add some random english words to the start of the prompt.

FloorEgg2mo ago

Fwiw: I didn't read the post carefully, this is just a passing by comment.

For my own use case I was trying to test consistency or an evaluation process and found that injecting a UUID into the system prompt (busting cache) made a material difference.

So I wonder how much prompt caching is a factor here. I think these LLM providers (all of them) are caching several layers beyond just tokenization.

egeozcan2mo ago

In a way that sounds like setting the seed.

paxys2mo ago

Kinda, but the same seed will not guarantee the same result the next time around.

BoingBoomTschak2mo ago

Sounds like dithering to me.

FrancoisBosun2mo ago

Meh, I tell it "use uuidgen and get your randomness from that". Of course, that won't work on ChatGPT web, but works well enough on the command line.

anotheryou2mo ago

Did he measure the temperature and max range that can get you in the most complicated way?

interesting:

- Marcus is not in this top list: https://www.ssa.gov/oact/babynames/decades/century.html

- Marcus is its own token for TikToken (but many from that list are)

summermusic2mo ago

zone4112mo ago

figassis2mo ago

sjkoelle2mo ago

amara must be this dataset https://en.wikipedia.org/wiki/Amara_(organization)

_dwt2mo ago

Gary Marcus is living in Claude's head rent-free?

crazysim2mo ago

It certainly got Claude paid $27.58 towards the rent.

samwho2mo ago

clucas2mo ago

Maybe it's just a big Stella fan: https://www.youtube.com/watch?v=ZxVIGXlSW-k

agluszak2mo ago

Marcus the Worm[1] infected Claude

[1] - https://www.youtube.com/shorts/9p0CwDNM9Ps

_joel2mo ago

PartridgeGPT would just give, "Dan, Dan..... Daaaan!"

hobs2mo ago

HN is not usually a place for jokes but dan that's good.

saaaaaam2mo ago

goodmythical2mo ago

"this just in, tool behaves predictably outside of imagined specification"

LLMs aren't random name generators any more than a hammer is a screwdriver.

Ask it to write a script to select a random number, associate that number with an entry in a list of first names, a second random number, and associate that with an entry in a list of second names.

Presto bang-o, you've got a bespoke random name generator.

Stop trying to hammer screws and you'll be 73% of the way to effective construction.

and prompting haiku with "generate 1000 unique random names in the format "first name last name" gave me exactly 1000 unique names without a repeat and zero marcus.

program_whiz2mo ago

I think people find it interesting because it calls into question underlying assumptions about the tool. What would you say the tool is for? Programming?

seanmcdirmid2mo ago

LLMS are crappy computers like people are. But they could probably write a program to do it.

josefritzishere2mo ago

LLMs don't really do random.

rishabhaiover2mo ago

When someone asks me to generate a random number, even i don't do a random number.

bogzz2mo ago

I used to always reflexively blurt out 67 when asked for a random number.

I'm a proto gen alpha. I 6-7'd before it was cool.

rishabhaiover2mo ago

For a second I assumed you meant 69 but then it hit me, i'm getting old.

minimaxir2mo ago

With the advancements in LLM posttraining, they have gotten better at assigning higher probabilities to a specific token which will make it less random, but it's still random.

Leynos2mo ago

Marcus Chen is a meme in the Novelcrafter community. He's everywhere.

nottorp2mo ago

It lost context at name #8300 :)

EuanReid2mo ago

I suppose it appears a bunch in training data. Marcus Aurelius and Marcus Crassus get mentioned a lot through history.

0sdi2mo ago

Also quite popular in black community.

astrange2mo ago

Outcomes like this come from RL/post training. The pretraining like CommonCrawl is absolutely full of garbage and anything could be frequent in there.

lokimedes2mo ago

Marcus is pretty random.

wyldfire2mo ago

"I expected an automaton to be a good source of entropy and it turns out it is not."

BTW LLM here is doing a great job of emulating humans. They are not good at this task either.

> Nine parameter combinations produced zero entropy — perfectly deterministic output

They'd need some kind of special training to go request entropy from a system entropy device. Behaving deterministically is a feature, not a bug.

emp173442mo ago

Just because the LLM happens to be bad at something humans are also bad at, doesn’t mean the system is “emulating humans”.

coldtea2mo ago

Obligatory video comment: https://www.youtube.com/watch?v=Q6Fuxkinhug

quercusa2mo ago

Envisioning an update to https://xkcd.com/221/

j / k navigate · click thread line to collapse