Here is a simple trivial one:
"make ssh-keygen output decrypted version of a private key to another file"
I'm pretty sure everyone on the LLM hypetrain will agree that just that prompt should be enough for GPT-4o to give a correct command. After all, it's SSH.
However, here is the output command:
ssh-keygen -p -f original_key -P "current_passphrase" -N "" -m PEM -q -C "decrypted key output" > decrypted_key
chmod 600 decrypted_key
Even the basic fact that ssh-keygen is an in-place tool and does not write data to stdout is not captured strongly enough in the representation for it to be activated with this prompt. Thus, it also overwrites the existing key, and your decrypted_key file will contain "your identification has been saved with the new passphrase", lol.Maybe we should set up a cron job - sorry, chatgpt task - to auto-tweet this in reply to all of the openai employees' hype tweets.
Edit:
chat link: https://chatgpt.com/share/67962739-f04c-800a-a56e-0c2fc8c2dd...
Edit 2: Tried it on deepseek
The prompt pasted as is, it gave the same wrong answer: https://imgur.com/jpVcFVP
However, with reasoning enabled, it caught the fact that the original file is overwritten in its chain of thought, and then gave the correct answer. Here is the relevant part of the chain of thought in a pastebin: https://pastebin.com/gG3c64zD
And the correct answer:
cp encrypted_key temp_key && \
ssh-keygen -p -f temp_key -m pem -N '' && \
mv temp_key decrypted_key
I find it quite interesting that this seemingly 2020-era LLM problem is only correctly solved on the latest reasoning model, but cool that it works.Slight improvement:
"make ssh-keygen output decrypted version of a private key to another file . Use chain reasoning, think carefully to make sure the answer is correct. Summarize the correct commands at the end."
This improved the odds for me of getting the right answer in the format you were looking for in GPT-4o and Claude.
These things aren't magic oracles, they're tools.
I didn't ask or expect any format. The accurate answer in whatever format is all that is expected.
There's no nuance to it whatsoever beyond needing to demonstrate knowledge of the rules of the game.
Why should a language model be good at chess or similar numerical/analytical tasks?
In what way does language resemble chess?
Chess engine: given a sequence of moves in a winning game, what is the most likely next move
I don't think LLMs will ever beat purpose built engines, but it is not inconcevable for them to play better chess than most humans.
My amateur opinion is that an "AI system" resembling AGI or ASI or whatever the acronym of the day is, will be modular, with different parts addressing different kinds of learning, rather than entirely end to end. One of the main milestones towards achieving this would be the ability to dynamically learn what is left to be learnt (finding gaps), and then potentially have it train itself to learn that, automatically. One of the half-milestones, I suppose, would be for humans to find gaps in the the ability first of all.
I attend a talk recently where they presented research that tried to distinguish effectively the following two types of LLM failures:
1) inability to generalize/give the output at the "representation layer" itself
2) has the information represented, but is not able to retrieve it for the given reasonable prompt, and requires "context scaling"
Which is a step towards this goal I suppose.
Here's the board; you can enable the engine to get the answer: https://lichess.org/analysis/standard/8/6B1/8/8/B7/8/K1pk4/8...
So draw is most one can get. Underpromoting to knight (with check, thus avoiding the check by the bishop) is the only way to promote and keep the piece another move.
I guess in this situation the knight against two bishops keeps the draw.
Spring has sprung, the grass iz riz,
I wonder where da boidies iz?
Da boid iz on da wing! Ain’t that absoid?
I always hoid da wing...wuz on da boid!
chatgpt up to o1 failed, o1 did very well. deepseek-r1 7b did ok too.Spring has sprung, grass has risen. I wonder where the birdies are? The bird is on the wing! Isn’t that absurd? I always heard the wing was on the bird!
Not sure what it means that the bird is on the wing.
Is this hard to find out? I mean it's easy to find this https://www.dictionary.com/browse/on-the-wing https://www.allgreatquotes.com/hamlet-quotes-114/
So it's somewhat old-timey English English.
And to explain the joke, the humour is that the rest of it is phonetic 20th century New York English. E.g. pronouncing "Absurd" more like "absoid" to rhyme with "boid".
New York guy finds Shakespearean English absoid.
The other commenter has this right, that now the joke has been explained on the internet, it will be harvested and LLMS will regurgitate variations on the explanations, then people will believe that the LLMs have become "more intelligent" in general. They have not, they just have more data for this specific test.
I get the impression it often does as well or better than o1 on many tasks, despite not being a reasoning model.
I once had a bishop and a knight endgame , I think It became draw on repetition.
Asking AI to do this is definitely flawed. This isn't reasoning. From what I know of 2 bishop end game , its more of hey lets trap the king in a box untill you could then snipe the king with your bishop (like his king could be on h1) yours on h3 your 1 bishop targeting g1 and the other bishop anywhere on the main diagonal with no other pieces.
But this is very much stalematey , since I am currently pondering how to get to this position without a stalemate! , if you move the bishop later , its stalement , Like seriously. https://www.chess.com/forum/view/endgames/two-bishop-checkma...
Just search 2 bishop checkmate is hard , a lot of guides exist just for this purpose , though in my 1000+ games I rarely got once or twice 2 bishop endgame , usually bishop or knight which is just as tricky or if I recall , the worst is knight and knight.
Two bishops (of different colour) is actually not that difficult. There are some simple heuristics to help you there (an LLM might actually tell you these, haven’t asked;-0)
Bishop+Knight is, in my opinion slightly more complicated, there are some ‘tricks’ necessary to keep the king from running from one courner to the next.
Bishop+bishop is - in most situations - a draw (you need three knights to mate).
For those who have not used Lichess, the puzzles it gives (unless you ask for a specific type) do not tell you what the goal is (mate, win material, get a winning endgame, save a bad position, etc) or how many moves it will take.
Here are some puzzles it has recently given me and their current ratings. These all have something in common.
1492 https://lichess.org/training/KsrR0
1506 https://lichess.org/training/RwLfy
1545 https://lichess.org/training/TzZdx
1557 https://lichess.org/training/IJfT7
1564 https://lichess.org/training/oOMz4
1604 https://lichess.org/training/uRRck
1661 https://lichess.org/training/jBrLX
1719 https://lichess.org/training/cpKAM
What they have in common is that they are all mate in one. I have seen composed mate in ones that puzzled even high rated players, but they involved something unusual like the mating move was an en passant capture.None of the above puzzles are tricks like that.
So how are enough people failing them for their ratings to be that high?
[1] https://syzygy-tables.info/?fen=8/6B1/8/8/B7/8/K1pk4/8_b_-_-...
See: https://dynomight.net/more-chess/
HN Discussion of that article: https://news.ycombinator.com/item?id=42206817
(Edits: It turns out "it's complicated").
For whose turn it is you can look at the FEN for the position, which is given at the bottom of the article:
> 8/6B1/8/8/B7/8/K1pk4/8 b - - 0 1
The second field in that indicates whose move it is: "b" for black and "w" for white.
You can also tell that the pawn is close to promoting from the FEN. The first field is simply a slash separated list of the contents of each row, starting from the black side of the board (row 8). In the entry for each row the numbers represent runs of empty squares. The letters represent white pieces (KQRBN) or white pawns (P) or black pieces (kqrbn) or black pawns (p).
Black pawns start at row 7, and we can see that this one is in row 2.