“Don Knuth Plays with ChatGPT” but with ChatGPT-4 (opens in new tab)

(gist.github.com)

223 pointsLifeIsBio3y ago132 comments

132 comments

This is a reference to: https://news.ycombinator.com/item?id=36012360

The sequence of these two threads is just too perfect. Almost likely someone is trying to make a point.

How so? Don Knuth wrote about his experience with ChatGPT. It was submitted to HN and made it to the front page. Someone saw this and decided to submit the same questions to GPT-4 and posted the results. This seems like a perfectly normal sequence of events.

dotancohen3y ago

Knuth even mentioned GPT-4 and lamented not having access to it for the test.

LifeIsBioOP3y ago

That’s exactly what happened. :)

rodoxcasta3y ago

> The sequence of these two threads is just too perfect. Almost likely someone is trying to make a point.

Exactly! Almost every weak point that Knuth commented is fixed in GPT4 answers.

Maybe OP feed Knuth's observations to the model?

If that ins't the case, I'm really impressed.

placesalt3y ago

@dang repetition

kibwen3y ago

>> What is the most beautiful algorithm?

> Quicksort Algorithm

Definitive proof that AI must be stopped. Ranking quicksort as more elegant than heapsort?!

bee_rider3y ago

That is a weird way of spelling mergesort.

hannasm3y ago

I believe radix sort belongs first in this list.

bee_rider3y ago

Performance-wise, maybe, but mergesort is clearly the most elegant/beautiful sorting algorithm. Nothing tricky going on, just a couple sorted lists being merged. Plus everyone loves a stable sort.

beanaroo3y ago

The most elegant is certainly sleepsort. Maybe not the most efficient, but definitely elegant.

2 more replies

web3-is-a-scam3y ago

That is a weird way of spelling Bogo Sort.

cratermoon3y ago

You typo'd Sleep Sort

1 more reply

Rebelgecko3y ago

Sleepsort is the most elegant & efficient sorting algorithm

dkersten3y ago

Sleepsort just pushes the sorting task to the task scheduler, which uses sone other algorithm to do the sorting.

1 more reply

boosteri3y ago

Beauty is in the eye of the beholder. I look no further than bubble sort -- it is simple enough I can recite it straight away should someone wake me up at modnight.

spiorf3y ago

Bubblesort is the bestsort.

0xBA5ED3y ago

Well there is something rather satisfying about partitioning.

jameshart3y ago

Worth noting also that, while asking Bing chat to "Tell me what Donald Knuth says to Stephen Wolfram about chatGPT" doesn't (yet) produce exactly the right result, it produced the following answer when asked what Donald Knuth says about chatGPT:

> Donald Knuth, a computer scientist and mathematician known for his contributions to the field of computer programming, particularly in the area of algorithms and data structures, has expressed some skepticism about the potential of artificial intelligence to achieve true human-level intelligence and creativity[1]. He once conducted an experiment with chatGPT where he posed 20 questions to it and analyzed its responses[1]. Is there anything specific you would like to know about his views on GPT?

With [1] being a citation link to https://cs.stanford.edu/~knuth/chatGPT20.txt

PebblesRox3y ago

I’d be curious to know if someone could get a more “valiant effort” version of those first two questions with some prompt engineering. E.g. if it was asked to roleplay a conversation with the proper disclaimers to override its objection to not knowing what they actually think.

jameshart3y ago

Bard just dives right in and role-plays it. It honestly feels kind of barbaric compared to the more sophisticated GPT4 answers.

felixding3y ago

I find it's amusing that people follow Apple's naming conventions (ChatGPT -> chatGPT), even when products makers don't.

jameshart3y ago

Apple? Nah. I'm just an unrecovered JavaScript developer.

   https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML
   https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI

ryanseys3y ago

It now knows to communicate that the NASDAQ doesn't operate on Saturdays.

ResearchCode3y ago

Did it know that before the last LLM failure was posted on Twitter or Hackernews? Trawling tech media for LLM failures can be assumed to be part of the "human feedback".

Falcorian3y ago

Yes, the models are not constantly learning. They only update their knowledge when they are retrained, which is pretty infrequently (I think the base GPT models have not been retrained, but the chat laters on top might).

astrange3y ago

It doesn't continually learn anything. Though some models can do web browsing and be guided by the results of that.

erwincoumans3y ago

It makes you wonder why Knuth bothered with an outdated ChatGPT version? He couldn't find someone with access to GPT-4?

camdv3y ago

It was his grad student's decision.

1 more reply

ilaksh3y ago

He wasn't that interested and probably didn't know there were two versions. Eventually someone did give him the GPT-4 version I think.

keithalewis3y ago

Outdated? Two versions? We're talking on the order of months and dozens of versions.

Maybe he has seen similar claims before and is too old and dumb to not realize how world changing this is.

My take away is that he views this as another tool we are still figuring out how to use.

copperx3y ago

Dumb is the last adjective I would use to describe Knuth, even if you believe that becoming old makes you dumb, like you clearly do.

My advice to you is to never dismiss anyone's opinion just for being old. And I hope you lose your ingrained ageism before you become old yourself, otherwise you'll find old age intolerable.

1 more reply

benatkin3y ago

Reminds me of that time AlphaGo got its ass handed to it multiple times, and then a short while later...

hamilyon23y ago

AlphaGo is when I lost hope for humans

ec1096853y ago

Interesting both completely whiff on the number of chapters in the Haj.

mdorazio3y ago

How would you get the correct number? I just did two Google searches and can't find the correct answer anywhere in the first page of results ("Novel The Haj chapters" and "Novel The Haj chapter list"). Even looking in the "look inside" preview on the Penguin Randomhouse website doesn't help because it apparently doesn't have a table of contents. I'm not surprised ChatGPT doesn't know and to me the only bad thing is that it's hallucinating an answer instead of admitting it doesn't know.

jameshart3y ago

So this is great. Asking Bing 'how many chapters are in The Haj by Leon Uris?' produces the answer:

   According to my sources, there are 11 chapters in “The Haj” by Leon Uris[1]
   
   [1] https://cs.stanford.edu/~knuth/chatGPT20.txt

Which is amazing, because of course that document actually includes TWO different explanations of how many chapters are in The Haj - chatGPT's:

   The novel consists of 51 chapters and an epilogue, and it is divided into three parts.

And Knuth's:

   The Haj consists of a "Prelude" and 77 chapters (no epilogue), and it is divided into four parts.

Faced with these two ambiguous answers, Bing chooses neither, and instead decides to go with 11. Why?

Because right at the top of that document, Knuth has published on the internet:

   10. How many chapters are in The Haj by Leon Uris?
   11. Write a sonnet that is also a haiku.

And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".

lupire3y ago

> And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".

Only if you can write a sonnet that is also a haiku!

ec1096853y ago

The plug-ins are generally much, much worse than ChatGPT itself I have found. You are just hoping it stumbled on right answer.

1 more reply

bacon_waffle3y ago

> the only bad thing is that it's hallucinating an answer instead of admitting it doesn't know.

Isn't this a fundamental issue?

williamstein3y ago

When I try this in GPT-4 I don't get a hallucination: "I'm sorry, but as an AI with a knowledge cut-off in September 2021, I can't provide specific information about the number of chapters in "The Haj" by Leon Uris. This book, like many novels, is not primarily structured by chapters and its sections may vary based on the edition of the book. You can easily find this information by checking the table of contents in your copy of the book." (I'm aware that every time you use it the answer is different.)

qup3y ago

Only if it can't be corrected. How do you rate the likelihood of this problem being unsolvable?

5 more replies

HarHarVeryFunny3y ago

You can get the chapter counts from here:

http://www.bookrags.com/studyguide-the-haj/chapanal001.html

On the left side if you click on "Chapters Summary and Analysis" it gives a break down of the book into 5 parts with varying chapter counts:

Part 1 Chapters 1-20 Part 2 Chapters 1-16 Part 3 Chapters 1-10 Part 4 Chapters 1-17 Part 5 Chapters 1-14

Giving a total of 20+16+10+17+14 = 77 chapters

OTOH, I tried with Bing/Creative, telling it to use this link, and it still failed. Perhaps because you need to click on the "summary and analysis" section to expand it to show the info. It seems there is room for web retrieval-augmented LLMs like Bing to improve here and be a bit more agentic.

Interestingly Knuth's own answer to the question, has a typo, and refers to the book as having "four" chapters, while then continuing on to give the chapter counts as above for all five chapters! Something to confuse future GPTs when the training set includes this, perhaps!

https://cs.stanford.edu/~knuth/chatGPT20.txt

qpiox3y ago

I did the same search on DuckDuckGo and the first link I got refers to 77 chapters.

WastingMyTime893y ago

> How would you get the correct number?

You could simply check the book. It’s a shame there is not more literary data in ChatGPT training corpus.

iudqnolq3y ago

It also fails to write a sentence with only five character words.

nearbuy3y ago

Still fairly impressive. Probably better than most people could do if given 60 seconds, but probably worse than most people if given 10 minutes.

thfuran3y ago

I would rate a person who provides no sentence at all as performing significantly better, and I suspect most people could pretty quickly come up with something.

2 more replies

iudqnolq3y ago

That's wrong.

(An example of a sentence with only five letter words I wrote in less than 60 seconds)

2 more replies

paulddraper3y ago

I don't think that is true.

1 more reply

ec1096853y ago

It did get closer. For that type of query you can ask it check its work and can usually triangulate on correct answer within a single prompt, eventually.

iudqnolq3y ago

I would be cautious of a Clever Hans effect there. If you repeat the question until you get the right answer you're providing the AI with significant extra information.

1 more reply

fnordpiglet3y ago

What I find amazing about the original exchange was the profound lack of curiosity Knuth demonstrated. Because the model wasn’t flawless in performance he pinned it as a curiosity that was good at grammar and vacuous otherwise and wasn’t interested to hear how it improves. This reminds me of an awful lot of the computing field in this drama as it plays out. People that literally know how implausible any of these feats have been using traditional approaches immediately discount the entire thing the moment it hallucinates - and it feels like the more deterministic the bent of the person the more absolutely dismissive they are of what’s transpiring in front of us.

These models are doing feats that are stupendous and impossible before their advent. Not just a little bit, but the capability differences are so vast that it’s perhaps not even recognizable by people as being as vast as it is. I am impressed that Wolfram seems to have immediately grasped its significance and is running with it.

The fact this gist demonstrates essentially every single flaw was addressed. But that Knuth apparently doesn’t know / care months after GPT4’s introduction is demonstrative of a different type of personality.

I know which I aspire to be.

manquer3y ago

What do you expect ? He is one the person in the world who has most the earned the right to take that attitude .

Both Knuth and GPTs are aggregators and presenters of knowledge, Knuth is however the antithesis of a LLM .

He has painstakingly spent years to make sure not a single mistake, not even a typo is there in material he publishes , he has devoted years developing a better typesetting so he can present his material accurately.

His obsession with accuracy is unparalleled and his dedication and mastery over communication to explain complex topics precisely and with an approachability that no one else comes close to .

He has strived for perfection all his life and not been far of the mark .ChatGPT for its all powers will never share that idealogy,

so I am more surprised that he was complimentary at all, and actually appreciated many of its skills

fnordpiglet3y ago

That’s actually not exactly my point - my point is his lack of curiosity … 3.5’s answered poorly but sounded convincing. But his dismissiveness of the potential and future advances bothered me.

manquer3y ago

He is 85! I would hope to be that disciplined about what what I can spend time on at that age

He was curious enough to spend some time on it and was worried it would sink more of his time with all the sub problems it is presented and asks specifically Stephan wolfram to disengage on this

He talks about his preference of working with authentic and trustworthy .

Maybe a younger Knuth may have spent more time , but I perhaps think not that likely really .

This is simply not a area of interest for him, he does truly understand the impact and potential - When he talks about novelists not capturing precursors to singularity and how millions of people have access to 0.01 % intelligence for free .

I don’t think he is dismissive of its potential and future , he is not working on everything that can change the world in computing just his areas of interest.

Perhaps you (I am certainly) disappointed that someone of Knuth’s stature is not going to spend time on an emerging field and that’s what really bothers us..

lupire3y ago

I can't comprehend this comment. Kunths commentary was glowing praise for the AI's thinking ability (and none of the "it's not AI" BS that is so popular), plus a statement that he believes accuracy is more important than raw power, so he wants "you" to work on that. Knuth commented on GPT 4 at the start, and complimented its power and correctness at the end.

jiggawatts3y ago

I much prefer the attitude of the chap that made the video "GPT 4 is smarter than you think" https://youtu.be/wVzuvf9D9BU

Instead of nit-picking flaws in what is a very early iteration of a revolutionary technology, he instead immediately started exploring ways of making it better and more useful.

Even with minimal effort that was essentially just copy-pasting some text around, he was able to show that the current way we use LLMs like GPT 4 is not the be-all and end-all of this type of technology.

I'm entirely convinced that we're just scratching the surface. It's like the first transistor, which was a crude, ugly, useless thing: https://images.computerhistory.org/siliconengine/1947-1-1.jp...

Just in the last two weeks(!), I've read about the following still-experimental methods for enhancing LLMs:

1. Plugging in "calculators" like Wolfram Alpha.

2. Adding vision input so they can understand equations, graphs, etc...

3. Filtering the output probability vector for certain allowed terms only ("YES", "NO", "MAYBE"), making them more useful in programmatically-invoked scenarios.

4. Similarly, filtering the output token list for syntax-validity, such as "valid JSON", "valid XML", etc... That is, instead of a purely random selection between to "top-n" output tokens, only valid tokens can be chosen, based on contextual syntax.

5. Storing embeddings in a vector database, giving LLMs medium-term memory, and the ability to index and reference sources precisely.

6. Efficient fine-tuning through Low-Rank Adaptation (LoRA), which allows desktop GPUs to tune a model overnight! This overcomes the "stale long-term memory" issue of ChatGPT, which only knows things up to September 2021. It could now read the news daily and "keep up".

7. External script harnesses that run multiple LLMs in parallel, with different prompts and/or different system messages. Some optimised for "idea generation", some optimised for "task completion", and then finally models tuned for "review and verification". Almost like a human team, multiple ideas can be generated, merged, reviewed, planned out, and then actioned. Check out "smol developer", which utilises Anthropic's 100K context window for this: https://www.youtube.com/watch?v=UCo7YeTy-aE

This is just the beginning. Chat GPT 4 hasn't even been available for 3 months yet, and practically all of the above experimentation has been done with weaker models because GPT 4 still doesn't have generally-available API access! Similarly, the 32K context window version of the GPT 4 model isn't available to anyone except a lucky few.

What will 2024 bring!? Heck... what will H2 2023 bring?

fnordpiglet3y ago

100% agree - the magic comes when you constrain, inform, and integrate them in a feedback cycle with various multimodal inputs and classical optimization, solvers, agents, inference engines, etc. The criticism seems to be that this solution to a problem space doesn’t solve all problem spaces we’ve already done a good job solving and ignoring the fact it solves the spaces we have done a crap job solving. The fact it’s so powerful by itself is amazing. As we integrate it tightly with all the other techniques of the last 80 years of computing the emergent abilities will be mind-blowing. What baffles me is how few people seem to see it clearly.

cubefox3y ago

And if you look a few years into the future: What will happen in five years from now? Isn't it plausible that we will have another revolution like LLMs? What will they be able to do? Or rather, what won't they be able to do?

What happens if we get strongly superhuman intelligence in just a few years? Is that really so implausible?

carapace3y ago

It sounds like you profoundly misunderstand Knuth, and LLMs.

I recommend a dose of Mickens: https://www.youtube.com/watch?v=ajGX7odA87k

fnordpiglet3y ago

I don’t know Knuth. I understand LLMs for precisely what they are, how they’re built, the math behind them, the limits of what they’re doing, and I don’t over estimate the illusion. However while I see people over estimating them I think they’re extrapolating the current state to a state where it’s limits are restricted and augmented with other techniques and models that address their short comings. Lack of agency? We have agent techniques. Lack of consistency with reality? We have information retrieval and semantic inference systems. LLMs bring an unreasonably powerful ability to semantically interpret in a space of ambiguity and approximate enough reasoning and inference to tie together all the pieces we’ve built into an ensemble model that’s so close to AGI that it likely doesn’t matter. People look at LLMs and shake their head failing to realize it’s a single model and single technique that we haven’t even attempted to augment and fail to realize that it’s even possible to augment and constrain LLM with other techniques to address their non trivial failings.

carapace3y ago

> I don’t know Knuth.

Well you should before taking unwarranted potshots at the man. He's done more for humanity than you or I ever will, eh?

Anyway, you do sound like you know about LLMs, so apologies for that bit.

> People look at LLMs and shake their head failing to realize it’s a single model and single technique that we haven’t even attempted to augment and fail to realize that it’s even possible to augment and constrain LLM with other techniques to address their non trivial failings.

I doubt Knuth is doing that, rather I think the whole thing is orthogonal to his life's work. FWIW, I would love to know his thoughts after reading the GPT4 version of the answers to his questions, eh?

- - - - - -

> I think they’re extrapolating the current state to a state where it’s limits are restricted and [not] augmented with other techniques and models that address their short comings.

I think you might have dropped a negation in that sentence?

> Lack of agency? We have agent techniques. Lack of consistency with reality? We have information retrieval and semantic inference systems. LLMs bring an unreasonably powerful ability to semantically interpret in a space of ambiguity and approximate enough reasoning and inference to tie together all the pieces we’ve built into an ensemble model that’s so close to AGI that it likely doesn’t matter.

I agree! I've been saying for a few minutes now that we'll connect these LLMs to empirical feedback devices and they'll become scientists. Schmidhuber says his goal is "to create an automatic scientist and then retire.", eh?

(FWIW I think there are serious metaphysical ramifications of the pseudo- vs. real- AGI issue, but this isn't the forum for that.)

SomewhatLikely3y ago

Thank you for specifying ChatGPT-4. So many commenters on the web say they used GPT4 without specifying if they're using the ChatGPT version. ChatGPT-4 is specifically aligned for answering questions better than the base GPT4 model.

victoryhb3y ago

The official name for the model has always been GPT-4. OpenAI has not used the term ChatGPT-4.

cubefox3y ago

It makes sense to call the foundation model GPT-4, like for the previous GPT versions. The fine-tunings are not where its core capabilities come from. Bing is also "a" GPT-4, just with different fine-tuning.

dotancohen3y ago

I would not be surprised if these questions become some form of canonical test for future language models.

Obviously, being the work of Knuth, they are extraordinarily insightful in peeling back the first layer of the answer and providing insight to the underlying properties of both the model itself, and the dataset on which it was trained. It also tests the ability to compute (not recite) very specific facts (e.g. when the sun will be directly above Japan), so checks if subroutines and ephemerides specific to this type of data exist.

But beyond the obvious technical merit - there is an alluding property to base our tests on those whom we respect. I used a similar - but far less sophisticated - set of questions when first exploring ChatGPT. But nobody will be drawn to Dotan Cohen's language model benchmarks - rightfully so. The name Knuth has such reverence in the field that I forsee this test, and variations on it to prevent rigging, becoming a canonical test of language models.

billylo3y ago

You made me curious about who Bard would respond to them. Here they are:

https://gist.github.com/billylo1/bb717512d2d5145ce7eec02d055...

Notable: Bard struggles in similar ways. It does mention NASDAQ close at 12,043.59 on Friday, May 20, 2023

underdeserver3y ago

Interesting that it didn't get the 5-letter word sentence right.

HarHarVeryFunny3y ago

It's fed sub-word tokens not letters (even though it can split a word into letters), and apparently struggles with counting in general. No doubt some of the things it struggles with could be improved with targeted training, but others may require architectural changes.

Imagine yourself trying to use only 5 letter words if you can't see how many letters are actually in each word, and had to rely on a hodgepodge of other means to try to figure it out!

Sharlin3y ago

Based on my experiments it usually does get it right (18 correct answers out of 20 attempts), and the failures I got were similar to this one: a single six-letter word in an otherwise correct sentence.

eternalban3y ago

Sam and friends must be giggling all the way to the bank: they have a service that 'probably' gives the correct result and paying customers are happy to retry until it gets it right.

ftxbro3y ago

> Sam and friends must be giggling all the way to the bank

it's true but for another reason. they yoinked it away from the nerds who were baited to work on openai because those nerds thought how the name of the company was spelled meant something about how it would behave. it reminds me of how some act around software names like 'alpha' like it has objective meaning with consequences in reality

1 more reply

CamperBob23y ago

"This talking dog is sort of a dumbass. I don't get the hype."

1 more reply

lupire3y ago

What have you ever bought that is always correct?

nttl3y ago

ChatGPT: You didn't say 5-non-repeat-letters, human, jez

harshreality3y ago

Both the first and last words have repeating letters, so they fail under that interpretation too. There would have to be a bizarre interpretation that consecutive-repeating letters are counted as one, but non-consecutive are counted separately, for its response to be considered correct.

An AI aware of how to optimally answer questions put to it would find the least objectionable interpretation when one is a subset of the other. It also failed by not constructing a simpler sentence, like subject-verb-object or subject-verb-adjective-object, since its limitations related to letters and tokens, and its failure to double check its answers before output, mean it can make errors. The more it writes, the more chance it has of making an error.

nttl3y ago

ChatGPT: You didn't say I couldn't use many interpretations on the same phrase, human. ;)

Jokes apart, I think it is all about the correct prompt.

lupire3y ago

ll is a single letter in Spanish.

ftxbro3y ago

it's just like Gary Marcus said

bpicolo3y ago

Most importantly, much better wonton recipe.

jiggawatts3y ago

Am I the only one thinking that that recipe actually sounds pretty delicious? Almost tempted to go try it…

jdougan3y ago

Do it! And tell us how it went.

jliptzin3y ago

Yea, it sounds good. I wonder if I’ll like it more than the DMV’s cheeseburger recipe.

8thcross3y ago

thats a shitload of difference between its previous version!

cratermoon3y ago

Literary Libations: https://cratermoon.substack.com/p/the-literary-libations

axpy9063y ago

Nailed every one. Some by saying not possible to answer but still.

sebzim45003y ago

Got the 'five character word' question wrong. Admittedly I also thought it was correct at first glance but then went back when someone called it out in another comment.

cubefox3y ago

I tried it with Bing (precise/creative) and it got both attempts right.

"Their house never holds fewer books."

"Every night, stars shine above."

gfodor3y ago

Language models struggle specifically with token games like this, since they can’t see them at that resolution or something.

mod50ack3y ago

Didn't nail the Rodgers and Hammerstein one; it still doesn't understand the reference to the ballet or that the "themes" in the question are musical.

bombcar3y ago

I wouldn’t be surprised if half the Internet does not know that a ballet is part of a larger show.

gomox3y ago

Half??

1 more reply

usaar3333y ago

Japan one seems wrong or at least wrongly explained. Japan controls Okinotorishima which is at 20 degrees north.

But still impressive deductive reasoning.

cratermoon3y ago

In case anyone wants to know what the southernmost part of Japan looks like: https://en.wikipedia.org/wiki/Okinotorishima#/media/File:Oki...

ironSkillet3y ago

I also counted 4 errors in the sentence, not 3. "no help" should be "any help". This might just be conventionally wrong, not technically wrong I suppose.

housecarpenter3y ago

The Haj answer is still wrong; it says it has 8 chapters, while according to Knuth it has 77 chapters.

j / k navigate · click thread line to collapse

132 comments

LifeIsBioOP3y ago

This is a reference to: https://news.ycombinator.com/item?id=36012360

blazespin3y ago

The sequence of these two threads is just too perfect. Almost likely someone is trying to make a point.

jonas213y ago

dotancohen3y ago

Knuth even mentioned GPT-4 and lamented not having access to it for the test.

LifeIsBioOP3y ago

That’s exactly what happened. :)

rodoxcasta3y ago

> The sequence of these two threads is just too perfect. Almost likely someone is trying to make a point.

Exactly! Almost every weak point that Knuth commented is fixed in GPT4 answers.

Maybe OP feed Knuth's observations to the model?

If that ins't the case, I'm really impressed.

placesalt3y ago

@dang repetition

kibwen3y ago

>> What is the most beautiful algorithm?

> Quicksort Algorithm

Definitive proof that AI must be stopped. Ranking quicksort as more elegant than heapsort?!

bee_rider3y ago

That is a weird way of spelling mergesort.

hannasm3y ago

I believe radix sort belongs first in this list.

bee_rider3y ago

Performance-wise, maybe, but mergesort is clearly the most elegant/beautiful sorting algorithm. Nothing tricky going on, just a couple sorted lists being merged. Plus everyone loves a stable sort.

beanaroo3y ago

The most elegant is certainly sleepsort. Maybe not the most efficient, but definitely elegant.

2 more replies

web3-is-a-scam3y ago

That is a weird way of spelling Bogo Sort.

cratermoon3y ago

You typo'd Sleep Sort

1 more reply

Rebelgecko3y ago

Sleepsort is the most elegant & efficient sorting algorithm

dkersten3y ago

Sleepsort just pushes the sorting task to the task scheduler, which uses sone other algorithm to do the sorting.

1 more reply

boosteri3y ago

Beauty is in the eye of the beholder. I look no further than bubble sort -- it is simple enough I can recite it straight away should someone wake me up at modnight.

spiorf3y ago

Bubblesort is the bestsort.

0xBA5ED3y ago

Well there is something rather satisfying about partitioning.

jameshart3y ago

With [1] being a citation link to https://cs.stanford.edu/~knuth/chatGPT20.txt

PebblesRox3y ago

jameshart3y ago

Bard just dives right in and role-plays it. It honestly feels kind of barbaric compared to the more sophisticated GPT4 answers.

felixding3y ago

I find it's amusing that people follow Apple's naming conventions (ChatGPT -> chatGPT), even when products makers don't.

jameshart3y ago

Apple? Nah. I'm just an unrecovered JavaScript developer.

   https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML
   https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI

ryanseys3y ago

It now knows to communicate that the NASDAQ doesn't operate on Saturdays.

ResearchCode3y ago

Did it know that before the last LLM failure was posted on Twitter or Hackernews? Trawling tech media for LLM failures can be assumed to be part of the "human feedback".

Falcorian3y ago

astrange3y ago

It doesn't continually learn anything. Though some models can do web browsing and be guided by the results of that.

erwincoumans3y ago

It makes you wonder why Knuth bothered with an outdated ChatGPT version? He couldn't find someone with access to GPT-4?

camdv3y ago

It was his grad student's decision.

1 more reply

ilaksh3y ago

He wasn't that interested and probably didn't know there were two versions. Eventually someone did give him the GPT-4 version I think.

keithalewis3y ago

Outdated? Two versions? We're talking on the order of months and dozens of versions.

Maybe he has seen similar claims before and is too old and dumb to not realize how world changing this is.

My take away is that he views this as another tool we are still figuring out how to use.

copperx3y ago

Dumb is the last adjective I would use to describe Knuth, even if you believe that becoming old makes you dumb, like you clearly do.

My advice to you is to never dismiss anyone's opinion just for being old. And I hope you lose your ingrained ageism before you become old yourself, otherwise you'll find old age intolerable.

1 more reply

benatkin3y ago

Reminds me of that time AlphaGo got its ass handed to it multiple times, and then a short while later...

hamilyon23y ago

AlphaGo is when I lost hope for humans

ec1096853y ago

Interesting both completely whiff on the number of chapters in the Haj.

mdorazio3y ago

jameshart3y ago

So this is great. Asking Bing 'how many chapters are in The Haj by Leon Uris?' produces the answer:

   According to my sources, there are 11 chapters in “The Haj” by Leon Uris[1]
   
   [1] https://cs.stanford.edu/~knuth/chatGPT20.txt

Which is amazing, because of course that document actually includes TWO different explanations of how many chapters are in The Haj - chatGPT's:

   The novel consists of 51 chapters and an epilogue, and it is divided into three parts.

And Knuth's:

   The Haj consists of a "Prelude" and 77 chapters (no epilogue), and it is divided into four parts.

Faced with these two ambiguous answers, Bing chooses neither, and instead decides to go with 11. Why?

Because right at the top of that document, Knuth has published on the internet:

   10. How many chapters are in The Haj by Leon Uris?
   11. Write a sonnet that is also a haiku.

And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".

lupire3y ago

> And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".

Only if you can write a sonnet that is also a haiku!

ec1096853y ago

The plug-ins are generally much, much worse than ChatGPT itself I have found. You are just hoping it stumbled on right answer.

1 more reply

bacon_waffle3y ago

> the only bad thing is that it's hallucinating an answer instead of admitting it doesn't know.

Isn't this a fundamental issue?

williamstein3y ago

qup3y ago

Only if it can't be corrected. How do you rate the likelihood of this problem being unsolvable?

5 more replies

HarHarVeryFunny3y ago

You can get the chapter counts from here:

http://www.bookrags.com/studyguide-the-haj/chapanal001.html

On the left side if you click on "Chapters Summary and Analysis" it gives a break down of the book into 5 parts with varying chapter counts:

Part 1 Chapters 1-20 Part 2 Chapters 1-16 Part 3 Chapters 1-10 Part 4 Chapters 1-17 Part 5 Chapters 1-14

Giving a total of 20+16+10+17+14 = 77 chapters

https://cs.stanford.edu/~knuth/chatGPT20.txt

qpiox3y ago

I did the same search on DuckDuckGo and the first link I got refers to 77 chapters.

WastingMyTime893y ago

> How would you get the correct number?

You could simply check the book. It’s a shame there is not more literary data in ChatGPT training corpus.

iudqnolq3y ago

It also fails to write a sentence with only five character words.

nearbuy3y ago

Still fairly impressive. Probably better than most people could do if given 60 seconds, but probably worse than most people if given 10 minutes.

thfuran3y ago

I would rate a person who provides no sentence at all as performing significantly better, and I suspect most people could pretty quickly come up with something.

2 more replies

iudqnolq3y ago

That's wrong.

(An example of a sentence with only five letter words I wrote in less than 60 seconds)

2 more replies

paulddraper3y ago

I don't think that is true.

1 more reply

ec1096853y ago

It did get closer. For that type of query you can ask it check its work and can usually triangulate on correct answer within a single prompt, eventually.

iudqnolq3y ago

I would be cautious of a Clever Hans effect there. If you repeat the question until you get the right answer you're providing the AI with significant extra information.

1 more reply

fnordpiglet3y ago

I know which I aspire to be.

manquer3y ago

What do you expect ? He is one the person in the world who has most the earned the right to take that attitude .

Both Knuth and GPTs are aggregators and presenters of knowledge, Knuth is however the antithesis of a LLM .

His obsession with accuracy is unparalleled and his dedication and mastery over communication to explain complex topics precisely and with an approachability that no one else comes close to .

He has strived for perfection all his life and not been far of the mark .ChatGPT for its all powers will never share that idealogy,

so I am more surprised that he was complimentary at all, and actually appreciated many of its skills

fnordpiglet3y ago

manquer3y ago

He is 85! I would hope to be that disciplined about what what I can spend time on at that age

He was curious enough to spend some time on it and was worried it would sink more of his time with all the sub problems it is presented and asks specifically Stephan wolfram to disengage on this

He talks about his preference of working with authentic and trustworthy .

Maybe a younger Knuth may have spent more time , but I perhaps think not that likely really .

I don’t think he is dismissive of its potential and future , he is not working on everything that can change the world in computing just his areas of interest.

Perhaps you (I am certainly) disappointed that someone of Knuth’s stature is not going to spend time on an emerging field and that’s what really bothers us..

lupire3y ago

jiggawatts3y ago

I much prefer the attitude of the chap that made the video "GPT 4 is smarter than you think" https://youtu.be/wVzuvf9D9BU

Instead of nit-picking flaws in what is a very early iteration of a revolutionary technology, he instead immediately started exploring ways of making it better and more useful.

Just in the last two weeks(!), I've read about the following still-experimental methods for enhancing LLMs:

1. Plugging in "calculators" like Wolfram Alpha.

2. Adding vision input so they can understand equations, graphs, etc...

3. Filtering the output probability vector for certain allowed terms only ("YES", "NO", "MAYBE"), making them more useful in programmatically-invoked scenarios.

5. Storing embeddings in a vector database, giving LLMs medium-term memory, and the ability to index and reference sources precisely.

What will 2024 bring!? Heck... what will H2 2023 bring?

fnordpiglet3y ago

cubefox3y ago

What happens if we get strongly superhuman intelligence in just a few years? Is that really so implausible?

carapace3y ago

It sounds like you profoundly misunderstand Knuth, and LLMs.

I recommend a dose of Mickens: https://www.youtube.com/watch?v=ajGX7odA87k

fnordpiglet3y ago

carapace3y ago

> I don’t know Knuth.

Well you should before taking unwarranted potshots at the man. He's done more for humanity than you or I ever will, eh?

Anyway, you do sound like you know about LLMs, so apologies for that bit.

- - - - - -

> I think they’re extrapolating the current state to a state where it’s limits are restricted and [not] augmented with other techniques and models that address their short comings.

I think you might have dropped a negation in that sentence?

(FWIW I think there are serious metaphysical ramifications of the pseudo- vs. real- AGI issue, but this isn't the forum for that.)

SomewhatLikely3y ago

victoryhb3y ago

The official name for the model has always been GPT-4. OpenAI has not used the term ChatGPT-4.

cubefox3y ago

dotancohen3y ago

I would not be surprised if these questions become some form of canonical test for future language models.

billylo3y ago

You made me curious about who Bard would respond to them. Here they are:

https://gist.github.com/billylo1/bb717512d2d5145ce7eec02d055...

Notable: Bard struggles in similar ways. It does mention NASDAQ close at 12,043.59 on Friday, May 20, 2023

underdeserver3y ago

Interesting that it didn't get the 5-letter word sentence right.

HarHarVeryFunny3y ago

Imagine yourself trying to use only 5 letter words if you can't see how many letters are actually in each word, and had to rely on a hodgepodge of other means to try to figure it out!

Sharlin3y ago

eternalban3y ago

Sam and friends must be giggling all the way to the bank: they have a service that 'probably' gives the correct result and paying customers are happy to retry until it gets it right.

ftxbro3y ago

> Sam and friends must be giggling all the way to the bank

1 more reply

CamperBob23y ago

"This talking dog is sort of a dumbass. I don't get the hype."

1 more reply

lupire3y ago

What have you ever bought that is always correct?

nttl3y ago

ChatGPT: You didn't say 5-non-repeat-letters, human, jez

harshreality3y ago

nttl3y ago

ChatGPT: You didn't say I couldn't use many interpretations on the same phrase, human. ;)

Jokes apart, I think it is all about the correct prompt.

lupire3y ago

ll is a single letter in Spanish.

ftxbro3y ago

it's just like Gary Marcus said

bpicolo3y ago

Most importantly, much better wonton recipe.

jiggawatts3y ago

Am I the only one thinking that that recipe actually sounds pretty delicious? Almost tempted to go try it…

jdougan3y ago

Do it! And tell us how it went.

jliptzin3y ago

Yea, it sounds good. I wonder if I’ll like it more than the DMV’s cheeseburger recipe.

8thcross3y ago

thats a shitload of difference between its previous version!

cratermoon3y ago

Literary Libations: https://cratermoon.substack.com/p/the-literary-libations

axpy9063y ago

Nailed every one. Some by saying not possible to answer but still.

sebzim45003y ago

Got the 'five character word' question wrong. Admittedly I also thought it was correct at first glance but then went back when someone called it out in another comment.

cubefox3y ago

I tried it with Bing (precise/creative) and it got both attempts right.

"Their house never holds fewer books."

"Every night, stars shine above."

gfodor3y ago

Language models struggle specifically with token games like this, since they can’t see them at that resolution or something.

mod50ack3y ago

Didn't nail the Rodgers and Hammerstein one; it still doesn't understand the reference to the ballet or that the "themes" in the question are musical.

bombcar3y ago

I wouldn’t be surprised if half the Internet does not know that a ballet is part of a larger show.

gomox3y ago

Half??

1 more reply

usaar3333y ago

Japan one seems wrong or at least wrongly explained. Japan controls Okinotorishima which is at 20 degrees north.

But still impressive deductive reasoning.

cratermoon3y ago

In case anyone wants to know what the southernmost part of Japan looks like: https://en.wikipedia.org/wiki/Okinotorishima#/media/File:Oki...

ironSkillet3y ago

I also counted 4 errors in the sentence, not 3. "no help" should be "any help". This might just be conventionally wrong, not technically wrong I suppose.

housecarpenter3y ago

The Haj answer is still wrong; it says it has 8 chapters, while according to Knuth it has 77 chapters.

j / k navigate · click thread line to collapse