undefined | Better HN

0 pointsYetAnotherNick2y ago0 comments

> it's not capable of actually reasoning

Define reasoning. Because in my definition GPT 4 could reason without doubt. It definitely can't reason better than experts in the field, but it could reason better than say interns.

0 comments

NovemberWhiskey2y ago

I don't have access to GPT 4 but I'd be interested to see how it does on a question like this:

"Say I have a container with 50 red balls and 50 blue balls, and every time I draw a blue ball from the container, I add two white balls back. After drawing 100 balls, how many of each different color ball are left in the container? Explain why."

... because on GPT 3.5 the answer begins like the below and then gets worse:

"Let's break down the process step by step:

Initially, you have 50 red balls and 50 blue balls in the container.

1) When you draw a blue ball from the container, you remove one blue ball, and you add two white balls back. So, after drawing a blue ball, you have 49 blue balls (due to removal) and you add 2 white balls, making it a total of 52 white balls (due to addition) ..."

If I was hiring interns this dumb, I'd be in trouble.

EDIT: judging by the GPT-4 responses, I remain of the opinion I'd be in trouble if my interns were this dumb.

YetAnotherNickOP2y ago

This is such a flawed puzzle. And GPT 4 answers it rightly. It is a long answer but the last sentence is "This is one possible scenario. However, there could be other scenarios based on the order in which balls are drawn. But in any case, the same logic can be applied to find the number of each color of ball left in the container."

NovemberWhiskey2y ago

The ability to identify that there isn't a simple closed form result is actually a key component of reasoning. Can you stick the answer it gives on a gist or something? The GPT 3.5 response is pure, self-contradictory word salad and of course delivered in a highly confident tone.

2 more replies

kenjackson2y ago

I asked GPT4 and it gave a similar response. So then I asked my wife and she said, "do you want more white balls at the end or not?" And I realized as CS or math question we assume that the draw is random. Other people assume that you're picking which ball to draw.

So I clarified to ChatGPT that the drawing is random. And it replied: "The exact numbers can vary based on the randomness and can be precisely modeled with a simulation or detailed probabilistic analysis."

I asked for a detailed probabilistic analysis and it gives a very simplified analysis. And then basically says that a Monte Carlo approach would be easier. That actually sounds more like most people I know than most people I know. :-)

sebzim45002y ago

I don't understand the question. Surely the answer depends which order you withdraw balls in? Is the idea that you blindly withdraw a ball at every step, and you are asking for the expected value of each number of ball at the end of the process?

Seems like quite a difficult question to compute exactly.

I reworded the question to make it clearer and then it was able to simulate a bunch of scenarios as a monte carlo simulation. Was your hope to calculate it exactly with dynamic programming? GPT-4 was not able to do this, but I suspect neither could a lot of your interns.

NovemberWhiskey2y ago

>I don't understand the question. Surely the answer depends which order you withdraw balls in? Is the idea that you blindly withdraw a ball at every step, and you are asking for the expected value of each number of ball at the end of the process?

These are very good questions that anyone with the ability to reason would ask if given this problem.

jiggawatts2y ago

"You're holding it wrong."

You're asking GPT to do maths in its head, the AI equivalent of a person standing in the middle of the room with no tools and getting grilled in a oral examination of their knowledge.

Instead, collaborate with it, while giving it the appropriate tools to help you.

I asked it to write a Monte Carlo simulation of the problem in Wolfram Mathematica script. It did this about 10-100x faster than I would have been able to. It made a few small mistakes with the final visualisation, but I managed to get it to output a volumetric plot showing the 3D scatter plot of the histogram of possible outcomes.

I even got it to save a video of the plot rotating: https://streamable.com/2aphbz

freejazz2y ago

AI can reason! Just not reasonably!

1 more reply

pertique2y ago

This is what I got on a basically brand new OpenAI account: https://chat.openai.com/share/5199c972-478d-406f-9092-061a6b...

All told, I'd say it's a decent answer.

Edit: I took it to completion:https://chat.openai.com/c/6cdd92f1-487a-4e1c-ab94-f2bdbf282d...

These were the first responses each time, with no massaging/retires/leading answers. I will say it's not entirely there. I re-ran the initial question a few times afterwards and one was basically giberish.

magospietato2y ago

GPT-4:

e composition of what's left in the container.

There's a couple of scenarios, which depend on when you run out of blue balls:

1) If you draw all 50 blue balls first, for every blue ball you draw, you are adding 2 white balls. Hence, once you've drawn all the blue balls, you have replaced them with 100 white balls. This means you've drawn 50 balls total and there's now 100 white balls and 50 red balls (150 balls) in the container. Then, you continue and draw the next 50 balls. These could be any combination of white and red balls, we don't know which color is preferred when you continue drawing.

2a) If you draw all red balls in these next 50 draws, you would have 50 red balls out and 50 blue balls out. So, you would end with 100 white balls in the container.

2b) If you draw all white balls in these next 50 draws you would have 50 red balls, 50 blue balls, and 50 white balls out. So, you would end with 50 white balls and 50 red balls in the container.

2c) If you draw a mixture of red and white balls in these next 50 draws, the number leftover in the container would vary according to the specific combination drawn.

Remember, the order in which the balls are drawn matters to this question since we have a rule in place that changes the overall total (or population) of balls within the container. In each of these scenarios, you've drawn 50 times initially (all blue) and then 50 times again (in some unknown combination of red and white). You've drawn 100 times total and changed the number of white balls in the container from zero to an amount dependent on how many times you drew a white ball on your second round of 50 draws.

NovemberWhiskey2y ago

Yeah, that's still pretty much nonsense isn't it?

2b) If you draw all white balls in these next 50 draws you would have 50 red balls, 50 blue balls, and 50 white balls out. So, you would end with 50 white balls and 50 red balls in the container.

... so after removing 100 balls, I've removed 150 balls? And the 150 balls that I've removed are red, white and blue despite the fact that I removed 50 blue balls initially and then 50 white ones.

1 more reply

rafark2y ago

GPT 3.5 is VERY dumb when compared to GPT 4. Like, the difference is massive.

Jensson2y ago

GPT 4 still does a lot of dumb stuff on this question, you see several people post outright wrong answer and say "Look how gpt-4 solved it!". That happens quite a lot in these discussions, so it seems like the magic to get gpt-4 to work is that you just don't check its answers properly.

1 more reply

always2slow2y ago

I ran this through GPT-4 Advanced Data Analytics version: https://chat.openai.com/share/b84feb03-22ed-4231-be41-cdb725...

Seems like it reasons it's way to this answer at the end to me: Mind you, while averages are insightful, they don't capture the delightful unpredictability of each individual run. Would you like to explore this delightful chaos further, or shall we move on to other intellectual pursuits?

kaibee2y ago

https://chat.openai.com/share/a9806bd1-e5a9-4fea-981b-2843e6...

Took a bit of massaging and I enabled the Data Analysis plugin which lets it write python code and run it. It looks like the simulation code is correct though.

NovemberWhiskey2y ago

>Let's assume you draw x blue balls in 100 draws. Then you would have drawn 100−x red balls.

Uhm.

Kim_Bruning2y ago

I came at it from a different angle. The simulation code in my case had a bug which I needed to point out. Then it got a similar final answer.

_trapexit2y ago

It's not reasoning. It's word prediction. At least at the individual model level. OpenAI is likely using a collection of models.

intended2y ago

ChatGPT is trained on text that includes most reasoning problems that people come up with.

You see reasoning issues when you use more real world examples, rather than theoretical tests.

I had 4 failure states.

1) Summarization: It summarized 3 transcripts correctly, for the fourth it described the speaker as a successful VC. The speaker was a professor.

2) It was to act as a classifier, with a short list of labels. Depending on the length of text, the classifier would swap over to text gen. Other issues included novel labels, new variations of labels, and so on.

3) Agents - This died on the vine. Leave having to learn asynch, vector DBs or whatever. You can never trust the output of an LLM, so you can never chain agents.

4) I focused on using ChatGPT to complete a project. I hadnt touched HTML ever - the goal was to use ChatGPT to build the site. This would cover design, content, structure, development, hosting, and improvements.

I still have trauma. Wrong code, bad design, were base issues. If code was correct, it simply meant I had dug a deeper grave. I had anticipated 70% of the work being handled by ChatGPT, it ended up at 30% at the most.

ChatGPT is great IF you already are a subject expert - you can brush over the issues and move on.

"Hallucinations" is the little bit of string that you pull on, and the rest unravels. There are no hallucinations, only humans can hallucinate - because we have an actual ground truth to work with.

LLMs are only creating the next token. For them to reason, they must be holding structures and proxies in some data store, and actively altering it.

Its easier to see once you deal with hallucinations.

mcguire2y ago

What is your definition?

YetAnotherNickOP2y ago

If it can solve basic logic problems, then it could reason. And if it could write code of a new game with new logic, then it could reason for sure.

Example of basic problem: In a shop, there are 4 dolls of different heights P,Q,R and S. S is neither as tall as P nor as short as R. Q is shorter than S but taller than R. If Kittu wants to purchase the tallest doll, which one should she purchase? Think step by step.

kaibee2y ago

Seems to handle it easily. https://chat.openai.com/share/4d8ab2af-f824-44c8-9311-e3893c...

szundi2y ago

Really?

j / k navigate · click thread line to collapse

0 comments

NovemberWhiskey2y ago

I don't have access to GPT 4 but I'd be interested to see how it does on a question like this:

... because on GPT 3.5 the answer begins like the below and then gets worse:

"Let's break down the process step by step:

Initially, you have 50 red balls and 50 blue balls in the container.

If I was hiring interns this dumb, I'd be in trouble.

EDIT: judging by the GPT-4 responses, I remain of the opinion I'd be in trouble if my interns were this dumb.

YetAnotherNickOP2y ago

NovemberWhiskey2y ago

2 more replies

kenjackson2y ago

sebzim45002y ago

Seems like quite a difficult question to compute exactly.

NovemberWhiskey2y ago

These are very good questions that anyone with the ability to reason would ask if given this problem.

jiggawatts2y ago

"You're holding it wrong."

You're asking GPT to do maths in its head, the AI equivalent of a person standing in the middle of the room with no tools and getting grilled in a oral examination of their knowledge.

Instead, collaborate with it, while giving it the appropriate tools to help you.

I even got it to save a video of the plot rotating: https://streamable.com/2aphbz

freejazz2y ago

AI can reason! Just not reasonably!

1 more reply

pertique2y ago

This is what I got on a basically brand new OpenAI account: https://chat.openai.com/share/5199c972-478d-406f-9092-061a6b...

All told, I'd say it's a decent answer.

Edit: I took it to completion:https://chat.openai.com/c/6cdd92f1-487a-4e1c-ab94-f2bdbf282d...

magospietato2y ago

GPT-4:

e composition of what's left in the container.

There's a couple of scenarios, which depend on when you run out of blue balls:

2a) If you draw all red balls in these next 50 draws, you would have 50 red balls out and 50 blue balls out. So, you would end with 100 white balls in the container.

2b) If you draw all white balls in these next 50 draws you would have 50 red balls, 50 blue balls, and 50 white balls out. So, you would end with 50 white balls and 50 red balls in the container.

2c) If you draw a mixture of red and white balls in these next 50 draws, the number leftover in the container would vary according to the specific combination drawn.

NovemberWhiskey2y ago

Yeah, that's still pretty much nonsense isn't it?

2b) If you draw all white balls in these next 50 draws you would have 50 red balls, 50 blue balls, and 50 white balls out. So, you would end with 50 white balls and 50 red balls in the container.

... so after removing 100 balls, I've removed 150 balls? And the 150 balls that I've removed are red, white and blue despite the fact that I removed 50 blue balls initially and then 50 white ones.

1 more reply

rafark2y ago

GPT 3.5 is VERY dumb when compared to GPT 4. Like, the difference is massive.

Jensson2y ago

1 more reply

always2slow2y ago

I ran this through GPT-4 Advanced Data Analytics version: https://chat.openai.com/share/b84feb03-22ed-4231-be41-cdb725...

kaibee2y ago

https://chat.openai.com/share/a9806bd1-e5a9-4fea-981b-2843e6...

Took a bit of massaging and I enabled the Data Analysis plugin which lets it write python code and run it. It looks like the simulation code is correct though.

NovemberWhiskey2y ago

>Let's assume you draw x blue balls in 100 draws. Then you would have drawn 100−x red balls.

Uhm.

Kim_Bruning2y ago

I came at it from a different angle. The simulation code in my case had a bug which I needed to point out. Then it got a similar final answer.

_trapexit2y ago

It's not reasoning. It's word prediction. At least at the individual model level. OpenAI is likely using a collection of models.

intended2y ago

ChatGPT is trained on text that includes most reasoning problems that people come up with.

You see reasoning issues when you use more real world examples, rather than theoretical tests.

I had 4 failure states.

1) Summarization: It summarized 3 transcripts correctly, for the fourth it described the speaker as a successful VC. The speaker was a professor.

3) Agents - This died on the vine. Leave having to learn asynch, vector DBs or whatever. You can never trust the output of an LLM, so you can never chain agents.

ChatGPT is great IF you already are a subject expert - you can brush over the issues and move on.

"Hallucinations" is the little bit of string that you pull on, and the rest unravels. There are no hallucinations, only humans can hallucinate - because we have an actual ground truth to work with.

LLMs are only creating the next token. For them to reason, they must be holding structures and proxies in some data store, and actively altering it.

Its easier to see once you deal with hallucinations.

mcguire2y ago

What is your definition?

YetAnotherNickOP2y ago

If it can solve basic logic problems, then it could reason. And if it could write code of a new game with new logic, then it could reason for sure.

kaibee2y ago

Seems to handle it easily. https://chat.openai.com/share/4d8ab2af-f824-44c8-9311-e3893c...

szundi2y ago

Really?

j / k navigate · click thread line to collapse