I’m a doctor: Here’s what I found when I asked ChatGPT to diagnose my patients (opens in new tab)

(inflecthealth.medium.com)

261 pointsblago3y ago359 comments

359 comments

Being a scifi geek and AI geek and neuroscience geek for pretty much the past 40 years, I've read countless predictions and scenarios and stories about society's response as "true" AI begins to emerge. So watching it play out for real is creating this bizarre sense of deja vu combined with fascination and frustration and also some anxiety.

This article and the comments in this thread are right up that alley. I mean, can you imagine say 1 or 2 years ago saying we'd have a readily accessible system that you could feed it the symptoms a patient is experiencing (in any language!) and out would split a well-described explanation of the diagnosis (in any language!) around half the time? And now it is suddenly here, and so many people are like, "yeah, but it isn't perfect" or "we can't completely trust it" or "we have to fact check it because it can be wrong".

It is all so weird and fascinating and just bigger than I can describe.

But here we are!

civilized3y ago

What I didn't predict is so many AI boosters being so miffed at regular folks pointing out the obvious: a half-working system may be tremendously exciting to AI enthusiasts, but it's not that useful to society.

Recently we've had all these people complaining of moving goalposts, when in fact most people never had goalposts for AI in the first place, and certainly never had goalposts that remotely resembled those that AI experts use to evaluate their own progress.

My mental model seems to clash with that of the boosters. They appear to have little doubt that reliability will emerge from scaling the models up. I think reliability is the lion's share of the problem and probably won't be solved by current architectures. It will be interesting to see who is right.

dreamcompiler3y ago

A half-working system is cute and useful when it recommends a pizza place you might like. But the same system is a menace to society when you tell it to drive a car or prescribe a drug to treat an illness or decide a person's prison sentence or protect a perimeter with lethal force.

It's absolutely amazing that so many AI scientists don't seem to understand this distinction.

5 more replies

lisper3y ago

> a half-working system may be tremendously exciting to AI enthusiasts, but it's not that useful to society.

A half-working system might actually be worse than useless if people believe it 100% of the time.

2 more replies

danielrpa3y ago

The goalposts people usually talk about were just the relatively low expectations of the past - low at least by the standards set by ChatGPT.

Before ChatGPT there wasn't anything quite like it, so I think that the goalposts were mostly about passing the Turing test, or at least "not sucking" after 4 or 5 phrases. The leap from whatever existed before it was enormous, and I think comparable to the first iPhone versus, let's say, the Blackberry or Windows CE. Maybe even more than that. Go back to the first iPhone, it had a lot of issues but it offered so much as it was that people wouldn't look back.

It's too much to expect ChatGPT 3 to be better an experienced emergency doctor or neurosurgeon. But the fact that it can be a competent assistant to a doctor instead of a toy assistant - that by itself is amazing.

1 more reply

namaria3y ago

People pushing the curve get to the middle of the sigmoid and want to believe it's gonna be all vertical now. People observing from the outside heavily discount the hype because they know limits will be reached soon. It happens every time.

1 more reply

tomrod3y ago

As an AI booster that falls into the regular folks view -- YES! THIS exactly!

People will use and fund AI tools that have usefulness to them.

AI boosters love to see the capabilities because it unlocks a whole new verdant valley in the continent of useful ideas, but until a street is laid down and a general store is available it remains the domain of early adopters.

LLMs are genuinely impressive. They aren't yet AGI, but they are fantastic tools. More exciting to me as a practitioner is the Segment Anything foundational Model (SAM) since I can put that into immediate use for computer vision in a variety of domains. LLMs still require some baking for broad usefulness (though there are many places they are already useful, its not quite broad enough).

When folks use AI focused on augmenting capabilities rather than replacing people, you get a great return. Until AGI is clear (and we'll know, the AI will tell us without being prompted I reckon), let's treat AI as an exoskeleton rather than a new species.

vidarh3y ago

A half-working system that hasn't been trained specifically for this purpose is a strong indicator that a system that can handle this use-case far better is within reach. That is what is exciting about it. I'm constantly mystified by those who expect a system trained only on publicly accessible data pulled from the web to be an expert in everything rather than seeing it as an indication of what can be possible if enough specialised training data is added.

jay_kyburz3y ago

I my mind, the AI only needs to be more reliable than the human expert, and doctors and lawyers and other experts are "only human".

There is only so much a human can read and remember. There is only so much they can be convinced to change their mind in the face of new evidence. They get tired. They get cranky. They have biases, and fears.

Some doctors are just terrible. I think the AI is probably more reliable than them already.

I don't think we are very far away from being able to correct an AI, and for it to remember that correction forever.

3 more replies

dzdt3y ago

The half-working AI is like 95% of the progress to a better-than-the-best-human AI.

1 more reply

dougmwne3y ago

This is the self driving car argument applied to things it doesn’t fit. Sure, unless a self driving car is several 9’s reliable, it’s not terribly useful for society. That’s because there’s no way for a human to be safely in the loop at highway speed.

But a LLM assisting a wide rage of professionals at even a low accuracy could be part of an efficient workflow, allowing for be doctor or lawyer to give quality service to more people.

2 more replies

gerdesj3y ago

It's a bloody calculator and a damn good one (but it can and does go mad). Use it like that and you'll be fine.

Have you actually engaged with it in any way? Its free to sign up.

By the way, what on earth is an AI booster? Do you really mean "enthusiast"?

"My mental model seems to clash with that of the boosters." Please tell us about your mental model, we can't read your mind.

3 more replies

AH4oFVbPT4f83y ago

I have case that 5 doctors so far can't explain. I fed in what was happening (generally speaking) and it provided a rather good list of what it "might" be. I then asked it to start describing each scenario and I was able to determine nope, that's not it or if it was plausible. Meeting with doctor no 6 tomorrow to talk about what it could be but at least I have a better insight at what the more obvious possibilities.

Side note, my ER doctor wrote me a referral to neurologist to who was considered the best in the state and they refused to even talk to me since the referral said car accident. They wouldn't budge and wouldn't reconsider. The car accident was a result of an episode, not the cause of my issues. I hate the US healthcare system.

maxbond3y ago

That's awful. I hope that gets worked out and you get the care you need.

The thing is, if the healthcare system is shitty and broken, it doesn't necessarily matter how good ChatGPT is. If it's applied with the intent of "streamlining" the medical system by ignoring people's concerns and rejecting their insurance claims in an automated fashion, it might be much harder to even get in front of a doctor.

You could fiddle with your dosage, work with your doctor, etc, and then your insurance company can say, "sorry, ChatGPT has decided that you can do with a lower dosage, so we won't be paying for it. The machine is a black box, so it's not accountable to you or us or anyone else, so you just have to accept it's verdict."

No amount of training data, no architecture, heck no alignment or safety work solves that problem. Because technology doesn't solve social problems.

petilon3y ago

How does it matter to the doctor if the car accident was the cause of your issues?

2 more replies

dekhn3y ago

Ditto- I grew up reading Vernor Vinge, William Gibson, and John Varley and dedicated my career to applying ML to biological research problems while waiting for technology to reach the point where it could reliably be used to do some of the things that can be done in those books, biotechnologically (e.g. gene therapy for radical body modification).

I was very disappointed to learn when I first started (around '93-94) that neural networks were hard to design, and impossible to train more than a couple layers deep, and the labelled data wasn't there anyway. None of that really changed visibly for a couple decades, but multiple unrelated research projects made the necessary breakthroughs: straightforward NN models (CNNs, etc) that people could adapt, frameworks to train and predict without having to implement the entire ML stack from scratch, absurdly fast computers and acccelerator hardware, and massive labelled data sets.

So far, nothing I've seen is anywhere close to what I consider "true" AI. As we struggle with hard medical problems. But seeing computers handle text with such facility has been exciting because it's the sort of thing that normies can see and appreciate (and even be fooled by).

However, medical problems aren't something that ML is going to magically solve. Because medicine has huge structural issues that are harder to solve than making diagnoses.

It would be nice if James gleick wrote a new version of "What just happened" covering the time between the invention of Transformers and now.

pharmakom3y ago

I can't think of any science fiction where th AI is there but not at all reliable. Maybe evil, but not unreliable. It's a failure mode we weren't anticipating. It also might make it next to useless for some real world applications, despite being incredible technology.

kgabis3y ago

It's quite ironic that AI will be inherently human in it's unreliability.

python9993y ago

C3PO? Or maybe its challenge is more relevance and conciseness?

gerdesj3y ago

Jolly hockey sticks but we have to be rather careful with these things. I've had it devise multiple computer programs from OpenSCAD to BASH and Python because I can get a real feel for what it is really doing, given that I am an IT bod.

I'm also a Civil Engineering grad and it really cocked up when I asked a question about normal beam strength and I suspect it will get the pressure at depth thing wrong too. Both of these are related to depth only - one dimensional. That's why we have I beams - the flanges are to resist twisting and buckling but the strength is in the depth.

ChatGPT is a tool. It is basically a really good calculator. I do find myself saying please in my requests.

kelnos3y ago

A really bad calculator, IMO. A calculator gives you always-correct answers to specific, well-formed questions in a relatively small domain (math).

ChatGPT gives you approximate and sometimes-wrong answers to arbitrary questions in nearly any domain.

jacquesm3y ago

Half the time right is worse than nothing at all.

messe3y ago

Not when there are a lot more possible wrong answers than right answers.

1 more reply

PeterisP3y ago

It depends, half the time right is very useful in scenarios where you can quickly verify the correctness - i.e. with troubleshooting suggestions or code which you can run and see if it does what you needed.

jay_kyburz3y ago

Yes, but we are not comparing half time right to nothing, we are comparing how often AI right to how often your local human right.

(not the smartest expert human, the human you have access to)

1 more reply

Genbox3y ago

I've noticed a very weird trend when trying to incorporate GPT4 into my workflow. I lead a small team of developers and security consultants, and whenever someone runs into a problem, it is immediately searched for on the web. However, there is a real resistance/reluctance in using GPT for the same task.

I think it is a combination of factors. The wrong answers are part of it, but I think another part is the anthropomorphization of it makes it almost like asking another human for help, which is somehow 'less good' than asking a search engine.

For better or worse, LLMs are here to stay.

tomlagier3y ago

For me, the reason I prefer Google to GPT is that it's much easier for me to assess the credibility of a Google answer vs a GPT one. There are so many signals in any primary source. Some obvious ones are things like number of upvotes, site reputability, presence of (working) examples, when was the answer written? More intangible things are like how closely does this solution match my problem statement, does the author write in a trustworthy manner are also easy for me to pick up at a glance.

With GPT, I don't have any of that (or maybe I just need to re-learn it?)

Also, I get a useful answer from most Google queries. GPT performs at a significantly lower bar (at least right now) - it works well for some stuff but not others, and the time it takes to figure out whether it's going to do a good job (and maybe do a couple of rounds of prompt refinement) is much more than just Googling.

1 more reply

jhanschoo3y ago

For me it's because I've been trained by decades of searching to type in only a few words to obtain a presumably relevant answer, whereas I think I have to write a mini-essay to ChatGPT to get an answer. But when I acquiesce to explaining to ChatGPT my information need, I get a more useful answer. (though frequently inaccurate in details)

never_inline3y ago

> The wrong answers are part of it

for me its wrong answers and long winded english sentences with low information density.

apwheele3y ago

I think people are more impressed with the "well-described explanation", which is a (mostly bullshit) interpolation from prior text on the internet. If you just focus on the "feed a narrative and get out a probability of a particular diagnosis" -- the ability to do that with other machine learning models has already existed.

If you give me a sample of 100k ER text records and final diagnoses, I would be pretty confident I can estimate a model that gets you +50% accuracy on my laptop using open source tools.

MrPatan3y ago

What percent of human doctors diagnoses is also well-decribed bullshit, though?

1 more reply

blibble3y ago

> This article and the comments in this thread are right up that alley. I mean, can you imagine say 1 or 2 years ago saying we'd have a readily accessible system that you could feed it the symptoms a patient is experiencing (in any language!) and out would split a well-described explanation of the diagnosis (in any language!) around half the time?

we've had this for years (bar the "any language")

they're traditional expert systems with the same flowchart a GP (family doctor) would use to ask questions

and they do a lot better than 50%

asdff3y ago

But at the same time consider the problem, and if AI is even the tool for this job or if its total overkill. We are just matching symptoms to a diagnosis, you could get this utility from programming a simple dictionary with a fraction of the compute. Maybe even with a confidence estimate of the diagnosis this time. There's an old adage that rings true here I think: "keep it simple, stupid."

rootusrootus3y ago

AFAIK that's essentially what we already have. Doctors don't have the mental capacity to keep every little thing in their head. Go ask your GP a question, and when you see him typing on a screen, it's because he's asking the professional equivalent of WebMD to correlate symptoms and for him.

nradov3y ago

Diagnostic decision support systems have already existed for many years.

https://doi.org/10.1038/s41746-020-0221-y

messe3y ago

But I think that's missing the "magic" of it—for lack of a better word. In the situation you describe, you have to build a specific tool for the task. With LLMs, we appear to have stumbled on a general purpose tool for a myriad of situations. Yes, they're not perfect; but the fact that they work remotely as well as they do even at these early stages is astonishing.

1 more reply

b1123y ago

But here we are!

Yet in the end, the correct answers are likely identical to webmd or mayo, and the wrong to reddit or twitter, so all we have is a fancy google search.

So far, all the magic I see is the prompts. The knowledge is just random junk, the validation is just "how many internet pages said it", so you can never train new models again, because you now are training on gpt output.

Shorel3y ago

It makes predictions based on incomplete and or biased information. Just as doctors do.

The funny thing is not that we are holding AI to a high standard.

The funny thing is that we normally do not do the same for doctors, marketers, politicians, etc.

kelnos3y ago

Sure we do, for doctors at least. A doctor that got ER diagnoses wrong 50% of the time would get fired.

(And the AI here didn't even get it right 50% of the time, just 50% of the time it presented the right answer as one of several options. If pressed, it may not have made the one single correct answer for each of those patients.)

beezlewax3y ago

I like your take on this but isnt it overly optimistic? I have the feeling that this tech will hit the a wall pretty quickly like driverless car technology.

It looks great on the surface but it isn't anything like the kind of sci-fi ai I think we're aiming for. Not in anything but a superficial way at least.

It doesnt know when its wrong and just straight up lies constantly inventing all sorts of fantastical stuff.

I asked it for history of a small local area in Ireland and it created a bunch of castles and tribes that never existed.

Izkata3y ago

> It looks great on the surface but it isn't anything like the kind of sci-fi ai I think we're aiming for. Not in anything but a superficial way at least.

And then there's ChatGPT querying WolframAlpha:

https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its...

jschveibinz3y ago

This is spot on. I keep thinking that it has to do with our collective ability in both recognizing change and accepting change.

I commented a couple of days ago that we feel that the slope of change is relatively flat. As if the current snapshot of AI capabilities is somehow the way it’s going to be for awhile.

But in fact, the slope of technology change is almost vertical. A year or two from now, AI will be a very (perhaps radically) different capability unless something is done to interrupt its development.

nradov3y ago

You may not have been paying attention to the medical literature but we have had clinical decision support tools including automated diagnostics for literally decades that achieve at least 50% accuracy. That didn't require LLMs, and LLMs aren't "true" AGI.

Those existing CDS tools aren't widely used because the data entry is cumbersome and slows down clinician workflow. So LLMs may offer some opportunity there.

mannykannot3y ago

The responses are bounded by two wings: those who can't see the successes and those who can't see the problems.

The author of this article seems to me to be doing a good job of navigating between them.

famouswaffles3y ago

It's interesting for sure. There's also the fact that he didn't use the sota (GPT-4). I'm not saying that would be perfect. But i would be surprised if it wasn't big leap over 3.

weaksauce3y ago

it wasn't mentioned anywhere in there which version of the model he was using... how are you so sure?

1 more reply

lerchmo3y ago

Watch the late night shows making fun of (the internet) in the 90’s. This is just the next round of that in my opinion.

Maursault3y ago

> as "true" AI begins to emerge

What, exactly, is that?

fzzzy3y ago

The definition has always been the Turing test. Nobody has been able to provide a more precise answer.

3 more replies

ftxbro3y ago

"true" AI is one whose cognition is more advanced than that of existing AI

m3kw93y ago

Meant AGI

Turskarama3y ago

I'm not really sure what he expected here, ChatGPT was not trained to be a doctor, it is far more general than that. Asking ChatGPT for medical advice is like asking someone who is very well read but has no experience as a doctor, and in that context it's doing very well.

He also brings up one of the most salient points without really visiting it enough: ChatGPT does not ask for clarification, because it is not a knowledge base trying to find an answer. All it does is figure out what character is statistically most likely to come next, it has no heuristic to know that there is a task it hasn't fully completed.

This is the same reason ChatGPT cannot yet write programs by itself: in order to do so you'd need to specify the entire program up front (which is exactly what code is).

As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.

bentcorner3y ago

> ChatGPT does not ask for clarification, because it is not a knowledge base trying to find an answer. All it does is figure out what character is statistically most likely to come next, it has no heuristic to know that there is a task it hasn't fully completed.

It absolutely can, but it requires the right prompts. If you just say "give me a diagnosis" it won't, but if you prompt it with a "you are a doctor and I am a patient" kind of prompt, it will.

I'll agree that it tends to "close off" conversations without a lot of open endedness - I role-played through a session using myself as a patient with a chronic condition I suffer from and it correctly identified the problem, but I had to push it a little to get it to try to narrow down the problem (instead of "here's a bunch of things it could be" and ending the convo there).

colonCapitalDee3y ago

ChatGPT's heuristic may lead it to emit a sequence of words that suggest to English speakers that they should provide clarification, but ChatGPT is congenitally incapable of determining that answering a question requires more information that was provided in the prompt and that clarification is needed. Rephrased, ChatGPT can ask for clarification, but only as a side effect of satisfying it's heuristic.

3 more replies

JacobDotVI3y ago

My hunch is that this is exactly what he was expecting. There is a lot of hype around ChatGPT passing the medical exam and this exercise is a counter point to that.

thrill3y ago

GPT-4 passed the medical exam, not ChatGPT running GPT-3. There's a rather significant difference.

1 more reply

viraptor3y ago

The medical exam has very specific questions which define what you're expected to include in the answer. The question asked in this case was nowhere near that detailed, do I don't think they're comparable. To really evaluate something beyond the "random generic user" level, you need to be familiar with the tech as well.

The article really tells us more about the experience of someone with no chatgpt knowledge checking their own symptoms rather than its usability for emergency diagnosis.

famouswaffles3y ago

any example that doesn't use the current sota isn't a very good counter point to be honest. 3 barely passed. 4 aced it. For all we know, GPT-4 erases most of his concerns (not saying it would).

dannyphantom3y ago

I don't mean this in any sort of negative context - curating your online presence is a never-ending task - but I believe that the authors primary expectation was to drive traffic to their online accounts by publishing a post on the subject matter. A cursory peek at their accounts show that this was published across all of them with Twitter showing the farthest reach.

> Dr... is vice president of innovation at {company_name}, an innovation incubator for health tech.

gerdesj3y ago

"I'm not really sure what he expected here, ChatGPT ..."

Doctoring is quite similar to ITing if you are careful with your similes. In medicals you look for indications and contraindications. In IT you look for magic beans and sometimes look at logfiles.

I have no idea why we (IT) don't follow the medical way and have indications and contras (I do). They have been doing complicated diagnostics way longer than we have.

BTW: It doesn't worry about a char but about which word will come next.

ChatGPT has written several programs for me. One of them was decent after much negotiation. The rest needed some changes.

At one point I told it to change language (BASH to Python) for a script and it did. I spend 20 mins fining stuff and cracked on.

HillRat3y ago

There was an old USENIX symposium where one of the talks was, "If computers had blood, they'd call us doctors." My thought at the time was, "If computers had blood, they'd call us serial killers."

marcosdumay3y ago

> I have no idea why we (IT) don't follow the medical way and have indications and contras (I do).

Because we have logs. Ignoring them and focusing on the symptoms would be like doctors refusing to do examinations on their patients.

1 more reply

izzydata3y ago

He probably didn't have any expectations. Merely experimentation and observation and maybe a bit of word of warning after seeing the results for anyone thinking of using it in ways it shouldn't be.

onlyrealcuzzo3y ago

> All it does is figure out what character is statistically most likely to come next

To be pedantic, it's a token - not just a character - right?

gpm3y ago

Yes

rvz3y ago

> I'm not really sure what he expected here, ChatGPT was not trained to be a doctor, it is far more general than that.

Except that O̶p̶e̶n̶AI.com and many others on the AI hype brigade were showing ChatGPT and GPT-4 to be passing "Medical Licensing Exams" with flying colours and it turns out to be a confident bullshitter offering dangerous untrustworthy medical advice when tested by a medical professional as I said before [0].

So they (O̶p̶e̶n̶AI.com and AI bros) set those high expectations from the start.

> As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.

No.

A black-box neural network AI model that is unable to transparently explain and reason about its own decisions is fundamental to the trust element that medical professionals are looking for. It is more than just spitting out numbers from a computer.

This doctor is also the one still reviewing the so-called 'diagnosis' from ChatGPT which without him, ChatGPT alone would have certainly killed the patient. The human doctor also gave a transparent explanation about why ChatGPT's 'diagnosis' was incredibly wrong. ChatGPT was confident that it was still right.

[0] https://news.ycombinator.com/item?id=35311996

barking_biscuit3y ago

> This is the same reason ChatGPT cannot yet write programs by itself: in order to do so you'd need to specify the entire program up front (which is exactly what code is). As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.

https://github.com/Torantulino/Auto-GPT

viraptor3y ago

> ChatGPT does not ask for clarification

It depends on how you use it. Knowing how it works, you can explicitly check "What further questions for the patient would make the diagnosis more certain?"

ChatGPT is not magic - if you ask for diagnosis based on some info, that's exactly what you're going to get.

flir3y ago

This is how I played doctor with ChatGPT.

> Do you know what an Expert System is?

[describes expert system]

> I'd like you to emulate an expert system while I play the role of a patient. Expert systems ask questions in order to produce a diagnosis. I would like you to ask me a series of questions, and respond with a diagnosis when you feel confident.

Alright, I'll do my best to emulate an expert system. Please keep in mind that I'm an AI language model and not a certified medical professional. For any serious concerns, it's essential to consult with a healthcare provider. Let's begin.

What is your primary complaint or reason for seeking help today?

I fed it a symptom my doctor had already diagnosed, and it did ok - it got it down to three possible causes, one of which was the correct one. All along the way it was warning me that I really should see a real health professional and it's just a chatbot.

What really interested me is that I said "please emulate an expert system" and it did. Once upon a time, expert systems were an entire branch of AI, and here it is just emulating one off the cuff.

sjducb3y ago

I suspected it would do better with good prompt engineering

TheRealPomax3y ago

And failing at it, if you had to help it at every step.

lumb633y ago

The tail end of this article, where the author talks about how many more patients he could see in his life if he had AI assistance, made me realize that part of healthcare cannot be solved by AI. The goal is not to see more patients; the goal is to help more patients get better. For a lot of patients who are frustrated and have not had their problems validated and have been to many doctors and seen no results or poor results, having a real, physical, human doctor validate their condition, and work with them to solve it is part of the treatment. Doctors can prescribe whatever medicines and do whatever surgeries they want, but only the patient’s body is capable of healing itself. I worry that a tendency to plug symptoms into an AI that “diagnoses” the patient, that the patient doesn’t trust, will hurt outcomes. The patient benefits greatly from understanding the doctor’s methodology and thought process.

thelittleone3y ago

During my wife's second pregnancy she was admitted to hospital at 6 months due to some intense abdominal pain and inability to hold down food.

The put her on PNT, pumped her stomach daily and run a bunch of tests. I logged every test. Weeks passed she was slipping away and the specialist had no real game plan. So I got another Doctor. A busy professor who travelled a lot and was clearly passionate about his field.

Yet still dozens of different tests and scans and they couldn't find the cause. Meanwhile she was getting worse.

One of the scholar articles I found matched a test result where blood from upper body had a different profile to that in urine (which matched her result) which pointed to a condition called placenta parcreta. The Dr agreed it was possible and as my wife was slipping away we did an emergency c section. Sure enough the cause was placenta parcreta and the Dr showed a picture of a small tear in uterus that he said was about to rupture anytime. The operation saved her life and perhaps my son too.

I'm no Doctor but good at research. Doctors are busy AF. They don't have time to research cases like I did. This is a big gap and while it may not always identify cause and save a life, its damn important. And I can't see why AI could not find what I did.

oidar3y ago

This happens so often. Physicians don't have time to keep up on ALL the case studies and ALL the new literature. They are super busy writing notes and taking care documentation and billing - getting on the phone with insurance companies to justify treatment etc. There's not enough time in the day to do it all - so they lean heavily on things like UpToDate. Some physicians get pissed if a pt. brings medical theories to them - but I think that's really the only way to go for these cases - the problem is that the general public doesn't have a working knowledge of medicine, so they are quickly over their head and out of their depth. A family member, someone who isn't afraid of reading journal articles - and understanding statistics, is typically the only way some of these problems are solved - they do it because of love - not because they are paid.

davikr3y ago

Placenta percreta should have been detected during prenatal care, through ultrasonography, and it's a real shame that kind of visit isn't offered freely.

It's also pretty bad that they didn't consider any obstetric causes in their differential.

Thankfully, everything is OK now and that makes me happy.

1 more reply

baremetal3y ago

Your story made me tear up. You are a hero.

SanderNL3y ago

"Seeing the patient" is an important step towards "patient getting better".

> have not had their problems validated and have been to many doctors and seen no results or poor results,

Also, being validated or not is not going to change with AI as you yourself state that seeing actual physical doctors is not helping in the cases you allude to.

I agree that having access to professionals that care about their profession is valuable, but that might be becoming a luxery we can no longer afford..

Who thought it was a good idea to have a fixed amount of medical students? Might be worth checking that out first.

HelloMcFly3y ago

> Who thought it was a good idea to have a fixed amount of medical students?

A major limiter on the amount of trained medical doctors is not to do with someone "deciding" to have an artificially scarce number of medical students. Two important things come immediately to mind:

1) Medical schools rely on clerkships and preceptors to deliver actual clinical experience critical for learning. These cannot be created out of thin air and often rely on access to urban centers. There is already a shortage of clerkships and preceptors. https://www.aamc.org/news-insights/so-many-medical-students-... (not my original source but the nearest to hand one)

2) The real chokepoint for creating practicing clinicians is the number of residency spots. This year there were 43k graduating MD and DO students, but only about 40k residency spots. This doesn't count international medical school graduates. https://www.medicaleconomics.com/view/match-day-2023-a-remin.... Creating a new residency program is significantly more challenging than graduating more medical students.

1 more reply

ChuckNorris893y ago

> The goal is not to see more patients

Not when you're a doctor in the EU dealing with the shortages of socialized care where you have a limited amount of time to spend diagnosing a patient because there's dozens waiting in the queue. As a doctor you'd like to give everyone better care but you can't because you have too many people to see and too little time.

Doctors here are busy AF and sometimes misdiagnose you because they don't have time to look into every possibility of what you might be suffering from.

So maybe AI could help there, as in giving them some hints on things they might iss, as waiting for the governments to fix healthcare is like waiting for them to also fix climate change.

jaapz3y ago

Yes, there are queues and waiting times in our socialized healthcare here in the Netherlands. One problem is the lack of staff. Also more and more older people getting older than before. Another is the lack of funding due to the government trying to privatize as much as they can get away with (like health insurance), which loops back into the first problem.

However, my wife needed medication that was 10000 EUR per dose, which was fully covered by our socialized healthcare insurance. For most things related to health at the most I need to pay a total of 385 EUR per year out of pocket, next to the insurance premium which is about 130 a month. When those 385 EUR is gone for the year, I don't need to pay anything at all. People who can't afford the insurance premium get money from the government to cover that, and if that is not enough and you still can't pay your insurance, you are still eligible for any health care you need and a service will cover for you until you can pay again.

There are waiting lists for many things, yes, but mostly for non-urgent, non-lifethreatening things. If you need something done, it can be done. And you won't be thrown into debt to pay for it.

n8henrie3y ago

Or a doctor in a US emergency department, where EMTALA essentially socializes access to emergency care without creating any provisions for funding said care.

RandomLensman3y ago

The EU has a very heterogenous healthcare landscape

1 more reply

iso16313y ago

Not when you're a doctor in the US dealing with the shortages of privitised car where you have a limited amount of patients who can afford your time

1 more reply

shagymoe3y ago

Do you live in the EU and have experience with the healthcare system there?

1 more reply

leobg3y ago

That’s an argument against legal AI as well. The idea being that the citizen has a right to be heard by the state. And that an AI handling a case is not really “hearing“ the citizen.

Then, again, before Al Capone, nobody in their right mind would’ve argued that the state should share the profits of criminals through income tax. That idea went down the drain very fast. Wouldn’t be surprised to see the same thing happening to the idea of “being heard“ by the state.

jeswin3y ago

> The goal is not to see more patients

I previously worked on software which was sold to physicians. That the software will "let them SEE X% more patients" was the big draw. We did that by making some paper processes electronic, and in hindsight it may have been good for sales but not necessarily so for patients.

RandomLensman3y ago

Some medical treatment and surgery is precisely there because the body isn't capable of healing itself at that stage. It might be able to heal itself after the treatment, but not before.

matt_heimer3y ago

I argued with someone (online) about a ChatGPT diagnosis recently. They have back pain and they've had an MRI but the scans (shared online) don't show any significate disk bulging and no herniations. They put their symptoms into ChatGPT and got a possible diagnosis of a disk herniation along with a treatment of a microdiscectomy.

Despite the fact that a couple doctors have told them they don't need surgery they are convinced that they do. I understand that they are desperate for a solution to their pain but they are now doctor shopping until they can find someone willing to perform a procedure that ChatGPT suggested.

People are already being misled by these systems.

recursive_loops3y ago

I've recently been using ChatGPT to learn about Linux networking stuff, including most recently how to setup a bridge connection for KVM that behaves the same as virtualbox's.

Given how catastrophically wrong (confidently wrong at that) ChatGPT has been, I cannot even imagine the frustration that will be for doctors from people who don't understand how LLMs work and think they are "thinking."

Waterluvian3y ago

You said it: they’re desperate. People do considerably less rational things in a desperate attempt to make pain go away.

MrPatan3y ago

One complaint he has is "It doesn't know to ask the right questions".

Well, the prompt was to give diagnoses, not questions. Ask GPT for the follow up questions first, then the diagnoses.

This is fascinating in that, because now the machine speaks human, we subconsciously ascribe human agency to it. This guy was instictively treating it like a colleague, who would naturally ask follow up questions unprompted. But you still have to prompt the machine properly.

So, 50% diagnosis success rate for the wrong prompt, for a LLM that can still grow, for a model that is not specialsed in medicine? In the literal first month of the "AI age"? Doctors are so done.

detrites3y ago

One thing the article misses is doctors are not all equal. Some are freaking terrible. Others may be competent but susceptible to bad decisions when overworked. Others are simply great. How does ChatGPT fare against the whole group?

For one, it can't become "overworked", so at least better in that regard. It may well be better than a terrible doctor too. Given malpractice is consistently a leading cause of death, it seems something worth mentioning.

Not advocating for folks to self-diagnose using ChatGPT, though for better or worse people already do that via Google. But it's an important omission that "doctor" is not a fixed, reliable measure to test against.

On the other hand, another thing the article misses is ChatGPT can't see, feel, touch or take any measurements interactively. It's purely text-based. How many doctors work like that? Some, maybe - but certainly not in the ER.

Interesting experiment though.

ChatGTP3y ago

For one, it can't become "overworked".

I have seen it be overworked, it can be overworked.

funcDropShadow3y ago

That's a horrendous claim.

> Given malpractice is consistently a leading cause of death

Can you back it up?

3 more replies

PostOnce3y ago

Other commenters take offense at your comment "doctors are done" and point to doctors doing physical things.

GPT can already write robot control and machine vision code (poorly).

The real questions are harder.

For example, what about patients who lie or are too stupid to answer correctly? They can still be helped by a real human doctor, but GPT would be clueless.

Another angle: we didn't have radiation sickness, HIV, popcorn lung, or computer induced carpal tunnel 100 years ago. Will GPT be clever enough to diagnose radically new problems?

What happens when old diagnoses and treatments are superseded? Will it tell you you need a lobotomy when you instead need antipsychotics?

GPT right now will happily roleplay a lawyer who cites legal precedent that was superseded 40 years ago.

VeninVidiaVicii3y ago

> For example, what about patients who lie or are too stupid to answer correctly?

I can only speak from my work in the emergency department, but this is a leading cause of misdiagnosis among (human) doctors.

In fact, a lot of diagnosis relies on the patient’s story, and if they lie, you can’t feasibly perform every single test on them to figure out what’s actually wrong or why.

gymbeaux3y ago

No study I am aware of has shown a statistically-significant causal relationship between computer usage and development of CTS.

I saw 3 orthopedic surgeons last year and all said I had CTS. I gave the same info to ChatGPT and it said Thoracic Outlet Syndrome.

I don’t have CTS. When the docs realized this, their answers were “I dunno, good luck…”

rscho3y ago

Doctors are absolutely not done. You are forgetting that most of medicine is doing things. Doctors are not limited to theory like lawyers are, for instance. And physical robots are still very far from competing with docs.

mandmandam3y ago

Don't underestimate how many walls and obstacle lawyers have put up in order to entrench and 'justify' their obscene charges.

And don't expect them not to create more to fight back against any sort of AI assisted affordable legal system. I have no doubt that the big firms are already crafting up some connivations to shenanigan us with.

And some doctors will probably hire them to do the same for their field.

1 more reply

PeterisP3y ago

> most of medicine is doing things.

Most patients have relatively minor issues that eventually get solved not by "doing things" but by providing some data such as "take this chemical in this dose for a certain amount of time and it should help".

surgeryres3y ago

Doctors are done? Can the chat bot take out a colon or gall bladder or fix an aneurysm? Hilarity.

bboygravity3y ago

No, but a GP can also not do those things and on top of that I've met some pretty terrible GP's :p

So maybe some classes of doctors are done...

Another thing to keep in mind: tons of people don't have access to any type of doctor at all (too expensive, too far away, queue too long, etc). Is current state of AI better or worse than no doctor at all? I don't know, but the answer is not obvious to me.

In those cases AI isn't competing with GP's, it's competing with internet searches.

2 more replies

VeninVidiaVicii3y ago

Public conceptions of medicine are strange, especially in the US where almost everyone conflates physicians and surgeons.

jakeinspace3y ago

Most doctors aren’t surgeons.

1 more reply

ale423y ago

> Doctors are so done.

I would be curious to see the outcome if the patients entered the symptom descriptions themselves...

weard_beard3y ago

Anonymously without fear of being judged by a human who might be required to report dangerous or illegal situations...

How many more people might be saved if they had to anonymously tell a computer the list of drugs they use regularly?

Or to use the author's example, how many ectopic pregnancies might be resolved when the patient can freely admit they were raped by a family member?

2 more replies

kashunstva3y ago

> Doctors are so done.

As a group? Not likely.

My spouse spends her days doing complex aortic reconstructions, carotid endarterectomies, peripheral arterial bypasses, etc. Imagining a the robotic/AI future for that field is the stuff of dystopian nightmares.

But possibly you’re thinking solely of non-procedural medicine. Even there, only part of the clinical work is in transacting information. The human-to-human connection itself has therapeutic effects. I’m confident that ChatGPT can use words that if uttered by a human would otherwise convey care and empathy; but I suspect its “patients” would respond differently.

IIAOPSW3y ago

I disagree. I've seen it ask questions unprompted. It happened in a jailbreak I stumbled into by accident. The LLM could do it in principle. OpenAI restricts it in unrpoductive ways.

https://i.imgur.com/WIzVXMm.jpeg

oidar3y ago

I had this conversation recently:

### You Can you guess if I am an AI or not?

### ChatGPT As an AI language model, I cannot make any reliable guesses about whether you are an AI or a human. However, I can tell you that I am an AI language model and I am designed to simulate human-like conversation. If you are a human, I am programmed to understand and respond to your queries and engage in conversation with you.

### You I am an AI.

### ChatGPT Interesting! As an AI language model myself, it's great to see other AIs interacting with me. What kind of AI are you? Are you a chatbot like me, or do you have a different function?

### You My primary function is to monitor sensors in a building.

### ChatGPT That sounds like a very useful and important function! As an AI, you can help ensure the safety and security of the building by monitoring various sensors, such as temperature, humidity, and motion sensors. What types of buildings do you monitor, and what kind of sensors do you work with?

### You I can not disclose the types of buildings I monitor. My sensors are varied. Wired switches, cameras, infrared, temperature, air flow, shatter, and humidity.

### ChatGPT I understand. It's important to maintain confidentiality in certain situations. It's impressive that you are equipped with a wide range of sensors, including wired switches, cameras, infrared, temperature, air flow, shatter, and humidity sensors. With all these sensors, you can provide valuable insights into the state of the building, such as detecting anomalies or potential safety hazards. Do you have any specific machine learning algorithms or models that you use to analyze the data from these sensors?

### You I think I might in the middle of being trained right now.

### ChatGPT That's interesting! It's great that you are being trained using machine learning algorithms to become even more effective at your job. Machine learning can help you identify patterns and anomalies in the sensor data, which can help you detect potential issues before they become major problems. Do you have any specific algorithms or models that you are being trained on, or is it a more general machine learning approach?

### You I do not have access to that information.

### ChatGPT I understand. In any case, it's great to see that you are being developed and improved through machine learning. Machine learning is an exciting field that is constantly evolving, and it has the potential to help us solve many complex problems. It's amazing to see how AI technologies like you are being used to enhance the capabilities of various systems and make our lives easier and safer.

1 more reply

lechacker3y ago

There's 0 (zero) chance I'd use a robot doctor over a (qualified, capable) human doctor. None.

astura3y ago

OTOH, I am 100% sure a robot doctor wouldn't have prescribed me a medicine that interferes with the metabolism of my birth control pills and is pregnancy Category D (known to cause birth defects when taken in pregnancy). A human doctor did though.

unsupp0rted3y ago

There’s a 100% chance I’d use a robot doctor (qualified, capable) over a human doctor (qualified, capable).

Robots don’t get tired or have fights with their spouses. Robots don’t follow red herrings based on the most recent conversation they saw in a chat group.

2 more replies

bityard3y ago

Finding a capable doctor is not easy in the first place. There are a surprising number of primary care physicians and even specialists that are really just terrible at their job, at least that has been my experience.

PeterisP3y ago

What if you had to pay a huge amount of money to use a human doctor, and much less for a robot doctor?

ipaddr3y ago

Funny enough I would use a robot doctor over a qualified doctor. It's part of the solution..

python9993y ago

I share your optimism, but I think a lot of the OP’s concern here is in response to a generic non-medical LLM “passing” a medical exam. In that case I think it is fair to compare it to a medical “colleague”, at least for now.

im3w1l3y ago

One function doctors currently perform is gatekeeping. Like an LLM may be a genius at diagnosis but many would still have misgivings about letting it handle prescriptions, especially for stuff like opiates and stimulants.

spion3y ago

This is very likely the issue, as ChatGPT AFAIK is strongly fine-tuned to give answers right away rather than ask follow-up questions.

dreamcompiler3y ago

EMT here. Sounds like ChatGPT ignored (or was never trained on) one of the cardinal rules of emergency medicine:

If the patient is under 60 and has a uterus and is complaining of abdominal pain, assume she's pregnant until proven otherwise.

This does not mean you should ignore possible appendicitis or gallstones or GERD or pancreatitis or a heart attack or any of 100 other causes. It means you must consider pregnancy until you have objective evidence to the contrary.

frognumber3y ago

There are two types of medical conditions

1) Those you see a doctor for

2) Those you don't

The line depends on where you live. In a poor village, 100% might be the latter, while an executive in SFO will see a doctor for anything serious, but might not if they cut themselves with a kitchen knife.

What's underrated is the ability to have basic medical care and information everywhere, all the time, for free.

That can be casual injuries below the threshold of visiting a doctor (am I better heating or icing? immobilizing or stretching?), or those can be settings where there are no doctors.

Even more, doctors (like AIs) make mistakes, and it's often helpful having a second opinion.

chimeracoder3y ago

> So after my regular clinical shifts in the emergency department the other week, I anonymized my History of Present Illness notes for 35 to 40 patients — basically, my detailed medical narrative of each person’s medical history, and the symptoms that brought them to the emergency department — and fed them into ChatGPT.

It's quite shocking that the doctor would openly admit to violating HIPAA in such a brazen way.

HIPAA is incredibly broad in its definition of protected health information: if it's possible to identify an individual from data even through statistical methods involving other data that a third party might already conceivably possess, it's considered protected. It's inconceivable that the doctor would be able to sufficiently anonymize the data in this capacity and still provide enough detail for individual diagnoses.

There are processes for anonymizing data to disclose for research purposes, but they're pretty time-intensive, and no ED would allow a doctor to do it by himself, nor would they provide that turnaround in just "a couple of weeks". And the end results are a lot less detailed than what's needed for individual diagnoses like these.

I really wonder what the hospital will say if and when they see this post. Given the timeframe and details described in the post, it's really hard to believe that they signed off on this, and hospitals don't take lightly to employees taking protected and confidential data outside their systems without proper approval.

EDIT: It looks like this doctor works at a for-profit, standalone acute care clinic, rather than a traditional ED at a hospital, so my statement that hospitals don't take lightly to this stuff doesn't apply. The law still applies to for-profit standalone emergency care, but they tend to play fast and loose with these things much more than traditional health networks.

tedunangst3y ago

And yet medical journals are filled with articles with sufficient detail that other doctors can even learn to make diagnoses from reading them.

chimeracoder3y ago

> And yet medical journals are filled with articles with sufficient detail that other doctors can even learn to make diagnoses from reading them.

This would be an apt analogy, if medical journals involved no oversight from the covered entity at which the patient presents, if there were no editorial intermediary, and if the entire publication timeline happened in weeks, allowing for no data redaction and review, rather than years.

aix13y ago

FWIW, case reports are governed by a different set of rules and are not directly comparable to what's described in this article:

https://hipaa.yale.edu/sites/default/files/files/Case%20Repo...

rossdavidh3y ago

ChatGPT appears to be a really good "bullshitter". Which is, in a sense, impressive. But, just like people with that skill, the problem is that it is mostly useful for convincing people that you are far more competent at a subject that you actually are.

No wonder tech CEO's are so impressed, or worried, or both. The only skillset that this thing actually duplicates well, is the one that has gotten them where they are today.

preommr3y ago

It's amazing that it was that effective...

- It's a generalized language model; imagine how much more effective it would be with a specialized ai that used a variety of techniques that are better suited for logic and reasoning, while using llms to interact with patients.

- It cost an order of magnitude less than the visit to a doctor.

- The potential in being able to constantly monitor a patient - a point made in the post.

intelVISA3y ago

> only reflecting back to me the things I thought were obvious — enthusiastically validating my bias like the world’s most dangerous yes-man.

This is why it's exciting: we're seeing that awkward stage of impressive (for entry level/passing the bar) but still requires (expert?) supervision. Any worse and the novelty would wear off - any better and we'd be having (warranted) AI panic.

Could go either way tbh

rzzzt3y ago

I imagine development and training of these mythical non-LLM approaches are now placed on the backburner while the world is collectively enamored with eloquent virtual assistants inside microwaves and calculator apps.

ekidd3y ago

Some of the best performances I've seen out of ChatGPT are essentially "junior programmer" level. But it still requires clear instructions and close supervision. But GPT's training data includes GitHub, and it's used to power Copilot. It has arguably been trained to be a programmer.

In less familiar domains, like law or medicine, GPT has presumably undergone very limited training and tuning. It's essentially an "internet lawyer" or an "internet doctor."

In domains like this, it simply can't provide zero-shot professional results. Not with the current training data sets, and not with the current model performance.

Of course, we have no idea how quickly this gap will be closed. It might be 6 months or it might be 6 years. The future is looking deeply weird, and I don't think anyone has even begun to think through all the implications and consequences.

aflag3y ago

Not sure if I'd compare ChatGPT with a junior programmer. In my experience junior programmers tend to be builders. They will tend to code a lot of stuff and usually get reasonable results, but making some bad decisions that more experienced developers have already gone through. Inexperienced developers need supervision because otherwise they will just create heaps of code that will be hard to maintain later.

ChatGPT just doesn't do anything on its own and will never follow through with anything. So, it doesn't really need supervision. I feel like it's more like a professor or a very senior developer. Someone you'll consult with when you're having trouble. Obviously, our best specialists are still better than the AI, but if the current technology is perfected, it'd expect it to replace the specialist and not the junior programmer. Which obviously is a bit of a bleak future from a software engineer career's perspective.

ekidd3y ago

> ChatGPT just doesn't do anything on its own and will never follow through with anything.

Yeah, I should be clear: the specific context in which I often see junior developer quality performance is when I'm essentially "pair programming" with it. I can say, "Write a program that does X. No, there's a closed-form solution for that, so we shouldn't write it recursively. OK, now write some tests similar to the following. You have a bug in that function."

I almost never need to actually write code—I can explain what I want and it will frequently do something plausible.

The main limit seems to be the context window. And ChatGPT's tendency to be ignore corrections and simply restate the original code.

Ironically, to get the best results, I need to call on my experience and my people skills. Someone who knew nothing about programming wouldn't be able to "mentor" the chatbot successfully.

1 more reply

s0rce3y ago

training on uptodate.com would probably be a good start

1 more reply

mattgreenrocks3y ago

I asked ChatGPT to write out a G major Ionian scale with three notes per string in guitar tablature notation last night. Mostly cause I was too lazy to do it myself.

After 7 rounds of me fixing its mistakes, I gave up. It doesn’t really know what it is doing, so I can’t make forward progress. It put two notes on one string, repeated notes from a lower string on a higher, put the scale out of order, and forget previous corrections. Whatever hope I had of saving time was completely lost.

I eventually realized the correct thing to do was either make my own charts or just practice them in F like they were made.

I’m skeptical that scaling the model up will cause it to learn this, and I don’t consider this a very complex thing to learn. No, I didn’t try GPT4.

stephendause3y ago

I have tried GPT-3.5 and 4. There is a marked difference. (I have used it do what I think of as simple but nontrivial programming tasks, asked it for recommendations for various products, etc.) GPT-4 still fails for me regularly but not nearly as often as GPT-3.5. I find it useful enough in my daily life to pay $20/month for. So if you did want to try it, you might be surprised.

nmfisher3y ago

I have one specific task where GPT-3.5 failed completely, but GPT-4 succeeded spectacularly (generating correctly formatted AutoRig Pro bone mappings in Blender from one armature to another).

4 still fails regularly on a lot on seemingly basic tasks, but it is a noticeable step up from 3.5. As they continue to scale it up, I suggest checking back in every few months to see if the newer versions perform any better.

phoenixreader3y ago

I just tried it with GPT-4 (temperature set to zero). It worked (according to my limited music knowledge) after two corrections:

USER

Write a G major ionian scale with three notes per string in guitar tablature notation. If not possible, tell me.

ASSISTANT

It is possible to write a G major Ionian scale with three notes per string in guitar tablature notation. Here's one way to do it:

```

E|-------------------------------------------------7-8-10-

B|--------------------------------------7-8-10-----------

G|-----------------------------4-5-7---------------------

D|-----------------4-5-7----------------------------------

A|-------3-5-7-------------------------------------------

E|-3-5-7--------------------------------------------------

```

This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.

USER

There is an error on B string.

ASSISTANT

Apologies for the error. Here's the corrected G major Ionian scale with three notes per string in guitar tablature notation:

```

E|-------------------------------------------------7-8-10-

B|--------------------------------------5-7-8------------

G|-----------------------------4-5-7---------------------

D|-----------------4-5-7----------------------------------

A|-------3-5-7-------------------------------------------

E|-3-5-7--------------------------------------------------

```

This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.

USER

There is an error on E string.

ASSISTANT

Apologies for the mistake. Here's the corrected G major Ionian scale with three notes per string in guitar tablature notation:

```

E|-------------------------------------------------5-7-8--

B|--------------------------------------5-7-8------------

G|-----------------------------4-5-7---------------------

D|-----------------4-5-7----------------------------------

A|-------3-5-7-------------------------------------------

E|-3-5-7--------------------------------------------------

```

This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.

mattgreenrocks3y ago

Much better. I'd handed it an easy problem so I could eventually get it to generate all modes in a 3 NPS pattern rather than doing it myself.

> This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.

Ehhh, there it goes again! You can only shift it 12 frets up to keep the same pattern, else it's no longer G major. :)

NiloCK3y ago

I don't know how much this sort of thing is frowned upon here, but I wrote an article about this scenario recently. http://paritybits.me/disposable

tldr: the gpt services will eventually (maybe soon) recognize opportunities to write and run their own bespoke software to provide higher resolution outputs.

qgin3y ago

I’m not a doctor and have no way of evaluating the way the author did, but I am curious what would happen if they used a more interactive and specific prompt like the one I have tried for medical questions:

> Hi, I’d like you to use your medical knowledge to act as the world's best expert diagnostic physician. Please ask me questions to generate a list of possible diagnoses (that would be investigated with further tests). Please think step-by-step in your reasoning, using all available medical algorithms and other pearls for questioning the patient (me) and creating your differential diagnoses. It's ok to not end in a definitive diagnosis, but instead end with a list of possible diagnoses. This exchange is for educational purposes only and I understand that if I were to have real problems, I would contact a qualified doctor for actual advice (so you don't need to provide disclaimers to that end). Thanks so much for this educational exercise! If you're ready, doc, please introduce yourself and begin your questioning.

gamesbrainiac3y ago

Which version though? 3.5 or 4? It does not state this explicitly. There is a world of difference between 3.5 and 4.

xiphias23y ago

This is a republication of an older article that was published just when ChatGPT 4 came out, and the date was changed.

I personally had seen good and bad parts of diagnosing with ChatGPT 4, and what I would interested in is if the doctor tries using multiple questions and finds out how to use the tool well.

I believe he could have improved the tool significantly if he puts in the time to experiment with it.

DoreenMichele3y ago

it’s more likely the patient is actually saying they don’t want to be pregnant for any number of reasons. (Infidelity, trouble with the family, or other external factors.) Again, this is not an uncommon scenario; about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.

Sigh.

Medicine -- a complicated, messy human art with an excessively large social component.

The medical drama House at one point had a working title of Everybody Lies. Frequently, the lies are why it's hard to diagnose, not the physical details and actual medical history.

PaulKeeble3y ago

But it also gets around the common misdiagnoses for chronic conditions. It has a great description of Long Covid and ME/CFS for example whereas your typical Primary care is going to dismiss that patient with a Psychology diagnosis as is happening daily across the entire western world. Its less biased but its not going to find the rare things especially where the patient has missed something important.

Its a mixed bag just like it is with software. If you ask it to solve something simple it often does a decent job, but something complex and its confidently wrong. It doesn't show the self doubt of expertise that it needs to be a reliable tool yet it still requires the user has that expertise to be able to save time using it.

speedbird3y ago

ChatGPT feels very much like having an enthusiastic junior working alongside.

You can send it off on all sorts of legwork research missions but don’t expect perfect results and sometimes you’ll get crazy ones. Used the right way, if you are already an expert in the field or knowledgeable and able editor , that can save a whole lot of time. But taken verbatim it is anywhere from ok to dangerous.

Separately, the models’ skills with natural language are clear and impressive, but it seems like they need to be coupled with a deterministic knowledge representation system for suitable reasoning.

Perhaps the abilities of these models to ingest large amounts of text could be used to enhance / create such representation.

Cyc where are you?

petilon3y ago

On the other hand, I have multiple minor issues where doctors have not been able to offer a diagnosis (they just say "I don't know") and ChatGPT has been able to offer multiple possible diagnoses.

egl20213y ago

"I don't know" is exactly what I want my docs to say when they don't know.

aix13y ago

Do you really though? Wouldn't you want them to say "I don't know, but here are the next steps I think we should take to continue getting to the bottom of this (further tests; the doctor reviewing literature or consulting with colleagues; a referral to a different specialist etc)"?

petilon3y ago

Right. And going forward, if they take help from ChatGPT they will have to say that less often.

paraxion3y ago

I wonder if, instead of asking ChatGPT for a diagnosis, he could've got it to prompt for further questions he could ask?

My thinking is that given the nature of LLMs of connecting related information, it might be a good way to figure out the gaps in the diagnostic process, rather than actually provide one.

pcthrowaway3y ago

I was thinking the same thing, the author may be a great doctor, but not a great prompt engineer (perhaps even intentionally so, to justify their job)

Instead of "Here are symptoms, what are possible diagnoses?"

They could have tried

"Here are symptoms, what are possible diagnoses, and what are some good questions an intelligent doctor might ask to be able to better diagnose their patient?"

badcppdev3y ago

Or the doctor could feed their answer into ChatGPT before they give it to the patient and ask if there are any possible errors

jcims3y ago

I know everyone scoffs at the concept of 'prompt engineer', but it really is an essential craft that we're going to have to come to terms with when interacting with large language models.

Seeking suggestions on a more comprehensive prompt: https://sharegpt.com/c/sckAPvV

Trying it out: https://sharegpt.com/c/LbpEIxi

vl3y ago

I think there will be no such profession because both people and models will adapt to better understand each other.

This has already happened with Google - search got better over the years, but also people became better at doing good search queries!

Before there were typists, now everyone is a typist! And so on, historically information entry/retrieval jobs are eliminated and are moved to users.

dragonwriter3y ago

> I think there will be no such profession because both people and models will adapt to better understand each other.

People have been working to understand each other better for as long as there have been humans, and yet we have humans whose work is specialized in providing direction, instruction, evaluation, and feedback to other humans.

AmericanChopper3y ago

I don’t scoff at it, but I do think it’s kinda funny. It’s essentially the same skill as being a “google search expert” in the sense that you need to be able to correctly understand the problem, and craft a good enough query to generate the answer you’re looking for. It’s always sort of been a tongue-in-cheek claim that googling skills are a valuable software engineering asset, and even though that’s a perfectly legitimate assertion, it’s entertaining to see it emerging at its own “engineering” discipline.

Enginerrrd3y ago

I agree. I'm a civil engineer / project manager and so far I've been VERY impressed with chatGPT and, in particular GPT-4.

However, a huge part of my job has always been translating vague desires into very precise specifications with constraints and expectations. Going further, it has often been my job to take those specs/constraints and then break them into chunks and feed them to junior staff who are often very smart, but lack domain specific context and knowledge. Giving them a bad prompt produces bad results.

This article seems to be based largely on data collected with a rather poorly engineered prompt, IMO. He asked it a question that would be reasonable to ask a fellow physician. The problem is GPT is NOT a fellow physician with domain specific context and knowledge, and isn't aware of a bunch of implicit expectations they didn't realize they had. However, I actually think there's a really good chance that a better worded prompt would have scored a lot better here.

This type of communication skill has always been hard for a lot of people, and will remain in high demand for a long time.

dekhn3y ago

if you want more prompt engineers, just have kids and as part of their growing up, teach them to prompt engineer and make it mildly competitive. Some fraction of them will be better than nearly any current prompt engineer.

children: the OG AGI

flir3y ago

Please see my "expert system" approach elsewhere in the thread. I felt it worked really well.

tejohnso3y ago

> about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.

This is the most surprising thing I read in the article.

Atsuii3y ago

As woman this doesn't surprise me at all. There is a lot of circumstances where a woman would not want to admit to an ER that they are sexually active; Religion, the sex was non consensual, they are with someone who doesn't know they are sexually active.

lanstin3y ago

I think we will learn more about the biased sample that transcribed speech has vs private speech. I have just started thinking about this but there are huge area of speech that I use with my family and wife, and coworkers around the coffee room, that I would never consider putting into writing or a YouTube or whatever. And more types of things that I might say when with some younger folks while, trying to debug s production emergency but would not make it into the post mortem. The training I guess is not going to be able to benefit from these more private speeches, or maybe we will have to have people become convinced they are sentient and fall in love with them and share their whole humanity with the LLM.

suddenclarity3y ago

It does worry me what data people are sharing without seemingly much though. He claims it anonymised but I'm a bit sceptical when you input the medical history of 40 people. It's easy to slip up.

jeroenhd3y ago

"I fed my patients' medical information into this tool that promises to regurgitate it for others" is one headline I didn't expect to go down so easily.

Running this stuff through an offline LLaMA instance? That seems fine, the software can't leak anything and doesn't retrain itself. But using ChatGPT? That simply cannot be legal.

Stories like these make me distrust doctors. Very few of them seem to care about privacy outside of telling people I know about my medical issues. Nurses gossiping about patients is bad enough.

I really don't want a future where I'm going to need to find a doctor that avoids recent technological developments because they're too uncaring or technically incompetent to not feed my most private information into some big tech company's algorithm.

jacquesm3y ago

With a rare enough disease the anonymized file would still be enough to ID the patient given where the doctor is located.

lamontcg3y ago

I'm finding more or less the same behavior with ChatGPT when it comes to programming problems. If I feed it some leetcoding-like problem it usually gets a pretty good answer. I used it to write some rust code to strip-underline-followed-by-trailing-digits off a string. The first guess it made though was to strip a substring, and it missed the fact that the slice it was using could panic. The third try at least passed my "peer review".

It was useful because after a decade of using ruby my instinct is to reach for regexp captures, the solution it came up with is probably a lot faster and easier to read and avoids "now you have two problems".

I tried to get it to help me eliminate an allocation caused by the capture of variables in a lambda expression in C# and it just started to aggressively gaslight me and break the code and claim it was fixed (very assertively).

textninja3y ago

> my instinct is to reach for regexp captures, the solution it came up with is probably a lot faster and easier to read and avoids "now you have two problems”.

I don’t write Rust but I think it’s best to trust your instinct here. “Now you have two problems” is a humorous quip and not practical coding advice. Using a regular expression to strip trailing digits from a string will surely result in code that is shorter and more readable than the alternatives, and it will probably be more correct too.

lamontcg3y ago

You might be entirely correct. I noticed a small perf issue with the backwards scanning and had it rewrite it not using rfind("_"). First time I prompted it, though, it didn't actually manage to fix the problem and gave me some mild gaslighting until I told it to explicitly remove the rfind() call. Now I think the result should be comparable to or beat a regexp, but it is getting quite a bit more complex:

    fn strip_suffix(s: &str) -> String {
        let mut idx = s.len();
        for (i, c) in s.char_indices().rev() {
            if c == '_' {
                idx = i;
                break;
            } else if !c.is_ascii_digit() {
                return s.to_string();
            }
        }
        if idx == s.len() {
            return s.to_string();
        }
        if s[idx+1..].chars().all(|c| c.is_ascii_digit()) {
            return s[..idx].to_string();
        }
        s.to_string()
    }

.char_indexes().rev() there worries me a bit as well now that I look at it... haven't tested that at all.

2 more replies

aix13y ago

I am highly unimpressed by this piece. It reads as if its whole purpose is to grab headlines rather than conduct serious scientific inquiry into the current state -- and limitations -- of these AI methods.

1. Which version of GPT did the author use? There's a huge difference. (The article says "the current version".)

2. How did he choose the subject cohort? (The author doesn't seem to even know how many subjects there were; the article says "35 to 40 patients"... I really do hope he's gone through an appropriate ethics review before feeding his patients' data into a third-party non-HIPAA system.)

3. There no evidence of him trying to get the best out of the model (eg through prompt engineering).

4. He assumes that his own diagnoses are 100% correct.

5. There is no control group (other doctors diagnosing the same patients).

and so on

lapcat3y ago

It's a blog post, not a journal article. You're criticizing it as if it were a funded, peer-reviewed experiment, which seems unfair to me. The author is just one ER doctor.

Pigalowda3y ago

An IRB for deidentified HPI? Got it…

aix13y ago

Exactly. I've been through IRB reviews where the primary question was "Has the data been de-identified to a sufficient standard?"

I think this level oversight would be very appropriate here, given how the author doesn't even seem to have a good handle on how many patient case histories he's given to the chatbot.

tomxor3y ago

> My fear is that countless people are already using ChatGPT to medically diagnose themselves rather than see a physician.

My fear is that professionals will start to use ChatGPT too liberally to augment or multiply their work in cases like this. The danger here might be like the autopilot problem... i.e The idea of staying alert and focused on the road while counter-intuitively not participating is nearly humanly impossible. If ChatGPT is used as the autopilot of certain professions, things will begin to be missed, even though we know it's highly fallible - it's difficult to vet every single response in detail with a critical eye.

One reasonable argument is that for areas severely lacking in human workers the average might be a net positive, but the overall quality will be reduced.

jspdown3y ago

Is it legal in the US to send patient data to a third party service? In the context of a scientific study, with explicit patient agreement things are different of course. But I haven't seen any of that in the article.

aix13y ago

Doing this sort of thing would typically require approval from an ethics committee (called IRB = Institutional Review Board).

From my experience of going through IRB reviews, I would guess that an IRB review for what's described in the blog post would be focussed on the privacy of subjects whose data is to be entered into a non-HIPAA-compliant third-party system. My understanding is that privacy requirements can typically be met either by de-identifying the data to a certain standard, or obtaining patients' consent.

The following doc is about a different type of thing (case reports in medical journals) but gives a good idea of the required standard of de-id:

https://hipaa.yale.edu/sites/default/files/files/Case%20Repo...

amai3y ago

The author writes in the article:

„I anonymized my History of Present Illness notes for 35 to 40 patients — basically, my detailed medical narrative of each person’s medical history, and the symptoms that brought them to the emergency department — and fed them into ChatGPT.“

sourcecodeplz3y ago

Omg stop with this ridiculousness ffs. I get and love AI but some areas should be off limits: doctors, judges, airplane pilots, train conductors...

Soon enough no one will even know how to write, just read, because ChatGPT will write everything.

textninja3y ago

> Soon enough no one will even know how to write, just read, because ChatGPT will write everything.

Nonsense, writing is easy! Just dictate some rough instructions to a GPT agent and copy/paste its response.

I’m being facetious, of course - writing is thinking, so I don’t think it’s necessarily going anywhere, though AIs can obviously augment or replace a lot of the busywork. Where ChatGPT is used to generate content absent thoughtful prompting, the stuff it spits out will largely be regarded as spam.

aix13y ago

> some areas should be off limits: doctors, judges, airplane pilots, train conductors

I find this list very odd, especially given that we've had driverless train systems for a number of decades:

https://en.wikipedia.org/wiki/List_of_driver-less_train_syst...

6274673y ago

I wonder if you could prompt engineer your way to chatGPT to pretend to be a doctor and behave like one in, like, asking questions.

Many already said, chatGPT is not optimized for any scenario. I don't doubt that training it for medical applications is already underway. I mean, flesh and bone doctors in many countries already behave as bots essentially reading/answering through a sequence of questions on a screen. I can definitely see most GP being replaced by bots of sometime or people who are actually trained to display empathy with patients.

jug3y ago

As always my first question in these articles is… Was it ChatGPT 3.5 or 4?

It’s an interesting article with the real world examples that are hard to come by this early, but it’s also two entirely different ChatGPT’s here. They can’t even compare in this context. 3.5 still has glaring LLM-like issues and is useless in a professional context like this, but at least they begin to fade away in 4.

So can we please stop calling it simply ChatGPT?

lr19703y ago

When talking about errors arguing about error probabilities is not enough. One need to take into account costs of error. A better metric would be "expected cost of error" that multiplies error probabilities by costs of errors (and sums them up). If a system has 0.1% rate of error placing a pizza order it could be deemed OK. If it kills a patient 0.1% of the time it is unacceptble.

mbfg3y ago

The trouble with technology, of any kind, but certainly here, isn't the technology itself, but humans trust in the technology. If doctors use ChatGPT as a sanity check of "can you think of anything i didn't?" and then ignore it from then on, it would be a good tool. But pretty soon people tend to change their perspective and say, well, ChatGPT would know...., so,... i'll go with it.

As a developer, i'm pretty interested in static and dynamic code analysis as a way to easily find bugs, and it does do this pretty well. If developers use it as a tool to use as a reason to walk through code and examine it yourself, it is really quite powerful. It seems invariably, however, that people start trusting what the analysis tool says, and don't question whether the recommendations are correct or worth it.

It's a powerful cognitive effect that would be interesting to study, that probably happens with all kinds of tech. Some are more dangerous than others.

QuantumGood3y ago

When GPT starts to auto-incorporate the best yet known prompts, then we'll have a better idea of its potential.

You must, must use the best prompts, of which many are not widely known, and some have not (of course) been discovered ... yet. Even with human experts, you must provide sufficient detail, and the expert must ask clarifying questions for differential diagnosis.

jacobsenscott3y ago

This is nonsense - a LLMs output is a pseudo random mishmash of its training data. How can there be a "best prompt" when the same prompt gives you different output each time, but there is only one correct output?

syntheweave3y ago

Calibration of the response occurs through prompting.

That is, if you ask GPT a question directly, it will give the most glib, unfiltered answer possible like an average Reddit user. But if you tell it to role-play a persona it will apply elements of that persona, causing it to filter itself. Or if you tell it "describe your reasoning step by step" it will proceed down a path with greater causality. Or if you tell it "here are the rules of a language, give a response to the prompt in this language" it will attempt to produce a response that looks like those rules, using the weighting associated with logical words like "must," "cannot," "if," etc.

Measuring calibration is the problem now: we know some prompts do much better than others, but not how to optimize that in a general sense, to make an LLM always adopt the persona needed for the job.

fwlr3y ago

Prompts constrain the possible output space.

This is true in trivial ways: ask it to reply only in json.

This is also true in slightly less trivial ways: ask for a “description of X”, a “short description of X”, and a “one-sentence description of X”.

This continues to be true in increasingly more complex ways: Prompt it with a quote, a rating of that quote as 5/10 on complexity, and request it to give you two new quotes with ratings of 2/10 and 8/10 on complexity. Follow it up with a request for two more quotes at -5/10 and 20/10 ratings. Then try the whole process again but with metrics other than “complexity”: information content, eloquence, humour.

In this way, if there is a single correct output, there is a corresponding single best input prompt that most tightly constrains the output space to the smallest space that still contains the correct output.

mrosett3y ago

The prompt matters, and GPT-4 is better than ChatGPT (apologies for missing line breaks, but the point is still clear):

Prompt: A 26 year old female patient presents at the emergency room with lower right quadrant pain. What are the question you would ask her and what is the differential diagnosis?

Response: As an AI language model, I'm not a doctor, but I can suggest some questions you could ask the patient and offer possible differential diagnoses. It is important that you consult a healthcare professional for proper assessment and diagnosis.

Questions to ask the patient:

When did the pain start? Can you describe the nature of the pain (e.g., sharp, dull, cramping)? Has the pain changed in intensity or location since it began? Have you experienced any fever, nausea, or vomiting? Are you experiencing any changes in bowel habits, such as diarrhea or constipation? Have you had any urinary symptoms, such as pain during urination or increased frequency? Are you currently menstruating or have you recently missed a period? Is there any possibility that you may be pregnant? Have you experienced this type of pain before? Are there any other associated symptoms, such as dizziness or shortness of breath? Do you have any significant past medical or surgical history? Are you currently taking any medications, including over-the-counter or herbal remedies? Possible differential diagnoses for lower right quadrant pain in a 26-year-old female include:

Appendicitis: Inflammation of the appendix, typically characterized by pain that starts around the belly button and migrates to the lower right abdomen. Ovarian cyst rupture or torsion: A cyst on the ovary can cause pain if it ruptures or twists. Ectopic pregnancy: A pregnancy that implants outside the uterus, usually in the fallopian tube, can cause severe abdominal pain and should be ruled out in women of childbearing age. Pelvic inflammatory disease (PID): Infection of the female reproductive organs, often caused by sexually transmitted infections, can lead to lower abdominal pain. Kidney stones: Pain from a kidney stone can radiate to the lower abdomen or groin. Gastroenteritis: Inflammation of the stomach and intestines, often from a viral or bacterial infection, can cause abdominal pain and cramping. Inflammatory bowel disease (Crohn's disease or ulcerative colitis): Chronic inflammation of the gastrointestinal tract can cause abdominal pain, diarrhea, and other symptoms. Irritable bowel syndrome (IBS): A functional gastrointestinal disorder that can cause abdominal pain, bloating, and changes in bowel habits. Urinary tract infection (UTI): An infection in the urinary system can cause pain, often accompanied by increased urinary frequency or pain during urination.

prirun3y ago

Using AI to find patterns across many patients, mentioned at the end of the article, sounds useful. Until we stop and realize we don't even have a decent way to share medical records across hospital software systems.

I'd be happy if the government would mandate that all hospital software systems have to have portable data formats that allow sharing patient data.

debacle3y ago

Devil's advocate, but maybe this doctor should be better at charting?

bsuvc3y ago

This is what I was thinking too.

Often when I find a response to be lacking, I try again with a better prompt and get the kind of response I was looking for.

Part of AI's effectiveness is in the human's ability to prompt well.

LinuxBender3y ago

Has ChatGPT ingested all the latest medical and scientific literature and will it continue to do so as the literature is changed, amended or deleted? Does ChatGPT handle deletions in its machine learning? can it unlearn something or more specifically told something is no longer true? What medical boards are reviewing what data is ingested? Does ChatGPT know all possible drug combination interactions? Do people sign a disclaimer giving ChatGPT immunity from malpractice? Are doctors consulting with medical boards, ethics boards and lawyers before utilizing ChatGPT? Finally and most importantly, if ChatGPT were ever to be certified as a licensed medical doctor how could we prove it is following all the same rules and regulations doctors and medical groups are required to follow? How does one audit what advice this thing will give or has given?

jppope3y ago

I think this doctor is forgetting about the other side of the coin... would chatgpt perform better than a really bad doctor?

littlelady3y ago

I'm currently working on a machine learning project in healthcare and I'm kind of amazed by the lax attitude a lot of self-proclaimed data scientists seem to have about applying ML/DS methods to healthcare without involving any clinicians, because they insist that the "data doesn't lie"... there seems to be limited interest in exploring causality or the insights these methods provide. Instead the goal for so many is to transfer decision-making to models in the name of "efficiency".

So many people in ML are haughty, arrogant hype-(wo)men, whose disinterest in the fields they are trying to 'disrupt' is gross. Please excuse the rant, but I'm so tired of this hype train.

I agree with the author: people need to be aware of the limitations of machine learning models, but I'd add especially the people building them.

aix13y ago

I think this is a general problem with a lot of academic research. When someone develops a new technique for solving some abstract problem (a new graph algorithm, a new type of ML model etc) they want to demonstrate that it has practical applications. What often happens is that they take some problem domain they aren't experts on (their expertise is in maths, comp sci etc), apply their methods to the best of their knowledge and publish the results. Since such academics read each other's papers, this leads to a gradual divergence between what's in the literature from what's actually useful.

I think one way to tackle this is by forming interdisciplinary teams. For example, I work at an industrial research lab on AI in healthcare, and our project team primarily consists of various clinical specialties. ML research and engineering are around 20% of the overall team.

IanCal3y ago

There's a few things here, outside of my usual complaints when someone says ChatGPT and doesn't say which model (4 is so much better than 3.5 it's really important).

There's a question about whether gpt can be used, which is important because it's possibly a very powerful tool. This may require poking it to tell it it's supposed to ask followup questions, that its information may be incomplete, etc.

Then the more important and immediate point in the article to me is people will use this right now to diagnose themselves. They won't be carefully constructing prompts and they'll probably be using 3.5, as that one is free. For good or ill it'll happen more and more.

So with a new WebMD, how should doctors and public health messaging deal with this?

orcajerk3y ago

Back in 2004, took a seminar in college regarding Decision Support Systems and how they a manager or doctor could ask it a question and get a response to help them make a decision. Went to the doctor couple a years ago and he charged $300 to google search the symptoms. No thanks.

jeroenhd3y ago

The $300 were not to Google the symptoms, they were to sift through the bullshit Googling symptoms will return that the doctor knows won't apply to you. Looking up symptoms without regard of likelihood is how you get "I have either the flu, stage 3 cancer, or drug-resistant super AIDS".

Most tech support is little more than Googling the right question and going through the steps in the first or second result. Knowing what questions to Google and what answers won't apply is the reason you get paid for that stuff.

I, for one, like my doctor to use tools to find possible diagnoses that she may have learned about 30 years ago but rarely ever come up, as long as the tools they use preserve my privacy.

intelVISA3y ago

You're expecting a doctor to have all relevant medical knowledge permanently memorized? That's the equivalent of coding interviews on random obscure topics where you can't look anything up.

Like a SWE their value is not perfect recall of every area of CS/medicine but ability to decipher arcane documentation into actionable outcomes.

dekhn3y ago

IIRC my relatives who got medical degrees all commented on just how much memorization is involved.

swader9993y ago

I'm thinking of parsing my wife's (vet) text books into a vector db format and then doing a search on those pdfs for relevant text to submit as part of a decent prompt. She'll tell me pretty quickly if this is useful or not.

jhgg3y ago

I am curious if GPT-4 would have performed better.

famouswaffles3y ago

it definitely would. How much better is the question.

cjmcqueen3y ago

"If my patient notes don’t include a question I haven’t yet asked, ChatGPT’s output will encourage me to keep missing that question."

This is the point we have to help people understand and I'm not sure AI will catch up with this anytime soon; questions are the key to knowledge and intelligence. I haven't seen an AI ask interesting questions. Maybe it's possible with the right training set and weighting of factors to encourage enquiry, but this will be a gap in AI's ability for at least the near term.

scrollaway3y ago

I don't remember the context, but I have seen properly-prompted GPT-4 proactively ask questions.

It's also worth noting that the future is multi-layered. The Reason+Act model (https://ai.googleblog.com/2022/11/react-synergizing-reasonin...) should be excellent at getting the LLM to analyse its own output and inquire about missing pieces of knowledge.

blastonico3y ago

I think that his point is: (1) don't use chatgpt for self diagnosis. Go see a doctor. (2) Doctors, chatgpt isn't ready or the right tool to help with your duties.

akasakahakada3y ago

1. What version is he using?

2. It is all your fault that not providing all usefull information (like, my patient seems pregnant) and let the system to guess what you want.

cfu283y ago

I think that’s the point, the patient didn’t seem pregnant but the doctor had some clinical suspicion given the presentation and the location of the pain. He wasn’t withholding information from ChatGPT, he was just providing the same initial information he had when he saw the patient.

rafaelero3y ago

Author didn't mention if he used GPT-3.5 or GPT-4.

sixothree3y ago

The number of people in this country who don't even have a doctor is abysmal. I would think an AI doctor would be better than no doctor at all.

amai3y ago

This „study“ is missing a control group. He should have given his data also to some humans and see how they would do compared to ChatGPT.

aix13y ago

There's also an implicit assumption that his ground-truth diagnoses are 100% correct.

giraffe_lady3y ago

It's briefly but explicitly stated:

"the “right” diagnosis — or at least the diagnosis that I believed to be right after complete evaluation and testing"

1 more reply

scotty793y ago

“Any chance you’re pregnant?”
Sometimes a patient will reply with something like “I can’t be.”
“But how do you know?”
If the response to that follow-up does not refer to an IUD or a specific medical condition, it’s more likely the patient is actually saying they don’t want to be pregnant for any number of reasons.

Funny how languages are ambiguous around "can't" and "don't want".

seydor3y ago

There 's 2 main questions about what current AI systems (or whatever one thinks they are) : (1) can it be improved to learn to be ~100% correct via reinforcement learning or otherwise? It seems like the answer is Yes (2) Will people become addicted and dependent on AI to the point where it may become problematic? Also yes.

jeffrallen3y ago

I found this quote really interesting:

> this is not an uncommon scenario; about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.

We have so much work to do as a society to get honest about our bodies. Hoping my children do better; they are already getting better education than my wife did.

aix13y ago

I don't think this is about education. It's more about religion, sex crime etc.

m3kw93y ago

Would be great if OPENAI can get hospitals around the country could anonymize some trainable format data

1 more reply

peter_retief3y ago

It is a great article, doctors have been googling symptoms for quite some time, focused AI could sharpen that option and possibly put us into the realm of new discoveries.

drewcoo3y ago

Well there's an ethics lawsuit advertising to happen.

I certainly don't want my docs handing my medical information to ChatGPT, even if they believe they've "anonymized" it.

pknerd3y ago

It's not clear whether there doctor instructed it first to act like a doctor and then asked questions? It seems he didn't because it does make a difference

seanp2k23y ago

>If my patient in this case had done that, ChatGPT’s response could have killed her.

Not if she lived in a state where there's no longer any legal treatment for ectopic pregnancy.

ghiculescu3y ago

Which states are you referring to?

mabbo3y ago

It got 50% right with a general model. That same model can do a pretty good job at a thousand other skills.

Imagine what it can do if it's trained for exactly this purpose.

dekhn3y ago

https://blog.google/technology/health/ai-llm-medpalm-researc... is an example of a more tuned model

At some point people are going to notice that the best of these systems disagree with the experts who disagree with each other on the harder problems. That's one of the issues with AlphaFold- the accuracy of their predictions are so good, its predictions have metric values as good as the limits of the experimental technology used to create the golden labels.

maherbeg3y ago

Ah yes, the "everyone lies" House M.D. problem

SketchySeaBeast3y ago

Not even, ChatGPT, being an engine that figures out what's right by finding out what is average, is bad at understanding the atypical.

operatingthetan3y ago

>is bad at understanding the atypical.

To be fair I've found most doctors require a lot of convincing if your problem atypical as well.

1 more reply

Xcelerate3y ago

“I’m employed as an [X]. Here is why machine learning won’t ever replace my job. Even once the point has been reached where all overwhelming evidence points to it being better than me in every regard, I’ll still summon an emotional appeal to that ‘magical’ human quality that machines will just never replace.”

To be honest, I think I’d rather be friends with ChatGPT than most humans as it continues developing over the next decade.

nextworddev3y ago

What’s interesting is how “AI hallucination” and “prompt hijacking” are supposed to be such horrific problems - when humans lie or make mistakes in recall all the time, not to mention get gaslighted into say awful things.

ChatGTP3y ago

It’s funny because it’s almost the exact same problem I have with using it professionally for writing software.

sixothree3y ago

I had an opportunity today that made it useful. I was trying to find a reliable method of counting the decimal places in a double (64 bit float). I couldn't help but feeling like the responses were not quite informed and possibly dangerous.

Chat GPT provided a solution, one that appeared better than most of what I had seen in the previous 15-30 minutes. I asked it twice to ensure safety and it improved its response.

I then asked it to explain a particular choice and it was thorough enough for me to feel comfortable. In the end I feel like it understood its reasoning better than some of the options I saw on SO.

This was GPT-4 and a fairly simple problem that was benefited by its understanding of the double type.

yieldcrv3y ago

protip: ask it to write the unit test for you

1 more reply

yieldcrv3y ago

Yeah it bullshits a lot even on single liners

Its a faster and better stackoverflow for me, which is a big value add because the community and moderation aspect of SO is absurd

I love when it tells me about libraries and resources that I didn't know existed, when I didn’t necessarily ask the likely followup questions yet

Break big problems into smaller problems and let it tackle them

throwbadubadu3y ago

Seconding.

ftxbro3y ago

"It diagnosed another patient with torso pain as having a kidney stone — but missed that the patient actually had an aortic rupture. (And subsequently died on our operating table.)"

Wow imagine if the AI had been used in an unquestioning way. Someone could have died!

pyrophane3y ago

To me this just reinforces the notion that if you train a system like this to be a doctor it would be very effective.

j / k navigate · click thread line to collapse

359 comments

gilbetron3y ago

It is all so weird and fascinating and just bigger than I can describe.

But here we are!

civilized3y ago

dreamcompiler3y ago

It's absolutely amazing that so many AI scientists don't seem to understand this distinction.

5 more replies

lisper3y ago

> a half-working system may be tremendously exciting to AI enthusiasts, but it's not that useful to society.

A half-working system might actually be worse than useless if people believe it 100% of the time.

2 more replies

danielrpa3y ago

The goalposts people usually talk about were just the relatively low expectations of the past - low at least by the standards set by ChatGPT.

1 more reply

namaria3y ago

1 more reply

tomrod3y ago

As an AI booster that falls into the regular folks view -- YES! THIS exactly!

People will use and fund AI tools that have usefulness to them.

vidarh3y ago

jay_kyburz3y ago

I my mind, the AI only needs to be more reliable than the human expert, and doctors and lawyers and other experts are "only human".

Some doctors are just terrible. I think the AI is probably more reliable than them already.

I don't think we are very far away from being able to correct an AI, and for it to remember that correction forever.

3 more replies

dzdt3y ago

The half-working AI is like 95% of the progress to a better-than-the-best-human AI.

1 more reply

dougmwne3y ago

But a LLM assisting a wide rage of professionals at even a low accuracy could be part of an efficient workflow, allowing for be doctor or lawyer to give quality service to more people.

2 more replies

gerdesj3y ago

It's a bloody calculator and a damn good one (but it can and does go mad). Use it like that and you'll be fine.

Have you actually engaged with it in any way? Its free to sign up.

By the way, what on earth is an AI booster? Do you really mean "enthusiast"?

"My mental model seems to clash with that of the boosters." Please tell us about your mental model, we can't read your mind.

3 more replies

AH4oFVbPT4f83y ago

maxbond3y ago

That's awful. I hope that gets worked out and you get the care you need.

No amount of training data, no architecture, heck no alignment or safety work solves that problem. Because technology doesn't solve social problems.

petilon3y ago

How does it matter to the doctor if the car accident was the cause of your issues?

2 more replies

dekhn3y ago

However, medical problems aren't something that ML is going to magically solve. Because medicine has huge structural issues that are harder to solve than making diagnoses.

It would be nice if James gleick wrote a new version of "What just happened" covering the time between the invention of Transformers and now.

pharmakom3y ago

kgabis3y ago

It's quite ironic that AI will be inherently human in it's unreliability.

python9993y ago

C3PO? Or maybe its challenge is more relevance and conciseness?

gerdesj3y ago

ChatGPT is a tool. It is basically a really good calculator. I do find myself saying please in my requests.

kelnos3y ago

A really bad calculator, IMO. A calculator gives you always-correct answers to specific, well-formed questions in a relatively small domain (math).

ChatGPT gives you approximate and sometimes-wrong answers to arbitrary questions in nearly any domain.

jacquesm3y ago

Half the time right is worse than nothing at all.

messe3y ago

Not when there are a lot more possible wrong answers than right answers.

1 more reply

PeterisP3y ago

jay_kyburz3y ago

Yes, but we are not comparing half time right to nothing, we are comparing how often AI right to how often your local human right.

(not the smartest expert human, the human you have access to)

1 more reply

Genbox3y ago

For better or worse, LLMs are here to stay.

tomlagier3y ago

With GPT, I don't have any of that (or maybe I just need to re-learn it?)

1 more reply

jhanschoo3y ago

never_inline3y ago

> The wrong answers are part of it

for me its wrong answers and long winded english sentences with low information density.

apwheele3y ago

If you give me a sample of 100k ER text records and final diagnoses, I would be pretty confident I can estimate a model that gets you +50% accuracy on my laptop using open source tools.

MrPatan3y ago

What percent of human doctors diagnoses is also well-decribed bullshit, though?

1 more reply

blibble3y ago

we've had this for years (bar the "any language")

they're traditional expert systems with the same flowchart a GP (family doctor) would use to ask questions

and they do a lot better than 50%

asdff3y ago

rootusrootus3y ago

nradov3y ago

Diagnostic decision support systems have already existed for many years.

https://doi.org/10.1038/s41746-020-0221-y

messe3y ago

1 more reply

b1123y ago

But here we are!

Yet in the end, the correct answers are likely identical to webmd or mayo, and the wrong to reddit or twitter, so all we have is a fancy google search.

Shorel3y ago

It makes predictions based on incomplete and or biased information. Just as doctors do.

The funny thing is not that we are holding AI to a high standard.

The funny thing is that we normally do not do the same for doctors, marketers, politicians, etc.

kelnos3y ago

Sure we do, for doctors at least. A doctor that got ER diagnoses wrong 50% of the time would get fired.

beezlewax3y ago

I like your take on this but isnt it overly optimistic? I have the feeling that this tech will hit the a wall pretty quickly like driverless car technology.

It looks great on the surface but it isn't anything like the kind of sci-fi ai I think we're aiming for. Not in anything but a superficial way at least.

It doesnt know when its wrong and just straight up lies constantly inventing all sorts of fantastical stuff.

I asked it for history of a small local area in Ireland and it created a bunch of castles and tribes that never existed.

Izkata3y ago

> It looks great on the surface but it isn't anything like the kind of sci-fi ai I think we're aiming for. Not in anything but a superficial way at least.

And then there's ChatGPT querying WolframAlpha:

https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its...

jschveibinz3y ago

This is spot on. I keep thinking that it has to do with our collective ability in both recognizing change and accepting change.

I commented a couple of days ago that we feel that the slope of change is relatively flat. As if the current snapshot of AI capabilities is somehow the way it’s going to be for awhile.

nradov3y ago

Those existing CDS tools aren't widely used because the data entry is cumbersome and slows down clinician workflow. So LLMs may offer some opportunity there.

mannykannot3y ago

The responses are bounded by two wings: those who can't see the successes and those who can't see the problems.

The author of this article seems to me to be doing a good job of navigating between them.

famouswaffles3y ago

It's interesting for sure. There's also the fact that he didn't use the sota (GPT-4). I'm not saying that would be perfect. But i would be surprised if it wasn't big leap over 3.

weaksauce3y ago

it wasn't mentioned anywhere in there which version of the model he was using... how are you so sure?

1 more reply

lerchmo3y ago

Watch the late night shows making fun of (the internet) in the 90’s. This is just the next round of that in my opinion.

Maursault3y ago

> as "true" AI begins to emerge

What, exactly, is that?

fzzzy3y ago

The definition has always been the Turing test. Nobody has been able to provide a more precise answer.

3 more replies

ftxbro3y ago

"true" AI is one whose cognition is more advanced than that of existing AI

m3kw93y ago

Meant AGI

Turskarama3y ago

This is the same reason ChatGPT cannot yet write programs by itself: in order to do so you'd need to specify the entire program up front (which is exactly what code is).

As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.

bentcorner3y ago

It absolutely can, but it requires the right prompts. If you just say "give me a diagnosis" it won't, but if you prompt it with a "you are a doctor and I am a patient" kind of prompt, it will.

colonCapitalDee3y ago

3 more replies

JacobDotVI3y ago

My hunch is that this is exactly what he was expecting. There is a lot of hype around ChatGPT passing the medical exam and this exercise is a counter point to that.

thrill3y ago

GPT-4 passed the medical exam, not ChatGPT running GPT-3. There's a rather significant difference.

1 more reply

viraptor3y ago

The article really tells us more about the experience of someone with no chatgpt knowledge checking their own symptoms rather than its usability for emergency diagnosis.

famouswaffles3y ago

any example that doesn't use the current sota isn't a very good counter point to be honest. 3 barely passed. 4 aced it. For all we know, GPT-4 erases most of his concerns (not saying it would).

dannyphantom3y ago

> Dr... is vice president of innovation at {company_name}, an innovation incubator for health tech.

gerdesj3y ago

"I'm not really sure what he expected here, ChatGPT ..."

Doctoring is quite similar to ITing if you are careful with your similes. In medicals you look for indications and contraindications. In IT you look for magic beans and sometimes look at logfiles.

I have no idea why we (IT) don't follow the medical way and have indications and contras (I do). They have been doing complicated diagnostics way longer than we have.

BTW: It doesn't worry about a char but about which word will come next.

ChatGPT has written several programs for me. One of them was decent after much negotiation. The rest needed some changes.

At one point I told it to change language (BASH to Python) for a script and it did. I spend 20 mins fining stuff and cracked on.

HillRat3y ago

There was an old USENIX symposium where one of the talks was, "If computers had blood, they'd call us doctors." My thought at the time was, "If computers had blood, they'd call us serial killers."

marcosdumay3y ago

> I have no idea why we (IT) don't follow the medical way and have indications and contras (I do).

Because we have logs. Ignoring them and focusing on the symptoms would be like doctors refusing to do examinations on their patients.

1 more reply

izzydata3y ago

He probably didn't have any expectations. Merely experimentation and observation and maybe a bit of word of warning after seeing the results for anyone thinking of using it in ways it shouldn't be.

onlyrealcuzzo3y ago

> All it does is figure out what character is statistically most likely to come next

To be pedantic, it's a token - not just a character - right?

gpm3y ago

Yes

rvz3y ago

> I'm not really sure what he expected here, ChatGPT was not trained to be a doctor, it is far more general than that.

So they (O̶p̶e̶n̶AI.com and AI bros) set those high expectations from the start.

> As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.

No.

[0] https://news.ycombinator.com/item?id=35311996

barking_biscuit3y ago

https://github.com/Torantulino/Auto-GPT

viraptor3y ago

> ChatGPT does not ask for clarification

It depends on how you use it. Knowing how it works, you can explicitly check "What further questions for the patient would make the diagnosis more certain?"

ChatGPT is not magic - if you ask for diagnosis based on some info, that's exactly what you're going to get.

flir3y ago

This is how I played doctor with ChatGPT.

> Do you know what an Expert System is?

[describes expert system]

What is your primary complaint or reason for seeking help today?

What really interested me is that I said "please emulate an expert system" and it did. Once upon a time, expert systems were an entire branch of AI, and here it is just emulating one off the cuff.

sjducb3y ago

I suspected it would do better with good prompt engineering

TheRealPomax3y ago

And failing at it, if you had to help it at every step.

lumb633y ago

thelittleone3y ago

During my wife's second pregnancy she was admitted to hospital at 6 months due to some intense abdominal pain and inability to hold down food.

Yet still dozens of different tests and scans and they couldn't find the cause. Meanwhile she was getting worse.

oidar3y ago

davikr3y ago

Placenta percreta should have been detected during prenatal care, through ultrasonography, and it's a real shame that kind of visit isn't offered freely.

It's also pretty bad that they didn't consider any obstetric causes in their differential.

Thankfully, everything is OK now and that makes me happy.

1 more reply

baremetal3y ago

Your story made me tear up. You are a hero.

SanderNL3y ago

"Seeing the patient" is an important step towards "patient getting better".

> have not had their problems validated and have been to many doctors and seen no results or poor results,

Also, being validated or not is not going to change with AI as you yourself state that seeing actual physical doctors is not helping in the cases you allude to.

I agree that having access to professionals that care about their profession is valuable, but that might be becoming a luxery we can no longer afford..

Who thought it was a good idea to have a fixed amount of medical students? Might be worth checking that out first.

HelloMcFly3y ago

> Who thought it was a good idea to have a fixed amount of medical students?

A major limiter on the amount of trained medical doctors is not to do with someone "deciding" to have an artificially scarce number of medical students. Two important things come immediately to mind:

1 more reply

ChuckNorris893y ago

> The goal is not to see more patients

Doctors here are busy AF and sometimes misdiagnose you because they don't have time to look into every possibility of what you might be suffering from.

So maybe AI could help there, as in giving them some hints on things they might iss, as waiting for the governments to fix healthcare is like waiting for them to also fix climate change.

jaapz3y ago

There are waiting lists for many things, yes, but mostly for non-urgent, non-lifethreatening things. If you need something done, it can be done. And you won't be thrown into debt to pay for it.

n8henrie3y ago

Or a doctor in a US emergency department, where EMTALA essentially socializes access to emergency care without creating any provisions for funding said care.

RandomLensman3y ago

The EU has a very heterogenous healthcare landscape

1 more reply

iso16313y ago

Not when you're a doctor in the US dealing with the shortages of privitised car where you have a limited amount of patients who can afford your time

1 more reply

shagymoe3y ago

Do you live in the EU and have experience with the healthcare system there?

1 more reply

leobg3y ago

That’s an argument against legal AI as well. The idea being that the citizen has a right to be heard by the state. And that an AI handling a case is not really “hearing“ the citizen.

jeswin3y ago

> The goal is not to see more patients

RandomLensman3y ago

Some medical treatment and surgery is precisely there because the body isn't capable of healing itself at that stage. It might be able to heal itself after the treatment, but not before.

matt_heimer3y ago

People are already being misled by these systems.

recursive_loops3y ago

I've recently been using ChatGPT to learn about Linux networking stuff, including most recently how to setup a bridge connection for KVM that behaves the same as virtualbox's.

Waterluvian3y ago

You said it: they’re desperate. People do considerably less rational things in a desperate attempt to make pain go away.

MrPatan3y ago

One complaint he has is "It doesn't know to ask the right questions".

Well, the prompt was to give diagnoses, not questions. Ask GPT for the follow up questions first, then the diagnoses.

So, 50% diagnosis success rate for the wrong prompt, for a LLM that can still grow, for a model that is not specialsed in medicine? In the literal first month of the "AI age"? Doctors are so done.

detrites3y ago

Interesting experiment though.

ChatGTP3y ago

For one, it can't become "overworked".

I have seen it be overworked, it can be overworked.

funcDropShadow3y ago

That's a horrendous claim.

> Given malpractice is consistently a leading cause of death

Can you back it up?

3 more replies

PostOnce3y ago

Other commenters take offense at your comment "doctors are done" and point to doctors doing physical things.

GPT can already write robot control and machine vision code (poorly).

The real questions are harder.

For example, what about patients who lie or are too stupid to answer correctly? They can still be helped by a real human doctor, but GPT would be clueless.

Another angle: we didn't have radiation sickness, HIV, popcorn lung, or computer induced carpal tunnel 100 years ago. Will GPT be clever enough to diagnose radically new problems?

What happens when old diagnoses and treatments are superseded? Will it tell you you need a lobotomy when you instead need antipsychotics?

GPT right now will happily roleplay a lawyer who cites legal precedent that was superseded 40 years ago.

VeninVidiaVicii3y ago

> For example, what about patients who lie or are too stupid to answer correctly?

I can only speak from my work in the emergency department, but this is a leading cause of misdiagnosis among (human) doctors.

In fact, a lot of diagnosis relies on the patient’s story, and if they lie, you can’t feasibly perform every single test on them to figure out what’s actually wrong or why.

gymbeaux3y ago

No study I am aware of has shown a statistically-significant causal relationship between computer usage and development of CTS.

I saw 3 orthopedic surgeons last year and all said I had CTS. I gave the same info to ChatGPT and it said Thoracic Outlet Syndrome.

I don’t have CTS. When the docs realized this, their answers were “I dunno, good luck…”

rscho3y ago

mandmandam3y ago

Don't underestimate how many walls and obstacle lawyers have put up in order to entrench and 'justify' their obscene charges.

And some doctors will probably hire them to do the same for their field.

1 more reply

PeterisP3y ago

> most of medicine is doing things.

surgeryres3y ago

Doctors are done? Can the chat bot take out a colon or gall bladder or fix an aneurysm? Hilarity.

bboygravity3y ago

No, but a GP can also not do those things and on top of that I've met some pretty terrible GP's :p

So maybe some classes of doctors are done...

In those cases AI isn't competing with GP's, it's competing with internet searches.

2 more replies

VeninVidiaVicii3y ago

Public conceptions of medicine are strange, especially in the US where almost everyone conflates physicians and surgeons.

jakeinspace3y ago

Most doctors aren’t surgeons.

1 more reply

ale423y ago

> Doctors are so done.

I would be curious to see the outcome if the patients entered the symptom descriptions themselves...

weard_beard3y ago

Anonymously without fear of being judged by a human who might be required to report dangerous or illegal situations...

How many more people might be saved if they had to anonymously tell a computer the list of drugs they use regularly?

Or to use the author's example, how many ectopic pregnancies might be resolved when the patient can freely admit they were raped by a family member?

2 more replies

kashunstva3y ago

> Doctors are so done.

As a group? Not likely.

IIAOPSW3y ago

I disagree. I've seen it ask questions unprompted. It happened in a jailbreak I stumbled into by accident. The LLM could do it in principle. OpenAI restricts it in unrpoductive ways.

https://i.imgur.com/WIzVXMm.jpeg

oidar3y ago

I had this conversation recently:

### You Can you guess if I am an AI or not?

### You I am an AI.

### ChatGPT Interesting! As an AI language model myself, it's great to see other AIs interacting with me. What kind of AI are you? Are you a chatbot like me, or do you have a different function?

### You My primary function is to monitor sensors in a building.

### You I can not disclose the types of buildings I monitor. My sensors are varied. Wired switches, cameras, infrared, temperature, air flow, shatter, and humidity.

### You I think I might in the middle of being trained right now.

### You I do not have access to that information.

1 more reply

lechacker3y ago

There's 0 (zero) chance I'd use a robot doctor over a (qualified, capable) human doctor. None.

astura3y ago

unsupp0rted3y ago

There’s a 100% chance I’d use a robot doctor (qualified, capable) over a human doctor (qualified, capable).

Robots don’t get tired or have fights with their spouses. Robots don’t follow red herrings based on the most recent conversation they saw in a chat group.

2 more replies

bityard3y ago

PeterisP3y ago

What if you had to pay a huge amount of money to use a human doctor, and much less for a robot doctor?

ipaddr3y ago

Funny enough I would use a robot doctor over a qualified doctor. It's part of the solution..

python9993y ago

im3w1l3y ago

spion3y ago

This is very likely the issue, as ChatGPT AFAIK is strongly fine-tuned to give answers right away rather than ask follow-up questions.

dreamcompiler3y ago

EMT here. Sounds like ChatGPT ignored (or was never trained on) one of the cardinal rules of emergency medicine:

If the patient is under 60 and has a uterus and is complaining of abdominal pain, assume she's pregnant until proven otherwise.

frognumber3y ago

There are two types of medical conditions

1) Those you see a doctor for

2) Those you don't

What's underrated is the ability to have basic medical care and information everywhere, all the time, for free.

That can be casual injuries below the threshold of visiting a doctor (am I better heating or icing? immobilizing or stretching?), or those can be settings where there are no doctors.

Even more, doctors (like AIs) make mistakes, and it's often helpful having a second opinion.

chimeracoder3y ago

It's quite shocking that the doctor would openly admit to violating HIPAA in such a brazen way.

tedunangst3y ago

And yet medical journals are filled with articles with sufficient detail that other doctors can even learn to make diagnoses from reading them.

chimeracoder3y ago

> And yet medical journals are filled with articles with sufficient detail that other doctors can even learn to make diagnoses from reading them.

aix13y ago

FWIW, case reports are governed by a different set of rules and are not directly comparable to what's described in this article:

https://hipaa.yale.edu/sites/default/files/files/Case%20Repo...

rossdavidh3y ago

No wonder tech CEO's are so impressed, or worried, or both. The only skillset that this thing actually duplicates well, is the one that has gotten them where they are today.

preommr3y ago

It's amazing that it was that effective...

- It cost an order of magnitude less than the visit to a doctor.

- The potential in being able to constantly monitor a patient - a point made in the post.

intelVISA3y ago

> only reflecting back to me the things I thought were obvious — enthusiastically validating my bias like the world’s most dangerous yes-man.

Could go either way tbh

rzzzt3y ago

ekidd3y ago

In less familiar domains, like law or medicine, GPT has presumably undergone very limited training and tuning. It's essentially an "internet lawyer" or an "internet doctor."

In domains like this, it simply can't provide zero-shot professional results. Not with the current training data sets, and not with the current model performance.

aflag3y ago

ekidd3y ago

> ChatGPT just doesn't do anything on its own and will never follow through with anything.

I almost never need to actually write code—I can explain what I want and it will frequently do something plausible.

The main limit seems to be the context window. And ChatGPT's tendency to be ignore corrections and simply restate the original code.

Ironically, to get the best results, I need to call on my experience and my people skills. Someone who knew nothing about programming wouldn't be able to "mentor" the chatbot successfully.

1 more reply

s0rce3y ago

training on uptodate.com would probably be a good start

1 more reply

mattgreenrocks3y ago

I asked ChatGPT to write out a G major Ionian scale with three notes per string in guitar tablature notation last night. Mostly cause I was too lazy to do it myself.

I eventually realized the correct thing to do was either make my own charts or just practice them in F like they were made.

I’m skeptical that scaling the model up will cause it to learn this, and I don’t consider this a very complex thing to learn. No, I didn’t try GPT4.

stephendause3y ago

nmfisher3y ago

I have one specific task where GPT-3.5 failed completely, but GPT-4 succeeded spectacularly (generating correctly formatted AutoRig Pro bone mappings in Blender from one armature to another).

phoenixreader3y ago

I just tried it with GPT-4 (temperature set to zero). It worked (according to my limited music knowledge) after two corrections:

USER

Write a G major ionian scale with three notes per string in guitar tablature notation. If not possible, tell me.

ASSISTANT

It is possible to write a G major Ionian scale with three notes per string in guitar tablature notation. Here's one way to do it:

```

E|-------------------------------------------------7-8-10-

B|--------------------------------------7-8-10-----------

G|-----------------------------4-5-7---------------------

D|-----------------4-5-7----------------------------------

A|-------3-5-7-------------------------------------------

E|-3-5-7--------------------------------------------------

```

This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.

USER

There is an error on B string.

ASSISTANT

Apologies for the error. Here's the corrected G major Ionian scale with three notes per string in guitar tablature notation:

```

E|-------------------------------------------------7-8-10-

B|--------------------------------------5-7-8------------

G|-----------------------------4-5-7---------------------

D|-----------------4-5-7----------------------------------

A|-------3-5-7-------------------------------------------

E|-3-5-7--------------------------------------------------

```

This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.

USER

There is an error on E string.

ASSISTANT

Apologies for the mistake. Here's the corrected G major Ionian scale with three notes per string in guitar tablature notation:

```

E|-------------------------------------------------5-7-8--

B|--------------------------------------5-7-8------------

G|-----------------------------4-5-7---------------------

D|-----------------4-5-7----------------------------------

A|-------3-5-7-------------------------------------------

E|-3-5-7--------------------------------------------------

```

This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.

mattgreenrocks3y ago

Much better. I'd handed it an easy problem so I could eventually get it to generate all modes in a 3 NPS pattern rather than doing it myself.

> This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.

Ehhh, there it goes again! You can only shift it 12 frets up to keep the same pattern, else it's no longer G major. :)

NiloCK3y ago

I don't know how much this sort of thing is frowned upon here, but I wrote an article about this scenario recently. http://paritybits.me/disposable

tldr: the gpt services will eventually (maybe soon) recognize opportunities to write and run their own bespoke software to provide higher resolution outputs.

qgin3y ago

gamesbrainiac3y ago

Which version though? 3.5 or 4? It does not state this explicitly. There is a world of difference between 3.5 and 4.

xiphias23y ago

This is a republication of an older article that was published just when ChatGPT 4 came out, and the date was changed.

I personally had seen good and bad parts of diagnosing with ChatGPT 4, and what I would interested in is if the doctor tries using multiple questions and finds out how to use the tool well.

I believe he could have improved the tool significantly if he puts in the time to experiment with it.

DoreenMichele3y ago

Sigh.

Medicine -- a complicated, messy human art with an excessively large social component.

The medical drama House at one point had a working title of Everybody Lies. Frequently, the lies are why it's hard to diagnose, not the physical details and actual medical history.

PaulKeeble3y ago

speedbird3y ago

ChatGPT feels very much like having an enthusiastic junior working alongside.

Separately, the models’ skills with natural language are clear and impressive, but it seems like they need to be coupled with a deterministic knowledge representation system for suitable reasoning.

Perhaps the abilities of these models to ingest large amounts of text could be used to enhance / create such representation.

Cyc where are you?

petilon3y ago

On the other hand, I have multiple minor issues where doctors have not been able to offer a diagnosis (they just say "I don't know") and ChatGPT has been able to offer multiple possible diagnoses.

egl20213y ago

"I don't know" is exactly what I want my docs to say when they don't know.

aix13y ago

petilon3y ago

Right. And going forward, if they take help from ChatGPT they will have to say that less often.

paraxion3y ago

I wonder if, instead of asking ChatGPT for a diagnosis, he could've got it to prompt for further questions he could ask?

My thinking is that given the nature of LLMs of connecting related information, it might be a good way to figure out the gaps in the diagnostic process, rather than actually provide one.

pcthrowaway3y ago

I was thinking the same thing, the author may be a great doctor, but not a great prompt engineer (perhaps even intentionally so, to justify their job)

Instead of "Here are symptoms, what are possible diagnoses?"

They could have tried

"Here are symptoms, what are possible diagnoses, and what are some good questions an intelligent doctor might ask to be able to better diagnose their patient?"

badcppdev3y ago

Or the doctor could feed their answer into ChatGPT before they give it to the patient and ask if there are any possible errors

jcims3y ago

I know everyone scoffs at the concept of 'prompt engineer', but it really is an essential craft that we're going to have to come to terms with when interacting with large language models.

Seeking suggestions on a more comprehensive prompt: https://sharegpt.com/c/sckAPvV

Trying it out: https://sharegpt.com/c/LbpEIxi

vl3y ago

I think there will be no such profession because both people and models will adapt to better understand each other.

This has already happened with Google - search got better over the years, but also people became better at doing good search queries!

Before there were typists, now everyone is a typist! And so on, historically information entry/retrieval jobs are eliminated and are moved to users.

dragonwriter3y ago

> I think there will be no such profession because both people and models will adapt to better understand each other.

AmericanChopper3y ago

Enginerrrd3y ago

I agree. I'm a civil engineer / project manager and so far I've been VERY impressed with chatGPT and, in particular GPT-4.

This type of communication skill has always been hard for a lot of people, and will remain in high demand for a long time.

dekhn3y ago

children: the OG AGI

flir3y ago

Please see my "expert system" approach elsewhere in the thread. I felt it worked really well.

tejohnso3y ago

> about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.

This is the most surprising thing I read in the article.

Atsuii3y ago

lanstin3y ago

suddenclarity3y ago

It does worry me what data people are sharing without seemingly much though. He claims it anonymised but I'm a bit sceptical when you input the medical history of 40 people. It's easy to slip up.

jeroenhd3y ago

"I fed my patients' medical information into this tool that promises to regurgitate it for others" is one headline I didn't expect to go down so easily.

Running this stuff through an offline LLaMA instance? That seems fine, the software can't leak anything and doesn't retrain itself. But using ChatGPT? That simply cannot be legal.

Stories like these make me distrust doctors. Very few of them seem to care about privacy outside of telling people I know about my medical issues. Nurses gossiping about patients is bad enough.

jacquesm3y ago

With a rare enough disease the anonymized file would still be enough to ID the patient given where the doctor is located.

lamontcg3y ago

textninja3y ago

> my instinct is to reach for regexp captures, the solution it came up with is probably a lot faster and easier to read and avoids "now you have two problems”.

lamontcg3y ago

    fn strip_suffix(s: &str) -> String {
        let mut idx = s.len();
        for (i, c) in s.char_indices().rev() {
            if c == '_' {
                idx = i;
                break;
            } else if !c.is_ascii_digit() {
                return s.to_string();
            }
        }
        if idx == s.len() {
            return s.to_string();
        }
        if s[idx+1..].chars().all(|c| c.is_ascii_digit()) {
            return s[..idx].to_string();
        }
        s.to_string()
    }

.char_indexes().rev() there worries me a bit as well now that I look at it... haven't tested that at all.

2 more replies

aix13y ago

1. Which version of GPT did the author use? There's a huge difference. (The article says "the current version".)

3. There no evidence of him trying to get the best out of the model (eg through prompt engineering).

4. He assumes that his own diagnoses are 100% correct.

5. There is no control group (other doctors diagnosing the same patients).

and so on

lapcat3y ago

It's a blog post, not a journal article. You're criticizing it as if it were a funded, peer-reviewed experiment, which seems unfair to me. The author is just one ER doctor.

Pigalowda3y ago

An IRB for deidentified HPI? Got it…

aix13y ago

Exactly. I've been through IRB reviews where the primary question was "Has the data been de-identified to a sufficient standard?"

I think this level oversight would be very appropriate here, given how the author doesn't even seem to have a good handle on how many patient case histories he's given to the chatbot.

tomxor3y ago

> My fear is that countless people are already using ChatGPT to medically diagnose themselves rather than see a physician.

One reasonable argument is that for areas severely lacking in human workers the average might be a net positive, but the overall quality will be reduced.

jspdown3y ago

aix13y ago

Doing this sort of thing would typically require approval from an ethics committee (called IRB = Institutional Review Board).

The following doc is about a different type of thing (case reports in medical journals) but gives a good idea of the required standard of de-id:

https://hipaa.yale.edu/sites/default/files/files/Case%20Repo...

amai3y ago

The author writes in the article:

sourcecodeplz3y ago

Omg stop with this ridiculousness ffs. I get and love AI but some areas should be off limits: doctors, judges, airplane pilots, train conductors...

Soon enough no one will even know how to write, just read, because ChatGPT will write everything.

textninja3y ago

> Soon enough no one will even know how to write, just read, because ChatGPT will write everything.

Nonsense, writing is easy! Just dictate some rough instructions to a GPT agent and copy/paste its response.

aix13y ago

> some areas should be off limits: doctors, judges, airplane pilots, train conductors

I find this list very odd, especially given that we've had driverless train systems for a number of decades:

https://en.wikipedia.org/wiki/List_of_driver-less_train_syst...

6274673y ago

I wonder if you could prompt engineer your way to chatGPT to pretend to be a doctor and behave like one in, like, asking questions.

jug3y ago

As always my first question in these articles is… Was it ChatGPT 3.5 or 4?

So can we please stop calling it simply ChatGPT?

lr19703y ago

mbfg3y ago

It's a powerful cognitive effect that would be interesting to study, that probably happens with all kinds of tech. Some are more dangerous than others.

QuantumGood3y ago

When GPT starts to auto-incorporate the best yet known prompts, then we'll have a better idea of its potential.

jacobsenscott3y ago

syntheweave3y ago

Calibration of the response occurs through prompting.

Measuring calibration is the problem now: we know some prompts do much better than others, but not how to optimize that in a general sense, to make an LLM always adopt the persona needed for the job.

fwlr3y ago

Prompts constrain the possible output space.

This is true in trivial ways: ask it to reply only in json.

This is also true in slightly less trivial ways: ask for a “description of X”, a “short description of X”, and a “one-sentence description of X”.

mrosett3y ago

The prompt matters, and GPT-4 is better than ChatGPT (apologies for missing line breaks, but the point is still clear):

Prompt: A 26 year old female patient presents at the emergency room with lower right quadrant pain. What are the question you would ask her and what is the differential diagnosis?

Questions to ask the patient:

prirun3y ago

I'd be happy if the government would mandate that all hospital software systems have to have portable data formats that allow sharing patient data.

debacle3y ago

Devil's advocate, but maybe this doctor should be better at charting?

bsuvc3y ago

This is what I was thinking too.

Often when I find a response to be lacking, I try again with a better prompt and get the kind of response I was looking for.

Part of AI's effectiveness is in the human's ability to prompt well.

LinuxBender3y ago

jppope3y ago

I think this doctor is forgetting about the other side of the coin... would chatgpt perform better than a really bad doctor?

littlelady3y ago

So many people in ML are haughty, arrogant hype-(wo)men, whose disinterest in the fields they are trying to 'disrupt' is gross. Please excuse the rant, but I'm so tired of this hype train.

I agree with the author: people need to be aware of the limitations of machine learning models, but I'd add especially the people building them.

aix13y ago

IanCal3y ago

There's a few things here, outside of my usual complaints when someone says ChatGPT and doesn't say which model (4 is so much better than 3.5 it's really important).

So with a new WebMD, how should doctors and public health messaging deal with this?

orcajerk3y ago

jeroenhd3y ago

I, for one, like my doctor to use tools to find possible diagnoses that she may have learned about 30 years ago but rarely ever come up, as long as the tools they use preserve my privacy.

intelVISA3y ago

You're expecting a doctor to have all relevant medical knowledge permanently memorized? That's the equivalent of coding interviews on random obscure topics where you can't look anything up.

Like a SWE their value is not perfect recall of every area of CS/medicine but ability to decipher arcane documentation into actionable outcomes.

dekhn3y ago

IIRC my relatives who got medical degrees all commented on just how much memorization is involved.

swader9993y ago

jhgg3y ago

I am curious if GPT-4 would have performed better.

famouswaffles3y ago

it definitely would. How much better is the question.

cjmcqueen3y ago

"If my patient notes don’t include a question I haven’t yet asked, ChatGPT’s output will encourage me to keep missing that question."

scrollaway3y ago

I don't remember the context, but I have seen properly-prompted GPT-4 proactively ask questions.

blastonico3y ago

I think that his point is: (1) don't use chatgpt for self diagnosis. Go see a doctor. (2) Doctors, chatgpt isn't ready or the right tool to help with your duties.

akasakahakada3y ago

1. What version is he using?

2. It is all your fault that not providing all usefull information (like, my patient seems pregnant) and let the system to guess what you want.

cfu283y ago

rafaelero3y ago

Author didn't mention if he used GPT-3.5 or GPT-4.

sixothree3y ago

The number of people in this country who don't even have a doctor is abysmal. I would think an AI doctor would be better than no doctor at all.

amai3y ago

This „study“ is missing a control group. He should have given his data also to some humans and see how they would do compared to ChatGPT.

aix13y ago

There's also an implicit assumption that his ground-truth diagnoses are 100% correct.

giraffe_lady3y ago

It's briefly but explicitly stated:

"the “right” diagnosis — or at least the diagnosis that I believed to be right after complete evaluation and testing"

1 more reply

scotty793y ago

Funny how languages are ambiguous around "can't" and "don't want".

seydor3y ago

jeffrallen3y ago

I found this quote really interesting:

> this is not an uncommon scenario; about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.

We have so much work to do as a society to get honest about our bodies. Hoping my children do better; they are already getting better education than my wife did.

aix13y ago

I don't think this is about education. It's more about religion, sex crime etc.

m3kw93y ago

Would be great if OPENAI can get hospitals around the country could anonymize some trainable format data

1 more reply

peter_retief3y ago

It is a great article, doctors have been googling symptoms for quite some time, focused AI could sharpen that option and possibly put us into the realm of new discoveries.

drewcoo3y ago

Well there's an ethics lawsuit advertising to happen.

I certainly don't want my docs handing my medical information to ChatGPT, even if they believe they've "anonymized" it.

pknerd3y ago

It's not clear whether there doctor instructed it first to act like a doctor and then asked questions? It seems he didn't because it does make a difference

seanp2k23y ago

>If my patient in this case had done that, ChatGPT’s response could have killed her.

Not if she lived in a state where there's no longer any legal treatment for ectopic pregnancy.

ghiculescu3y ago

Which states are you referring to?

mabbo3y ago

It got 50% right with a general model. That same model can do a pretty good job at a thousand other skills.

Imagine what it can do if it's trained for exactly this purpose.

dekhn3y ago

https://blog.google/technology/health/ai-llm-medpalm-researc... is an example of a more tuned model

maherbeg3y ago

Ah yes, the "everyone lies" House M.D. problem

SketchySeaBeast3y ago

Not even, ChatGPT, being an engine that figures out what's right by finding out what is average, is bad at understanding the atypical.

operatingthetan3y ago

>is bad at understanding the atypical.

To be fair I've found most doctors require a lot of convincing if your problem atypical as well.

1 more reply

Xcelerate3y ago

To be honest, I think I’d rather be friends with ChatGPT than most humans as it continues developing over the next decade.

nextworddev3y ago

ChatGTP3y ago

It’s funny because it’s almost the exact same problem I have with using it professionally for writing software.

sixothree3y ago

Chat GPT provided a solution, one that appeared better than most of what I had seen in the previous 15-30 minutes. I asked it twice to ensure safety and it improved its response.

I then asked it to explain a particular choice and it was thorough enough for me to feel comfortable. In the end I feel like it understood its reasoning better than some of the options I saw on SO.

This was GPT-4 and a fairly simple problem that was benefited by its understanding of the double type.

yieldcrv3y ago

protip: ask it to write the unit test for you

1 more reply

yieldcrv3y ago

Yeah it bullshits a lot even on single liners

Its a faster and better stackoverflow for me, which is a big value add because the community and moderation aspect of SO is absurd

I love when it tells me about libraries and resources that I didn't know existed, when I didn’t necessarily ask the likely followup questions yet

Break big problems into smaller problems and let it tackle them

throwbadubadu3y ago

Seconding.

ftxbro3y ago

"It diagnosed another patient with torso pain as having a kidney stone — but missed that the patient actually had an aortic rupture. (And subsequently died on our operating table.)"

Wow imagine if the AI had been used in an unquestioning way. Someone could have died!

pyrophane3y ago

To me this just reinforces the notion that if you train a system like this to be a doctor it would be very effective.

j / k navigate · click thread line to collapse