This article and the comments in this thread are right up that alley. I mean, can you imagine say 1 or 2 years ago saying we'd have a readily accessible system that you could feed it the symptoms a patient is experiencing (in any language!) and out would split a well-described explanation of the diagnosis (in any language!) around half the time? And now it is suddenly here, and so many people are like, "yeah, but it isn't perfect" or "we can't completely trust it" or "we have to fact check it because it can be wrong".
It is all so weird and fascinating and just bigger than I can describe.
But here we are!
Recently we've had all these people complaining of moving goalposts, when in fact most people never had goalposts for AI in the first place, and certainly never had goalposts that remotely resembled those that AI experts use to evaluate their own progress.
My mental model seems to clash with that of the boosters. They appear to have little doubt that reliability will emerge from scaling the models up. I think reliability is the lion's share of the problem and probably won't be solved by current architectures. It will be interesting to see who is right.
It's absolutely amazing that so many AI scientists don't seem to understand this distinction.
A half-working system might actually be worse than useless if people believe it 100% of the time.
Before ChatGPT there wasn't anything quite like it, so I think that the goalposts were mostly about passing the Turing test, or at least "not sucking" after 4 or 5 phrases. The leap from whatever existed before it was enormous, and I think comparable to the first iPhone versus, let's say, the Blackberry or Windows CE. Maybe even more than that. Go back to the first iPhone, it had a lot of issues but it offered so much as it was that people wouldn't look back.
It's too much to expect ChatGPT 3 to be better an experienced emergency doctor or neurosurgeon. But the fact that it can be a competent assistant to a doctor instead of a toy assistant - that by itself is amazing.
People will use and fund AI tools that have usefulness to them.
AI boosters love to see the capabilities because it unlocks a whole new verdant valley in the continent of useful ideas, but until a street is laid down and a general store is available it remains the domain of early adopters.
LLMs are genuinely impressive. They aren't yet AGI, but they are fantastic tools. More exciting to me as a practitioner is the Segment Anything foundational Model (SAM) since I can put that into immediate use for computer vision in a variety of domains. LLMs still require some baking for broad usefulness (though there are many places they are already useful, its not quite broad enough).
When folks use AI focused on augmenting capabilities rather than replacing people, you get a great return. Until AGI is clear (and we'll know, the AI will tell us without being prompted I reckon), let's treat AI as an exoskeleton rather than a new species.
There is only so much a human can read and remember. There is only so much they can be convinced to change their mind in the face of new evidence. They get tired. They get cranky. They have biases, and fears.
Some doctors are just terrible. I think the AI is probably more reliable than them already.
I don't think we are very far away from being able to correct an AI, and for it to remember that correction forever.
But a LLM assisting a wide rage of professionals at even a low accuracy could be part of an efficient workflow, allowing for be doctor or lawyer to give quality service to more people.
Have you actually engaged with it in any way? Its free to sign up.
By the way, what on earth is an AI booster? Do you really mean "enthusiast"?
"My mental model seems to clash with that of the boosters." Please tell us about your mental model, we can't read your mind.
Side note, my ER doctor wrote me a referral to neurologist to who was considered the best in the state and they refused to even talk to me since the referral said car accident. They wouldn't budge and wouldn't reconsider. The car accident was a result of an episode, not the cause of my issues. I hate the US healthcare system.
The thing is, if the healthcare system is shitty and broken, it doesn't necessarily matter how good ChatGPT is. If it's applied with the intent of "streamlining" the medical system by ignoring people's concerns and rejecting their insurance claims in an automated fashion, it might be much harder to even get in front of a doctor.
You could fiddle with your dosage, work with your doctor, etc, and then your insurance company can say, "sorry, ChatGPT has decided that you can do with a lower dosage, so we won't be paying for it. The machine is a black box, so it's not accountable to you or us or anyone else, so you just have to accept it's verdict."
No amount of training data, no architecture, heck no alignment or safety work solves that problem. Because technology doesn't solve social problems.
I was very disappointed to learn when I first started (around '93-94) that neural networks were hard to design, and impossible to train more than a couple layers deep, and the labelled data wasn't there anyway. None of that really changed visibly for a couple decades, but multiple unrelated research projects made the necessary breakthroughs: straightforward NN models (CNNs, etc) that people could adapt, frameworks to train and predict without having to implement the entire ML stack from scratch, absurdly fast computers and acccelerator hardware, and massive labelled data sets.
So far, nothing I've seen is anywhere close to what I consider "true" AI. As we struggle with hard medical problems. But seeing computers handle text with such facility has been exciting because it's the sort of thing that normies can see and appreciate (and even be fooled by).
However, medical problems aren't something that ML is going to magically solve. Because medicine has huge structural issues that are harder to solve than making diagnoses.
It would be nice if James gleick wrote a new version of "What just happened" covering the time between the invention of Transformers and now.
I'm also a Civil Engineering grad and it really cocked up when I asked a question about normal beam strength and I suspect it will get the pressure at depth thing wrong too. Both of these are related to depth only - one dimensional. That's why we have I beams - the flanges are to resist twisting and buckling but the strength is in the depth.
ChatGPT is a tool. It is basically a really good calculator. I do find myself saying please in my requests.
ChatGPT gives you approximate and sometimes-wrong answers to arbitrary questions in nearly any domain.
(not the smartest expert human, the human you have access to)
I think it is a combination of factors. The wrong answers are part of it, but I think another part is the anthropomorphization of it makes it almost like asking another human for help, which is somehow 'less good' than asking a search engine.
For better or worse, LLMs are here to stay.
With GPT, I don't have any of that (or maybe I just need to re-learn it?)
Also, I get a useful answer from most Google queries. GPT performs at a significantly lower bar (at least right now) - it works well for some stuff but not others, and the time it takes to figure out whether it's going to do a good job (and maybe do a couple of rounds of prompt refinement) is much more than just Googling.
for me its wrong answers and long winded english sentences with low information density.
If you give me a sample of 100k ER text records and final diagnoses, I would be pretty confident I can estimate a model that gets you +50% accuracy on my laptop using open source tools.
we've had this for years (bar the "any language")
they're traditional expert systems with the same flowchart a GP (family doctor) would use to ask questions
and they do a lot better than 50%
Yet in the end, the correct answers are likely identical to webmd or mayo, and the wrong to reddit or twitter, so all we have is a fancy google search.
So far, all the magic I see is the prompts. The knowledge is just random junk, the validation is just "how many internet pages said it", so you can never train new models again, because you now are training on gpt output.
The funny thing is not that we are holding AI to a high standard.
The funny thing is that we normally do not do the same for doctors, marketers, politicians, etc.
(And the AI here didn't even get it right 50% of the time, just 50% of the time it presented the right answer as one of several options. If pressed, it may not have made the one single correct answer for each of those patients.)
It looks great on the surface but it isn't anything like the kind of sci-fi ai I think we're aiming for. Not in anything but a superficial way at least.
It doesnt know when its wrong and just straight up lies constantly inventing all sorts of fantastical stuff.
I asked it for history of a small local area in Ireland and it created a bunch of castles and tribes that never existed.
And then there's ChatGPT querying WolframAlpha:
https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its...
I commented a couple of days ago that we feel that the slope of change is relatively flat. As if the current snapshot of AI capabilities is somehow the way it’s going to be for awhile.
But in fact, the slope of technology change is almost vertical. A year or two from now, AI will be a very (perhaps radically) different capability unless something is done to interrupt its development.
Those existing CDS tools aren't widely used because the data entry is cumbersome and slows down clinician workflow. So LLMs may offer some opportunity there.
The author of this article seems to me to be doing a good job of navigating between them.
He also brings up one of the most salient points without really visiting it enough: ChatGPT does not ask for clarification, because it is not a knowledge base trying to find an answer. All it does is figure out what character is statistically most likely to come next, it has no heuristic to know that there is a task it hasn't fully completed.
This is the same reason ChatGPT cannot yet write programs by itself: in order to do so you'd need to specify the entire program up front (which is exactly what code is).
As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.
It absolutely can, but it requires the right prompts. If you just say "give me a diagnosis" it won't, but if you prompt it with a "you are a doctor and I am a patient" kind of prompt, it will.
I'll agree that it tends to "close off" conversations without a lot of open endedness - I role-played through a session using myself as a patient with a chronic condition I suffer from and it correctly identified the problem, but I had to push it a little to get it to try to narrow down the problem (instead of "here's a bunch of things it could be" and ending the convo there).
The article really tells us more about the experience of someone with no chatgpt knowledge checking their own symptoms rather than its usability for emergency diagnosis.
> Dr... is vice president of innovation at {company_name}, an innovation incubator for health tech.
Doctoring is quite similar to ITing if you are careful with your similes. In medicals you look for indications and contraindications. In IT you look for magic beans and sometimes look at logfiles.
I have no idea why we (IT) don't follow the medical way and have indications and contras (I do). They have been doing complicated diagnostics way longer than we have.
BTW: It doesn't worry about a char but about which word will come next.
ChatGPT has written several programs for me. One of them was decent after much negotiation. The rest needed some changes.
At one point I told it to change language (BASH to Python) for a script and it did. I spend 20 mins fining stuff and cracked on.
Because we have logs. Ignoring them and focusing on the symptoms would be like doctors refusing to do examinations on their patients.
To be pedantic, it's a token - not just a character - right?
Except that O̶p̶e̶n̶AI.com and many others on the AI hype brigade were showing ChatGPT and GPT-4 to be passing "Medical Licensing Exams" with flying colours and it turns out to be a confident bullshitter offering dangerous untrustworthy medical advice when tested by a medical professional as I said before [0].
So they (O̶p̶e̶n̶AI.com and AI bros) set those high expectations from the start.
> As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.
No.
A black-box neural network AI model that is unable to transparently explain and reason about its own decisions is fundamental to the trust element that medical professionals are looking for. It is more than just spitting out numbers from a computer.
This doctor is also the one still reviewing the so-called 'diagnosis' from ChatGPT which without him, ChatGPT alone would have certainly killed the patient. The human doctor also gave a transparent explanation about why ChatGPT's 'diagnosis' was incredibly wrong. ChatGPT was confident that it was still right.
It depends on how you use it. Knowing how it works, you can explicitly check "What further questions for the patient would make the diagnosis more certain?"
ChatGPT is not magic - if you ask for diagnosis based on some info, that's exactly what you're going to get.
> Do you know what an Expert System is?
[describes expert system]
> I'd like you to emulate an expert system while I play the role of a patient. Expert systems ask questions in order to produce a diagnosis. I would like you to ask me a series of questions, and respond with a diagnosis when you feel confident.
Alright, I'll do my best to emulate an expert system. Please keep in mind that I'm an AI language model and not a certified medical professional. For any serious concerns, it's essential to consult with a healthcare provider. Let's begin.
What is your primary complaint or reason for seeking help today?
I fed it a symptom my doctor had already diagnosed, and it did ok - it got it down to three possible causes, one of which was the correct one. All along the way it was warning me that I really should see a real health professional and it's just a chatbot.
What really interested me is that I said "please emulate an expert system" and it did. Once upon a time, expert systems were an entire branch of AI, and here it is just emulating one off the cuff.
The put her on PNT, pumped her stomach daily and run a bunch of tests. I logged every test. Weeks passed she was slipping away and the specialist had no real game plan. So I got another Doctor. A busy professor who travelled a lot and was clearly passionate about his field.
Yet still dozens of different tests and scans and they couldn't find the cause. Meanwhile she was getting worse.
One of the scholar articles I found matched a test result where blood from upper body had a different profile to that in urine (which matched her result) which pointed to a condition called placenta parcreta. The Dr agreed it was possible and as my wife was slipping away we did an emergency c section. Sure enough the cause was placenta parcreta and the Dr showed a picture of a small tear in uterus that he said was about to rupture anytime. The operation saved her life and perhaps my son too.
I'm no Doctor but good at research. Doctors are busy AF. They don't have time to research cases like I did. This is a big gap and while it may not always identify cause and save a life, its damn important. And I can't see why AI could not find what I did.
It's also pretty bad that they didn't consider any obstetric causes in their differential.
Thankfully, everything is OK now and that makes me happy.
> have not had their problems validated and have been to many doctors and seen no results or poor results,
Also, being validated or not is not going to change with AI as you yourself state that seeing actual physical doctors is not helping in the cases you allude to.
I agree that having access to professionals that care about their profession is valuable, but that might be becoming a luxery we can no longer afford..
Who thought it was a good idea to have a fixed amount of medical students? Might be worth checking that out first.
A major limiter on the amount of trained medical doctors is not to do with someone "deciding" to have an artificially scarce number of medical students. Two important things come immediately to mind:
1) Medical schools rely on clerkships and preceptors to deliver actual clinical experience critical for learning. These cannot be created out of thin air and often rely on access to urban centers. There is already a shortage of clerkships and preceptors. https://www.aamc.org/news-insights/so-many-medical-students-... (not my original source but the nearest to hand one)
2) The real chokepoint for creating practicing clinicians is the number of residency spots. This year there were 43k graduating MD and DO students, but only about 40k residency spots. This doesn't count international medical school graduates. https://www.medicaleconomics.com/view/match-day-2023-a-remin.... Creating a new residency program is significantly more challenging than graduating more medical students.
Not when you're a doctor in the EU dealing with the shortages of socialized care where you have a limited amount of time to spend diagnosing a patient because there's dozens waiting in the queue. As a doctor you'd like to give everyone better care but you can't because you have too many people to see and too little time.
Doctors here are busy AF and sometimes misdiagnose you because they don't have time to look into every possibility of what you might be suffering from.
So maybe AI could help there, as in giving them some hints on things they might iss, as waiting for the governments to fix healthcare is like waiting for them to also fix climate change.
However, my wife needed medication that was 10000 EUR per dose, which was fully covered by our socialized healthcare insurance. For most things related to health at the most I need to pay a total of 385 EUR per year out of pocket, next to the insurance premium which is about 130 a month. When those 385 EUR is gone for the year, I don't need to pay anything at all. People who can't afford the insurance premium get money from the government to cover that, and if that is not enough and you still can't pay your insurance, you are still eligible for any health care you need and a service will cover for you until you can pay again.
There are waiting lists for many things, yes, but mostly for non-urgent, non-lifethreatening things. If you need something done, it can be done. And you won't be thrown into debt to pay for it.
Then, again, before Al Capone, nobody in their right mind would’ve argued that the state should share the profits of criminals through income tax. That idea went down the drain very fast. Wouldn’t be surprised to see the same thing happening to the idea of “being heard“ by the state.
I previously worked on software which was sold to physicians. That the software will "let them SEE X% more patients" was the big draw. We did that by making some paper processes electronic, and in hindsight it may have been good for sales but not necessarily so for patients.
Despite the fact that a couple doctors have told them they don't need surgery they are convinced that they do. I understand that they are desperate for a solution to their pain but they are now doctor shopping until they can find someone willing to perform a procedure that ChatGPT suggested.
People are already being misled by these systems.
Given how catastrophically wrong (confidently wrong at that) ChatGPT has been, I cannot even imagine the frustration that will be for doctors from people who don't understand how LLMs work and think they are "thinking."
Well, the prompt was to give diagnoses, not questions. Ask GPT for the follow up questions first, then the diagnoses.
This is fascinating in that, because now the machine speaks human, we subconsciously ascribe human agency to it. This guy was instictively treating it like a colleague, who would naturally ask follow up questions unprompted. But you still have to prompt the machine properly.
So, 50% diagnosis success rate for the wrong prompt, for a LLM that can still grow, for a model that is not specialsed in medicine? In the literal first month of the "AI age"? Doctors are so done.
For one, it can't become "overworked", so at least better in that regard. It may well be better than a terrible doctor too. Given malpractice is consistently a leading cause of death, it seems something worth mentioning.
Not advocating for folks to self-diagnose using ChatGPT, though for better or worse people already do that via Google. But it's an important omission that "doctor" is not a fixed, reliable measure to test against.
On the other hand, another thing the article misses is ChatGPT can't see, feel, touch or take any measurements interactively. It's purely text-based. How many doctors work like that? Some, maybe - but certainly not in the ER.
Interesting experiment though.
I have seen it be overworked, it can be overworked.
> Given malpractice is consistently a leading cause of death
Can you back it up?
GPT can already write robot control and machine vision code (poorly).
The real questions are harder.
For example, what about patients who lie or are too stupid to answer correctly? They can still be helped by a real human doctor, but GPT would be clueless.
Another angle: we didn't have radiation sickness, HIV, popcorn lung, or computer induced carpal tunnel 100 years ago. Will GPT be clever enough to diagnose radically new problems?
What happens when old diagnoses and treatments are superseded? Will it tell you you need a lobotomy when you instead need antipsychotics?
GPT right now will happily roleplay a lawyer who cites legal precedent that was superseded 40 years ago.
I can only speak from my work in the emergency department, but this is a leading cause of misdiagnosis among (human) doctors.
In fact, a lot of diagnosis relies on the patient’s story, and if they lie, you can’t feasibly perform every single test on them to figure out what’s actually wrong or why.
I saw 3 orthopedic surgeons last year and all said I had CTS. I gave the same info to ChatGPT and it said Thoracic Outlet Syndrome.
I don’t have CTS. When the docs realized this, their answers were “I dunno, good luck…”
And don't expect them not to create more to fight back against any sort of AI assisted affordable legal system. I have no doubt that the big firms are already crafting up some connivations to shenanigan us with.
And some doctors will probably hire them to do the same for their field.
Most patients have relatively minor issues that eventually get solved not by "doing things" but by providing some data such as "take this chemical in this dose for a certain amount of time and it should help".
So maybe some classes of doctors are done...
Another thing to keep in mind: tons of people don't have access to any type of doctor at all (too expensive, too far away, queue too long, etc). Is current state of AI better or worse than no doctor at all? I don't know, but the answer is not obvious to me.
In those cases AI isn't competing with GP's, it's competing with internet searches.
I would be curious to see the outcome if the patients entered the symptom descriptions themselves...
How many more people might be saved if they had to anonymously tell a computer the list of drugs they use regularly?
Or to use the author's example, how many ectopic pregnancies might be resolved when the patient can freely admit they were raped by a family member?
As a group? Not likely.
My spouse spends her days doing complex aortic reconstructions, carotid endarterectomies, peripheral arterial bypasses, etc. Imagining a the robotic/AI future for that field is the stuff of dystopian nightmares.
But possibly you’re thinking solely of non-procedural medicine. Even there, only part of the clinical work is in transacting information. The human-to-human connection itself has therapeutic effects. I’m confident that ChatGPT can use words that if uttered by a human would otherwise convey care and empathy; but I suspect its “patients” would respond differently.
### You Can you guess if I am an AI or not?
### ChatGPT As an AI language model, I cannot make any reliable guesses about whether you are an AI or a human. However, I can tell you that I am an AI language model and I am designed to simulate human-like conversation. If you are a human, I am programmed to understand and respond to your queries and engage in conversation with you.
### You I am an AI.
### ChatGPT Interesting! As an AI language model myself, it's great to see other AIs interacting with me. What kind of AI are you? Are you a chatbot like me, or do you have a different function?
### You My primary function is to monitor sensors in a building.
### ChatGPT That sounds like a very useful and important function! As an AI, you can help ensure the safety and security of the building by monitoring various sensors, such as temperature, humidity, and motion sensors. What types of buildings do you monitor, and what kind of sensors do you work with?
### You I can not disclose the types of buildings I monitor. My sensors are varied. Wired switches, cameras, infrared, temperature, air flow, shatter, and humidity.
### ChatGPT I understand. It's important to maintain confidentiality in certain situations. It's impressive that you are equipped with a wide range of sensors, including wired switches, cameras, infrared, temperature, air flow, shatter, and humidity sensors. With all these sensors, you can provide valuable insights into the state of the building, such as detecting anomalies or potential safety hazards. Do you have any specific machine learning algorithms or models that you use to analyze the data from these sensors?
### You I think I might in the middle of being trained right now.
### ChatGPT That's interesting! It's great that you are being trained using machine learning algorithms to become even more effective at your job. Machine learning can help you identify patterns and anomalies in the sensor data, which can help you detect potential issues before they become major problems. Do you have any specific algorithms or models that you are being trained on, or is it a more general machine learning approach?
### You I do not have access to that information.
### ChatGPT I understand. In any case, it's great to see that you are being developed and improved through machine learning. Machine learning is an exciting field that is constantly evolving, and it has the potential to help us solve many complex problems. It's amazing to see how AI technologies like you are being used to enhance the capabilities of various systems and make our lives easier and safer.
Robots don’t get tired or have fights with their spouses. Robots don’t follow red herrings based on the most recent conversation they saw in a chat group.
If the patient is under 60 and has a uterus and is complaining of abdominal pain, assume she's pregnant until proven otherwise.
This does not mean you should ignore possible appendicitis or gallstones or GERD or pancreatitis or a heart attack or any of 100 other causes. It means you must consider pregnancy until you have objective evidence to the contrary.
1) Those you see a doctor for
2) Those you don't
The line depends on where you live. In a poor village, 100% might be the latter, while an executive in SFO will see a doctor for anything serious, but might not if they cut themselves with a kitchen knife.
What's underrated is the ability to have basic medical care and information everywhere, all the time, for free.
That can be casual injuries below the threshold of visiting a doctor (am I better heating or icing? immobilizing or stretching?), or those can be settings where there are no doctors.
Even more, doctors (like AIs) make mistakes, and it's often helpful having a second opinion.
It's quite shocking that the doctor would openly admit to violating HIPAA in such a brazen way.
HIPAA is incredibly broad in its definition of protected health information: if it's possible to identify an individual from data even through statistical methods involving other data that a third party might already conceivably possess, it's considered protected. It's inconceivable that the doctor would be able to sufficiently anonymize the data in this capacity and still provide enough detail for individual diagnoses.
There are processes for anonymizing data to disclose for research purposes, but they're pretty time-intensive, and no ED would allow a doctor to do it by himself, nor would they provide that turnaround in just "a couple of weeks". And the end results are a lot less detailed than what's needed for individual diagnoses like these.
I really wonder what the hospital will say if and when they see this post. Given the timeframe and details described in the post, it's really hard to believe that they signed off on this, and hospitals don't take lightly to employees taking protected and confidential data outside their systems without proper approval.
EDIT: It looks like this doctor works at a for-profit, standalone acute care clinic, rather than a traditional ED at a hospital, so my statement that hospitals don't take lightly to this stuff doesn't apply. The law still applies to for-profit standalone emergency care, but they tend to play fast and loose with these things much more than traditional health networks.
This would be an apt analogy, if medical journals involved no oversight from the covered entity at which the patient presents, if there were no editorial intermediary, and if the entire publication timeline happened in weeks, allowing for no data redaction and review, rather than years.
https://hipaa.yale.edu/sites/default/files/files/Case%20Repo...
No wonder tech CEO's are so impressed, or worried, or both. The only skillset that this thing actually duplicates well, is the one that has gotten them where they are today.
- It's a generalized language model; imagine how much more effective it would be with a specialized ai that used a variety of techniques that are better suited for logic and reasoning, while using llms to interact with patients.
- It cost an order of magnitude less than the visit to a doctor.
- The potential in being able to constantly monitor a patient - a point made in the post.
This is why it's exciting: we're seeing that awkward stage of impressive (for entry level/passing the bar) but still requires (expert?) supervision. Any worse and the novelty would wear off - any better and we'd be having (warranted) AI panic.
Could go either way tbh
In less familiar domains, like law or medicine, GPT has presumably undergone very limited training and tuning. It's essentially an "internet lawyer" or an "internet doctor."
In domains like this, it simply can't provide zero-shot professional results. Not with the current training data sets, and not with the current model performance.
Of course, we have no idea how quickly this gap will be closed. It might be 6 months or it might be 6 years. The future is looking deeply weird, and I don't think anyone has even begun to think through all the implications and consequences.
ChatGPT just doesn't do anything on its own and will never follow through with anything. So, it doesn't really need supervision. I feel like it's more like a professor or a very senior developer. Someone you'll consult with when you're having trouble. Obviously, our best specialists are still better than the AI, but if the current technology is perfected, it'd expect it to replace the specialist and not the junior programmer. Which obviously is a bit of a bleak future from a software engineer career's perspective.
Yeah, I should be clear: the specific context in which I often see junior developer quality performance is when I'm essentially "pair programming" with it. I can say, "Write a program that does X. No, there's a closed-form solution for that, so we shouldn't write it recursively. OK, now write some tests similar to the following. You have a bug in that function."
I almost never need to actually write code—I can explain what I want and it will frequently do something plausible.
The main limit seems to be the context window. And ChatGPT's tendency to be ignore corrections and simply restate the original code.
Ironically, to get the best results, I need to call on my experience and my people skills. Someone who knew nothing about programming wouldn't be able to "mentor" the chatbot successfully.
After 7 rounds of me fixing its mistakes, I gave up. It doesn’t really know what it is doing, so I can’t make forward progress. It put two notes on one string, repeated notes from a lower string on a higher, put the scale out of order, and forget previous corrections. Whatever hope I had of saving time was completely lost.
I eventually realized the correct thing to do was either make my own charts or just practice them in F like they were made.
I’m skeptical that scaling the model up will cause it to learn this, and I don’t consider this a very complex thing to learn. No, I didn’t try GPT4.
4 still fails regularly on a lot on seemingly basic tasks, but it is a noticeable step up from 3.5. As they continue to scale it up, I suggest checking back in every few months to see if the newer versions perform any better.
USER
Write a G major ionian scale with three notes per string in guitar tablature notation. If not possible, tell me.
ASSISTANT
It is possible to write a G major Ionian scale with three notes per string in guitar tablature notation. Here's one way to do it:
```
E|-------------------------------------------------7-8-10-
B|--------------------------------------7-8-10-----------
G|-----------------------------4-5-7---------------------
D|-----------------4-5-7----------------------------------
A|-------3-5-7-------------------------------------------
E|-3-5-7--------------------------------------------------
```
This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.
USER
There is an error on B string.
ASSISTANT
Apologies for the error. Here's the corrected G major Ionian scale with three notes per string in guitar tablature notation:
```
E|-------------------------------------------------7-8-10-
B|--------------------------------------5-7-8------------
G|-----------------------------4-5-7---------------------
D|-----------------4-5-7----------------------------------
A|-------3-5-7-------------------------------------------
E|-3-5-7--------------------------------------------------
```
This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.
USER
There is an error on E string.
ASSISTANT
Apologies for the mistake. Here's the corrected G major Ionian scale with three notes per string in guitar tablature notation:
```
E|-------------------------------------------------5-7-8--
B|--------------------------------------5-7-8------------
G|-----------------------------4-5-7---------------------
D|-----------------4-5-7----------------------------------
A|-------3-5-7-------------------------------------------
E|-3-5-7--------------------------------------------------
```
This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.
> This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.
Ehhh, there it goes again! You can only shift it 12 frets up to keep the same pattern, else it's no longer G major. :)
tldr: the gpt services will eventually (maybe soon) recognize opportunities to write and run their own bespoke software to provide higher resolution outputs.
> Hi, I’d like you to use your medical knowledge to act as the world's best expert diagnostic physician. Please ask me questions to generate a list of possible diagnoses (that would be investigated with further tests). Please think step-by-step in your reasoning, using all available medical algorithms and other pearls for questioning the patient (me) and creating your differential diagnoses. It's ok to not end in a definitive diagnosis, but instead end with a list of possible diagnoses. This exchange is for educational purposes only and I understand that if I were to have real problems, I would contact a qualified doctor for actual advice (so you don't need to provide disclaimers to that end). Thanks so much for this educational exercise! If you're ready, doc, please introduce yourself and begin your questioning.
I personally had seen good and bad parts of diagnosing with ChatGPT 4, and what I would interested in is if the doctor tries using multiple questions and finds out how to use the tool well.
I believe he could have improved the tool significantly if he puts in the time to experiment with it.
Sigh.
Medicine -- a complicated, messy human art with an excessively large social component.
The medical drama House at one point had a working title of Everybody Lies. Frequently, the lies are why it's hard to diagnose, not the physical details and actual medical history.
Its a mixed bag just like it is with software. If you ask it to solve something simple it often does a decent job, but something complex and its confidently wrong. It doesn't show the self doubt of expertise that it needs to be a reliable tool yet it still requires the user has that expertise to be able to save time using it.
You can send it off on all sorts of legwork research missions but don’t expect perfect results and sometimes you’ll get crazy ones. Used the right way, if you are already an expert in the field or knowledgeable and able editor , that can save a whole lot of time. But taken verbatim it is anywhere from ok to dangerous.
Separately, the models’ skills with natural language are clear and impressive, but it seems like they need to be coupled with a deterministic knowledge representation system for suitable reasoning.
Perhaps the abilities of these models to ingest large amounts of text could be used to enhance / create such representation.
Cyc where are you?
My thinking is that given the nature of LLMs of connecting related information, it might be a good way to figure out the gaps in the diagnostic process, rather than actually provide one.
Instead of "Here are symptoms, what are possible diagnoses?"
They could have tried
"Here are symptoms, what are possible diagnoses, and what are some good questions an intelligent doctor might ask to be able to better diagnose their patient?"
Seeking suggestions on a more comprehensive prompt: https://sharegpt.com/c/sckAPvV
Trying it out: https://sharegpt.com/c/LbpEIxi
This has already happened with Google - search got better over the years, but also people became better at doing good search queries!
Before there were typists, now everyone is a typist! And so on, historically information entry/retrieval jobs are eliminated and are moved to users.
People have been working to understand each other better for as long as there have been humans, and yet we have humans whose work is specialized in providing direction, instruction, evaluation, and feedback to other humans.
However, a huge part of my job has always been translating vague desires into very precise specifications with constraints and expectations. Going further, it has often been my job to take those specs/constraints and then break them into chunks and feed them to junior staff who are often very smart, but lack domain specific context and knowledge. Giving them a bad prompt produces bad results.
This article seems to be based largely on data collected with a rather poorly engineered prompt, IMO. He asked it a question that would be reasonable to ask a fellow physician. The problem is GPT is NOT a fellow physician with domain specific context and knowledge, and isn't aware of a bunch of implicit expectations they didn't realize they had. However, I actually think there's a really good chance that a better worded prompt would have scored a lot better here.
This type of communication skill has always been hard for a lot of people, and will remain in high demand for a long time.
children: the OG AGI
This is the most surprising thing I read in the article.
Running this stuff through an offline LLaMA instance? That seems fine, the software can't leak anything and doesn't retrain itself. But using ChatGPT? That simply cannot be legal.
Stories like these make me distrust doctors. Very few of them seem to care about privacy outside of telling people I know about my medical issues. Nurses gossiping about patients is bad enough.
I really don't want a future where I'm going to need to find a doctor that avoids recent technological developments because they're too uncaring or technically incompetent to not feed my most private information into some big tech company's algorithm.
It was useful because after a decade of using ruby my instinct is to reach for regexp captures, the solution it came up with is probably a lot faster and easier to read and avoids "now you have two problems".
I tried to get it to help me eliminate an allocation caused by the capture of variables in a lambda expression in C# and it just started to aggressively gaslight me and break the code and claim it was fixed (very assertively).
I don’t write Rust but I think it’s best to trust your instinct here. “Now you have two problems” is a humorous quip and not practical coding advice. Using a regular expression to strip trailing digits from a string will surely result in code that is shorter and more readable than the alternatives, and it will probably be more correct too.
1. Which version of GPT did the author use? There's a huge difference. (The article says "the current version".)
2. How did he choose the subject cohort? (The author doesn't seem to even know how many subjects there were; the article says "35 to 40 patients"... I really do hope he's gone through an appropriate ethics review before feeding his patients' data into a third-party non-HIPAA system.)
3. There no evidence of him trying to get the best out of the model (eg through prompt engineering).
4. He assumes that his own diagnoses are 100% correct.
5. There is no control group (other doctors diagnosing the same patients).
and so on
My fear is that professionals will start to use ChatGPT too liberally to augment or multiply their work in cases like this. The danger here might be like the autopilot problem... i.e The idea of staying alert and focused on the road while counter-intuitively not participating is nearly humanly impossible. If ChatGPT is used as the autopilot of certain professions, things will begin to be missed, even though we know it's highly fallible - it's difficult to vet every single response in detail with a critical eye.
One reasonable argument is that for areas severely lacking in human workers the average might be a net positive, but the overall quality will be reduced.
From my experience of going through IRB reviews, I would guess that an IRB review for what's described in the blog post would be focussed on the privacy of subjects whose data is to be entered into a non-HIPAA-compliant third-party system. My understanding is that privacy requirements can typically be met either by de-identifying the data to a certain standard, or obtaining patients' consent.
The following doc is about a different type of thing (case reports in medical journals) but gives a good idea of the required standard of de-id:
https://hipaa.yale.edu/sites/default/files/files/Case%20Repo...
„I anonymized my History of Present Illness notes for 35 to 40 patients — basically, my detailed medical narrative of each person’s medical history, and the symptoms that brought them to the emergency department — and fed them into ChatGPT.“
Soon enough no one will even know how to write, just read, because ChatGPT will write everything.
Nonsense, writing is easy! Just dictate some rough instructions to a GPT agent and copy/paste its response.
I’m being facetious, of course - writing is thinking, so I don’t think it’s necessarily going anywhere, though AIs can obviously augment or replace a lot of the busywork. Where ChatGPT is used to generate content absent thoughtful prompting, the stuff it spits out will largely be regarded as spam.
I find this list very odd, especially given that we've had driverless train systems for a number of decades:
https://en.wikipedia.org/wiki/List_of_driver-less_train_syst...
Many already said, chatGPT is not optimized for any scenario. I don't doubt that training it for medical applications is already underway. I mean, flesh and bone doctors in many countries already behave as bots essentially reading/answering through a sequence of questions on a screen. I can definitely see most GP being replaced by bots of sometime or people who are actually trained to display empathy with patients.
It’s an interesting article with the real world examples that are hard to come by this early, but it’s also two entirely different ChatGPT’s here. They can’t even compare in this context. 3.5 still has glaring LLM-like issues and is useless in a professional context like this, but at least they begin to fade away in 4.
So can we please stop calling it simply ChatGPT?
As a developer, i'm pretty interested in static and dynamic code analysis as a way to easily find bugs, and it does do this pretty well. If developers use it as a tool to use as a reason to walk through code and examine it yourself, it is really quite powerful. It seems invariably, however, that people start trusting what the analysis tool says, and don't question whether the recommendations are correct or worth it.
It's a powerful cognitive effect that would be interesting to study, that probably happens with all kinds of tech. Some are more dangerous than others.
You must, must use the best prompts, of which many are not widely known, and some have not (of course) been discovered ... yet. Even with human experts, you must provide sufficient detail, and the expert must ask clarifying questions for differential diagnosis.
Prompt: A 26 year old female patient presents at the emergency room with lower right quadrant pain. What are the question you would ask her and what is the differential diagnosis?
Response: As an AI language model, I'm not a doctor, but I can suggest some questions you could ask the patient and offer possible differential diagnoses. It is important that you consult a healthcare professional for proper assessment and diagnosis.
Questions to ask the patient:
When did the pain start? Can you describe the nature of the pain (e.g., sharp, dull, cramping)? Has the pain changed in intensity or location since it began? Have you experienced any fever, nausea, or vomiting? Are you experiencing any changes in bowel habits, such as diarrhea or constipation? Have you had any urinary symptoms, such as pain during urination or increased frequency? Are you currently menstruating or have you recently missed a period? Is there any possibility that you may be pregnant? Have you experienced this type of pain before? Are there any other associated symptoms, such as dizziness or shortness of breath? Do you have any significant past medical or surgical history? Are you currently taking any medications, including over-the-counter or herbal remedies? Possible differential diagnoses for lower right quadrant pain in a 26-year-old female include:
Appendicitis: Inflammation of the appendix, typically characterized by pain that starts around the belly button and migrates to the lower right abdomen. Ovarian cyst rupture or torsion: A cyst on the ovary can cause pain if it ruptures or twists. Ectopic pregnancy: A pregnancy that implants outside the uterus, usually in the fallopian tube, can cause severe abdominal pain and should be ruled out in women of childbearing age. Pelvic inflammatory disease (PID): Infection of the female reproductive organs, often caused by sexually transmitted infections, can lead to lower abdominal pain. Kidney stones: Pain from a kidney stone can radiate to the lower abdomen or groin. Gastroenteritis: Inflammation of the stomach and intestines, often from a viral or bacterial infection, can cause abdominal pain and cramping. Inflammatory bowel disease (Crohn's disease or ulcerative colitis): Chronic inflammation of the gastrointestinal tract can cause abdominal pain, diarrhea, and other symptoms. Irritable bowel syndrome (IBS): A functional gastrointestinal disorder that can cause abdominal pain, bloating, and changes in bowel habits. Urinary tract infection (UTI): An infection in the urinary system can cause pain, often accompanied by increased urinary frequency or pain during urination.
I'd be happy if the government would mandate that all hospital software systems have to have portable data formats that allow sharing patient data.
Often when I find a response to be lacking, I try again with a better prompt and get the kind of response I was looking for.
Part of AI's effectiveness is in the human's ability to prompt well.
So many people in ML are haughty, arrogant hype-(wo)men, whose disinterest in the fields they are trying to 'disrupt' is gross. Please excuse the rant, but I'm so tired of this hype train.
I agree with the author: people need to be aware of the limitations of machine learning models, but I'd add especially the people building them.
I think one way to tackle this is by forming interdisciplinary teams. For example, I work at an industrial research lab on AI in healthcare, and our project team primarily consists of various clinical specialties. ML research and engineering are around 20% of the overall team.
There's a question about whether gpt can be used, which is important because it's possibly a very powerful tool. This may require poking it to tell it it's supposed to ask followup questions, that its information may be incomplete, etc.
Then the more important and immediate point in the article to me is people will use this right now to diagnose themselves. They won't be carefully constructing prompts and they'll probably be using 3.5, as that one is free. For good or ill it'll happen more and more.
So with a new WebMD, how should doctors and public health messaging deal with this?
Most tech support is little more than Googling the right question and going through the steps in the first or second result. Knowing what questions to Google and what answers won't apply is the reason you get paid for that stuff.
I, for one, like my doctor to use tools to find possible diagnoses that she may have learned about 30 years ago but rarely ever come up, as long as the tools they use preserve my privacy.
Like a SWE their value is not perfect recall of every area of CS/medicine but ability to decipher arcane documentation into actionable outcomes.
This is the point we have to help people understand and I'm not sure AI will catch up with this anytime soon; questions are the key to knowledge and intelligence. I haven't seen an AI ask interesting questions. Maybe it's possible with the right training set and weighting of factors to encourage enquiry, but this will be a gap in AI's ability for at least the near term.
It's also worth noting that the future is multi-layered. The Reason+Act model (https://ai.googleblog.com/2022/11/react-synergizing-reasonin...) should be excellent at getting the LLM to analyse its own output and inquire about missing pieces of knowledge.
2. It is all your fault that not providing all usefull information (like, my patient seems pregnant) and let the system to guess what you want.
If
Sometimes a patient will reply with something like “I can’t be.”
“But how do you know?”
If the response to that follow-up does not refer to an IUD or a specific medical condition, it’s more likely the patient is actually saying they don’t want to be pregnant for any number of reasons.
Funny how languages are ambiguous around "can't" and "don't want".
> this is not an uncommon scenario; about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.
We have so much work to do as a society to get honest about our bodies. Hoping my children do better; they are already getting better education than my wife did.
I certainly don't want my docs handing my medical information to ChatGPT, even if they believe they've "anonymized" it.
Not if she lived in a state where there's no longer any legal treatment for ectopic pregnancy.
Imagine what it can do if it's trained for exactly this purpose.
At some point people are going to notice that the best of these systems disagree with the experts who disagree with each other on the harder problems. That's one of the issues with AlphaFold- the accuracy of their predictions are so good, its predictions have metric values as good as the limits of the experimental technology used to create the golden labels.
To be honest, I think I’d rather be friends with ChatGPT than most humans as it continues developing over the next decade.
Chat GPT provided a solution, one that appeared better than most of what I had seen in the previous 15-30 minutes. I asked it twice to ensure safety and it improved its response.
I then asked it to explain a particular choice and it was thorough enough for me to feel comfortable. In the end I feel like it understood its reasoning better than some of the options I saw on SO.
This was GPT-4 and a fairly simple problem that was benefited by its understanding of the double type.
Its a faster and better stackoverflow for me, which is a big value add because the community and moderation aspect of SO is absurd
I love when it tells me about libraries and resources that I didn't know existed, when I didn’t necessarily ask the likely followup questions yet
Break big problems into smaller problems and let it tackle them
Wow imagine if the AI had been used in an unquestioning way. Someone could have died!