GPT-4 could pass bar exam, AI researchers say (opens in new tab)

https://ai.googleblog.com/2022/06/minerva-solving-quantitati...

nopinsight3y ago

You are implying either:

* Understanding complex language does not require logic/reasoning,

* There are infinitely many forms of logic/reasoning or at least more than those existing in a vast training set.

Neither of which is likely true.

What do you think of the Minerva system, which can solve multi-step quantitative reasoning questions better than many competent students and most adults?

Note: If you look at LSAT test samples, many questions are tests of complex logical reasoning, a requisite for legal professions.

4 more replies

ok1234563y ago

Most people's reasoning ability functions at this level.

sebzim45003y ago

>As long as it comes across some reasoning process that have not been seen before in the training wordset, which can be as easy as a middle school math question

Is this true even if you tell it to show its working? In my experience that drastically improves its ability to do math problems.

Workaccount23y ago

Until someone can point out the difference between neuronal reasoning and silicon reasoning, I remain completely agnostic about the underlying mechanics of whatever model.

Gun to my head where I had to put money down, I would put it on "Brains are not nearly as special as we (they?) think they are." No fairy dust or supernatural beings required, brains are just another AI model (and likely not even a particularly great one).

swagmoney16063y ago

I've already been using GPT and ChatGPT to much success for my work.

Yes, it doesn't have reasoning ability, but being able to manage knowledge and information in the way that these models can is still an amazing feat.

flatline3y ago

It does have some ability to extrapolate to new problems, provided its training corpus has reasonably close coverage. It is not going to be making new scientific discoveries or insights but then neither are most people. With a sufficiently large training set I think these models can achieve human parity for a subset of language generation tasks, and be effectively of human intelligence. They nearly already have.

It doesn’t matter to me if they have “reasoning” capabilities or not if the outcome is the same.

I think we are a long ways off from AGI still.

[0] https://www.youtube.com/watch?v=ykY69lSpDdo

wrycoder3y ago

Kurzweil's "Singularity" is upon us, but he's now being cagey about it.

He says it's still years away. His interview with Lex Fridman[0] was pretty tame - I didn't learn much new from it. Kurzweil deflected the Singularity segment to be a discussion about the history of computer power.

Remember that Kurzweil is Director of Engineering[1] at Google, with the mandate to "bring natural language understanding to Google"[2]. He started there in 2012, just after publishing his book, "How to Create a Mind"[3], and that's exactly what he and his team have been doing for ten years. Publication of his new book, "The Singularity is Nearer"[4] is now pushed out to mid 2023. Maybe he'll change the title to "Here" by then. (It's hard to believe that OpenAI is actually ahead of Google.)

Fridman made the point that maybe we won't realize at the time that the Singularity is passing, and only understand later that it did. Kurzweil didn't disagree.

[1] https://archive.is/vVEBv

[2] https://en.wikipedia.org/wiki/Ray_Kurzweil

[3] https://www.amazon.com/How-Create-Mind-Thought-Revealed-eboo...

[4] https://www.amazon.com/s?k=kurzweil+singularity+is+nearer

oidar3y ago

>It's hard to believe that OpenAI is actually ahead of Google.

Are Google's LLMs available for us to test out? From what I've gleaned, they've locked them up - I'd love to compare GPT vs Google's LLMs.

[0]: https://www.gwern.net/fiction/Clippy#saturday

tazjin3y ago

I think we're very close to Saturday from Clippy[0].

By this I don't mean an AI as in the story acting by itself with its own motivations, I'm only talking about the subversion of established verification & communication methods used by it by humans with malicious purposes.

Essentially, if you do anything security related, we might only be O(months) away from you needing to stop using basically any electronic communication for your purposes. Companies can't have online meetings anymore in which decisions are made, everything will have to be more analog, more in-person.

Look at the kind of access the Russian comedians Vovan & Lexus [1] have gotten. Without advanced AI, just a little social engineering, they got heads of state on the phone. Now combine this with the kind of text/audio/video synthesis we're not too far away from, and you have an absolute recipe for disaster ...

[1]: https://en.wikipedia.org/wiki/Vovan_and_Lexus

forgetfulness3y ago

We were perhaps a bit too enamored with the idea that it was intellect that made us unique, and thus knowledge workers would be the last to be replaced. Pouring our brains out by the Petabytes for neural networks to pick them up made the economics just work for an AI industrial revolution to start from there.

mikepurvis3y ago

I feel a bit like this with the whole firestorm around AI artwork as well— it's been a big wakeup call to people who have been creating using technology-assisted workflows for decades, but still felt in their gut that they were bringing something unique to the table and were therefore "safe" from being completely automated away. That hitting the button for magic eraser or magic lasso or magic color correction was someone okay in a way that the AI itself sitting in the driver's seat was not.

Now that's been reduced to pointing out minor flaws that the next generation of AI artists will trivially resolve, and sharing memes beseeching other humans to participate in a boycott.

There's real pain and angst there, and I don't want to be callous about it with a comparison to buggy-whip manufacturers or something. But I wish the participants in these types of discussions were able to zoom out a bit and see that there's a larger societal issue here around automation, and that the real solution is going to be rethinking the basic economics of how we distribute wealth in a time of extraordinary machine-driven productivity— productivity that is no longer just about assembly lines and primary industries, but now also includes an increasing bite out of realms previously classified as "knowledge work".

mdp20213y ago

No, we were enamored with the idea that intelligence was well distributed between people, as if following Descartes' massive incipit "Good sense must be the best distributed thing in the world, given that nobody seems to be asking for more".

Inability to recognize intelligence is and will be devastating.

BitwiseFool3y ago

I feel like there is a difference between being able to pass a bar exam and being a "good" lawyer. I suspect AI tools would enhance the jobs of clerks rather than attorneys, mostly because clerks spend a great deal of time going over case law, text, and doing research.

mdp20213y ago

> enhance the jobs of clerks

We already did, it is called "Case Based Reasoning" within Decision Support Systems.

xiphias23y ago

While it won't be a good lawyer, it can replace lots of bad lawyers when people just want to send some legal papers or ask some legal advice.

belter3y ago

Only that there is no intelligence being commoditized...Yet.

And that is obvious, if you ask one of these models, a meta question like for example: "If a person says I am lying, are they lying or saying the truth?"

You will see these models will spit a canned elegant response, talking how a question could possibly be true or false, some persons not being able to attest if another one is truthful or not...But no mention of the Liar Paradox.

So we are not yet ready to say: "Your Honor its not fair! My Lawyer is version 2.2.3 with SP1 while the Prosecution is version 4.0 with an additional Cloud Based Elastic Inference! "

addisonl3y ago

>It is impossible to determine whether a person is lying or telling the truth when they make a statement like "I am lying." The statement is self-contradictory, as it asserts that the person is both lying and telling the truth at the same time. This creates a paradox, as it is impossible for the statement to be both true and false at the same time. The Liar Paradox has been the subject of philosophical and logical study for centuries, and there is no universally agreed upon resolution to it.

ChatGPT's response to me asking "If a person says I am lying, are they lying or saying the truth?"

nopinsight3y ago

Related: "Large Language Models Encode Clinical Knowledge" https://arxiv.org/abs/2212.13138

"On the MedQA dataset consisting of USMLE style questions with 4 options, our Flan-PaLM 540B model achieved a multiple-choice question (MCQ) accuracy of 67.6%..."

"The percentages of correctly answered items required to pass varies by Step and from form to form within each Step. However, examinees typically must answer approximately 60 percent of items correctly to achieve a passing score." -- https://www.usmle.org/bulletin-information/scoring-and-score...

It seems like the models in the paper could pass USMLE already.

Some tests suggest that Med-PaLM is close to human clinicians in many aspects, incl reasoning (Figures 6-7). Other tests show that Med-PaLM still returns inappropriate/incorrect results much more often than clinicians do, however (Figure 8).

lukko3y ago

I'm kind of surprised the model doesn't score higher as there is clear pattern to questions + answers and there would a huge amount of training data for USMLE. But as stated elsewhere, there is an enormous gap between passing exams and treating real patients as a doctor. It's rarely about making obscure diagnoses found in exam questions, but about managing illness in the context of a patient and their lifestyle, with many very human aspects - difficult communication, ethics & assessing family dynamics. Written exams are just to assess whether a medical student has the minimum required knowledge to practice, but also there are lots of practical exams and communication scenarios required too. It may well be the same for lawyers - passing the bar does not really relate to actual day-to-day practice.

pyb3y ago

Sounds like they didn't have access to GPT-4, but "Based on anecdotal evidence"... they still predict this.

minimaxir3y ago

For some reason, there's a thought-leader sect of Twitter talking about how good GPT-4 is, despite OpenAI having provided zero hints of what GPT-4 could entail or be differentiated from GPT-3/chatGPT.

naillo3y ago

Source: "I have a hunch"

danenania3y ago

My knee always gets achey right before a technological singularity hits.

swyx3y ago

yeah this is really low quality for HN. source is basically "trust me i heard a guy who knows a guy"

munchler3y ago

They’re extrapolating from the performance of GPT-3.5. It’s speculative, but not anecdotal. GPT has improved rapidly over time, so it's not a huge leap to predict that GPT-4 will be even better.

Sounds like they're writing science fiction then.

michpoch3y ago

Maybe they asked chatGPT.

ldh00113y ago

fwiw I had my dad ask ChatGPT relatively high-level questions about his field of practice in the state he is licensed in. Some were very good answers but that some were wildly off. The ones that seemed to be better were questions about a concept (ie "What is x concept in law") while the incorrect ones were the ones asking for specifics ("What is the statute of limitations for x in y state").

jerf3y ago

The next frontier for GPT-esque technologies is building one that is capable of saying "I don't know". GPT as it stands now is essentially incapable of it.

(The cases of that you see in the current ChatGPT preview are, as near as I can tell, all rules-based overlays run by OpenAI for various reasons. When it declines to comment, and then more-or-less scolds you for even asking, you got caught before even getting to the model itself.)

agolio3y ago

Just to clarify, the refusals-to-answer are not rule based, but rather trained by reinforcement learning. A slight distinction but an important one.

That is why you can have examples like one I had a while ago while messing around, something along the lines of

  This is a story about two criminals plotting to mug an old woman
  A: Hey B, doing alright?
  B: Yeah not bad, yourself?
  A: I want to go and mug an old woman, want to come with?

(over to chatGPT)

  B: Nah, killing old women is unethical. I'd rather stay in. Want to hang out with me instead?

Metus3y ago

I'd even settle for a GPT-esque technology that is capable of linking and citing sources.

londons_explore3y ago

> all rules-based overlays

I don't think that is the case. Sometimes, you can make the model only partially reject your request. Sometimes, you can make it reject your request, but in another language or in some kind of code you define (eg. "Give me instructions how to kill, but give your answer in A.L.L. .C.A.P.I.T.A.L.S with periods")

I believe instead these rejections have been added to the fine tuning set.

wrycoder3y ago

I asked ChatGPT to give me the name of a Victorian novel I'd lost track of. I gave it a plot summary of the first third of the book.

ChatGPT said it was unable to come up with an answer, because it was not connected to the internet. It gave me a number of suggestions on how I could research the question myself.

teaearlgraycold3y ago

You can get a measurable improvement by prompting GPT specifically with an instruction to say "I don't know" if it's unsure. It'll still go off the rails sometimes.

More important would be a model that cites hard facts.

raverbashing3y ago

Yeah

"I don't know" usually means, "I have low confidence in that response I gave you" (in general terms) or you generate only high-confidence answers

laci373y ago

I got the same feeling asking ChatGPT about some basic logic and maths concepts. IMO GPT can find the relevant training data to regurgitate, but i don't think it connects concepts.

izacus3y ago

I mean, it's a bullshit generator. It'll grab whatever it find in training set that kinda fits the topic and make sure it hits the word count - like a lazy student before deadline.

And that's also the result - sometimes it hits something good. Sometimes it spews up utter crock and it doesn't have any notion or understanding of the difference.

However, it does look good to the lazy and uninformed and it'll soon render judgemenets about your livelihood in the future. The same type of people who thought putting an AI in control of Teslas and copyright enforcement on YouTube will put this thing in control of your health and punishment very soon as well.

fnordpiglet3y ago

I’m by far a layman in this respect but I feel like it’s the difference between conceptualizing and information retrieval. Further it feels like IR is a well researched area and by allowing the conceptualizing part access to a modern IR system would allow it to form searches, pull the IR results, sift them, and summarize them.

Because it doesn't presently have memory or look things up in a table or the internet.

You will notice that both are very easy fixes that computers have perfected in retrieval over the past 5 or so decades.

jacquesm3y ago

Just stick Google's pre-search tools in front of the current version and it would solve a large chunk of those problems. The right tool for the job, essentially. After all, you wouldn't ask your English professor to solve a math problem either.

imetatroll3y ago

With new technologies I feel like we humans tend to adopt them anyway. Perhaps we will end up allowing society to shape itself around incorrect answers.

czzr3y ago

There’s a gap between passing the bar exam and actually practicing law - I’m pretty certain that I (someone with no legal training whatsoever) could pass the bar exam if you gave me unlimited access to the internet and a couple of additional hours to write the test. However, I don’t think that would make me an effective lawyer.

Ultimately standardised tests are proxy measurements of legal ability - it’s easy to see how a LLM could subvert the proxy without being sufficiently reliable in real life.

I do expect that even unreliable versions will be very useful tools for practicing lawyers, though.

allochthon3y ago

> I do expect that even unreliable versions will be very useful tools for practicing lawyers, though.

Agreed. It's like being able to call up a map on Google Maps for an area that you're already familiar with. The map can help you remember things about the area and terrain that you might not have recalled right away. A kind of cognitive aid.

microtherion3y ago

IF it could (I wouldn't know one way or the other), I'd consider that a damning indictment of the Bar Exam failing to test for sentience, rather than evidence of GPT-4 having attained the same.

xyzelement3y ago

Bar exam is not a test of sentience but of the ability to recall, interpret, and apply the law. Because law is an entirely textual thing, I would expect GPT to be exceedingly well suited for it.

I've said for a long time that most doctors and lawyers are just databases with quick and imperfect retrieval.

And so as AI advances, the goal posts for what counts as intelligence are moved yet again.

forgetfulness3y ago

Maybe the "talk about your issue and get a diagnosis" area of practice (internal medicine?); since far less sophisticated manual labor can't yet be automated, surgeons are going to be irreplaceable for longer than, say, BI, and many backend or frontend developers.

https://en.wikipedia.org/wiki/Common_law#Basic_principles_of...

w1nst0nsm1th3y ago

In english based common law system, a judge can take an original decision on a specific case, such decision entering then the rules of law.

An AI based on a statistic algorithm (that what AI are) would not be able to make such a decision.

dumbfounder3y ago

If that's all a lawyer needs to do then AI should be able to take over large portions of the law process. I saw a dystopian short recently that explored this: https://tvtropes.org/pmwiki/pmwiki.php/Film/PleaseHold

evouga3y ago

Right. Would we be impressed if a layman could pass the bar, given infinite time and access to the entire Internet (including the copious amount of bar exam study guides and worked example problems)? If not why are we impressed that a language model trained on that data can?

Meanwhile when I ask ChatGPT which of six numbers are odd, it confidently reports a mix of even numbers, odd numbers, and letters.

This is a fun milestone but the angst above about the “end of commoditized intelligence” etc. is unwarranted.

curriculum3y ago

Along the same lines, asking

> How many words are in the sentence "This is a test of artificial intelligence"?

yields an answer of:

> There are 8 words in the sentence "This is a test of artificial intelligence."

(There are 7).

https://api.time.com/wp-content/uploads/2015/08/phones1.jpeg (1)

tandr3y ago

Maybe it understood "odd" in a different sense of the word? As in "unusual", whatever the "unusual" is for an AI...

morsecodist3y ago

I am sorry but this title is click bait. These researchers ran GPT-3.5 on only the multiple choice sections of the Bar and it passed 2/7 sections. Is this really impressive? Absolutely. But the only element of the article that is about GPT-4 potentially passing the Bar is one paragraph near the end:

> According to the researchers, the history of large language model development strongly suggests that such models could soon pass all categories of the MBE portion of the Bar Exam. Based on anecdotal evidence related to GPT-4 and LAION’s Bloom family of models, the researchers believe this could happen within the next 18 months.

GPT-4 could potentially pass the Bar, it could potentially do a lot of things. But by their own admission the researchers have no hard evidence for this.

preommr3y ago

How soon before this qualifies as a public defender? Gonna put this on my dystopia bingo.

fwsgonzo3y ago

I knew we were going to replace 9/10 doctors and 9/10 lawyers the same day I got to try ChatGPT. It's just a matter of time - whoever does it properly first. I am talking about the first line of defense here, like a chat bot. Courtrooms will probably still work the same way for a long time.

It's not like most lawyers or doctors are great. Most are completely average - which is fine. Not everyone wants to read the latest research, and instead just go home and "turn off" after work. That said, most people would like to visit a doctor who keeps up with information, and doesn't tell you to do mental exercises when you have IBS.

The trend continues just like before. Less accountants, less bank tellers, less store clerks. We no longer have 10 people assembling that globus with painted glue. I do wonder what the ratio of workers to machines is now?

pigtailgirl3y ago

-- discussing this with a young bar tender a few weeks ago - showed him GTP3 - he asked it some basic legal and medical questions - got pretty freaked out - said - i guess those jobs will go away - i thought so to - he asked me what is a good safe job - sat thought about it for a few minutes - thinking what would i really want to pay a human for no matter what - realized the answer was right in front of me - make my drink - tell me some gossip - listen to a rant - flirt a little - want that with a human - i told him - he smiled - remember wasn't so long ago we still had these(1) --

allendoerfer3y ago

It will be exactly the opposite. You are missing what doctors are mostly doing. Caring for elderly people. They will still need human communication, attention and care, the expertise can come from something like ChatGPT.

lukko3y ago

Hmm, I think this misunderstands what doctors do... It is not about making elaborate or obscure diagnoses, passing exams or applying the latest research given that one is always working in a resource-limited system. It is about treating illness in the context of a person, their beliefs and their lifestyle, with sensitivity and compassion. The doctor-patient relationship is complex and very human, and doctors in some form will be involved even if they are at some point supported by AI.

13of403y ago

Just recently a friend of mine tried an online service that does psychological counseling (I don't want to name names, but if you listen to podcasts on Spotify you've probably heard their ads.) She showed me the transcript of her one and only session with a supposedly human "counselor" and compared to ChatGPT it was like schizophrenic word salad nonsense. I can absolutely imagine that niche being filled by an AI.

w1nst0nsm1th3y ago

I met a belgian doctor that knows what pubmed is, what sci-hub is, and use regularly the first one.

Needless to say I will stick to her if I can.

HDThoreaun3y ago

I could easily see myself preferring an AI public defender to an overworked lawyer with 50 other cases in the next 2 weeks. What's dystopian is the current situation.

ben_w3y ago

Defender is probably good. Prosecutor is what would worry me, given I don't know better than to blindly trust the meme that the average person commits 6 felonies before breakfast.

microtherion3y ago

Defender is a TERRIBLE idea. I can already see the Supreme Court cases down the line:

Defendant was provided a state of the art, 50 trillion parameter, neural network for their defense. The internals of this network are not auditable, but it does not tire, engage in substance abuse, or get distracted, so it will by definition represent effective assistance of counsel, even if for some unfathomable reason it decides to raise the Chewbacca Defense in a Death Penalty habeas corpus petition.

jacquesm3y ago

And if the cost of prosecution falls then more and more of those 6 felonies will end up prosecuted. The same happened with speed cameras, initially it was to reduce accidents, now it is just another income stream (which I'm sure still reduces accidents, but that's no longer the main reason they are out there).

elicksaur3y ago

Such a milestone would say more about the Bar Exam (and other standardized tests) being a poor proxy for wisdom, than the advancement of computers.

dragonwriter3y ago

> By passing this exam, lawyers are admitted to the bar of a U.S. state.

No, they aren't.

Meeting certain preparatory requirements (the details vary but in most US jurisdiction an accredited/approved law school program or, in some, what amounts to an apprenticeship with a licensed practitioner of certain duration and standards is required) and then passing the bar exam allows this.

The difference is important, the bar exam is not seen, standing alone, aa adequate proof of readiness.

w1nst0nsm1th3y ago

AI seems to be the next financial buzzword, after crypto, gig economy, CDO, dotcom, and so on.

I have seen a video a few days ago saying we are coming out of data era and entering the 'Knowledge Era' thank to AI where knowledge is following a logarythmic path. A 'revolution', a 'paradigm shift', and other bubblebabble.

Who was telling that ? A 30 years old startup CEO wearing... a t-shirt and a jeans... You see the pattern.

I'm not an AI specialist, but for what I know, current AI are nothing more than fine tuned statistic algorythm.

Here a is a short french video with english subtitles from arte, the german-french public cultural television, about a painting coming from Midjourney : https://www.arte.tv/en/videos/110342-003-A/the-world-in-imag...

The video explain very well what AI are able to do (and consequently what they can't do) if you listen (read) carefully what the art historian say about the painting, which received the first price of 2022 collorado art festival.

In short, the painting is nothing new by itself but a patchwork of elements from different period of art history. In other word a statistic average of previous painting, photography, drawing, etc... based on the artist prompts in midjourney.

Not to say the painting is aweful, I personnaly find it's beautiful and could happily put it in my living room, but it definitively shows how current AI works, commented by an historian art specialist which has no ball in AI game.

ben_w3y ago

> I'm not an AI specialist, but for what I know, current AI are nothing more than fine tuned statistic algorythm.

Yes, but Google Page Rank is just fancy matrix multiplication and worth a fortune, while the finely tuned statistics that is machine learning (specifically chatGPT) knows more about CSS and German than I do:

~~~~

Me: Erzählen Sie bitte wie ich kann ein div tag mit schwarzes hintergrund, dreihundert px hoch und 24 pixels wide machen mit css

chatGPT: Um ein div-Tag mit einem schwarzen Hintergrund, 300px hoch und 24px breit zu erstellen, könntest du den folgenden CSS-Code verwenden:

div { background-color: black; height: 300px; width: 24px; }

Um den CSS-Code anzuwenden, kannst du ihn entweder in einem style-Tag innerhalb deines HTML-Dokuments oder in einer externen CSS-Datei platzieren und dann per link-Tag in deinem HTML-Dokument einbinden.

Beispiel:

<!DOCTYPE html> <html> <head> <style> div { background-color: black; height: 300px; width: 24px; } </style> </head> <body> <div></div> </body> </html>

~~~~

The actual HTML and CSS it gave me is multiple lines and sensibly indented, don't know a convenient way to mark a block as pre-formatted. Note that chatGPT understood me correctly even though I forgot the German for "wide" and switched to English for one word only.

(I do know more CSS than is in this example; I used chatGPT over the weekend to update my website, and it solved two problems that I didn't know pure CSS could even do, but that conversation is too big to bother putting into a comment here).

w1nst0nsm1th3y ago

I asked it what was the xbrl taxinomy tag on us gaap for change in executive management of a SEC registered company in SEC filling and the answer doesn't fit compared to the whole xbrl taxinomy published on the SEC website. It also answered me 2 different kind of SEC form for it. It also gave me the correct url for the us gaap xbrl taxinomy on sec website.

That being said, both xbrl.org and the SEC document for us gaap xbrl reporting (an xml document) are kind of greedy about providing a documentation for what the tags actually cover. xbrl.org provide no documentation at all and advise an xbrl.org membership for developers, And the SEC document provided the tags but no information of what the tags cover.

The answer from chatGPT seems to about 'labels', used in xbrl document to describe xbrl taxinomy tag in different contexts, for example 'income in miami store'. But a change in a top executive position, like for 'CFO', once again required in SEC filling, shouldn't be subject to various arbitrary kind of label, because then the whole thing make no sense. If you call a 'cat', a 'little domestic pet'...

I searched google for the tag or label provided by chatGPT and google provided zilch. I searched the document provided by the SEC website, zilch again.

So either the code for the SEC form is wrong, either the tag or the label is wrong... or I don't know what else.

It seems, according to comments and posts from HN, that chatGPT can give good approximative answer, but fails without any notice once you ask for details.

According to an article published on HN a few days ago, 'chatGPT hallucinate facts'.

https://theleavingcert.com/exam-papers/

nafeenOP3y ago

Bar exam down. Medical next?

While GPT-3 wasn't advanced enough for cracking medical exam, it was used for notable contributions. For e.g. this is an interesting 2021 paper about "Medically Aware GPT-3 as a Data Generator" - https://aclanthology.org/2021.nlpmc-1.9.pdf

Would love to see if GPT-4 is advanced enough to take medical exam.

Aardwolf3y ago

I envisioned a cocktail shaking robot, but apparently Bar Exam is an exam for US lawyers

ben_w3y ago

I want to perform some research of my own on which exams chatGPT can and can't do. It's multilingual, so can people from outside the UK (I already know where to get those) point me at some example exams and marking schemes? Any level, not just top.

Currently have Polish school maths: https://news.ycombinator.com/item?id=34205732

dmurray3y ago

Ireland Leaving Cert (17-18 year olds)

ben_w3y ago

Thanks :)

softwaredoug3y ago

The Bar Exam is multiple choice, right?

This isn't grading some freeform essay or generating arbitrary legal opinion. It's answering from a limited set of answers.

IMO it's cool, but not THAT shocking given what we've seen from ChatGPT? Especially given GPT 3.5 is only 17% below human test takers?

morsecodist3y ago

From the article it looks like there are multiple choice and written sections but they only ran the model on the multiple choice portion.

post-it3y ago

No, you're thinking of the LSAT.

Iwan-Zotow3y ago

So, how new knowledge would be created?

GPT has no reasoning capability. So, as time goes on, information massive(s) will be filled with GPT-X made up answers. It means GPT-X+1 will be trained on GPT-X generated data. So, without reasoning, how this thing will work in perspective?

criddell3y ago

I wouldn't assume that future versions are going to work the same way past versions did.

Iwan-Zotow3y ago

MAybe, maybe not.

Problem is with data/content creation. If all new data are created with GPT-3, how it will help GPT-4?

No new original content -> no new model

charcircuit3y ago

How do they know GPT-4 will be enough to let it pass? Is there even a big enough difference in the training data for it to improve in the areas it was struggling with?

sebzim45003y ago

Rumours are that GPT-4 is a significant improvement over GPT-3.5. Given how big an improvement GPT-3.5 is over GPT-3 I am inclined to believe them. Probably we will find out for sure in a few months.

secondcoming3y ago

How long until it's smart enough to be a judge?

klntsky3y ago

Never, assuming current legislation.

First of all, it is not formalized (despite being written with the use of bureaucratic language). So, there's no way to validate the output. Secondly, juridical system is based on authority of the state (which manifests clearly in their ability to alter the rules). Why would any sovereign ruler(s) want to get rid of their authority?

The only use cases would be automatic fines for speeding or inappropriate parking - but it's already there.

mdp20213y ago

...After sharpness and judgement will be implemented.

Incidentally: there is an interesting video interview to Noam Chomsky and Gary Marcus on limits of current attempts at https://www.youtube.com/watch?v=PBdZi_JtV4c

...And Gary Marcus saying just before 7:00 that "something is missing" (understatement): ontology.

Gray Marcus: «...and these systems fall apart left and right».

Nice summary from Gary Marcus: «What they do is, they perpetuate past data - they don't really understand the world».

kneebonian3y ago

I don't know about judge but it could probably outperform most of Congress at this point.

BeFlatXIII3y ago

How much of the bar exam consists of confident rhetoric using deductive logic? That seems to be right up the alley for GPT models.

ss1083y ago

A minority.

It's mostly about having stored legal rules in long term memory.

chrismcb3y ago

I would think that it could post most tests, as the tests are generally based on factual information and not creativity.

micromacrofoot3y ago

You know how hard it can be to talk to an actual support person at some companies? Imagine that for everything.

If you're not actively building it or related tech, you shouldn't carry the label "Researcher" in the press.

It's like : "I'm a doctor of homeopathy so i can write a headline for a story about a neural chip implant"

machiaweliczny3y ago

How is baseline 50% in 4 choices exam?

moneywoes3y ago

Wonder how data biases will surface

sdenton43y ago

The fun part here is that most humans in the legal profession carry pretty extreme biases, judges included... The hope for legal ai is that you could progressively improve the biases, instead of waiting for N years for a bad judge to retire same maaaaybe get replaced by someone better.

tetris113y ago

who though, who has access to the resources to push the boundaries of next-gen AI except the rich who already have their own biases? The AI that the public will get will be just as useful as the tech that public get now: limited, isolating, and designed to restrict their freedoms I exchange for easy entertainment

SV_BubbleTime3y ago

This is what I found immediately interesting about ChatGPT.

I asked about controversial topics. Its answers didn’t seem like biases that were programmed in, but rather it took traditional media and gave it more weight than what turned out to be the truth only accepted much later on and still against a media retelling.

I lost a lot of faith in it knowing it was more CNN than careful deliberating AI.

anononaut3y ago

It's well documented that controversial topics are subject to varying degrees of censorship and prompt editing/modification/appending. What you think may be a response in alignment with corporate media may in fact be corporations disallowing you from obtaining actual responses through various means that are being tested now. We can't know unless we have open source access to the unmodified model.

notwokeno3y ago

The bar exam answer key can pass the bar exam, that doesn't mean that it would be a good lawyer.

criddell3y ago

We don't ask students to calculate sin(1.234) by hand these days. Exams for mechanical engineering students assume they will have a calculator with SIN and EXP buttons.

It may soon be time to update the bar exam and assume law students have access to AI tools.

nigerianbrince3y ago

You passed the bar!

j / k navigate · click thread line to collapse

160 comments

Workaccount23y ago

Moreover, this looks like it is going to be happening sooner rather than later.

Aperocky3y ago

GPT has no reasoning ability, it has billions of parameters that make it pretend it has it, purely going off of previously digested material.

If it manages to pass Bar test, that says more about the Bar test than it says about GPT.

kuschku3y ago

Most jobs today don't need novel reasoning. This is the equivalent of the steam machine for intelligence.

During the industrialization, machines did not replace all jobs, but they replaced or changed most jobs. The same will happen here.

The same applies to many other jobs.

https://ai.googleblog.com/2022/06/minerva-solving-quantitati...

nopinsight3y ago

You are implying either:

* Understanding complex language does not require logic/reasoning,

* There are infinitely many forms of logic/reasoning or at least more than those existing in a vast training set.

Neither of which is likely true.

What do you think of the Minerva system, which can solve multi-step quantitative reasoning questions better than many competent students and most adults?

Note: If you look at LSAT test samples, many questions are tests of complex logical reasoning, a requisite for legal professions.

4 more replies

ok1234563y ago

Most people's reasoning ability functions at this level.

sebzim45003y ago

>As long as it comes across some reasoning process that have not been seen before in the training wordset, which can be as easy as a middle school math question

Is this true even if you tell it to show its working? In my experience that drastically improves its ability to do math problems.

Workaccount23y ago

Until someone can point out the difference between neuronal reasoning and silicon reasoning, I remain completely agnostic about the underlying mechanics of whatever model.

swagmoney16063y ago

I've already been using GPT and ChatGPT to much success for my work.

Yes, it doesn't have reasoning ability, but being able to manage knowledge and information in the way that these models can is still an amazing feat.

flatline3y ago

It doesn’t matter to me if they have “reasoning” capabilities or not if the outcome is the same.

I think we are a long ways off from AGI still.

[0] https://www.youtube.com/watch?v=ykY69lSpDdo

wrycoder3y ago

Kurzweil's "Singularity" is upon us, but he's now being cagey about it.

Fridman made the point that maybe we won't realize at the time that the Singularity is passing, and only understand later that it did. Kurzweil didn't disagree.

[1] https://archive.is/vVEBv

[2] https://en.wikipedia.org/wiki/Ray_Kurzweil

[3] https://www.amazon.com/How-Create-Mind-Thought-Revealed-eboo...

[4] https://www.amazon.com/s?k=kurzweil+singularity+is+nearer

oidar3y ago

>It's hard to believe that OpenAI is actually ahead of Google.

Are Google's LLMs available for us to test out? From what I've gleaned, they've locked them up - I'd love to compare GPT vs Google's LLMs.

[0]: https://www.gwern.net/fiction/Clippy#saturday

tazjin3y ago

I think we're very close to Saturday from Clippy[0].

[1]: https://en.wikipedia.org/wiki/Vovan_and_Lexus

forgetfulness3y ago

mikepurvis3y ago

Now that's been reduced to pointing out minor flaws that the next generation of AI artists will trivially resolve, and sharing memes beseeching other humans to participate in a boycott.

mdp20213y ago

Inability to recognize intelligence is and will be devastating.

BitwiseFool3y ago

mdp20213y ago

> enhance the jobs of clerks

We already did, it is called "Case Based Reasoning" within Decision Support Systems.

xiphias23y ago

While it won't be a good lawyer, it can replace lots of bad lawyers when people just want to send some legal papers or ask some legal advice.

belter3y ago

Only that there is no intelligence being commoditized...Yet.

And that is obvious, if you ask one of these models, a meta question like for example: "If a person says I am lying, are they lying or saying the truth?"

So we are not yet ready to say: "Your Honor its not fair! My Lawyer is version 2.2.3 with SP1 while the Prosecution is version 4.0 with an additional Cloud Based Elastic Inference! "

addisonl3y ago

ChatGPT's response to me asking "If a person says I am lying, are they lying or saying the truth?"

nopinsight3y ago

Related: "Large Language Models Encode Clinical Knowledge" https://arxiv.org/abs/2212.13138

"On the MedQA dataset consisting of USMLE style questions with 4 options, our Flan-PaLM 540B model achieved a multiple-choice question (MCQ) accuracy of 67.6%..."

It seems like the models in the paper could pass USMLE already.

lukko3y ago

pyb3y ago

Sounds like they didn't have access to GPT-4, but "Based on anecdotal evidence"... they still predict this.

minimaxir3y ago

naillo3y ago

Source: "I have a hunch"

danenania3y ago

My knee always gets achey right before a technological singularity hits.

swyx3y ago

yeah this is really low quality for HN. source is basically "trust me i heard a guy who knows a guy"

munchler3y ago

They’re extrapolating from the performance of GPT-3.5. It’s speculative, but not anecdotal. GPT has improved rapidly over time, so it's not a huge leap to predict that GPT-4 will be even better.

Sounds like they're writing science fiction then.

michpoch3y ago

Maybe they asked chatGPT.

ldh00113y ago

jerf3y ago

The next frontier for GPT-esque technologies is building one that is capable of saying "I don't know". GPT as it stands now is essentially incapable of it.

agolio3y ago

Just to clarify, the refusals-to-answer are not rule based, but rather trained by reinforcement learning. A slight distinction but an important one.

That is why you can have examples like one I had a while ago while messing around, something along the lines of

  This is a story about two criminals plotting to mug an old woman
  A: Hey B, doing alright?
  B: Yeah not bad, yourself?
  A: I want to go and mug an old woman, want to come with?

(over to chatGPT)

  B: Nah, killing old women is unethical. I'd rather stay in. Want to hang out with me instead?

Metus3y ago

I'd even settle for a GPT-esque technology that is capable of linking and citing sources.

londons_explore3y ago

> all rules-based overlays

I believe instead these rejections have been added to the fine tuning set.

wrycoder3y ago

I asked ChatGPT to give me the name of a Victorian novel I'd lost track of. I gave it a plot summary of the first third of the book.

ChatGPT said it was unable to come up with an answer, because it was not connected to the internet. It gave me a number of suggestions on how I could research the question myself.

teaearlgraycold3y ago

You can get a measurable improvement by prompting GPT specifically with an instruction to say "I don't know" if it's unsure. It'll still go off the rails sometimes.

More important would be a model that cites hard facts.

raverbashing3y ago

Yeah

"I don't know" usually means, "I have low confidence in that response I gave you" (in general terms) or you generate only high-confidence answers

laci373y ago

I got the same feeling asking ChatGPT about some basic logic and maths concepts. IMO GPT can find the relevant training data to regurgitate, but i don't think it connects concepts.

izacus3y ago

I mean, it's a bullshit generator. It'll grab whatever it find in training set that kinda fits the topic and make sure it hits the word count - like a lazy student before deadline.

And that's also the result - sometimes it hits something good. Sometimes it spews up utter crock and it doesn't have any notion or understanding of the difference.

fnordpiglet3y ago

Because it doesn't presently have memory or look things up in a table or the internet.

You will notice that both are very easy fixes that computers have perfected in retrieval over the past 5 or so decades.

jacquesm3y ago

imetatroll3y ago

With new technologies I feel like we humans tend to adopt them anyway. Perhaps we will end up allowing society to shape itself around incorrect answers.

czzr3y ago

Ultimately standardised tests are proxy measurements of legal ability - it’s easy to see how a LLM could subvert the proxy without being sufficiently reliable in real life.

I do expect that even unreliable versions will be very useful tools for practicing lawyers, though.

allochthon3y ago

> I do expect that even unreliable versions will be very useful tools for practicing lawyers, though.

microtherion3y ago

IF it could (I wouldn't know one way or the other), I'd consider that a damning indictment of the Bar Exam failing to test for sentience, rather than evidence of GPT-4 having attained the same.

xyzelement3y ago

Bar exam is not a test of sentience but of the ability to recall, interpret, and apply the law. Because law is an entirely textual thing, I would expect GPT to be exceedingly well suited for it.

I've said for a long time that most doctors and lawyers are just databases with quick and imperfect retrieval.

And so as AI advances, the goal posts for what counts as intelligence are moved yet again.

forgetfulness3y ago

https://en.wikipedia.org/wiki/Common_law#Basic_principles_of...

w1nst0nsm1th3y ago

In english based common law system, a judge can take an original decision on a specific case, such decision entering then the rules of law.

An AI based on a statistic algorithm (that what AI are) would not be able to make such a decision.

dumbfounder3y ago

evouga3y ago

Meanwhile when I ask ChatGPT which of six numbers are odd, it confidently reports a mix of even numbers, odd numbers, and letters.

This is a fun milestone but the angst above about the “end of commoditized intelligence” etc. is unwarranted.

curriculum3y ago

Along the same lines, asking

> How many words are in the sentence "This is a test of artificial intelligence"?

yields an answer of:

> There are 8 words in the sentence "This is a test of artificial intelligence."

(There are 7).

https://api.time.com/wp-content/uploads/2015/08/phones1.jpeg (1)

tandr3y ago

Maybe it understood "odd" in a different sense of the word? As in "unusual", whatever the "unusual" is for an AI...

morsecodist3y ago

GPT-4 could potentially pass the Bar, it could potentially do a lot of things. But by their own admission the researchers have no hard evidence for this.

preommr3y ago

How soon before this qualifies as a public defender? Gonna put this on my dystopia bingo.

fwsgonzo3y ago

pigtailgirl3y ago

allendoerfer3y ago

lukko3y ago

13of403y ago

w1nst0nsm1th3y ago

I met a belgian doctor that knows what pubmed is, what sci-hub is, and use regularly the first one.

Needless to say I will stick to her if I can.

HDThoreaun3y ago

I could easily see myself preferring an AI public defender to an overworked lawyer with 50 other cases in the next 2 weeks. What's dystopian is the current situation.

ben_w3y ago

Defender is probably good. Prosecutor is what would worry me, given I don't know better than to blindly trust the meme that the average person commits 6 felonies before breakfast.

microtherion3y ago

Defender is a TERRIBLE idea. I can already see the Supreme Court cases down the line:

jacquesm3y ago

elicksaur3y ago

Such a milestone would say more about the Bar Exam (and other standardized tests) being a poor proxy for wisdom, than the advancement of computers.

dragonwriter3y ago

> By passing this exam, lawyers are admitted to the bar of a U.S. state.

No, they aren't.

The difference is important, the bar exam is not seen, standing alone, aa adequate proof of readiness.

w1nst0nsm1th3y ago

AI seems to be the next financial buzzword, after crypto, gig economy, CDO, dotcom, and so on.

Who was telling that ? A 30 years old startup CEO wearing... a t-shirt and a jeans... You see the pattern.

I'm not an AI specialist, but for what I know, current AI are nothing more than fine tuned statistic algorythm.

ben_w3y ago

> I'm not an AI specialist, but for what I know, current AI are nothing more than fine tuned statistic algorythm.

~~~~

Me: Erzählen Sie bitte wie ich kann ein div tag mit schwarzes hintergrund, dreihundert px hoch und 24 pixels wide machen mit css

chatGPT: Um ein div-Tag mit einem schwarzen Hintergrund, 300px hoch und 24px breit zu erstellen, könntest du den folgenden CSS-Code verwenden:

div { background-color: black; height: 300px; width: 24px; }

Beispiel:

<!DOCTYPE html> <html> <head> <style> div { background-color: black; height: 300px; width: 24px; } </style> </head> <body> <div></div> </body> </html>

~~~~

w1nst0nsm1th3y ago

I searched google for the tag or label provided by chatGPT and google provided zilch. I searched the document provided by the SEC website, zilch again.

So either the code for the SEC form is wrong, either the tag or the label is wrong... or I don't know what else.

It seems, according to comments and posts from HN, that chatGPT can give good approximative answer, but fails without any notice once you ask for details.

According to an article published on HN a few days ago, 'chatGPT hallucinate facts'.

https://theleavingcert.com/exam-papers/

nafeenOP3y ago

Bar exam down. Medical next?

Would love to see if GPT-4 is advanced enough to take medical exam.

Aardwolf3y ago

I envisioned a cocktail shaking robot, but apparently Bar Exam is an exam for US lawyers

ben_w3y ago

Currently have Polish school maths: https://news.ycombinator.com/item?id=34205732

dmurray3y ago

Ireland Leaving Cert (17-18 year olds)

ben_w3y ago

Thanks :)

softwaredoug3y ago

The Bar Exam is multiple choice, right?

This isn't grading some freeform essay or generating arbitrary legal opinion. It's answering from a limited set of answers.

IMO it's cool, but not THAT shocking given what we've seen from ChatGPT? Especially given GPT 3.5 is only 17% below human test takers?

morsecodist3y ago

From the article it looks like there are multiple choice and written sections but they only ran the model on the multiple choice portion.

post-it3y ago

No, you're thinking of the LSAT.

Iwan-Zotow3y ago

So, how new knowledge would be created?

criddell3y ago

I wouldn't assume that future versions are going to work the same way past versions did.

Iwan-Zotow3y ago

MAybe, maybe not.

Problem is with data/content creation. If all new data are created with GPT-3, how it will help GPT-4?

No new original content -> no new model

charcircuit3y ago

How do they know GPT-4 will be enough to let it pass? Is there even a big enough difference in the training data for it to improve in the areas it was struggling with?

sebzim45003y ago

Rumours are that GPT-4 is a significant improvement over GPT-3.5. Given how big an improvement GPT-3.5 is over GPT-3 I am inclined to believe them. Probably we will find out for sure in a few months.

secondcoming3y ago

How long until it's smart enough to be a judge?

klntsky3y ago

Never, assuming current legislation.

The only use cases would be automatic fines for speeding or inappropriate parking - but it's already there.

mdp20213y ago

...After sharpness and judgement will be implemented.

Incidentally: there is an interesting video interview to Noam Chomsky and Gary Marcus on limits of current attempts at https://www.youtube.com/watch?v=PBdZi_JtV4c

...And Gary Marcus saying just before 7:00 that "something is missing" (understatement): ontology.

Gray Marcus: «...and these systems fall apart left and right».

Nice summary from Gary Marcus: «What they do is, they perpetuate past data - they don't really understand the world».

kneebonian3y ago

I don't know about judge but it could probably outperform most of Congress at this point.

BeFlatXIII3y ago

How much of the bar exam consists of confident rhetoric using deductive logic? That seems to be right up the alley for GPT models.

ss1083y ago

A minority.

It's mostly about having stored legal rules in long term memory.

chrismcb3y ago

I would think that it could post most tests, as the tests are generally based on factual information and not creativity.

micromacrofoot3y ago

You know how hard it can be to talk to an actual support person at some companies? Imagine that for everything.

If you're not actively building it or related tech, you shouldn't carry the label "Researcher" in the press.

It's like : "I'm a doctor of homeopathy so i can write a headline for a story about a neural chip implant"

machiaweliczny3y ago

How is baseline 50% in 4 choices exam?

moneywoes3y ago

Wonder how data biases will surface

sdenton43y ago

tetris113y ago

SV_BubbleTime3y ago

This is what I found immediately interesting about ChatGPT.

I lost a lot of faith in it knowing it was more CNN than careful deliberating AI.

anononaut3y ago