undefined | Better HN

0 pointsj_w2mo ago0 comments

It's not that the oral format should be dismissed, just that the idea of your exam being speaking to a machine to be judged on the merit of your time in a course is dystopian. Talking to another human is fine.

0 comments

makeitdouble2mo ago

How different is it in essence from checking boxes to be scanned by a machine and auto-evaluated to get a one dimention numerical score ?

Have exams ever been about humanity and the optics of it ?

sarchertech2mo ago

Very different. A scantron machine is deterministic and non-chaotic.

In addition to being non-deterministic LLMs can product vastly different output from very slightly different input.

That’s ignoring how vulnerable LLMs are to prompt injection, and if this becomes common enough that exams aren’t thoroughly vetted by humans, I expect prompt attacks to become common.

Also if this is about avoiding in person exams, what prevents students from just letting their AI talk to test AI.

makeitdouble2mo ago

I saw this piece as the start of an experiment, and the use of a "council of AI" as they put it to average out the variability sounds like a decent path to standardization to me (prompt injecting would not be impossible, but getting something past all the steps sounds like a pretty tough challenge)

They mention getting 100% agreement between the LLMs on some questions and lower rates on other, so if an exam was composed of only questions where there is near 100% convergence, we'd be pretty close to a stable state.

I agree it would be reassuring to have a human somewhere in the loop, or perhaps allow the students to appeal the evaluation (at cost?) if they is evidence of a disconnect between the exam and the other criteria. But depending on how the questions and format is tweaked we could IMHO end up with something reliable for very basic assessments.

PS:

> Also if this is about avoiding in person exams, what prevents students from just letting their AI talk to test AI.

Nothing indeed. The arms race hasn't started here, and will keep going IMO.

3 more replies

j / k navigate · click thread line to collapse

0 comments

makeitdouble2mo ago

How different is it in essence from checking boxes to be scanned by a machine and auto-evaluated to get a one dimention numerical score ?

Have exams ever been about humanity and the optics of it ?

sarchertech2mo ago

Very different. A scantron machine is deterministic and non-chaotic.

In addition to being non-deterministic LLMs can product vastly different output from very slightly different input.

That’s ignoring how vulnerable LLMs are to prompt injection, and if this becomes common enough that exams aren’t thoroughly vetted by humans, I expect prompt attacks to become common.

Also if this is about avoiding in person exams, what prevents students from just letting their AI talk to test AI.

makeitdouble2mo ago

PS:

> Also if this is about avoiding in person exams, what prevents students from just letting their AI talk to test AI.

Nothing indeed. The arms race hasn't started here, and will keep going IMO.

3 more replies

j / k navigate · click thread line to collapse