When is this ever a problem that cannot be solved by positioning yourself with a wall behind you or going somewhere private? This feels like overkill for the stated use-case. I can imagine someone thinking they might need this to do private stuff in a public space (a coffee shop?), but they'd turn paranoid from everyone passing by just glancing around.
Also, is this a realistic threat model anywhere? People snooping by standing behind you tend to be colleagues or totally random passers-by; not people actually interested in gleaning private information. Anything more serious than logging into your Facebook account would imply simply having proper OpSec procedures (like: 'only do this in private').
All I can think of is employee monitoring where such tools will just end up making people insecure in their workplace (and less productive, because gazing out of a window or into nothingness actually helps when you are doing work which requires pondering; and less healthy, because looking away from your screen into the distance is recommended for anyone with working eyes).
They create this CNN for exactly this task, autism diagnosis in children. I suppose this model would work for babies too.
Edit: ah I see your point, in the paper they diagnose autism with eye contact, but your point is a task closer to what my model does. It could definelty be augmented for such a task, we’d just need to improve the accuracy. The only issue I see is sourcing training data might be tricky, unless I partner with some institution researching this. If you know of anyone in this field I’d be happy to speak with them.
Put a tablet in front of a baby. Left half has images of gears and stuff, right half has images of people and faces. Does the baby look at the left or right half of the screen? This is actually pretty indicative of autism and easy to put into a foolproof app.
The linked github is recording a video of an older child's face while they look at a person who is wearing a camera or something, and judging whether or not they make proper eye contact. This is thematically similar but actually really different. Requires an older kid, both for the model and method, and is hard to actually use. Not that useful.
Intervening when still a baby is absolutely critical.
P.S., deciding which half of a tablet a baby is looking is MUCH MUCH easier than gaze tracking. Make the tablet screen bright white around the edges. Turn brightness up. Use off the shelf iris tracking software. Locate the reflection of the iPad in the baby's iris. Is it on the right half or left half of the iris? Adjust for their position in FOV and their face pose a bit and bam that's very accurate. Full, robust gaze tracking is a million times harder, believe me.
I'm honestly skeptical this will work at all, the FOV of most webcams is so small that it can barely capture the shoulder of someone sitting beside me, let alone their eyes.
Then what you're basically looking for is callibration from the eye position / angle to the screen rectangle. You want to shoot a ray from each eye and see if they intersect with the laptop's screen.
This is challenging because most webcams are pretty low resolution, so each eyeball will probably be like ~20px. From these 20px, you need to estimate the eyeball->screen ray. And of course this varies with the screen size.
TLDR: Decent idea, but should've done some napkin math and or quick bounds checking first. Maybe a $5 privacy protector is better.
Here's an idea:
Maybe start by seeing if you can train a primary user gaze tracker first, how well you can get it with modeling and then calibration. Then once you've solved that problem, you can use that as your upper bound of expected performance, and transform the problem to detecting the gaze of people nearby instead of the primary user.
Perhaps I have been jaded by the Mac webcam, I agree on most old webcams it wont be great but on newer webcams I have had success.
I did try a calibration approach but it's simply too fragile for in the wild deployment, calibration works great if you only care about one user but when you start looking at other people it doesn't work so well.
Good idea, it may be more fruitful to do that. At least then for the primary user we can be much more certain.
Privacy protector solves different problems - they prevent people from extracting information on screen, not merely inform about possible infraction.
That being said it's useful in a way that if I'd see anything like that in a contract it wouldn't be a red flag. It'd be red flashing GT*O alarm ;)
Privacy screens are still useful and I recommend people to use EyesOff and the screen protector. A privacy screen won't stop someone shoulder surfing from directly behind you etc.
There is also better ways to do this sort of task when all you care about is tracking the main user: https://arxiv.org/abs/2504.06237, https://pmc.ncbi.nlm.nih.gov/articles/PMC11019238/
Interesting problem anyway. I'm surprised the accuracy is so low.
Any tips on improving accuracy? A lot of it might be due to lack of diverse images + labelling errors as I did it all manually.
I remember a guy watching a video them looking up and it paused, etc