Show HN: YoHa – A practical hand tracking engine (opens in new tab)

(handtracking.io)

292 pointsb-3-n4y ago64 comments

64 comments

The demo really sells it here [1]. It's amazingly intuitive and easy to use, it should be a part of video-conferencing software.

[1] https://handtracking.io/draw_demo/

b-3-nOP4y ago

Thank you for the feedback. Such an integration would be nice indeed.

cloudking4y ago

You could integrate it with all video-conferencing software building it as a virtual camera plug-in that works with OBS

Graffur4y ago

It's like an initial beta of the software - it's not production ready. I can't imagine this adding value to a meeting _yet_. Seems promising though.

tomcooks4y ago

This is a GREAT website, I can understand what it does with zero clicks, zero scrolls.

Really great, congratulations, I hope that I can find a way to apply this lesson to my SaaS.

SV_BubbleTime4y ago

Agreed, but also nature of the beast. It's really easy to explain hand tracking software in a single media element. It's a lot harder to explain some crypto AEAD encapsulation format the same way.

I assume YoHa means Your Hands... I don't think I could have resisted OhHi for hand tracking.

smoyer4y ago

I've been working on a couple of chording keyboard designs and was thinking I might be able to create a virtual keyboard using this library. It would be nice to also be able to recognize the hand from the back. A keyboard would also obviously be necessary to track two hands at a time.

How does the application deal with different skin-tones?

b-3-nOP4y ago

That's an interesting idea. I have not tried to build something similar but a humble word of caution that I want to put out is that no matter what kind of ML you use the mechanical version of the instrument will always be more precise (you likely are aware of it, just want to make sure). However, you might be able to approximate precision of the mechanical version.

Two hand support would be nice and I would love to add it in the future.

The engine should work well with different skin tones as the training data was collected from a set of many and diverse individuals. The training data will also grow further over time making it more and more robust.

canada_dry4y ago

Similar thought here. I'd like to track two hands above a real keyboard (size/position established via 2 X QR stickers) to feed into a VR piano training simulation.

The tech is all there, really it's just having the time and effort to get all the pieces together!

programmarchy4y ago

Was wondering how easy it'd be to port to native mobile, so went looking for the source code, but doesn't appear to actually be open source. The meat is distributed as binary (WASM for "backend" code and a .bin for model weights).

Aside from being a cool hand tracker, it's a very clever way to distribute closed source JavaScript packages.

b-3-nOP4y ago

Thank you for the feedback. You are right that the project is not open source right now. It's "only" MIT licensed. That's why I also don't advertise it as open source (if you see the word open source somewhere it would be a mistake on my end, feel free to tell me if you see it somewhere). I wanted to start out from just an API contract so that it is easier to manage and get started. In general I have no problem open sourcing the JS part. But first there is some refactoring to do so it is easier to maintain upon open sourcing. Stay tuned!

As a side note: The wasm files are actually from the inference engine (tfjs).

Please let me know if you have any more questions in that regard.

boxfire4y ago

This architecture was also used in the link referenced when bringing up alternative implementations:

https://github.com/google/mediapipe/issues/877#issuecomment-...

phailhaus4y ago

An "undo" gesture seems necessary, it was a bit too easy to accidentally wipe the screen. Aside from that, this is fantastic! Love to see what WASM is enabling these days on the web.

b-3-nOP4y ago

Thank you for the feedback. Indeed such a functionality would be nice. One could solve this via another hand pose or in some way also with the existing hand poses. E.g. make a fist for say 2 seconds to clear the whole screen. Anything shorter will just issue an "undo".

YoHa uses tfjs.js which provides several backends for computation. One indeed uses WASM, the other one is WebGL based. The latter one is usually the more powerful one.

iainctduncan4y ago

Hi, I'm not sure if you've looked into this or not, but another area that is interested in this sort of thing and might be very excited is musical gesture recognition.

b-3-nOP4y ago

Hey, I believe there are multiple things you could have meant. From the top of my head one thing that might be interesting would be an application that allows conductors to conduct a virtual orchestra. But there are other possibilities in this space too I'm sure! If you had something else in mind feel free to share.

I have not explored this space much so far as my focus is rather to build the infrastructure that enables such applications rather than building the applications myself.

hondadriver4y ago

Also look at leap motion. https://www.ultraleap.com/product/leap-motion-controller/ (tip: mouser has them in stock and usually the best price) with midipaw http://www.midipaw.com/ (free)

Latency is very low which is very important for this use case. Look on YouTube for demos.

layer84y ago

What would be nice is a version that can be used to paint on the screen with your fingers, such that the lines are visible on a remotely shared screen. The use-case is marking up/highlighting on a normal desktop monitor (i.e. non-touch) while screen-sharing, which is awkward using a mouse or touchpad (think circling stuff in source code and documents, drawing arrows etc.). That would mean (a) a camera from behind (facing the screen), so that the fingers can touch (or almost touch) the screen (i.e. be co-located to the screen contents you want to markup), and (b) native integration, so that the painting is done on a transparent always-on-top OS window (so that it's picked up by the screen-sharing software); or just as a native pointing device, since such on-screen painting/diagramming software already exists.

b-3-nOP4y ago

Thank you for sharing this creative idea. "so that the fingers can touch (or almost touch) the screen" I think this is a big advantage of this approach since you can only achieve this with the back facing camera. On the flip side, with a back facing camera you either have to place the camera in between yourself and the screen which might be awkward or you have to ensure a placement of the camera behind you that isn't prone to occlusions (e.g. your head or chair might occlude your hands from the cameras point of view). The latter might also make calibration more difficult or impact precision since you might have to mount the camera with some elevation causing a less optimal camera angle.

stavros4y ago

This looks great! Recently I've been wanting to make a hand-tracking library for video editing. I'd make a hand gesture like an OK with my index and thumb to begin recording, and when I was done I'd make a thumbs up to keep the take or thumbs down to delete a bad take. That way, I could very easily record stuff while only keeping the good takes, to sort out later.

Hell, the library could even stitch the takes together, omitting the times when my hand started/finished doing the gestures.

tjchear4y ago

This reminds me of TAFFI [0], a pinching gesture recognition algorithm that is surprisingly easy to implement with classical computer vision techniques.

[0] https://www.microsoft.com/en-us/research/publication/robust-...

brundolf4y ago

Bit of feedback: the home page is pretty sparse. The video is great, but it wasn't obvious how to find the repo or where to get the package (or even what language it can be used with). I had to open the Demo, wait for it to load, and then click the Github link there, and then the readme told me it was available on NPM.

Otherwise looks pretty impressive! I've been looking for something like this and I may give it a whirl

b-3-nOP4y ago

Thank you for the feedback. You are right, the home page should probably be enriched with more information and maybe I can make the information you were looking for stand out better. As a side note: There is a link to GitHub in the footer. The language ("TypeScript API") is also mentioned in the body of the page. But I see that these two can quickly go unnoticed.

eminence324y ago

The demo doesn't seem to work on my chromebook. Maybe it's too underpowered?

Web page doesn't say anything after `Warming up...` and the latest message in the browser console is:

    Setting up wasm backend.

I expected to see a message from my browser along the lines of "Do you want to let this site use your camera", but I saw no such message.

b-3-nOP4y ago

Thank you for the feedback. I would like to fix this but I neither own a Chromebook nor does it seem like I can use a platform like browserstack to reproduce the issue (didn't find Chromebook as available device there). If you would like to help debugging the issue you can open a GitHub issue here: https://github.com/handtracking-io/yoha/issues

jakearmitage4y ago

I wish there was a nice open-source model for tracking hands and arms with multiple viewpoints (multiple cameras), similar to commercial software like this: https://www.ipisoft.com/

WalterGR4y ago

Awesome.

Just note that in the demo video, the user is 'writing' everything mirrored.

22c4y ago

The video itself could be mirrored.

WalterGR4y ago

My god, you're right. Unless he's wearing a women's shirt.

tomcooks4y ago

BTW this would be great for spaced repetition foreign character learning (Chinese, Arabic, Japanese, Korean, etc.): if the drawn figure is similar enough to the character the student is learning mark it as studied.

Congrats again

b-3-nOP4y ago

Thank you for your feedback and for sharing this potential use case. I think it is a very creative idea.

inetsee4y ago

My first question is whether this has the capability of being adapted to interpret/translate American Sign Language (ASL)?

b-3-nOP4y ago

Thank you for this inspiring question. For interpreting sign language you need multi-hand support which YoHa is currently lacking. Apart from that you likely also need to account for the temporal dimension which YoHa also does not do right now. If those things were implemented I'm confident that it would produce meaningful results.

rafamct4y ago

It's worth noting that movements of the mouth are extremely important in ASL (and other sign languages) and so this probably isn't as useful as it might seem at first.

b-3-nOP4y ago

Thank you for pointing this out. I overlooked this. I presume that on top of that what also could be relevant is the movements of arms, facial expressions and maybe also general body posture. Please correct me if I'm wrong as I'm not too familiar with sign language.

bjackman4y ago

Signs also tend to be expressed by the hands' position/movement in relation to _other_ body parts.

Edit: OTOH fingerspelling (https://en.m.wikipedia.org/wiki/Fingerspelling) might be a more feasible usecase!

netgusto4y ago

Indeed! TIL. https://en.wikipedia.org/wiki/Mouthing

borplk4y ago

Very impressive.

I want something like this so I can bind hand gestures to commands.

For example scroll down on a page by a hand gesture.

b-3-nOP4y ago

One can build this pretty easily for a website that you are hosting with the existing API (https://github.com/handtracking-io/yoha/tree/master/docs).

However, you likely want this functionality on any website that you are visiting for which you probably need to build a browser extension. I haven't tried incorporating YoHa into a browser extension but if somebody were to try I'd be happy to help.

borplk4y ago

That's nice but I'd also want it for general desktop stuff.

So I guess it would have to be sitting on my machine.

For example hand gestures to switch the desktop workspace.

Swipe left/right motion to switch desktop workspace. That would be the dream :)

kseistrup4y ago

Could it be used to translate sign language signs to written or spoken words, I wonder.

b-3-nOP4y ago

Thank you for the question. Since a similar question was asked already let me refer you to this comment: https://news.ycombinator.com/item?id=28830943

kseistrup4y ago

Thank you for the pointer that I had overlooked.

lost-found4y ago

Demo keeps crashing on iOS.

b-3-nOP4y ago

Thank you for the feedback. I can confirm there seems to be an issue with iOS/Chrome and created a GitHub issue for it (https://github.com/handtracking-io/yoha/issues/5).

Note that if you were trying iOS/Safari and not iOS/Chrome there is nothing that can be done due to a limitation that is documented in the section "Discussion" here: https://developer.apple.com/documentation/webkitjs/canvasren... Will document this.

adnanc4y ago

Great idea which is brilliantly executed.

So many educational uses, well done.

b-3-nOP4y ago

Thank you for the feedback.

karxxm4y ago

Would you provide the related paper to this approach?

b-3-nOP4y ago

In contrast to similar works there is no dedicated paper that presents e.g. the neural network or the training procedure. Of course ideas from many papers influenced this work and I can't list them all here. Maybe it helps that the backbone of the network is very similar to MobileNetV2 (https://arxiv.org/abs/1801.04381). Let me know if you have any more questions in that regard.

karxxm4y ago

Thanks for your reply! I just thought that SIGCHI is around the corner and it will be presented there! Awesome work!

tcper4y ago

Great work!

b-3-nOP4y ago

Thank you for the feedback.

rglover4y ago

Great demo.

b-3-nOP4y ago

Thank you for your feedback.

itake4y ago

I think these tools are super interesting, but I tools like this marginalize users with non-standard number of limbs or fingers.

rpmisms4y ago

So does the real world. Things are hard to do with disabilities. That's what the word means. This has great potential, and it's not worth shutting down because some people aren't able to use it.

I can also see this being very helpful for people who have cerebral palsy, for example. Larger movements are easier, this might help someone use the web more easily.

itake4y ago

What if a bank used this for authentication and disable people can't use their custom interface devices? Does that mean that disabled people shouldn't access to their bank accounts?

Maybe if this was the input device that interacts with the standard web, then there is potential here, but it would be unfortunate if a company used this as a primary means of input.

mkl4y ago

That's the bank's mistake, not this library's.

pasabagi4y ago

I feel the opposite. The more computer interfacing takes place in software, the better for disabled users. If you have a device that expects a keyboard scancode to respond, then you need to build a physical keyboard to talk to it. Building a physical keyboard that doesn't suck is expensive, and so disabled people pay crazy prices for gear tailored to them.

Tailoring software that can use very general-purpose input equipment is much cheaper. Training a neural net to recognize one-handed gestures, for instance, could be done by one developer then deployed worldwide. Making a decent one-hand keyboard is way less easy and way harder to scale.

colordrops4y ago

What do you suggest be done about it?

itake4y ago

I'm fine with using them, as long as alternatives are available for people with disabilities are able to participate as well.

Imagine if your bank started using these to access your account and suddenly disabled customers could no longer use their adaptive input devices to interact with their account.

voakbasda4y ago

So we can't give people nice things unless we can give everyone nice things?

pantulis4y ago

This is a very valid point, but as a counter argument the technique implemented here could be adapted to help users with other needs like say, a browser extension that can help you navigate back and forward with the blink of an eye.

itake4y ago

This all gets complicated, because not everyone has 2 eyes :-/.

You end up with complicated systems trying to cover all of the edge cases.

j / k navigate · click thread line to collapse

64 comments

gitgud4y ago

The demo really sells it here [1]. It's amazingly intuitive and easy to use, it should be a part of video-conferencing software.

[1] https://handtracking.io/draw_demo/

b-3-nOP4y ago

Thank you for the feedback. Such an integration would be nice indeed.

cloudking4y ago

You could integrate it with all video-conferencing software building it as a virtual camera plug-in that works with OBS

Graffur4y ago

It's like an initial beta of the software - it's not production ready. I can't imagine this adding value to a meeting _yet_. Seems promising though.

tomcooks4y ago

This is a GREAT website, I can understand what it does with zero clicks, zero scrolls.

Really great, congratulations, I hope that I can find a way to apply this lesson to my SaaS.

SV_BubbleTime4y ago

Agreed, but also nature of the beast. It's really easy to explain hand tracking software in a single media element. It's a lot harder to explain some crypto AEAD encapsulation format the same way.

I assume YoHa means Your Hands... I don't think I could have resisted OhHi for hand tracking.

smoyer4y ago

How does the application deal with different skin-tones?

b-3-nOP4y ago

Two hand support would be nice and I would love to add it in the future.

canada_dry4y ago

Similar thought here. I'd like to track two hands above a real keyboard (size/position established via 2 X QR stickers) to feed into a VR piano training simulation.

The tech is all there, really it's just having the time and effort to get all the pieces together!

programmarchy4y ago

Aside from being a cool hand tracker, it's a very clever way to distribute closed source JavaScript packages.

b-3-nOP4y ago

As a side note: The wasm files are actually from the inference engine (tfjs).

Please let me know if you have any more questions in that regard.

boxfire4y ago

This architecture was also used in the link referenced when bringing up alternative implementations:

https://github.com/google/mediapipe/issues/877#issuecomment-...

phailhaus4y ago

An "undo" gesture seems necessary, it was a bit too easy to accidentally wipe the screen. Aside from that, this is fantastic! Love to see what WASM is enabling these days on the web.

b-3-nOP4y ago

YoHa uses tfjs.js which provides several backends for computation. One indeed uses WASM, the other one is WebGL based. The latter one is usually the more powerful one.

iainctduncan4y ago

Hi, I'm not sure if you've looked into this or not, but another area that is interested in this sort of thing and might be very excited is musical gesture recognition.

b-3-nOP4y ago

I have not explored this space much so far as my focus is rather to build the infrastructure that enables such applications rather than building the applications myself.

hondadriver4y ago

Also look at leap motion. https://www.ultraleap.com/product/leap-motion-controller/ (tip: mouser has them in stock and usually the best price) with midipaw http://www.midipaw.com/ (free)

Latency is very low which is very important for this use case. Look on YouTube for demos.

layer84y ago

b-3-nOP4y ago

stavros4y ago

Hell, the library could even stitch the takes together, omitting the times when my hand started/finished doing the gestures.

tjchear4y ago

This reminds me of TAFFI [0], a pinching gesture recognition algorithm that is surprisingly easy to implement with classical computer vision techniques.

[0] https://www.microsoft.com/en-us/research/publication/robust-...

brundolf4y ago

Otherwise looks pretty impressive! I've been looking for something like this and I may give it a whirl

b-3-nOP4y ago

eminence324y ago

The demo doesn't seem to work on my chromebook. Maybe it's too underpowered?

Web page doesn't say anything after `Warming up...` and the latest message in the browser console is:

    Setting up wasm backend.

I expected to see a message from my browser along the lines of "Do you want to let this site use your camera", but I saw no such message.

b-3-nOP4y ago

jakearmitage4y ago

I wish there was a nice open-source model for tracking hands and arms with multiple viewpoints (multiple cameras), similar to commercial software like this: https://www.ipisoft.com/

WalterGR4y ago

Awesome.

Just note that in the demo video, the user is 'writing' everything mirrored.

22c4y ago

The video itself could be mirrored.

WalterGR4y ago

My god, you're right. Unless he's wearing a women's shirt.

tomcooks4y ago

Congrats again

b-3-nOP4y ago

Thank you for your feedback and for sharing this potential use case. I think it is a very creative idea.

inetsee4y ago

My first question is whether this has the capability of being adapted to interpret/translate American Sign Language (ASL)?

b-3-nOP4y ago

rafamct4y ago

It's worth noting that movements of the mouth are extremely important in ASL (and other sign languages) and so this probably isn't as useful as it might seem at first.

b-3-nOP4y ago

bjackman4y ago

Signs also tend to be expressed by the hands' position/movement in relation to _other_ body parts.

Edit: OTOH fingerspelling (https://en.m.wikipedia.org/wiki/Fingerspelling) might be a more feasible usecase!

netgusto4y ago

Indeed! TIL. https://en.wikipedia.org/wiki/Mouthing

borplk4y ago

Very impressive.

I want something like this so I can bind hand gestures to commands.

For example scroll down on a page by a hand gesture.

b-3-nOP4y ago

One can build this pretty easily for a website that you are hosting with the existing API (https://github.com/handtracking-io/yoha/tree/master/docs).

borplk4y ago

That's nice but I'd also want it for general desktop stuff.

So I guess it would have to be sitting on my machine.

For example hand gestures to switch the desktop workspace.

Swipe left/right motion to switch desktop workspace. That would be the dream :)

kseistrup4y ago

Could it be used to translate sign language signs to written or spoken words, I wonder.

b-3-nOP4y ago

Thank you for the question. Since a similar question was asked already let me refer you to this comment: https://news.ycombinator.com/item?id=28830943

kseistrup4y ago

Thank you for the pointer that I had overlooked.

lost-found4y ago

Demo keeps crashing on iOS.

b-3-nOP4y ago

Thank you for the feedback. I can confirm there seems to be an issue with iOS/Chrome and created a GitHub issue for it (https://github.com/handtracking-io/yoha/issues/5).

adnanc4y ago

Great idea which is brilliantly executed.

So many educational uses, well done.

b-3-nOP4y ago

Thank you for the feedback.

karxxm4y ago

Would you provide the related paper to this approach?

b-3-nOP4y ago

karxxm4y ago

Thanks for your reply! I just thought that SIGCHI is around the corner and it will be presented there! Awesome work!

tcper4y ago

Great work!

b-3-nOP4y ago

Thank you for the feedback.

rglover4y ago

Great demo.

b-3-nOP4y ago

Thank you for your feedback.

itake4y ago

I think these tools are super interesting, but I tools like this marginalize users with non-standard number of limbs or fingers.

rpmisms4y ago

So does the real world. Things are hard to do with disabilities. That's what the word means. This has great potential, and it's not worth shutting down because some people aren't able to use it.

I can also see this being very helpful for people who have cerebral palsy, for example. Larger movements are easier, this might help someone use the web more easily.

itake4y ago

What if a bank used this for authentication and disable people can't use their custom interface devices? Does that mean that disabled people shouldn't access to their bank accounts?

Maybe if this was the input device that interacts with the standard web, then there is potential here, but it would be unfortunate if a company used this as a primary means of input.

mkl4y ago

That's the bank's mistake, not this library's.

pasabagi4y ago

colordrops4y ago

What do you suggest be done about it?

itake4y ago

I'm fine with using them, as long as alternatives are available for people with disabilities are able to participate as well.

Imagine if your bank started using these to access your account and suddenly disabled customers could no longer use their adaptive input devices to interact with their account.

voakbasda4y ago

So we can't give people nice things unless we can give everyone nice things?

pantulis4y ago

itake4y ago

This all gets complicated, because not everyone has 2 eyes :-/.

You end up with complicated systems trying to cover all of the edge cases.

j / k navigate · click thread line to collapse