Back to the Future of Handwriting Recognition (opens in new tab)

(jackschaedler.github.io)

142 pointsjabagawee7y ago37 comments

37 comments

Handwriting recognition is a great example of technology whose development seems to have plateaued before it became "good enough." Stroke-based recognition has been in development for half a century now, but my iPad Pro still makes errors at least a couple of times per line, which is enough to make it pretty much useless unless you're writing only for your own later consumption. That and voice recognition. It's shocking how bad Android and iOS still are at that, even after decades of work on voice recognition technology.

pipio217y ago

>I think it’s worth asking why anyone in their right mind should care about mid-century handwriting recognition algorithms in 2016.

Lots of people care, specially in Asia(Chinese and Japanese). It is just that the problem is incredible hard.

We put 5 very smart people working for a year on that, and it was totally impossible meeting people's expectations, specially people like doctors taking notes fast(and ugly).

We thought that the market was in creating mindmaps or something instead as people could write slower and better.

But people write a double u and expect the computer to see an "m". With deep learning is possible but extremely flimsy.

snowwrestler7y ago

This a cool exploration of technology, and I don't want to take away from that.

> The program was efficient enough to run in real-time on a IBM System/360 computer, and robust enough to properly identify 90 percent of the symbols drawn by first-time users.

I just want to point out that 90% accuracy is, from a user's point of view, awful handwriting recognition performance. It means you will be correcting on average about 10 words per paragraph! Even 99% percent accuracy is not nearly good enough to give people a sense that the computer is good at handwriting recognition.

I also want to point out the difficulty and danger in interpreting strokes when doing handwriting recognition.

In the last demo box, try writing a capital Y without lifting the pen. You'll have to go "up and down" one or both upper branches. Because of this, the recognizer will call it a K, A, or N even though it is obviously a Y when you're done.

This demo is constrained to only using one stroke per letter, but systems that permit multiple strokes still get into trouble when the strokes don't match what they are expecting--for example if you draw an X using 4 individual strokes outward from a central point.

This also happens with words. In Microsoft's handwriting recognition in Office in the early 2000s, writing the letters of a word out of order completely borked the recognition. For example writing "xample" and then going back and adding an "e" at the beginning would not produce a recognized word of "example."

My point with all of this is that there is a reason you probably don't do all your computing with natural handwriting. It's a surprisingly difficult problem. Users do not expect it to matter how they form letters and words on the page. And they have very low tolerance for correcting computer mistakes.

defgeneric7y ago

> This demo is constrained to only using one stroke per letter, but systems that permit multiple strokes still get into trouble when the strokes don't match what they are expecting--for example if you draw an X using 4 individual strokes outward from a central point.

Arguably, an X drawn this way should NOT be recognized as an X--that's not how an X is spelled.

If the task is communicating with the computer, then recognition of the gesture is a valid approach. Just as there are conventions regarding the spelling of words, there are conventions involved in the formation of letters. Why not use them? It would even seem incorrect to leave these out.

snowwrestler7y ago

The human convention of written language it to interpret the symbols after they have been completed, not during the act of writing them.

A computer that interprets the behavior of writing, rather than the final symbols, is going to violate user expectations at some point.

Why? Because people do not always write as linearly as you might expect, especially when writing fast. They might drop or mis-write letters or words, then go back and fix it. Or quickly jot down just enough letters to remind themselves of what they heard, then go back and fill the rest in. A routine that interprets actions in order is going to have a hard time with actions that the user completes out of order.

17877y ago

"human convention of written language" is a bit much. Stroke order is almost as important as what the actual strokes are in the definition of a Chinese character, for example. Of course unless you literally watch someone write you observe the characters after they're written, but the most predictive latent mental representation of a character does include an order component. I know this because I made the mistake of memorizing many characters almost like bitmaps and have had to go back and learn how to reliably write/read hand written characters.

2 more replies

tincholio7y ago

In calligraphy there is the notion of 'ductus', or how and in which order the strokes are written. It has a significant effect on the end result, and for each script, there is arguably only one "correct" ductus. A similar concept can be applied to normal handwriting.

lozenge7y ago

I highly doubt people are using non standard stroke orders, unless they are very young or it isn't their first language. However, this scheme probably won't work for cursive, which is how people actually write.

1 more reply

taeric7y ago

Going back and fixing up is almost never as legible as getting it right the first time. Even for human readers.

If you truly want to write fast, you go with a shorthand system. I don't know many folks that have tried reading other's shorthands. It probably isn't as tough as you'd imagine, but most of those systems are more demanding on stroke order, not less. If only because the speed is gained by being very prescriptive.

coldtea7y ago

>The human convention of written language it to interpret the symbols after they have been completed, not during the act of writing them.

Not exactly. E.g. Japanese handwriting and the order of strokes etc (also in traditional caligraphy/penmanship)

1 more reply

taneq7y ago

I've seen people write letters in all manner of unexpected ways. If the resultant marks on the paper look enough like the intended letter, then it's readable by a human, and if it's readable by a human, it should be readable by a machine.

Not that I don't think "meet me halfway" type approaches (like the Graffiti system) aren't worth using, but in this case we're talking about recognizing writing (the artifact), not writing (the verb).

DEADBEEFC0FFEE7y ago

Interesting discussion, thank you.

I am reminded of the Graffiti handwriting notation used by Palm OS. That was single stroke, and devices came with a card depicting all the characters.

I was never able to become fluent.

https://en.m.wikipedia.org/wiki/Graffiti_(Palm_OS)

3 more replies

krick7y ago

Yeah, when I was learning Japanese it was a useful thought of how order of strokes actually matters and there should be always 1 way to write a letter, but, no -- everything I can recognize computer should recognize as well. No matter how fucked up, if I can guess it -- the program should guess it. That's being good in handwriting recognition. Everything else will be perceived as subpar by the enduser.

scotu7y ago

for many of the examples you gave, I think that could be solved through an autocomplete style correction that sure, it's not perfect, but it seems good enough for smartphone users: xample is not a word, so it's probably a typo, so it's probably example...

you could also keep multiple interpretation of a word pending (and a text search for all of them would take you there) and eventually ask the user to disambiguate if the user wants to. I assume this would be an acceptable solution for non dictionary words too...

1 more reply

unwind7y ago

I just want to point out that 90% accuracy is, from a user's point of view, awful handwriting recognition performance. It means you will be correcting on average about 10 words per paragraph!

Wait, what? Doesn't that imply that a paragraph needs to have 100 words in it, in order for 10 of them to be recognized wrong at 90% success rate? That seems super-long, anyway.

My stats are really rusty, perhaps that's just one of those unintuitive cases that confuse people like me.

snowwrestler7y ago

It's a (somewhat dated, probably) copyediting rule of thumb that a written paragraph has about 100-200 words in it. This would be in a writing style you might see in a novel or an essay. For online writing, perhaps more like 50-100. Even that might be long for the style of writing where each sentence is its own paragraph, supposedly for impact or whatever. Not sure you can really call it "paragraphs" when each one is only a sentence.

For reference, the above paragraph is 78 words long.

taeric7y ago

I don't really disagree, but I think you overstate it, to an extent. For most people, simple 99% accuracy of their input on their phone's system of capture is probably overstating it. There is a reason people have the clever footers "written on phone."

That is to say, people have a higher tolerance for things that are within expected norms of their environment. Ideally, we want no corrections. But, having to do them constantly for a time will quickly desensitize people to this. (And yes, this is currently just an assertion of mine, I don't have data backing it. Just some anecdotes.)

SmellyGeekBoy7y ago

I always saw the "Sent from my iPhone" footer as nothing more than advertising, and the ensuing "Sent from my x" as a small act of rebellion or tongue-in-cheek reference.

I hadn't considered that it was intended to act as a warning that the content might be more error-prone.

taeric7y ago

I've seen a few that were direct statements of more typos because of the device used. Probably did start and largely remain advertising, though.

pkaye7y ago

This is kind of interesting. I had a through about how to approach the handwriting recognition problem a few years back and surprisingly I though of this curvature based approach also. I never implemented it (too lazy to try...) but its cool to see how well something like that might work.

blattimwind7y ago

The linked demo is by far the most impressive thing I've seen all week. I wish a certain Microsoft chart editor was as easy and unfinicky to use as this demo from 1966 (52 years ago), and that's still one of the better editors out there.

taneq7y ago

Comparing this with the Graffiti system on my old (2000-ish) Palm Pilot, this is somewhat more reliable even on a first attempt than that was after I'd made a concerted effort to learn it. Very cool!

Edit: I think where the Afterword says "inputting text with a stylus is likely slower than touch typing", they're forgetting that we still don't have a really acceptable way of inputting text on mobile devices. Swype and its ilk are close, but still hamfisted at times.

symlock7y ago

I missed it the first time, but the article has linked source code (github.com/jackschaedler/handwriting-recognition) for all the D3.js demos that is worth a read.

interfixus7y ago

All this constant talk of AI and singularities and whatnot.

Reality check: Our machines do not yet accurately manage simple reading tasks.

watmough7y ago

I did something like this in Visual Basic and submitted it to PC PLUS in the UK, back in the early 90's.

It was (yay!) published as recognit.bas (VB) and I'd be really happy if someone still has a copy.

It recognized just numbers but the basis of operation was similar to the linked article.

EliasY7y ago

I wonder if it was possible to use Hinton's idea of local features (where a 3 is recognized as an E in a 180 rotation map and a W in 90 deg. rotation map) to make the recognition partially rotation invariant....

singularity20017y ago

so much time spent on manual feature engineering which could be implicitly picked up by RNNs.

j / k navigate · click thread line to collapse

37 comments

rayiner7y ago

pipio217y ago

>I think it’s worth asking why anyone in their right mind should care about mid-century handwriting recognition algorithms in 2016.

Lots of people care, specially in Asia(Chinese and Japanese). It is just that the problem is incredible hard.

We put 5 very smart people working for a year on that, and it was totally impossible meeting people's expectations, specially people like doctors taking notes fast(and ugly).

We thought that the market was in creating mindmaps or something instead as people could write slower and better.

But people write a double u and expect the computer to see an "m". With deep learning is possible but extremely flimsy.

snowwrestler7y ago

This a cool exploration of technology, and I don't want to take away from that.

> The program was efficient enough to run in real-time on a IBM System/360 computer, and robust enough to properly identify 90 percent of the symbols drawn by first-time users.

I also want to point out the difficulty and danger in interpreting strokes when doing handwriting recognition.

defgeneric7y ago

Arguably, an X drawn this way should NOT be recognized as an X--that's not how an X is spelled.

snowwrestler7y ago

The human convention of written language it to interpret the symbols after they have been completed, not during the act of writing them.

A computer that interprets the behavior of writing, rather than the final symbols, is going to violate user expectations at some point.

17877y ago

2 more replies

tincholio7y ago

lozenge7y ago

1 more reply

taeric7y ago

Going back and fixing up is almost never as legible as getting it right the first time. Even for human readers.

coldtea7y ago

>The human convention of written language it to interpret the symbols after they have been completed, not during the act of writing them.

Not exactly. E.g. Japanese handwriting and the order of strokes etc (also in traditional caligraphy/penmanship)

1 more reply

taneq7y ago

DEADBEEFC0FFEE7y ago

Interesting discussion, thank you.

I am reminded of the Graffiti handwriting notation used by Palm OS. That was single stroke, and devices came with a card depicting all the characters.

I was never able to become fluent.

https://en.m.wikipedia.org/wiki/Graffiti_(Palm_OS)

3 more replies

krick7y ago

scotu7y ago

1 more reply

unwind7y ago

I just want to point out that 90% accuracy is, from a user's point of view, awful handwriting recognition performance. It means you will be correcting on average about 10 words per paragraph!

Wait, what? Doesn't that imply that a paragraph needs to have 100 words in it, in order for 10 of them to be recognized wrong at 90% success rate? That seems super-long, anyway.

My stats are really rusty, perhaps that's just one of those unintuitive cases that confuse people like me.

snowwrestler7y ago

For reference, the above paragraph is 78 words long.

taeric7y ago

SmellyGeekBoy7y ago

I always saw the "Sent from my iPhone" footer as nothing more than advertising, and the ensuing "Sent from my x" as a small act of rebellion or tongue-in-cheek reference.

I hadn't considered that it was intended to act as a warning that the content might be more error-prone.

taeric7y ago

I've seen a few that were direct statements of more typos because of the device used. Probably did start and largely remain advertising, though.

pkaye7y ago

blattimwind7y ago

taneq7y ago

Comparing this with the Graffiti system on my old (2000-ish) Palm Pilot, this is somewhat more reliable even on a first attempt than that was after I'd made a concerted effort to learn it. Very cool!

symlock7y ago

I missed it the first time, but the article has linked source code (github.com/jackschaedler/handwriting-recognition) for all the D3.js demos that is worth a read.

interfixus7y ago

All this constant talk of AI and singularities and whatnot.

Reality check: Our machines do not yet accurately manage simple reading tasks.

watmough7y ago

I did something like this in Visual Basic and submitted it to PC PLUS in the UK, back in the early 90's.

It was (yay!) published as recognit.bas (VB) and I'd be really happy if someone still has a copy.

It recognized just numbers but the basis of operation was similar to the linked article.

EliasY7y ago

singularity20017y ago

so much time spent on manual feature engineering which could be implicitly picked up by RNNs.

j / k navigate · click thread line to collapse