Real time numbers recognition (MNIST) on an iPhone with CoreML (opens in new tab)

(liip.ch)

137 pointsuberneo7y ago19 comments

19 comments

yeldarb7y ago

Neat walkthrough!

Last year I actually made an applied-CoreML app to solve sudoku puzzles where MNIST came in very handy.

I wrote about it here: https://blog.prototypr.io/behind-the-magic-how-we-built-the-...

nothis7y ago

>After I scanned a wide variety of puzzles from each book, my server had stored about 600,000 images

600,000?!? Even divided by 81 that's over 7000! How long did this take?

yeldarb7y ago

A couple of afternoons.

I just hacked into my app's flow to upload a "scan" of the isolated puzzle to my server instead of slicing it and sending the component images to CoreML.

Then I sat there and flipped through page after page of Sudoku puzzles and scanned them from a few different angles each, sliced them in bulk on the server, and voila: data!

dangero7y ago

Sorry I’m still confused. You took roughly 7000 pictures in two afternoons? What do you mean by sliced them in bulk? If you took them from different angles how do you slice them in bulk?

1 more reply

rahimnathwani7y ago

"Apple ... provides a ... helper library called coremltools that we can use to ... convert scikit-learn models, Keras and XGBoost models to CoreML"

Awesome.

a_c7y ago

As someone with not much experience in ML, how to handle when there is no number present or if a number is present?

ericjang7y ago

Great question! This is actually a surprisingly deep problem in ML, known as "anomaly detection" or "out-of-distribution" (OoD) detection.

Another way to formulate this question: "given training data that only tells you about digits, how do you know whether something is a digit or not?" Given that the training data never actually defines what isn't a digit, how can we ensure that the model actually sees a digit at test time? If we cannot ensure this (e.g. an adversary or the real world supplies inputs), how can we "filter out" bad inputs?

A quick hack solution that works well in practice is to examine the "predictive distribution" across digit classes. Researchers have empirically found that entropy tends to be higher (i.e. more smooth) when the model sees an OoD input. However, the OoD problem is not fully solved.

Here's a nice survey paper on the topic: https://arxiv.org/abs/1809.04729

Note that methods that tie OoD to the task at hand (classification) are not actually solving OoD, they are solving "predictive uncertainty" of the task.

jononor7y ago

You mean to get either 0-9 or 'no number'? Here are two approaches:

1) Integrated. Represent 'no number' as class number 11 in the original model. Retrain it with this additional class (needs additional training data).

2) Cascading. Train a dedicated model for 'number' versus 'no number' (binary classifier), and use that in front of the original model.

Note that the MNIST data comes already extracted from original image, centered in fixed-size images of 28x28 pixels. In a practical ML application these steps would also need to be done before classification can be performed.

jononor7y ago

In the work shown in the article, the segmentation and centering of digits looks to be done by the user holding the camera. Which can be workable for some applications!

lozenge7y ago

The predictions variable has a confidence value for each digit. You can put a cutoff and say if none is above a certain confidence, assume there's no number at all.

jefft2557y ago

This could work, but it is important to note that a lot of ML algorithms trained in a closed domain (no "other" class) will be pretty bad at knowing what they don't know. This is an open problem in ML.

jononor7y ago

Choosing the threshold will be hard. And (as mentioned by other poster) the model is unlikely to generalize well to classes of data it has not seen. I suspect that this approach will get things similar to numbers wrong quite often, like handwritten characters (a,b,c). Including these into the training set is much more likely to yield a model which will successfully discriminate it.

gunzor7y ago

You can use threshold value to detect whether there is no number. If the prediction accuracy is below this threshold value you can say it as no number

zackmorris7y ago

The scrollbar distance confirms a suspicion that I've held for some time: that writing a machine learning algorithm is of similar complexity to developing an iOS app in Xcode!

saagarjha7y ago

What scrollbar distance are you talking about?

zackmorris7y ago

It was a joke - the Xcode section starts about halfway down the page. I was just illustrating that the friction we deal with today is of comparable complexity to what might be thought of as advanced programming (AI, VR, AR, physics, etc etc).

j / k navigate · click thread line to collapse

19 comments

yeldarb7y ago

Neat walkthrough!

Last year I actually made an applied-CoreML app to solve sudoku puzzles where MNIST came in very handy.

I wrote about it here: https://blog.prototypr.io/behind-the-magic-how-we-built-the-...

nothis7y ago

>After I scanned a wide variety of puzzles from each book, my server had stored about 600,000 images

600,000?!? Even divided by 81 that's over 7000! How long did this take?

yeldarb7y ago

A couple of afternoons.

I just hacked into my app's flow to upload a "scan" of the isolated puzzle to my server instead of slicing it and sending the component images to CoreML.

Then I sat there and flipped through page after page of Sudoku puzzles and scanned them from a few different angles each, sliced them in bulk on the server, and voila: data!

dangero7y ago

Sorry I’m still confused. You took roughly 7000 pictures in two afternoons? What do you mean by sliced them in bulk? If you took them from different angles how do you slice them in bulk?

1 more reply

rahimnathwani7y ago

"Apple ... provides a ... helper library called coremltools that we can use to ... convert scikit-learn models, Keras and XGBoost models to CoreML"

Awesome.

a_c7y ago

As someone with not much experience in ML, how to handle when there is no number present or if a number is present?

ericjang7y ago

Great question! This is actually a surprisingly deep problem in ML, known as "anomaly detection" or "out-of-distribution" (OoD) detection.

Here's a nice survey paper on the topic: https://arxiv.org/abs/1809.04729

Note that methods that tie OoD to the task at hand (classification) are not actually solving OoD, they are solving "predictive uncertainty" of the task.

jononor7y ago

You mean to get either 0-9 or 'no number'? Here are two approaches:

1) Integrated. Represent 'no number' as class number 11 in the original model. Retrain it with this additional class (needs additional training data).

2) Cascading. Train a dedicated model for 'number' versus 'no number' (binary classifier), and use that in front of the original model.

jononor7y ago

In the work shown in the article, the segmentation and centering of digits looks to be done by the user holding the camera. Which can be workable for some applications!

lozenge7y ago

The predictions variable has a confidence value for each digit. You can put a cutoff and say if none is above a certain confidence, assume there's no number at all.

jefft2557y ago

jononor7y ago

gunzor7y ago

You can use threshold value to detect whether there is no number. If the prediction accuracy is below this threshold value you can say it as no number

zackmorris7y ago

The scrollbar distance confirms a suspicion that I've held for some time: that writing a machine learning algorithm is of similar complexity to developing an iOS app in Xcode!

saagarjha7y ago

What scrollbar distance are you talking about?

zackmorris7y ago

j / k navigate · click thread line to collapse