Page Dewarping (opens in new tab)

(mzucker.github.io)

743 pointspietrofmaggi9y ago70 comments

70 comments

This strikes close to home for me, because I've done this "by hand" quite a number of times, using some interactive Gimp filters, with very good results. I've been able to take a perspective photo of a curved page lying on a desk, get it almost perfectly straight, and threshold it to clean black and white or sharp gray scale.

Here is my very quick and dirty manual job of the same example page:

http://www.kylheku.com/~kaz/dewarp.png

Literally less than five minutes.

First I cropped the image. Then duplicated the layer. Blurred the top layer (Gaussian, 50 radius). Then flipped to Divide mode and merged the visible layers. This leveled the lightness quite well, almost completely eliminating the shadow over he right side of the page and all other lighting differences. There is a hint of the edge of the shadow still present because it is such a sharp contrast; but that can be eliminated in an adjustment of the intensity curves. In such cases it may be helpful to experiment with smaller blur radii, too.

I then did a perspective transform in the lateral direction, squeezing the left side top-bottom and expanding the right, resulting in the warp now being approximately horizontal. (The perspective transform is not just for adding a perspective effect; it is also useful reversing perspective!)

Finally, I used the Curve Bend (with its horrible interactive interface and awful preview) to warp in a compensating way. Basically, the idea is to draw an upper and lower curve which is the opposite of the curve on the page. I made two attempts, keeping the results of the second.

If the preview of this tool wasn't a ridiculous, inscrutable thumbnail, it would be possible to do an excellent job in one attempt, probably close to perfect.

Because the page is evenly light thanks to the divide-by-blurred layer trick, it will nicely threshold to black and white, or a narrow grayscale range.

mzucker9y ago

Hence the tagline on the blog: "Why do it by hand if you can code it in just quadruple the time?" :)

digi_owl9y ago

More work up front, less work afterwards.

1 more reply

jc4p9y ago

I really enjoyed the various random thoughts in the wrapping up section of this post, thank you a good read!

kctess59y ago

Divide by blur is very crafty. Will have to give that a shot next time I have a similar need in Photoshop.

drcode9y ago

As a comic artist working with scanned images, the "divide by blur" trick is going to save me significant effort in the future.

THIS is exactly the sort of insight I wish I would get when I click one of those "one weird trick" links!

1 more reply

thibauts9y ago

Reminds me of the "frequency separation" [1] technique, used a lot in the high end retouching business.

[1] https://fstoppers.com/post-production/ultimate-guide-frequen...

thegeomaster9y ago

Echoing what the other comment said, blurring and setting as Divide has never occurred to me. I just tried it and it does a very good job at leveling. Nice one.

kazinator9y ago

The radius of 50 was too large. Reason being: I'm used to working with much higher resolution scans, and without high contrast shadows across them. A blur radius of around 15 does a good job of eliminating the shadow edge in this image after the divide. What we're basically doing is a high pass filter. The radius of 50 is too low a frequency; it lets through the shadow edge noticeably.

By the way, here is a useful trick: blurred layer in Divide mode, plus vary the opacity. You can reduce the amount of contrast between light and dark areas in an image without completely leveling it. Works well with low opacity levels (just a slight blend of the Divide mode layer). If you have a picture with areas that are too dark and too bright, a touch of this can help. High radii tend to work well, with blends of up to 20% or so.

For me, this does what I want Retinex to do! Only better, with intuitive control parameters and predictable results.

1 more reply

cooper129y ago

I love well-illustrated writeups. Even a reader without mathematics or programming knowledge can understand what steps the author took. His model actually seems to better represent the warped paper than a cylinder would. (though I don't know the actual specifics of the CTM model)

I wish he went into more details on the steps taken after dewarping. You can tweak the image levels to get good contrast, but surprisingly there aren't any shadows from underleveling or loss of detail from overleveling. I wonder if the author ran OCR on the scans after, and speaking of OCR, IIRC Leptonica is one of the dependencies of Tesseract so it must do some similar pre-processing.

Edit: reading more carefully, he mentions that he used adaptive thresholding from OpenCV.

niftich9y ago

Recently, Dropbox wrote about dewarping prior to OCR in their app: https://news.ycombinator.com/item?id=12297944

This code had the same idea, and is open-source!

marcusjt9y ago

No, what this code does is much more sophisticated - Dropbox do not dewarp (i.e. remove non-linear distortions) they only transform the image to make the document rectilinear (leaving the distortions/deformations intact), which is much simpler.

vidarh9y ago

Dewarping images for OCR isn't a new idea at all - while older systems used to do simple de-skew (tilting the image) for speed reasons, the need for warping the image to improve results has been known for a very long time, and most production quality OCR engines has done something like it for a long time.

(this isn't to slight the article - it's a great, well written presentation on how to implement it)

troymc9y ago

I use Microsoft's "Office Lens" app on my Android phone all the time, like a "smart camera" which automatically squares off and white-balances each photo of a page (usually mail or forms filled in by hand). It can't handle warped pages though, so I hope they add something like this!

claar9y ago

Yes, Office Lens is great for that. I recently switched to CamScanner, which has an even better algorithm for deskewing pages (take a pic of a receipt at an odd angle, it automatically transforms it to flat and rectangular). Pro version was on sale -- I'm not affiliated, just a happy customer and highly related to this article.

reitanqild9y ago

I have it as well, but use Office Lens or Google Drive (create a shortcut to a folder on home screen of android device and use it to scan into it).

The reason is CamScanner tried to upsell me to some monthly plan.

For all good advice on HN that you should build recurring revenue: It seriously annoys me when people tries to do that by demanding monthly payments for static features.

(Totally OK with selling license keys for new features etc.)

1 more reply

peterjmag9y ago

it came in handy whenever a student emailed me their homework as a pile of JPEGs.

Gotta admire this guy's resourcefulness—and patience. If I were a professor, I'd probably just reject the assignment outright if a student sent me a bunch of photos from their smartphone in lieu of a PDF or a "proper" scan. :)

Freak_NL9y ago

Or compromise; write up an awesome blog post about how you wrote software to turn their photos into proper scans, and reject their homework (with a link to the blog post).

vidarh9y ago

Reject it, in the form of a picture of a handwritten note on a print-out of his jpegs.

1 more reply

mzucker9y ago

Thanks for the feedback, everyone -- happy to answer questions here or in the Disqus comments on my blog.

tomn9y ago

This is really interesting, thanks!

If you decide to try to make this faster, check out ceres[0] a non-linear least squares optimisation framework that does automatic differentiation using a clever C++ template hack.

I've used it a few times to solve these kind of problems and found it to be very good!

[0] http://ceres-solver.org/

mzucker9y ago

Yep, I'm still waiting to use ceres for something - I didn't end up using it on my image approximation project https://mzucker.github.io/2016/08/01/gabor-2.html because it doesn't work well with inequality constraints.

1 more reply

owernonvl9y ago

I really enjoyed the article!

By the way, you mentioned using word boundaries in regexes to replace variable names. GNU Emacs regexes can actually include "symbol boundaries" (which are a little better for variable names than word boundaries), represented as "\_<symbol\_>". Personally, I like using the "highlight-symbol" package, which provides the "highlight-symbol-query-replace" command to basically execute M-% for the symbol at point.

mzucker9y ago

Cool - will check it out!

anilgulecha9y ago

Here's something that I think has not been done, but could be quite lucrative, building a high resolution scanner using the phone camera, multiple pictures and interpolation/noise removal.

Most phone cameras these days have good resolutions, and you could technically take a 6x4 photo, divvy it to 3x3 grid and take close up photos, and have smart algorithms interpolate the pixels to form a single image with high res. I'd even bet you'd results equal to or better than a flat bed scanner.

For better us, just open the camera preview and slowly pan over the image.

Has someone tried something like this? With FOSS apps like mosaic, hdr tools and imagemagick, it should be possible. I'm guessing opencv would be needed for interpolation and noise removal..

brownbat9y ago

I've used some free apps that turn phones into document scanners. Almost all banks I use have something like that embedded for check deposits. Maybe they're just doing deskew rather than dewarping though... loose documents aren't usually warped like book pages.

The next hard problem would be help with DIY book scanning. Like the camera could sit over my shoulder and detect when I've turned a page, then automatically take a picture of the new page. Then OCR kicks in, and conversion to EPUB, preserving graphics when pages have non-text elements. Mostly just feel like we can probably do better than the massive contraptions over at www.diybookscanner.org

Syzygies9y ago

This problem is fundamentally different with stereo images; there is a hope of reconstructing the exact 3D geometry of the page before flattening, rather than inferring from content. An iPhone app that did this would do well.

radarsat19y ago

I've always thought it should be possible using a short video of the camera orbiting the page from different angles. (Assuming a static scene of course.)

manmal9y ago

Or now, some frames of a live photo.

mzucker9y ago

That sounds like a good idea and pretty straightforward to implement -- now I almost wanna try it...

Ciantic9y ago

I wonder if someone has tried to dewarp whole book from a video when flipping through it. I imagine that could be handy way to copy the whole book.

mortenjorck9y ago

Page warp becomes but a tiny part of the problem with a page-flip video – now you have motion blur, varying exposure, resolution, noise, and depth of field to worry about. Maybe if you had a camera capable of recording extremely high frame rates at high resolution (probably north of 120fps @ 1440p), a high-intensity, diffuse light source, and a lens with a tight aperture / narrow focal length, you’d at least have something to work with.

voxic119y ago

Doesn't that describe most modern cell phone cameras?

1 more reply

gamman9y ago

Maybe close enough: http://spectrum.ieee.org/automaton/robotics/robotics-softwar...

jgable9y ago

This is great! My wife is a music teacher and often scans sheet music so that it's more portable. She has been asking me for a while for something exactly like this. I'll have to tweak it to work on sheet music, since I imagine his methods to identify lines of text won't work for the music staff out of the box.

orivej9y ago

ScanTailor [1] also implements automatic dewarping and is overall great for scan postprocessing.

[1] http://scantailor.org/

dalke9y ago

I used it for about 1000 pages I photographed with a DSLR. The automatic dewarping rarely worked for me, so I ended up doing it manually.

It likely didn't help that many of the pages were typed carbon copies or hand-written. For the former I had to put another piece of paper behind the page to prevent the next page from bleeding through.

That said, I managed to get the job done, though it took a couple of weeks. Next is to capture the metadata (author, date written, ...). There were a lot of one page letters in the documents I copied.

Would love to try out another tool that could read from the ScanTailor project file to get the page segmentation or even the warping I did manually to improve on the result.

jff9y ago

He mentions adding line detection for tables, which should be a good start for sheet music.

mzucker9y ago

Yep, might just work out of the box -- give it a shot!

kazinator9y ago

If she literally scans it with a scanner, why would it need dewarping? This is for when you use a digital camera on pages not lying flat.

If you have the option go for scanning.

Scans still benefit from cleanup, such as rotation and adjusting the levels (making everything near white perfectly white) and possibly manual cleanup (removal of specks of dust or the odd hair that was on the scanner).

jgable9y ago

She scans it in a scanner, but the music is often in large books that don't lay completely flat, so you still see the warping near the binding.

3 more replies

buovjaga9y ago

You could also try ScanTailor: http://scantailor.org/

jgable9y ago

Thanks for the suggestion! I'll check it out.

amelius9y ago

The next step would be to "depixelate" the resulting image. How could this be done? I guess OCR would not work because of the variation of the fonts (you don't want the document to end up in a single font; you want to keep the fonts). Could a deep learning approach work here, even if it has not been trained on all the specific fonts?

vidarh9y ago

Plenty of engines will do OCR and use the shapes recognised with high certainty to affect how they detect the rest.

There are many ways of doing this, and you can achieve some results even without knowing if your image is text, but just has lots of self-similarity by virtually sliding a "grid" over the image, slicing it up into n-by-n squares, running any of a number of nearest-neighbour variants over it, and then for each cluster replace all instances of the squares in the cluster by the one which minimise the overall error rate vs the others.

This will work reasonably well for very structured images such as text, as long as enough characters are near correct, and will retain custom fonts etc. but clean them up quite a bit as long as they either are different enough, or occur often enough on a page to not get "corrected".

I'm sure there are better ways of doing this too - it's been a decade since I kept up with OCR research.

renlo9y ago

How much of an effect does the camera lens make in page warping? Correct me if I'm wrong, but for shorter focal length lenses I would think it would warp the page more. If a person accounted for that, could they get a near perfect result? Or does his algorithm account for that? It seems that one would need to know where the center of the image would be.

mzucker9y ago

Pretty sure focal length would affect things as you say, but also physical dimensions matter too. My program assumes fixed focal length and then picks the page dimensions that work with the assumed focal length -- almost certainly not the correct ones.

petters9y ago

"You can see these are not exactly small optimization problems. The smallest one has 89 parameters in the model, and the largest has 600."

Those are small optmization problems. These types of problems are solved in computer vision for hundreds of thousands of variables. His problem can be solved in real-time, not tens of seconds.

mnw21cam9y ago

Compared to weather prediction, which is an optimisation problem involving hundreds of millions of variables, and occupies a hefty supercomputing cluster for a few hours.

paul_milovanov9y ago

Consider mentioning Dan Bloomberg as the author of the original work as well as Leptonica. :)

voltagex_9y ago

Thank you for the gif near the start - it really helped me to understand what was going on.

saynsedit9y ago

Is using a curve whose end points are fixed to zero to model the warping accurate? I can't see a rationale for why the end points should both be 0.

mzucker9y ago

The main rationale is removing redundant degrees of freedom -- since the page is allowed to rotate freely, the edges of the page can still move around plenty.

saynsedit9y ago

I see, so basically assume the two end points are at zero and there is some rotation accounting for the endpoint offset in real space.

It still doesn't seem fully accurate as I can imagine a non-rotated cubic curve with endpoints at an offset, but I assume your simplification works well enough.

artursapek9y ago

I've always been interested in getting into graphics programming, and stuff like this only makes me more interested. Really well written post.

nullcipher9y ago

Wrap this in a service and you have a startup!

anowell9y ago

This is solid. I'm an engineer at Algorithmia, and this caught our attention as the sort of project we love to host as a service on our algorithm marketplace. We've already made note of it for our team to consider adding (thanks to the generous MIT license), but I wanted to reach out in case you'd rather add, own, and optionally monetize it on our platform yourself. Either way, this was a great read with impressive results.

1 more reply

j / k navigate · click thread line to collapse

70 comments

kazinator9y ago

Here is my very quick and dirty manual job of the same example page:

http://www.kylheku.com/~kaz/dewarp.png

Literally less than five minutes.

If the preview of this tool wasn't a ridiculous, inscrutable thumbnail, it would be possible to do an excellent job in one attempt, probably close to perfect.

Because the page is evenly light thanks to the divide-by-blurred layer trick, it will nicely threshold to black and white, or a narrow grayscale range.

mzucker9y ago

Hence the tagline on the blog: "Why do it by hand if you can code it in just quadruple the time?" :)

digi_owl9y ago

More work up front, less work afterwards.

1 more reply

jc4p9y ago

I really enjoyed the various random thoughts in the wrapping up section of this post, thank you a good read!

kctess59y ago

Divide by blur is very crafty. Will have to give that a shot next time I have a similar need in Photoshop.

drcode9y ago

As a comic artist working with scanned images, the "divide by blur" trick is going to save me significant effort in the future.

THIS is exactly the sort of insight I wish I would get when I click one of those "one weird trick" links!

1 more reply

thibauts9y ago

Reminds me of the "frequency separation" [1] technique, used a lot in the high end retouching business.

[1] https://fstoppers.com/post-production/ultimate-guide-frequen...

thegeomaster9y ago

Echoing what the other comment said, blurring and setting as Divide has never occurred to me. I just tried it and it does a very good job at leveling. Nice one.

kazinator9y ago

For me, this does what I want Retinex to do! Only better, with intuitive control parameters and predictable results.

1 more reply

cooper129y ago

Edit: reading more carefully, he mentions that he used adaptive thresholding from OpenCV.

niftich9y ago

Recently, Dropbox wrote about dewarping prior to OCR in their app: https://news.ycombinator.com/item?id=12297944

This code had the same idea, and is open-source!

marcusjt9y ago

vidarh9y ago

(this isn't to slight the article - it's a great, well written presentation on how to implement it)

troymc9y ago

claar9y ago

reitanqild9y ago

I have it as well, but use Office Lens or Google Drive (create a shortcut to a folder on home screen of android device and use it to scan into it).

The reason is CamScanner tried to upsell me to some monthly plan.

For all good advice on HN that you should build recurring revenue: It seriously annoys me when people tries to do that by demanding monthly payments for static features.

(Totally OK with selling license keys for new features etc.)

1 more reply

peterjmag9y ago

it came in handy whenever a student emailed me their homework as a pile of JPEGs.

Freak_NL9y ago

Or compromise; write up an awesome blog post about how you wrote software to turn their photos into proper scans, and reject their homework (with a link to the blog post).

vidarh9y ago

Reject it, in the form of a picture of a handwritten note on a print-out of his jpegs.

1 more reply

mzucker9y ago

Thanks for the feedback, everyone -- happy to answer questions here or in the Disqus comments on my blog.

tomn9y ago

This is really interesting, thanks!

If you decide to try to make this faster, check out ceres[0] a non-linear least squares optimisation framework that does automatic differentiation using a clever C++ template hack.

I've used it a few times to solve these kind of problems and found it to be very good!

[0] http://ceres-solver.org/

mzucker9y ago

1 more reply

owernonvl9y ago

I really enjoyed the article!

mzucker9y ago

Cool - will check it out!

anilgulecha9y ago

Here's something that I think has not been done, but could be quite lucrative, building a high resolution scanner using the phone camera, multiple pictures and interpolation/noise removal.

For better us, just open the camera preview and slowly pan over the image.

Has someone tried something like this? With FOSS apps like mosaic, hdr tools and imagemagick, it should be possible. I'm guessing opencv would be needed for interpolation and noise removal..

brownbat9y ago

Syzygies9y ago

radarsat19y ago

I've always thought it should be possible using a short video of the camera orbiting the page from different angles. (Assuming a static scene of course.)

manmal9y ago

Or now, some frames of a live photo.

mzucker9y ago

That sounds like a good idea and pretty straightforward to implement -- now I almost wanna try it...

Ciantic9y ago

I wonder if someone has tried to dewarp whole book from a video when flipping through it. I imagine that could be handy way to copy the whole book.

mortenjorck9y ago

voxic119y ago

Doesn't that describe most modern cell phone cameras?

1 more reply

gamman9y ago

Maybe close enough: http://spectrum.ieee.org/automaton/robotics/robotics-softwar...

jgable9y ago

orivej9y ago

ScanTailor [1] also implements automatic dewarping and is overall great for scan postprocessing.

[1] http://scantailor.org/

dalke9y ago

I used it for about 1000 pages I photographed with a DSLR. The automatic dewarping rarely worked for me, so I ended up doing it manually.

Would love to try out another tool that could read from the ScanTailor project file to get the page segmentation or even the warping I did manually to improve on the result.

jff9y ago

He mentions adding line detection for tables, which should be a good start for sheet music.

mzucker9y ago

Yep, might just work out of the box -- give it a shot!

kazinator9y ago

If she literally scans it with a scanner, why would it need dewarping? This is for when you use a digital camera on pages not lying flat.

If you have the option go for scanning.

jgable9y ago

She scans it in a scanner, but the music is often in large books that don't lay completely flat, so you still see the warping near the binding.

3 more replies

buovjaga9y ago

You could also try ScanTailor: http://scantailor.org/

jgable9y ago

Thanks for the suggestion! I'll check it out.

amelius9y ago

vidarh9y ago

Plenty of engines will do OCR and use the shapes recognised with high certainty to affect how they detect the rest.

I'm sure there are better ways of doing this too - it's been a decade since I kept up with OCR research.

renlo9y ago

mzucker9y ago

petters9y ago

"You can see these are not exactly small optimization problems. The smallest one has 89 parameters in the model, and the largest has 600."

Those are small optmization problems. These types of problems are solved in computer vision for hundreds of thousands of variables. His problem can be solved in real-time, not tens of seconds.

mnw21cam9y ago

Compared to weather prediction, which is an optimisation problem involving hundreds of millions of variables, and occupies a hefty supercomputing cluster for a few hours.

paul_milovanov9y ago

Consider mentioning Dan Bloomberg as the author of the original work as well as Leptonica. :)

voltagex_9y ago

Thank you for the gif near the start - it really helped me to understand what was going on.

saynsedit9y ago

Is using a curve whose end points are fixed to zero to model the warping accurate? I can't see a rationale for why the end points should both be 0.

mzucker9y ago

The main rationale is removing redundant degrees of freedom -- since the page is allowed to rotate freely, the edges of the page can still move around plenty.

saynsedit9y ago

I see, so basically assume the two end points are at zero and there is some rotation accounting for the endpoint offset in real space.

It still doesn't seem fully accurate as I can imagine a non-rotated cubic curve with endpoints at an offset, but I assume your simplification works well enough.

artursapek9y ago

I've always been interested in getting into graphics programming, and stuff like this only makes me more interested. Really well written post.

nullcipher9y ago

Wrap this in a service and you have a startup!

anowell9y ago

1 more reply

j / k navigate · click thread line to collapse