undefined | Better HN

0 pointsrjzzleep3y ago0 comments

You might be interested in https://github.com/ocrmypdf/OCRmyPDF then.

It does quite some preprocessing on the PDF pages before passing it on to tesseract.

0 comments

I've found ocrmypdf to be excellent: the only issue I've had is with PDFs with differing page sizes; it seems to scale everything up to the size of the largest page, which can be a bit of a pain.

j / k navigate · click thread line to collapse

0 pointsrjzzleep3y ago0 comments

You might be interested in https://github.com/ocrmypdf/OCRmyPDF then.

It does quite some preprocessing on the PDF pages before passing it on to tesseract.

0 comments

angrygoat3y ago

I've found ocrmypdf to be excellent: the only issue I've had is with PDFs with differing page sizes; it seems to scale everything up to the size of the largest page, which can be a bit of a pain.

j / k navigate · click thread line to collapse