Frog: OCR Tool for Linux (opens in new tab)

(tenderowl.com)

172 pointsEpitom33y ago46 comments

46 comments

Appears to be a nice wrapper around Tesseract:

https://github.com/tesseract-ocr/tessdata

https://en.wikipedia.org/wiki/Tesseract_(software)

The demo of course works perfectly on a Mac as this is already built into Ventura.

If you haven't experienced it yet ye olde ctrl-f now seamlessly sneaks a peak into images on the page for example, surprisingly useful.

  In November 2020, Brewster Kahle from the Internet Archive praised Tesseract saying:

  Tesseract has made a major step forward in the last few years. When we last evaluated the accuracy it was not as good as the proprietary OCR, but that has changed– we have done evaluations and it is just as good, and can get better for our application because of its new architecture.

Anybody have an up to date breakdown of available OCR solutions?

nickserv3y ago

There's also DocTR which can do text detection and extraction out of the box.

It's command line driven but can display the detected text as an overlay of the document.

https://github.com/mindee/doctr

Icko3y ago

Last I compared them, (1-2 years ago), Google OCR was much much better and supported more languages than tesseract. There was also an OCR in openCV, which was slightly better than tesseract, but not good enough to be useful.

rafram3y ago

I’m not aware of any separate OCR in OpenCV. Some builds include an interface to Tesseract, which might be what you’re thinking of. Tesseract certainly benefits from preprocessing (conversion to grayscale, posterization) with OpenCV.

Icko3y ago

There was "EAST OCR" detector, which is basically someone put a deep learning model in openCV somehow. https://www.folio3.ai/blog/text-detection-by-using-opencv-an...

lasagna_coder3y ago

What are these projects are you referring to? AFAIK Tesseract is sponsored by Google, from what I understand it is state of the art, ie it is Google OCR. Searching for OCR with OpenCV only reveals using OpenCV with Tesseract, not rolling its own OCR, OpenCV being used to preprocess images to optimise them for Tesseract. Maybe I'm missing something, so I'm interested if you can point me in the right direction.

spi3y ago

Google OCR is definitely not the same as Tesseract, although it's true that Tesseract is maintained by Google. Google OCR has definitely much higher accuracy and is significantly faster (basically always taking 1s for inference, while Tesseract can easily take 10s or more for dense pages).

Source: I work in developing a competing OCR service and we keep an eye on competition (e.g. aside from Google, solutions by Azure, Amazon, Abbyy, Nuance, Cloudmersive, etc., as well as our internal product of course, which is not available externally), and they are (almost) all significantly better on Tesseract.

The only domain where Tesseract is competitive is for perfect "black text on white paper", it gives pretty poor performance when dealing with colored, distorted text, or even strong page structure effects (tables, etc.).

When I say "pretty poor" I mean: "with respect to the state-of-the-art", of course it's still enormously better than what was the state-of-the-art before deep learning came into the picture, roughly a decade ago. And for things like "search contents of a book" it's basically perfect already.

4 more replies

holbue3y ago

I agree, there are way better cloud based and proprietary OCR solutions out there. But Tesseract still seems to deliver the best results among the FOSS tools, doesn't it?

Back in the days, Cuneiform got close to Tesseract's performance, but AFAIK it wasn't developed further...

Does anyone else know other promising open-source OCR engines?

captnswing3y ago

On a Mac, for ad-hoc OCR, I use the immensely useful CleanShot X https://cleanshot.com/ (which is well worth paying for).

Among many other things, it offes OCR of any region on the screen

for larger-scale OCR processing of pdfs and other files, I love how s3-ocr https://simonwillison.net/2022/Jun/30/s3-ocr/ makes working with AWS Textract OCR more accessible (though, somehow, Textract refuses to fully OCR larger pdfs I possess..)

kube-system3y ago

On the latest MacOS, OCR happens automatically in any screenshot, or any image you open in Preview.

Try Command+Shift+4, grab part of the screen, click the pop-up, and just select text.

ce43y ago

I use https://kebekus.gitlab.io/scantools for scanning, it builds on top of tesseract and works great for pdf enhancements

rjzzleep3y ago

You might be interested in https://github.com/ocrmypdf/OCRmyPDF then.

It does quite some preprocessing on the PDF pages before passing it on to tesseract.

angrygoat3y ago

I've found ocrmypdf to be excellent: the only issue I've had is with PDFs with differing page sizes; it seems to scale everything up to the size of the largest page, which can be a bit of a pain.

IceHegel3y ago

In 2019 I was working on a project that involved OCRing millions of scanned historical documents. I evaluated Google, Azure, Amazon, Adobe, ABBYY, and Tesseract somewhat rigorously.

Google's was by far the best, especially for obscured or malformed characters. Azure was second and I ended up merging the results from both.

For my use case (in Spring 2019) Tesseract was not very accurate and struggled with slanted text especially. Hopefully that has changed.

bjacobt3y ago

I’ve had good results from paddle ocr.

https://github.com/PaddlePaddle/PaddleOCR

jibbers3y ago

And for older Mac OS's T-Rex https://trex.ameba.co

twobitshifter3y ago

On windows this is built into powertoys. win+shift+T is the default shortcut.

https://learn.microsoft.com/en-us/windows/powertoys/text-ext...

ChuckNorris893y ago

Just tried I and it works pretty nice. Thanks for the tip.

mavu3y ago

I probably shouldn't but I dislike using libraries for the main part of your project and then not even mentioning that you are using them.

Seems dishonest to me, but maybe I'm just too strict.

mewse-hn3y ago

Yeah.. if I have to dig into your python code on github to figure out what library you're using for the main feature of your project (OCR in this case), I'm not impressed

rjzzleep3y ago

This looks like a nice app. I was looking for something like this a while back until I noticed that there are "one" liners that can you can setup for a hotkey:

    #!/usr/bin/env bash
    langs=(eng ara fas chi_sim chi_tra deu ell fin heb hun jpn kor nld rus tur)
    lang=$(printf '%s\n' "${langs[@]}" | dmenu "$@")
    maim -us | tesseract --dpi 145 -l eng+${lang} - - | xsel -bi

tmerse3y ago

Nice! Didn't know about maim. This looks better than what I currently use (found it somewhere on the internet).

  #!/bin/bash
  SRC_IMG=$(mktemp -u /tmp/ocr_XXXXXXXXX.png)
  scrot --select "$SRC_IMG" -q 100
  mogrify -modulate 100,0 -resize 400% "$SRC_IMG"
  tesseract "$SRC_IMG" "$SRC_IMG" &> /dev/null
  OCR_RESULT=$(cat "$SRC_IMG.txt")
  echo "$OCR_RESULT"
  notify-send "$OCR_RESULT"
  xsel -bi < "$SRC_IMG.txt"

rjzzleep3y ago

I mean don't scrot and maim do the same? It can write to stdout as well. Is the resize really worth it?

aidenn03y ago

Tesseract get's significanlty better results after resize; can probably get away with a 2x resize on a 4k monitor, but 4x is good for e.g. 1200p

tjoff3y ago

Nice, that is great! I adapted this to work for me on wayland (sway):

grim -g "$(slurp)" - | tesseract --dpi 145 -l eng+${lang} - - | wl-copy

Using grim to take a screenshot, slurp to mark a region on your screen and wl-copy to copy to clipboard.

ducktective3y ago

I wonder if it's possible to auto-detect the language. Meaning, instead of the priority list, it finds out the most probable language a script belongs to in the first sweep.

ever13373y ago

yeah, i have a script almost identical to this that i've been using with i3 for a long time

lervag3y ago

Cool! I've seen similar ideas before and made my own inspired by these some years ago. It's a simple bash script based on Flameshot [0] for taking the screenshot and Tesseract:

    #!/usr/bin/env bash

    rm -f /tmp/screen.png
    flameshot gui -p /tmp/screen.png

    tesseract \
      -c page_separator="" \
      -l "eng" \
      --dpi 145 \
      /tmp/screen.png /tmp/screen

    if [ "$(wc -l < /tmp/screen.txt)" -eq 0 ]; then
      notify-send "ocrmyscreen" "No text was detected!"
      exit 1
    fi

    xclip /tmp/screen.txt
    notify-send "ocrmyscreen" "$(cat /tmp/screen.txt)"

[0]: https://flameshot.org/

ensocode3y ago

This is a nice app, thanks. I am using a similar a bit less UI-heavy tool based on Tesseract as well. It's called Normcap: https://github.com/dynobo/normcap

xchip3y ago

Nice, but it should give credits to Tesseract and mention how much HD space it requires for the UI dependencies.

seltzered_3y ago

Oh nice. There hasn't been a good ocr screenshot tool with Wayland support yet so look forward to trying this. IIRC there's been..

Linux: dpScreenOCR - x11 only last I checked in and now Frog

MacOS: screenotate, prizmo

Windows: screenotate

I don't get all the nitpick comments. OCR tools like this are extremely useful when dealing with excerpting text from certain websites (slack) or taking class notes from video.

holbue3y ago

Here's a list with more tools like this, in case you are interested: https://github.com/dynobo/normcap#similar-open-source-tools

habibur3y ago

Uses tesseract OCR on the ocr part.

schappim3y ago

FYI if you’re on a Mac, I’ve made this similar tool: https://github.com/schappim/macOCR

noisediver3y ago

A useful tool and great UI work. A handy extension would be the ability to extract text of specific colour, e.g. the highlights in Kindle's Cloud Reader, to get around the 10% highlight export cap that Amazon puts on most books. I did this previously by running the screenshot through ImageMagick's colour filling and thresholding options before passing the output to Tesseract. A colour picker tool might be a nice addition.

throwawaaarrgh3y ago

....why is it named frog?

MeteorMarc3y ago

Frog already is a Nlp package in Debian

jalacang3y ago

j / k navigate · click thread line to collapse

46 comments

recuter3y ago

Appears to be a nice wrapper around Tesseract:

https://github.com/tesseract-ocr/tessdata

https://en.wikipedia.org/wiki/Tesseract_(software)

The demo of course works perfectly on a Mac as this is already built into Ventura.

If you haven't experienced it yet ye olde ctrl-f now seamlessly sneaks a peak into images on the page for example, surprisingly useful.

  In November 2020, Brewster Kahle from the Internet Archive praised Tesseract saying:

  Tesseract has made a major step forward in the last few years. When we last evaluated the accuracy it was not as good as the proprietary OCR, but that has changed– we have done evaluations and it is just as good, and can get better for our application because of its new architecture.

Anybody have an up to date breakdown of available OCR solutions?

nickserv3y ago

There's also DocTR which can do text detection and extraction out of the box.

It's command line driven but can display the detected text as an overlay of the document.

https://github.com/mindee/doctr

Icko3y ago

rafram3y ago

Icko3y ago

There was "EAST OCR" detector, which is basically someone put a deep learning model in openCV somehow. https://www.folio3.ai/blog/text-detection-by-using-opencv-an...

lasagna_coder3y ago

spi3y ago

4 more replies

holbue3y ago

I agree, there are way better cloud based and proprietary OCR solutions out there. But Tesseract still seems to deliver the best results among the FOSS tools, doesn't it?

Back in the days, Cuneiform got close to Tesseract's performance, but AFAIK it wasn't developed further...

Does anyone else know other promising open-source OCR engines?

captnswing3y ago

On a Mac, for ad-hoc OCR, I use the immensely useful CleanShot X https://cleanshot.com/ (which is well worth paying for).

Among many other things, it offes OCR of any region on the screen

kube-system3y ago

On the latest MacOS, OCR happens automatically in any screenshot, or any image you open in Preview.

Try Command+Shift+4, grab part of the screen, click the pop-up, and just select text.

ce43y ago

I use https://kebekus.gitlab.io/scantools for scanning, it builds on top of tesseract and works great for pdf enhancements

rjzzleep3y ago

You might be interested in https://github.com/ocrmypdf/OCRmyPDF then.

It does quite some preprocessing on the PDF pages before passing it on to tesseract.

angrygoat3y ago

I've found ocrmypdf to be excellent: the only issue I've had is with PDFs with differing page sizes; it seems to scale everything up to the size of the largest page, which can be a bit of a pain.

IceHegel3y ago

In 2019 I was working on a project that involved OCRing millions of scanned historical documents. I evaluated Google, Azure, Amazon, Adobe, ABBYY, and Tesseract somewhat rigorously.

Google's was by far the best, especially for obscured or malformed characters. Azure was second and I ended up merging the results from both.

For my use case (in Spring 2019) Tesseract was not very accurate and struggled with slanted text especially. Hopefully that has changed.

bjacobt3y ago

I’ve had good results from paddle ocr.

https://github.com/PaddlePaddle/PaddleOCR

jibbers3y ago

And for older Mac OS's T-Rex https://trex.ameba.co

twobitshifter3y ago

On windows this is built into powertoys. win+shift+T is the default shortcut.

https://learn.microsoft.com/en-us/windows/powertoys/text-ext...

ChuckNorris893y ago

Just tried I and it works pretty nice. Thanks for the tip.

mavu3y ago

I probably shouldn't but I dislike using libraries for the main part of your project and then not even mentioning that you are using them.

Seems dishonest to me, but maybe I'm just too strict.

mewse-hn3y ago

Yeah.. if I have to dig into your python code on github to figure out what library you're using for the main feature of your project (OCR in this case), I'm not impressed

rjzzleep3y ago

This looks like a nice app. I was looking for something like this a while back until I noticed that there are "one" liners that can you can setup for a hotkey:

    #!/usr/bin/env bash
    langs=(eng ara fas chi_sim chi_tra deu ell fin heb hun jpn kor nld rus tur)
    lang=$(printf '%s\n' "${langs[@]}" | dmenu "$@")
    maim -us | tesseract --dpi 145 -l eng+${lang} - - | xsel -bi

tmerse3y ago

Nice! Didn't know about maim. This looks better than what I currently use (found it somewhere on the internet).

  #!/bin/bash
  SRC_IMG=$(mktemp -u /tmp/ocr_XXXXXXXXX.png)
  scrot --select "$SRC_IMG" -q 100
  mogrify -modulate 100,0 -resize 400% "$SRC_IMG"
  tesseract "$SRC_IMG" "$SRC_IMG" &> /dev/null
  OCR_RESULT=$(cat "$SRC_IMG.txt")
  echo "$OCR_RESULT"
  notify-send "$OCR_RESULT"
  xsel -bi < "$SRC_IMG.txt"

rjzzleep3y ago

I mean don't scrot and maim do the same? It can write to stdout as well. Is the resize really worth it?

aidenn03y ago

Tesseract get's significanlty better results after resize; can probably get away with a 2x resize on a 4k monitor, but 4x is good for e.g. 1200p

tjoff3y ago

Nice, that is great! I adapted this to work for me on wayland (sway):

grim -g "$(slurp)" - | tesseract --dpi 145 -l eng+${lang} - - | wl-copy

Using grim to take a screenshot, slurp to mark a region on your screen and wl-copy to copy to clipboard.

ducktective3y ago

I wonder if it's possible to auto-detect the language. Meaning, instead of the priority list, it finds out the most probable language a script belongs to in the first sweep.

ever13373y ago

yeah, i have a script almost identical to this that i've been using with i3 for a long time

lervag3y ago

Cool! I've seen similar ideas before and made my own inspired by these some years ago. It's a simple bash script based on Flameshot [0] for taking the screenshot and Tesseract:

    #!/usr/bin/env bash

    rm -f /tmp/screen.png
    flameshot gui -p /tmp/screen.png

    tesseract \
      -c page_separator="" \
      -l "eng" \
      --dpi 145 \
      /tmp/screen.png /tmp/screen

    if [ "$(wc -l < /tmp/screen.txt)" -eq 0 ]; then
      notify-send "ocrmyscreen" "No text was detected!"
      exit 1
    fi

    xclip /tmp/screen.txt
    notify-send "ocrmyscreen" "$(cat /tmp/screen.txt)"

[0]: https://flameshot.org/

ensocode3y ago

This is a nice app, thanks. I am using a similar a bit less UI-heavy tool based on Tesseract as well. It's called Normcap: https://github.com/dynobo/normcap

xchip3y ago

Nice, but it should give credits to Tesseract and mention how much HD space it requires for the UI dependencies.

seltzered_3y ago

Oh nice. There hasn't been a good ocr screenshot tool with Wayland support yet so look forward to trying this. IIRC there's been..

Linux: dpScreenOCR - x11 only last I checked in and now Frog

MacOS: screenotate, prizmo

Windows: screenotate

I don't get all the nitpick comments. OCR tools like this are extremely useful when dealing with excerpting text from certain websites (slack) or taking class notes from video.

holbue3y ago

Here's a list with more tools like this, in case you are interested: https://github.com/dynobo/normcap#similar-open-source-tools

habibur3y ago

Uses tesseract OCR on the ocr part.

schappim3y ago

FYI if you’re on a Mac, I’ve made this similar tool: https://github.com/schappim/macOCR

noisediver3y ago

throwawaaarrgh3y ago

....why is it named frog?

MeteorMarc3y ago

Frog already is a Nlp package in Debian

jalacang3y ago

j / k navigate · click thread line to collapse