https://github.com/tesseract-ocr/tessdata
https://en.wikipedia.org/wiki/Tesseract_(software)
The demo of course works perfectly on a Mac as this is already built into Ventura.
If you haven't experienced it yet ye olde ctrl-f now seamlessly sneaks a peak into images on the page for example, surprisingly useful.
In November 2020, Brewster Kahle from the Internet Archive praised Tesseract saying:
Tesseract has made a major step forward in the last few years. When we last evaluated the accuracy it was not as good as the proprietary OCR, but that has changed– we have done evaluations and it is just as good, and can get better for our application because of its new architecture.
Anybody have an up to date breakdown of available OCR solutions?It's command line driven but can display the detected text as an overlay of the document.
Back in the days, Cuneiform got close to Tesseract's performance, but AFAIK it wasn't developed further...
Does anyone else know other promising open-source OCR engines?
Among many other things, it offes OCR of any region on the screen
for larger-scale OCR processing of pdfs and other files, I love how s3-ocr https://simonwillison.net/2022/Jun/30/s3-ocr/ makes working with AWS Textract OCR more accessible (though, somehow, Textract refuses to fully OCR larger pdfs I possess..)
Try Command+Shift+4, grab part of the screen, click the pop-up, and just select text.
It does quite some preprocessing on the PDF pages before passing it on to tesseract.
Google's was by far the best, especially for obscured or malformed characters. Azure was second and I ended up merging the results from both.
For my use case (in Spring 2019) Tesseract was not very accurate and struggled with slanted text especially. Hopefully that has changed.
https://learn.microsoft.com/en-us/windows/powertoys/text-ext...
Seems dishonest to me, but maybe I'm just too strict.
#!/usr/bin/env bash
langs=(eng ara fas chi_sim chi_tra deu ell fin heb hun jpn kor nld rus tur)
lang=$(printf '%s\n' "${langs[@]}" | dmenu "$@")
maim -us | tesseract --dpi 145 -l eng+${lang} - - | xsel -bi #!/bin/bash
SRC_IMG=$(mktemp -u /tmp/ocr_XXXXXXXXX.png)
scrot --select "$SRC_IMG" -q 100
mogrify -modulate 100,0 -resize 400% "$SRC_IMG"
tesseract "$SRC_IMG" "$SRC_IMG" &> /dev/null
OCR_RESULT=$(cat "$SRC_IMG.txt")
echo "$OCR_RESULT"
notify-send "$OCR_RESULT"
xsel -bi < "$SRC_IMG.txt"grim -g "$(slurp)" - | tesseract --dpi 145 -l eng+${lang} - - | wl-copy
Using grim to take a screenshot, slurp to mark a region on your screen and wl-copy to copy to clipboard.
#!/usr/bin/env bash
rm -f /tmp/screen.png
flameshot gui -p /tmp/screen.png
tesseract \
-c page_separator="" \
-l "eng" \
--dpi 145 \
/tmp/screen.png /tmp/screen
if [ "$(wc -l < /tmp/screen.txt)" -eq 0 ]; then
notify-send "ocrmyscreen" "No text was detected!"
exit 1
fi
xclip /tmp/screen.txt
notify-send "ocrmyscreen" "$(cat /tmp/screen.txt)"
[0]: https://flameshot.org/Linux: dpScreenOCR - x11 only last I checked in and now Frog
MacOS: screenotate, prizmo
Windows: screenotate
I don't get all the nitpick comments. OCR tools like this are extremely useful when dealing with excerpting text from certain websites (slack) or taking class notes from video.