NormCap: OCR powered screen-capture tool (opens in new tab)

(github.com)

72 pointskitschyred3y ago24 comments

24 comments

I created a basic similar app for my own use because I wanted to have an idea what people are conversing in russian in online lobbies. WPF, Tesseract OCR and Microsoft's translation API.

https://streamable.com/ykng5u

A fun side project that I do end up using a bit. Gonna bind the capture to some hotkey so I can use it without changing app focus. Most annoying problem though is that Tesseract OCR often gets confused when you make it read combined latin+cyrillic letters and the font isn't something Tesseract prefers. Especially when there's something behind the text. Kind of disappointed that the most popular API often has a lot worse results than a human would just transcribing the letters.

Wouldn't be surprised if OCR software would leap soon due to a product similar to Whisper.

Comes to mind that the best possible app that does this would be kind of like the old "word lens" iPhone application but on all screens, meaning it would replace text from the raw screen input with text of another language, while keeping the appearance/color/scale/rotation of the original text. This would free it from needing to be built-in to whatever UI library is producing the text, and would work on recorded video too. Immediate latency/performance problems come to mind though but could be a fun thing to try.

sitkack3y ago

That is a super neat.

Visual Universal Translator.

bjoli3y ago

I had my mind blown by the same functionality in Android. Being able to select text from an app in the app-switching context is amazing.

I will definitely use this.

bigmattystyles3y ago

On Windows with the new PowerToys, win+shift+t does the same thing - works really well!

https://learn.microsoft.com/en-us/windows/powertoys/text-ext...

stavros3y ago

How do you do this on Android?

RobinL3y ago

You can also do it with Google lens on any android, unless I'm misunderstanding the functionality

tripdout3y ago

It's only for Pixel phones.

1 more reply

quelltext3y ago

That doesn't use OCR.

lazycouchpotato3y ago

There are two options for text selection.

Overview selection is Pixel-exclusive and that is definitely OCR since it can detect text in images. It's not perfect however, and it doesn't seem to support non-Latin script.

The other way is to use an app like Copy [1] which analyzes app layouts.

[1] https://play.google.com/store/apps/details?id=com.weberdo.ap...

bjoli3y ago

It can select text from images. It uses OCR.

yigitkonur353y ago

It is very nice to have free alternatives to this kind of software. I wanted to ask because I already use terassect, the most basic feature I want in this type of software is that I want to be able to edit the text on the image I screenshot in order to use it while guiding designers, especially in design. I think teras supports this feature, but I have not seen it actively in any project other than Project Naptha, which is not an actively developed project in this regard. I would like to hear if there is a project you know about this and want to share.

csdvrx3y ago

Is there anything on Linux not based on tesseract for the OCR?

It's not very good. I miss being able to copy/paste from blurry or deformed screenshots of youtube on Windows.

ElectricalUnion3y ago

Tesseract might be "not very good" but it is still state-of-the-art, often available, with many languages supported.

The special sauce - what you need to get a better result - is good, adaptive thresholding (something more advanced that raw naive binary thresholding you get feeding naive color/grayscale images to OCR).

As far as I know, once you get that nailed it doesn't matter that much what OCR you use - as long as it's available and supports your target language.

holbue3y ago

As others mentioned, Tesseract is SOTA in FOSS OCR. It also still is being developed, improving slow but constantly.

The main issue for a use-case like NormCap are the trained models: they are optimized for images of _printed_ text and layouts, which is different from on-screen-text in many aspects. Unfortunately, I don't have the resources to train my own models.

Cuneiform was a long time competitor, but afaik development there is stalled.

m-p-33y ago

Is there any development on Tesseract, or at least on updating the trained models out there? Just curious.

chrispogeek3y ago

I was just using tesseract.js and the repo looks active. Tesseract is still crap, but it's the free crap, so I'll just put up with it. Grayscale seems to improve the OCR. I'm sure there are tons of other techniques to improve the result

mmcwilliams3y ago

I can't find anything backing this up at the moment but I was under the impression that Google had been upstreaming some development to the project. Open Sans recognition in particular got noticeably more reliable sometime in the last few years.

nicodjimenez3y ago

Why not use a proprietary OCR tool like mathpix.com?

denimboy3y ago

Keras-ocr

holbue3y ago

Author here, excited to find my tool on hn! Happy to answer any questions.

PS: People looking for (FOSS) alternatives, look here: https://github.com/dynobo/normcap#similar-open-source-tools

tough3y ago

Mac only but I am a happy user and can recommend

https://github.com/schappim/macOCR

Just rediscovered the Shortcuts a couple days ago while installing it on a friend's mac.

villgax3y ago

I thought this would use macOS' native API for text extraction which is leaps better for text in the wild than tesseract which is what this tool uses.

j / k navigate · click thread line to collapse

24 comments

maxlin3y ago

I created a basic similar app for my own use because I wanted to have an idea what people are conversing in russian in online lobbies. WPF, Tesseract OCR and Microsoft's translation API.

https://streamable.com/ykng5u

Wouldn't be surprised if OCR software would leap soon due to a product similar to Whisper.

sitkack3y ago

That is a super neat.

Visual Universal Translator.

bjoli3y ago

I had my mind blown by the same functionality in Android. Being able to select text from an app in the app-switching context is amazing.

I will definitely use this.

bigmattystyles3y ago

On Windows with the new PowerToys, win+shift+t does the same thing - works really well!

https://learn.microsoft.com/en-us/windows/powertoys/text-ext...

stavros3y ago

How do you do this on Android?

RobinL3y ago

You can also do it with Google lens on any android, unless I'm misunderstanding the functionality

tripdout3y ago

It's only for Pixel phones.

1 more reply

quelltext3y ago

That doesn't use OCR.

lazycouchpotato3y ago

There are two options for text selection.

Overview selection is Pixel-exclusive and that is definitely OCR since it can detect text in images. It's not perfect however, and it doesn't seem to support non-Latin script.

The other way is to use an app like Copy [1] which analyzes app layouts.

[1] https://play.google.com/store/apps/details?id=com.weberdo.ap...

bjoli3y ago

It can select text from images. It uses OCR.

yigitkonur353y ago

csdvrx3y ago

Is there anything on Linux not based on tesseract for the OCR?

It's not very good. I miss being able to copy/paste from blurry or deformed screenshots of youtube on Windows.

ElectricalUnion3y ago

Tesseract might be "not very good" but it is still state-of-the-art, often available, with many languages supported.

As far as I know, once you get that nailed it doesn't matter that much what OCR you use - as long as it's available and supports your target language.

holbue3y ago

As others mentioned, Tesseract is SOTA in FOSS OCR. It also still is being developed, improving slow but constantly.

Cuneiform was a long time competitor, but afaik development there is stalled.

m-p-33y ago

Is there any development on Tesseract, or at least on updating the trained models out there? Just curious.

chrispogeek3y ago

mmcwilliams3y ago

nicodjimenez3y ago

Why not use a proprietary OCR tool like mathpix.com?

denimboy3y ago

Keras-ocr

holbue3y ago

Author here, excited to find my tool on hn! Happy to answer any questions.

PS: People looking for (FOSS) alternatives, look here: https://github.com/dynobo/normcap#similar-open-source-tools

tough3y ago

Mac only but I am a happy user and can recommend

https://github.com/schappim/macOCR

Just rediscovered the Shortcuts a couple days ago while installing it on a friend's mac.

villgax3y ago

I thought this would use macOS' native API for text extraction which is leaps better for text in the wild than tesseract which is what this tool uses.

j / k navigate · click thread line to collapse