Which involves taking some rolling papers, a pouch of loose tobacco (or whatever), and perhaps a little device if you're rich. As opposed to manufactured cigarettes, you're just doing some manual assembly for the end-product.
You don't need to cultivate the plants or pulp any trees to roll your own.
Not quite. Serverless means you can run a server permanently, but you need pay someone else to manage the infrastructure for you.
https://github.com/zai-org/GLM-OCR
(Shameless plug: I also maintain a simplified version of GLM-OCR without dependency on the transformers library, which makes it much easier to install: https://github.com/99991/Simple-GLM-OCR/)
I do agree with the use of serverless though. I feel like we agree long ago that serverless just means that you're not spinning up a physical or virtual server, but simply ask some cloud infrastructure to run your code, without having to care about how it's run.
'Serverless' has become a term of art: https://en.wikipedia.org/wiki/Serverless_computing
> Serverless is a misnomer
But this caught me for a bit as well. :-)
I use carless transportation (taxis).
ocrarena.ai maintains a leaderboard, and a number of other open source options like dots [1] or olmOCR [2] rank higher.
#!/usr/bin/env bash
# requires: tesseract-ocr imagemagick maim xsel
IMG=$(mktemp)
trap "rm $IMG*" EXIT
# --nodrag means click 2x
maim -s --nodrag --quality=10 $IMG.png
# should increase detection rate
mogrify -modulate 100,0 -resize 400% $IMG.png
tesseract $IMG.png $IMG &>/dev/null
cat $IMG.txt | xsel -bi
notify-send "Text copied" "$(cat $IMG.txt)"
exitMy client's usecase was specific to scanning medical reports but since there are thousands of labs in India which have slightly different formats, I built an LLM agent which works only after the pdf/image to text process - to double check the medical terminology. That too, only if our code cannot already process each text line through simple string/regex matches.
There are perhaps extremely efficient tools to do many of the work where we throw the problem at LLMs.
> In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G).
That... doesn't sound legal
I like to push everything into the image as much as I can. So in the image modal, I would run a command to trigger downloading the model. Then in the app just point to the locally downloaded model. So bigger image, but do not need to redownload on start up.
I have 4 of these now, some are better than others. But all worked great.
step 1 draw a circle
step 2 import the rest of the owl