Meta's Segment Anything written with C++ / GGML (opens in new tab)

(github.com)

233 pointsariym2y ago31 comments

31 comments

ariymOP2y ago

This is a port of Meta's Segment Anything computer vision model which allows easy segmentation of shapes in images. Originally written in Python, Yavor Ivanov has ported it to C++ using the GGML library created by Georgi Gerganov which is optimized for CPU instead of GPU, specifically Apple Silicon M1/M2. The repo is still in it's early stage

dekhn2y ago

Do you know how the time to do the image embedding takes? In SAM, most of the time is spent generating a very expensive embedding (prohibitive for real-time object detection). From the timing on your page it looks like yours is also similarly slow, but I'm curious how it compares to the pytorch Meta implementation.

yavorgiv2y ago

I am the creator of the repo.

Depends on the machine, number of threads selected and the model checkpoint used (Vit-B or Vit-L or Vit-B). The video demo attached is running on Apple M2 Ultra and using the Vit-B model. The generation of the image embedding takes ~1.9s there and all the subsequent mask segmentations take ~45ms.

However, I am now focusing on improving the inference speed by making better use of ggml and trying out quantization. Once I make some progress in this direction I will compare to other SAM alternatives and benchmark more thoroughly.

billrobertson422y ago

This is amazing. Thank you!

unshavedyak2y ago

Well... damn. Is there a framework like this (or this directly?) which can run object detection? People, car types, makes, animals, etc?

yeldarb2y ago

Yes, GroundingDINO is an open set object detector. There are some others (eg DETIC and OWL-ViT) as well.

We’ve been working on using them (often in conjunction with SAM) for auto-labeling datasets to train smaller faster models that can run in real-time at the edge: https://github.com/autodistill/autodistill

fiddlerwoaroof2y ago

Would this be suitable for labeling images to search by keyword (think Apple Photos-like “car” searches to pull up photos of cars)

lulurennt2y ago

I think you would want to use something like CLIP embeddings for image search.

Really enjoyed using this app for iOS: https://github.com/mazzzystar/Queryable HN discussion: https://news.ycombinator.com/item?id=34686947

Or explore the dataset stable diffusion was trained on: https://news.ycombinator.com/item?id=32655497

Tostino2y ago

I am looking for a model similar to this, but for text. I want to group text with different labels that apply to subsets of the text. Think of being able to quickly pull-out related segments from a large body of text. Let's take, for instance, a sales contract that specifies a discounted price for various goods. If you select the label "data rows", the system should be able to extract all the text pertaining to the table that specifies which SKUs are being purchased, and at what discounted price. Moreover, this model should be capable of segmenting the content into semantically relevant chunks. One example: each row in the aforementioned table would be tagged with multiple labels. One would be just that it is a row, the data in the first column should be labeled for what it represents, e.g. "product number". Another example: if there's a section discussing the terms of delivery or warranty conditions, selecting the respective labels would instantly extract that specific information, regardless of where it's located within the document. Would be great for it to be able to segment into some controllable range of tokens/characters to allow for pulling those chunks into a vector database, along with the relevant tags related to the chunk.

artninja19882y ago

Big fan of your work GGML friends

lelag2y ago

Another GGML model port that I'm pretty excited about is https://github.com/PABannier/bark.cpp.

The Bark python model is very compute intensive and require a powerful GPU to get bearable inference speed. I really hope that bark.cpp with GPU/Metal support and quanticized model can bring useful inference speed on a laptop in the near future.

accurrent2y ago

Hmm wonder how this compares to stuff like FastSAM and MobileSAM. Is SAM quantized better or are those knock of architectures more performant.

fzaninotto2y ago

Bravo, the demonstration is genuinely impressive!

Next Step: Incorporate this library into image editors like Photopea (via WebAssembly) to boost the speed of common selection tasks. The magic wand is a tool of the past.

I'd pay for such a feature.

farhanhubble2y ago

While I love the efficiency from these Python to C++ ports I can't stop thinking about the long tail of subtle bugs that will likely infest these libraries forever but then the Python versions also sit atop C/C++ cores

wmf2y ago

Good news! Deep learning inherently has a long tail of subtle bugs (SolidGoldMagikarp anyone?) so no one will care if C++ introduces a few more.

hoseja2y ago

Just because Python silently ignores the bugs doesn't mean they're not there.

OccamsMirror2y ago

Just wait until they’re ported to C++ using AI!

IshKebab2y ago

I'm so glad the AI community is finally starting to ditch Python. It has held progress back for far too long.

lelag2y ago

The AI community is nowhere close to ditching Python. Most model development and training still use python based toolchains (torch, tf...). The new trends is for popular and useful models to be ported to more efficient stack like C++/GGML for easier usage and inference speed on consumer hardware.

Another popular optimisation is to port models to WASM + GPU because it makes them easy to support a variety of platforms (desktop, mobile...) with a single API and it can still offer great performance (see Google's mediapipe as an exemple of that).

IshKebab2y ago

That's why I said "starting to" not "close to".

fsloth2y ago

In general if you don't know what you are doing, it's much faster to first figure out a good strategy for a solution in a language that does not suffer from all of the encumbrance C++ brings in.

Python is really great for fast prototyping. It can be argued most AI products so far are result of fast prototyping. So not sure if there is anything wrong with that.

As practical models emerge, at that point it indeed makes sense to port them to C++. But I would not in my wildest dreams suggest prototyping a data model in C++ unless absolutely necessary.

dbmikus2y ago

How has Python held it back? Most of the heavy computation lifting is done by C extensions/bindings and the models are compiled to run on CUDA, etc. What am I missing?

lhl2y ago

Presumably what you're missing is that it's that IshKebab probably doesn't work in AI/ML at all (no links in his profile, but you can judge his post history yourself). Anyone can have voice opinion, but that doesn't mean it's particularly well informed.

IshKebab2y ago

I worked for an AI startup for 5 years until recently. Nice try though.

IshKebab2y ago

Setting up and deploying models in production or on edge devices is much much more complex if you have to deal with Python and Conda and whatnot.

dbmikus2y ago

You can compile the models to something that runs on edge though, right? For example, Tensorflow is a C++ framework that has Python bindings and a Python library, but when the models are served they are running on C++.

Maybe the act of compilation is an extra step, but I'd much rather have my development be in a high level language that is very suited to experimentation, probing, and testing, and then compile the final result down to something performant.

EDIT: I don't know much about the IOT world, and Tensorflow is likely a bad example as it's not designed to run on edge. So, I could understand that things like llama.cpp, GGML and GGUF are making strides towards easier runtimes. But I still think for dev-time, Python makes sense!

2 more replies

Havoc2y ago

I’d say discovery and innovation would be slower in a less relaxed language. And speed end up comparable thanks to the compiled parts of python

jebarker2y ago

This is exactly the wrong way around. We've seen the progress we've seen because of the adoption of Python. Even now there are relatively few people that can write code like this and have the ML and math experience to push forward the research.

j / k navigate · click thread line to collapse

31 comments

ariymOP2y ago

dekhn2y ago

yavorgiv2y ago

I am the creator of the repo.

billrobertson422y ago

This is amazing. Thank you!

unshavedyak2y ago

Well... damn. Is there a framework like this (or this directly?) which can run object detection? People, car types, makes, animals, etc?

yeldarb2y ago

Yes, GroundingDINO is an open set object detector. There are some others (eg DETIC and OWL-ViT) as well.

fiddlerwoaroof2y ago

Would this be suitable for labeling images to search by keyword (think Apple Photos-like “car” searches to pull up photos of cars)

lulurennt2y ago

I think you would want to use something like CLIP embeddings for image search.

Really enjoyed using this app for iOS: https://github.com/mazzzystar/Queryable HN discussion: https://news.ycombinator.com/item?id=34686947

Or explore the dataset stable diffusion was trained on: https://news.ycombinator.com/item?id=32655497

Tostino2y ago

artninja19882y ago

Big fan of your work GGML friends

lelag2y ago

Another GGML model port that I'm pretty excited about is https://github.com/PABannier/bark.cpp.

accurrent2y ago

Hmm wonder how this compares to stuff like FastSAM and MobileSAM. Is SAM quantized better or are those knock of architectures more performant.

fzaninotto2y ago

Bravo, the demonstration is genuinely impressive!

Next Step: Incorporate this library into image editors like Photopea (via WebAssembly) to boost the speed of common selection tasks. The magic wand is a tool of the past.

I'd pay for such a feature.

farhanhubble2y ago

wmf2y ago

Good news! Deep learning inherently has a long tail of subtle bugs (SolidGoldMagikarp anyone?) so no one will care if C++ introduces a few more.

hoseja2y ago

Just because Python silently ignores the bugs doesn't mean they're not there.

OccamsMirror2y ago

Just wait until they’re ported to C++ using AI!

IshKebab2y ago

I'm so glad the AI community is finally starting to ditch Python. It has held progress back for far too long.

lelag2y ago

IshKebab2y ago

That's why I said "starting to" not "close to".

fsloth2y ago

In general if you don't know what you are doing, it's much faster to first figure out a good strategy for a solution in a language that does not suffer from all of the encumbrance C++ brings in.

Python is really great for fast prototyping. It can be argued most AI products so far are result of fast prototyping. So not sure if there is anything wrong with that.

As practical models emerge, at that point it indeed makes sense to port them to C++. But I would not in my wildest dreams suggest prototyping a data model in C++ unless absolutely necessary.

dbmikus2y ago

How has Python held it back? Most of the heavy computation lifting is done by C extensions/bindings and the models are compiled to run on CUDA, etc. What am I missing?

lhl2y ago

IshKebab2y ago

I worked for an AI startup for 5 years until recently. Nice try though.

IshKebab2y ago

Setting up and deploying models in production or on edge devices is much much more complex if you have to deal with Python and Conda and whatnot.

dbmikus2y ago

2 more replies

Havoc2y ago

I’d say discovery and innovation would be slower in a less relaxed language. And speed end up comparable thanks to the compiled parts of python

jebarker2y ago

j / k navigate · click thread line to collapse