Gemini Embedding 2: natively multimodal embedding model (opens in new tab)

(blog.google)

36 pointspanarky2mo ago5 comments

5 comments

This is colossal. It can creates embeddings on pretty much any type of format, video, audio, documents. The context is still a bit small compared to what we are used to in text, but this seems major

Grimblewald2mo ago

How does it compare with qwens open weight multimodal embedding model? Anyone know? This seems lesser form what i read, with the drawback of bei g via some api/model i dont have control over. Qwen gives great ebeddings out of the gate while also being steerable, i.e. you can supply a prompt to focus on embedding specific tasks with higher resolution, which in my tests has been mind-blowingly good. Not seeing the value add here.

1 more reply

jerrygoyal2mo ago

what's the pricing and how does it compare to zembed-1 for text only embeddings?

jiggawatts2mo ago

Pricing is here: https://cloud.google.com/vertex-ai/generative-ai/pricing#emb...

Seems to be 20 cents per million tokens of text and 0.012 cents per image.

j / k navigate · click thread line to collapse