https://www.behance.net/gallery/29122113/Pelican-on-bikes-wi...
There are other such images. Not an image model? How do we know that they don't convert all images to svg and train an LLM on it? How do we know that they do not cheat on this benchmark and route the query to an image model first?