undefined | Better HN

0 pointschintler2y ago0 comments

I'll recommend the Spotlight paper by Google[1]. There are very interesting datasets they created for this purpose. They mention they have a screen-action-screen dataset that is in-house and it doesn't look like they'll open it. Maybe owning Android has its advantages.

There's a recent paper by Huggingface called IDEFICS[2] that claims to be an open source implementation of Flamingo(an older paper about few-shot multi-modal task understanding) and I think this space will be heating up soon.

[1] https://research.google/pubs/pub52171/

[2] https://huggingface.co/blog/idefics

0 comments

hugs2y ago

Thanks!

j / k navigate · click thread line to collapse