undefined | Better HN

0 pointsjskherman2y ago0 comments

What kind of approach did you take? I was thinking along the lines of requiring something like rewind.ai or some program that autoscreenshots your screen at a set interval (or originally a recorded video split into several images later) and having a vision-capable model (particularly specialized in UIs) describe these set of images in order to build a dataset of images-tags-description and the like.

0 comments

jskhermanOP2y ago

There's also libraries like trafilatura in Python featured here in HN some time ago that could extract content from websites to help augment the data.

j / k navigate · click thread line to collapse

0 comments

jskhermanOP2y ago

There's also libraries like trafilatura in Python featured here in HN some time ago that could extract content from websites to help augment the data.

j / k navigate · click thread line to collapse