Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
Ask HN: How the AI companies collect data to train models? | Better HN
Ask HN: How the AI companies collect data to train models?
1 points
piotrke
2y ago
1 comments
Share
From the Internet, obviously, but how? Are they crawling through every website out there based on the IPs or domain names? Or do they piggyback on Google. Or is there all-internet-data store to just download the latest 'Internet data' dump?
1 comments
default
newest
oldest
richardjam73
2y ago
They use datasets like common crawl.
j
/
k
navigate · click thread line to collapse