Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
Silent Data Corruptions: The Boogeyman of LLM Training | Better HN
Silent Data Corruptions: The Boogeyman of LLM Training
(opens in new tab)
(adept.ai)
31 points
jmintz
2y ago
5 comments
Share
5 comments
default
newest
oldest
auraham
2y ago
Interesting post. It would be much better if the author included a few code snippets to show how to identify the failing GPU during training.
ejro
2y ago
Interesting. This is probably a universal problem for large model training but not being discussed enough.
adeptlo
2y ago
Super interesting problem that's affecting more people than they probably realize.
osavant
2y ago
Super interesting, thanks for putting this together
ibeitia
2y ago
Fascinating read!
j
/
k
navigate · click thread line to collapse