undefined | Better HN

0 pointscinntaile3y ago0 comments

Usually you just trivially have the model run on cpu or gpu by simply writing .cpu() at specific places, so he's wondering why this isn't the case here.

0 comments

markasoftware3y ago

that's literally all I did (plus switching the tensor type). I'd imagine people are posting and upvoting this not because it's actually interesting code but rather just because it runs unexpectedly fast on consumer CPUs and it's not something they considered feasible before.

roenxi3y ago

That is vastly underestimating how tricky it is to make novel pieces of software run. There is a huge fringe of people who know how to click things but not use the terminal and a large fringe of people who know how to run "./execute.bat" but not how to write syntactically correct Python.

But a lot of those people want to play with LLMs.

ComplexSystems3y ago

How are you getting this to run fast? I'm on a top of the line M1 MBP and getting 1 token every 8 minutes.

ingenieroariel3y ago

Try switching all the .cuda() to .mps() I got a 100x speedup on a different language model on a Macbook M1 Air.

https://pytorch.org/docs/stable/notes/mps.html

1 more reply

markasoftware3y ago

probably pytorch is very optimized to x86. It's likely using lots of SIMD and whatnot. I'm sure it's possible to get similar performance on m1 macs, but not with the current version of pytorch.

Do you have enough ram? (not swapping to disk)?

jwitthuhn3y ago

Same experience for me, looks like it is only using one cpu core instead of all of them.

sva_3y ago

Or better yet, define a device = 'cpu', and use tensor.to(device).

j / k navigate · click thread line to collapse

0 comments

markasoftware3y ago

roenxi3y ago

But a lot of those people want to play with LLMs.

ComplexSystems3y ago

How are you getting this to run fast? I'm on a top of the line M1 MBP and getting 1 token every 8 minutes.

ingenieroariel3y ago

Try switching all the .cuda() to .mps() I got a 100x speedup on a different language model on a Macbook M1 Air.

https://pytorch.org/docs/stable/notes/mps.html

1 more reply

markasoftware3y ago

probably pytorch is very optimized to x86. It's likely using lots of SIMD and whatnot. I'm sure it's possible to get similar performance on m1 macs, but not with the current version of pytorch.

Do you have enough ram? (not swapping to disk)?

jwitthuhn3y ago

Same experience for me, looks like it is only using one cpu core instead of all of them.

sva_3y ago

Or better yet, define a device = 'cpu', and use tensor.to(device).

j / k navigate · click thread line to collapse