Edge TPU – Run Inference at the Edge (opens in new tab)

(cloud.google.com)

43 pointsobulpathi7y ago12 comments

12 comments

From https://en.wikipedia.org/wiki/Edge_computing:

Edge computing is a method of optimizing cloud computing systems "by taking the control of computing applications, data, and services away from some central nodes (the "core") to the other logical extreme (the "edge") of the Internet" which makes contact with the physical world.[1] In this architecture, data comes in from the physical world via various sensors, and actions are taken to change physical state via various forms of output and actuators; by performing analytics and knowledge generation at the edge, communications bandwidth between systems under control and the central data center is reduced. Edge Computing takes advantage of proximity to the physical items of interest also exploiting relationships those items may have to each other.

TeMPOraL7y ago

Basically, how such computing should be done? Preferably with little to none of the data reaching the central server.

Funny how computing got centralized, and now is slowly getting decentralized again. I'm happy to see tech for that developing, but I worry that data ownership will continue to be centralized.

obulpathiOP7y ago

Eagerly waiting for Dev kits - to be released in couple of months!

deepnotderp7y ago

Looking at the image, looks like the die size is around 10-36 mm2,so presumably it's meant to be an ultra low performance chip?

slivym7y ago

Hey. We don't say low performance. We say efficient, or low cost.

deepnotderp7y ago

Sorry, yes, I didn't mean that as a negative, I just meant wrt its target market.

Also smaller != more efficient

dna_polymerase7y ago

Int8 and Int16? I never worked with quantized models, anyone mind sharing their experience? Do such models achieve state-of-the-art performance?

ur-whale7y ago

a) that's just for inference, you don't train with that.

b) a fully float-trained model "quantized" to int16 typically loses overall precision, but often works well enough. It's also usually faster (if implemented properly).

c) there's a version where you go all the way down to int1 (bits) and binary ops instead of addmuls on floats and ints. It can solve some problems. And properly compiled, it's wicked fast.

DoofusOfDeath7y ago

> there's a version where you go all the way down to int1 (bits)

There's also a Zen version that uses just 0.5 bits. </joke>

dekhn7y ago

We lose a tiny bit of accuracy (quantizing for Android Tensorflow Lite), that's about it. I was pretty impressed.

knorker7y ago

It's an AI chip for mobile devices out on missions. Skynet... err... I mean Google.. set it to read-only when out on these missions.

Inference only, no ML training. Only the Cloud has löööörning capabilities.

I bet you can just unscrew the head and flip a dip-switch, and it'll start combining insults in no time.

brootstrap7y ago

instead of cloud-to-butt, now we need edge-to-butt plugin :p

j / k navigate · click thread line to collapse

12 comments

grumpopotamus7y ago

From https://en.wikipedia.org/wiki/Edge_computing:

TeMPOraL7y ago

Basically, how such computing should be done? Preferably with little to none of the data reaching the central server.

Funny how computing got centralized, and now is slowly getting decentralized again. I'm happy to see tech for that developing, but I worry that data ownership will continue to be centralized.

obulpathiOP7y ago

Eagerly waiting for Dev kits - to be released in couple of months!

deepnotderp7y ago

Looking at the image, looks like the die size is around 10-36 mm2,so presumably it's meant to be an ultra low performance chip?

slivym7y ago

Hey. We don't say low performance. We say efficient, or low cost.

deepnotderp7y ago

Sorry, yes, I didn't mean that as a negative, I just meant wrt its target market.

Also smaller != more efficient

dna_polymerase7y ago

Int8 and Int16? I never worked with quantized models, anyone mind sharing their experience? Do such models achieve state-of-the-art performance?

ur-whale7y ago

a) that's just for inference, you don't train with that.

b) a fully float-trained model "quantized" to int16 typically loses overall precision, but often works well enough. It's also usually faster (if implemented properly).

c) there's a version where you go all the way down to int1 (bits) and binary ops instead of addmuls on floats and ints. It can solve some problems. And properly compiled, it's wicked fast.

DoofusOfDeath7y ago

> there's a version where you go all the way down to int1 (bits)

There's also a Zen version that uses just 0.5 bits. </joke>

dekhn7y ago

We lose a tiny bit of accuracy (quantizing for Android Tensorflow Lite), that's about it. I was pretty impressed.

knorker7y ago

It's an AI chip for mobile devices out on missions. Skynet... err... I mean Google.. set it to read-only when out on these missions.

Inference only, no ML training. Only the Cloud has löööörning capabilities.

I bet you can just unscrew the head and flip a dip-switch, and it'll start combining insults in no time.

brootstrap7y ago

instead of cloud-to-butt, now we need edge-to-butt plugin :p

j / k navigate · click thread line to collapse