The Tesla Dojo Chip Is Impressive, but There Are Some Major Technical Issues (opens in new tab)

(semianalysis.com)

82 pointstobijkl4y ago107 comments

107 comments

As Hamming suggested in "Art of doing science and engineering", when you want to make something autonomous, you usually have to build a completely different device that solves the same problem, rather than automating the same device.

I wonder. For all the money thrown into self-driving cars research, could we have had an autonomous rail system by now? The technology for mostly-autonomous rail is well understood. Most of the financial cost is in infrastructure to support the system. Seems to me self-driving cars try to short-circuit that infrastructure build-up. They try to "automate the device" rather than "producing an automated system that solves the problem of moving people and goods".

Specifically, I wonder if, for the cost and time spent on CPU-and-engineer-driven research and development of autonomous cars, if we could have had nationwide autonomous rail rolled out by now.

dragontamer4y ago

> could we have had an autonomous rail system by now?

We already have autonomous rail systems. Its called positive train control and was fully implemented like a year or two ago (mandated in 2009, but you know how government works, lol) https://en.wikipedia.org/wiki/Positive_train_control

The train conductor has become more-and-more automated to remove the chance of human error. It works with a system of very reliable sensors that indicate where every train engine is on the rails.

Given the huge amount of cargo any particular train has, I don't think there's any intent on cutting the last two humans (the conductor + engineer) out of their job. Their salary costs are miniscule compared to the safety value they deliver, even if the job of driving a train has been almost entirely automated away by now.

llsf4y ago

Paris got its first autonomous metro line in 1998. It was developed in ADA. http://archive.adaic.com/projects/atwork/paris.html

Using the B method: https://en.wikipedia.org/wiki/B-Method

https://link.springer.com/content/pdf/10.1007%252F3-540-4811...

https://arxiv.org/pdf/2005.07190.pdf

All this was developed in the 80's and 90's. It would be interesting to see how this evolved. Obviously with a ML/AI approach it would be different now. Although there might be some ways to specify some constraints or boundaries to an AI system, for safety, comfort, physics, etc.

dragontamer4y ago

> Obviously with a ML/AI approach it would be different now.

Is it? The theory of block signals and path signals don't change because AI was invented. And I have my doubts that AI could do better than path-signal algorithms.

https://en.wikipedia.org/wiki/Railway_signalling

OpenTTD players, wassup? You can run these block signals / path signals in video games. Its a bit complicated to setup and the terminology is arcane... but the algorithms aren't so advanced that they require "AI" or anything. (If at block signal, wait until exit signal is ready. Etc. etc.). Its actually pretty fun (OpenTTD is a videogame built around these signals!!), because if you mess up your signals, you get deadlocks and/or "race conditions" (aka: risk of a deadly crash). But once you get used to the methodology, you can build incredibly complex junctions that automatically and safely routes trains everywhere.

That's why you had legions of railway nerds playing with trains in their basement all day: they're trying to get their path signals / block signals / chain signals correct so that those toy trains traverse their toy-tracks automatically without crashing. These concepts were like, from the 1930s or something (Path signals are newer)

EDIT: Apparently "before 1923": https://www.amazon.com/gp/product/117705681X. Definitely an ancient and arcane art of old wizards. Railway engineers were dealing with semaphores, race conditions, and deadlocks __LONG__ before us programmers even existed!!

jvanderbot4y ago

I wonder if, for the cost spent on CPU-and-engineer-driven research and development of autonomous cars, if we could have had nationwide autonomous rail rolled out.

dimitrios14y ago

Nope, requires a sane society that gathers around and does whats best for the collective whole. What we have, however, is a kleoptocracy, and being as such, the opportunity to extract more wealth, faster, and easier, existed with selling the promise of autonomous vehicles. If the same incentives existed for autonomous rail, it would have happened already.

1 more reply

dnautics4y ago

Not for passengers. You still need conductors, if for no other reason to be the equivalent of flight attendants (there for extremely rare safety incidents -- deaths per passenger mile are higher for trains than planes -- to manage unruly passengers, etc.)

steveBK1234y ago

No because its a political and labor relations problem not a cost problem

dragontamer4y ago

https://railroads.dot.gov/train-control/ptc/positive-train-c...

> On December 29, 2020, FRA announced that PTC technology is in operation on all 57,536 required freight and passenger railroad route miles, prior to the December 31, 2020 statutory deadline set forth by Congress.

We got that. We literally got that.

1 more reply

foota4y ago

Probably not, that's likely a hundreds of billions of dollars investment, whereas I imagine self driving investment totals a couple tens at the most.

samstave4y ago

Remember that derailment/near derailment that happened when someone cut the electrical circuit connector cable on some rails, which then prevented the system for knowing where the train was?

dragontamer4y ago

PTC, from my understanding, is that the train gets "positive orders" from a central server. The central server says "It is safe to move through X", and the train then moves through X.

If the sensor in X is broken, then the server will say "I cannot prove it is safe to move through X", and the train will stop.

I'm not a transportation engineer however, so I'll defer to anyone with actual experience in the field. But the idea is that our sensors are so effective today, that it is better to "prove" each leg of the journey is safe with "positive train control", rather than the opposite normal approach. (Ex: Sensor detects a problem, which stops the train).

That is to say: all sensors in the USA are in positive-train control mode. Trains will therefore stop if any sensor malfunctions. We're now in a "default stop" state for all trains, unless those sensors are working, deployed across USA.

1 more reply

zdragnar4y ago

We could have possibly automated some existing rail, but I am not altogether certain that doing so would have lead to any significant improvements in cost or efficiency.

Actually laying the infrastructure for mass transit via rail is an entirely different league of cost from what has been dumped into self driving cars.

We have a hard enough time agreeing on how to do light rail transit in places that want it, and then actually getting it done.

jvanderbot4y ago

I'm not altogether convinced that the money and effort spent on self-driving cars has led or will lead to any significant improvements in cost or efficiency either.

Even if it does succeed, it seems to be about convenience anyway.

zdragnar4y ago

If self driving succeeds, it is because it is safer than a human driver. If nothing else, it'll cut down on insurance premiums, and increase fuel efficiency by reducing speeding and unnecessary hard acceleration.

There are also a lot more people currently driving than there are existing train drivers (which you can see I was comparing existing infrastructure) so the time tradeoff could easily lead to increased productivity, however you choose to measure it.

All of this, of course, assumes fully autonomous driving. I'm somewhat sceptical that we'll actually get there, myself.

jeffbee4y ago

A self-driving streetcar, that only has 1 degree of freedom, might be a practical problem to solve and maybe if it existed we would not have had to go through the 90-day outage of VTA light rail service that the Bay Area just experienced, an experience which in all likelihood killed VTA light rail forever.

mynameisvlad4y ago

The VTA control building/center was badly damaged in the shooting that caused the outage you're talking about. Surely a fully automated light rail line would require everything in said building to likely be running at full capacity to work.

Also, it was a result of a mass shooting. Have some tact.

wumpus4y ago

VTA restarted service a couple of days ago.

deeviant4y ago

Rail is already highly autonomous with external monitoring systems integrated with vehicle control systems, but I'll just go on as if the distinction between what it is now and what are calling "autonomous" is significant.

What problem does autonomous rail solve? The single driver is already a rounding error in total costs. Also, rail is already a controlled environment where collisions are much less likely to happen than road, so the fruit is much higher up the tree on that aspect too.

It seems to me that bringing autonomy to rail would have little on it's bottom line.

vishnugupta4y ago

> build a completely different device that solves the same problem

I realised this while discussing self-driving cars with my friends.

I used example of Uber Eats. The problem statement is "I don't want to cook" and a reasonably acceptable solution IMO is cloud kitchens + delivery. As opposed to building a cooking robot.

Cloud kitchens could automate 80% of repeatable stuff because it makes sense to solve that problem at scale.

stcredzero4y ago

Why not optimize Blue Apron and HelloFresh style meal kits so that they involve even less labor, then distribute the production to local commercial kitchens? There are already businesses that package this kind of prep work for restaurants.

dragontamer4y ago

I'm pretty sure you're just reinventing Domino's pasta line.

Standardize the ovens you distribute around the world. Simplify food cooking so that its just "put this pre-cut prepared block of aluminum foil into the standard-temperature oven for 15 minutes", and "deliver".

1 more reply

whatshisface4y ago

The labor saving advantages of carrying 100 people on the same vehicle are so enormous that there is little motivation if any to quit paying conductors and engineers.

samstave4y ago

The profit motivation of having 100 people being required to buy gas is the anti-motivation in capitalist oligarchies

the_third_wave4y ago

That's not a valid argument given that it is just as feasible to price individual transport out of the market while simultaneously raising the price for public transport in such a "capitalist oligarchy". Fossil fuel companies will again go on a public transport buying spree, this time around with the intention of keeping the trains rolling while raising prices to the point of maximum profit.

samstave4y ago

I've always wondered if TCP and networking design could be applied to autonomous traffic... basically think of every car/train as a packet and ensure no collisions...

Which networking protocol best maps to this?

And what if we had smart traffic lights that were aware of every car in an surrounding area of an intersection...

I mean FFS certain tech companies track all vehicles that drive by/near their corporate campuses and report that back to the city...

And that's almost a decade old now...

So apply the same but report the data back to the traffic management system which is also trained on all the traffic patterns for a given intersection to best optimize for their patterns...

hamiltonkibbe4y ago

For the most part the MAC just looks for another signal on the wire (another train on the same section of rail) and when it looks clear, starts transmitting (driving). As you can imagine, there will be cases when 2 MACs start talking at the same time, at which point they detect the collision, wait a random delay and try again. I wouldn’t want to be on that train, I’d prefer plain old serial with hardware flow control

fanf24y ago

Ethernet has not worked like that since the 1990s.

Even WiFi avoids relying on collision detection by routing client-to-client traffic via the AP instead of being peer-to-peer.

rini174y ago

Large part of TCP is dealing with dropped packets. Presumably you don't want that to carry that over to rail/road traffic :)

samstave4y ago

HAHAH yes, I know -- I just didnt have a better example protocol... TCP is "ensuring the traffic gets there" -- not so much "ensuring no collisions" that was why I chose that one.

Maybe BGP instead? "Route failed - go this way instead, as a backup"

1 more reply

fanf24y ago

The London Docklands Light Railway (DLR) has had fully automatic running since 1987 https://en.wikipedia.org/wiki/Docklands_Light_Railway

HPsquared4y ago

We could go the other way and have humanoid robot drivers that can get in and drive any car. Now that'd be difficult!

dragontamer4y ago

Somehow, I'm reminded of the Tsar tank from WW1. The Russians knew that a new weapon of war: an armored car, was necessary to break the stalemate of trench warfare.

This hypothetical armored car needed many features: the most important was that it must be able to move across the muddy no man's land reliably.

Tests have shown that regular sized wheels would get stuck in the mud. A bigger wheel has more surface area and greater contact area. So the Russians built an armored car with the largest wheels possible. Russian tests were outstanding, the Tsar tank rolled over a tree !!!!

https://en.m.wikipedia.org/wiki/Tsar_Tank

The French design was to use caterpillar tracks. We know what works now since we have a century of hindsight.

--------

Spending the most money to make the biggest wheel isn't necessarily the path to victory. I think it's more likely that the tech (aka, caterpillar track equivalent) hasn't been invented yet for robotaxis. Hitting the problem with bigger and more expensive neural network computers doesn't seem to be the right way to solve the problem.

zaptrem4y ago

I agree with your points on the robotaxi front, but there are many other problems that will totally benefit from a bigger training computer.

dragontamer4y ago

But Tesla isn't a cloud-provider company, nor is it a hardware company. None of the technical specs, assembly language, API, SDKs or whatnot have been released for Dojo.

The model of "someone will find this training computer useful" is... fine. Google TPUs, NVidia DGX, Intel Xe-HPC, AMD MI100, Cerebras wafer scale AI. These are computers that nominally are aiming for the market of selling computers / APIs / SDKs that will make training easier.

Its a pretty crowded field. Someone probably has struck gold (NVidia has a lead but... its still anyone's game IMO)

-------

If Tesla's goal is to compete against everyone else (or make a chip that's cost-competitive with everyone else), Tesla needs more volume than (allegedly) 3000 chips (quoted from the article: I dunno where they got this figure but... there's no way in hell 3k chips is cost-effective).

That's the name of the game: volume. The reason why NVidia leads is because NVidia sells the most GPUs right now, which means their R&D costs are applied to the broadest base, which means those company's engineering costs (aka: CUDA training) is spread across the widest number of programmers, leading to a self-reinforcing cycle of better hardware, lower costs, with a larger community of programmers to learn from.

jeffbee4y ago

You forgot the Hauwei Ascend 910. 4096 of them in a rack is an easy exaflop.

1 more reply

baybal24y ago

> many other problems that will totally benefit from a bigger training computer.

I don't really think it's that many.

The industry collectively sank untold billions into the blind belief that neural algorithms will somehow turn into "AI."

10 years later, no "AI," and not even a single money making niche use.

Right now the industry is deep in sank cost falacy, and people who promised this, and that to investors are now desperate, and doubling their bets in hopes that "at least something will come out of it...," a casino mode basically.

gpm4y ago

> not even a single money making niche use.

There are tons of money making niche uses of neural networks. From the branch predictor on your CPU, to trading on the stock market, to image-search engines.

1 more reply

atty4y ago

As someone who works in the field, if your only conception of AI is a truly sentient intelligence, then yes, we are, as far as I can tell, nowhere near that. However, if by AI, you mean the use of sub domains of AI like machine learning or deep learning (where all the money has been spent), it’s quite literally all around us now, whether you realize it or not. At my company our deployment of “AI” techniques is ramping up year over year, not slowing down, and we’re seeing very good results. (And I work at a very old company, not a tech native. I can only imagine how much of Google’s or Facebook’s workflows and products include some sort of machine learning)

stcredzero4y ago

The industry collectively sank untold billions into the blind belief that neural algorithms will somehow turn into "AI."

The huge qualitative differences between GPT-2 and GPT-3 seem to suggest that they will, if you just keep adding orders of magnitude more connections and more data.

https://www.youtube.com/watch?v=_8yVOC4ciXc

deepnotderp4y ago

What? Pretty much every major Internet company extensively uses NNs. Just recently there was a great published use case by Google using GNNs for travel time predictions

justapassenger4y ago

> Of this competition, only Google and Nvidia have supercomputers that stand toe to toe with the Tesla’s

Even assuming that it's true (which I very much doubt - anyone that's willing to spend enough money with Nvidia, can have powerful supercomputer fairly quickly), it's very dishonest statement. It's comparing deployed system with a lab prototype of a single competent of potential supercomputer, that may be fully operational in few years (software is a really, really, really big deal here).

solidasparagus4y ago

I assumed they're talking about Tesla's A100 cluster, which is huge - https://blogs.nvidia.com/blog/2021/06/22/tesla-av-training-s...

Tesla's compute-to-researcher ratio is definitely rare

jeffbee4y ago

It is really unreasonable to compare Tesla's photoshop mocks with hardware already deployed in the field today. Google already has a TPUv4 cluster that can train ResNet-50 in 13 seconds, which is ridiculous. Until Tesla publishes actual MLPerf benchmarks, you can assume that their ASIC game is at least as far behind Google's as their self-driving game is behind Waymo's: 5 years at a minimum.

https://github.com/mlcommons/training_results_v1.0/tree/mast...

thesausageking4y ago

The Q&A section on their compiler and software that the author links to is very interesting:

https://www.youtube.com/watch?v=j0z4FweCy4M&t=8047s

It sounds like they're going to have write a ton of custom software in order to use this hardware at scale. And, based on the team being speechless when asked a follow up question, it doesn't sound like they know (yet) how they're going to solve this.

Nvidia gets a lot of credit for their hardware advances, but what really what their chips work so well for deep learning was the huge software stack they created around CUDA.

Underestimating the software investment required has plagued a lot of AI chip startups. It doesn't sound like Tesla is immune to this.

ggoo4y ago

Tesla's claim to delivery ratio is abysmal. I'm not sure why anybody even bothers deconstructing these presentations anymore, they're just fluff.

snorrah4y ago

I would argue it’s always useful to see their tech deconstructed and explained. If nothing else, so we get an idea what the reality is to counter possible outlandish claims from overly-enthusiastic followers of the company (and its CEO)

stcredzero4y ago

Tesla's claim to delivery ratio is abysmal.

Can you substantiate this concretely? How about a list, with direct sources? (Not opinion pieces.)

gooseus4y ago

> The full system is scheduled for some time in 2022. Knowing Tesla’s timing on Model 3, Model Y, Cyber Truck, Semi, Roadster, and Full Self Driving, we should automatically assume we can pad this timing here.

That's just from the article; off the top of my head:

* NYC to LA fully autonomous drive by 2017.

* 1M Robotaxis on the road by 2021.

* Hyperloop.

* Solar roof tiles.

* All superchargers will be solar-powered.

* Tesla Semi.

Sure, some of these things may be "just around the corner" or "ramping up now", but some of these are claims going back almost 5 - 10 years where Elon says "2 weeks", "next year", "2 years", really whatever it takes to be just believable enough to get enough people to buy into a future where Tesla is worth 10x what it is today.

stcredzero4y ago

So, to sum up, the criticisms are mostly about "Elon Time" and not about whether Tesla is actually trying to do things. Hyperloop is not something Tesla is trying to do, and Elon is only cheerleading that effort.

There's something highly off-kilter with the relative mildness of the above and the vitriol of the criticism directed at it. This actually makes me feel really good about Elon Musk and Tesla's prospects!

ggoo4y ago

https://en.wikipedia.org/wiki/Criticism_of_Tesla,_Inc.

michelpp4y ago

Clearly a a shot across the bow for Cerebras and another excellent target for the GraphBLAS.

Dense numeric processing for image recognition is a key foundation for what Tesla is trying to do, but that tagging of the object is just the beginning of the process, what is the object going to do? What are its trajectories, what is the degree of belief that a unleashed dog vs a stationary baby carriage is going to jump out?

We are just beginning to scratch the surface of counterfactual and other belief propagation models which are hypersparse graph problems at their core. This kind of chip, and what Cerebras are working on, are the future platforms for the possibility of true machine reasoning.

2bitencryption4y ago

from the article:

> but the short of it is that their unique system on wafer packaging and chip design choices potentially allow an order magnitude advantage over competing AI hardware in training of massive multi-trillion parameter networks.

I kind of wonder if Tesla is building the Juicero of self-driving. [0]

Beautifully designed. An absolute marvel of engineering. The result of brilliant people with tons of money using every ounce of their knowledge to create something wonderful.

Except... you could just squeeze the bag. You could just use LIDAR. You could just use your hands to squish the fruit and get something just as good. You could just (etc etc).

No doubt future Teslas will be supercomputers on wheels. But what if all those trillions of parameters spent trying to compose 3D worlds out of 2D images is pointless if you can just get a scanner that operates in 3D space to begin with??

[0] https://www.theguardian.com/technology/2017/sep/01/juicero-s...

arnaudsm4y ago

The Juicero comparison doesn't hold up. LIDAR is 10x more expensive than RGB, but neither reach lvl5 at the moment. I'm glad multiple companies try multiple paths, it's the best way to avoid a research dead-end.

KaiserPro4y ago

> LIDAR is 10x more expensive than RGB

but pure RGB needs $millions to make a reliable realtime depth sensor, plus custom silicon and a massive annotated dataset.

It might just be that one company can do it, but its a hefty gamble.

nightski4y ago

Everyone acts like LIDAR is the holy grail but then why isn't there someone destroying Tesla with that tech? Waymo is not much farther along than Tesla, maybe even behind as far as miles driven.

If that was all that was needed then it would be done.

modeless4y ago

I'm glad people are exploring the design space. To some extent the training techniques and neural net architectures need to be tailored to the hardware. Nvidia isn't on top just because they're good at chip design, but because people have chosen to focus research effort on techniques that work well on Nvidia hardware. New hardware may allow new techniques to shine.

New hardware architectures can't really be used to their full potential without years of research into techniques that are suited for them. The more people who have access to the hardware, the faster we can discover those techniques. If Tesla is serious about their hardware project, they need to offer it to the public as some kind of cloud training system. They don't have enough people internally to develop everything themselves in a short enough time to remain competitive with the rest of the industry.

cr4zy4y ago

Trillion parameter networks are mentioned a few times, but Tesla is deploying much smaller networks than that (like tens of millions IMU). Trillion param networks are mostly transformers like GPT-3 (actually 175B) etc... that are particularly heavy vs Conv as they have no weight sharing. Tesla is definitely starting to use transformers though, e.g. for camera fusion and evidenced by their focus on matrix multiply in dojo asic's vs the conv asics they have in the on-vehicle chips.

zozbot2344y ago

Yup, there's plenty of ML architectures that try to save on parameters size, achieving better generalization (less overfitting) at the expense of slightly costlier training and inference. The memory constraints on Tesla Dojo might not be a big deal after all.

m3kw94y ago

All I see is [techno terms].. impressive engineering.. lots of problems need to be solved first..2022..on paper toe to toe with Nvidia.. calm the hype.

thunkshift14y ago

What a bs fanboy article.. the author is going gaga over something that isnt even out in silicon yet, and has no credible plans of software ecosystem coming on top the hw( if it materializes). Unbelievable hype.

Const-me4y ago

> they have 1.25MB of SRAM and 1TFlop of FP16/CFP8… This is woefully unequipped for the level of performance they want to achieve.

Any idea how OP made that conclusion?

My GeForce 1080Ti has 1.3MB of in-core L1 caches (28 streaming multiprocessors, 48kb L1 each). It also has L2 but not too large, slightly under 3MB for the whole chip.

The GPU delivers about 10 TFlops of FP32 which needs 2x the RAM bandwidth of FP16. I’m generally OK with the level of performance, at least until the GPU shortage is fixed.

neolefty4y ago

> This chip is not Tesla designing something that is better than everyone else all by themselves. We are not at the liberty to reveal the name of their partner(s), but the astute readers will know exactly who we are talking about when we reference the external SerDes and photonics IP.

Any "astute readers" here who know who the partner would be?

Melkman4y ago

I'd guess Broadcom.See also https://www.broadcom.com/info/optics/silicon-photonics

wumpus4y ago

Looks like the tech that Intel has been attempting to get to work and be cost effective for several decades.

eddanger4y ago

My astute guess is ASML.

TomVDB4y ago

That astute guess is so out of left field that it needs at least a bit of justification.

To my limited ASML knowledge, ASML is a fab equipment maker, not a silicon IP procider.

gimmeThaBeet4y ago

Correct, that's a bit of an odd guess. You are definitely working closely with your foundry on something like this, but unless you are building some truly exotic device, I don't think their vendor is getting involved.

Like Melkman says, Broadcom is a good guess. Not only for past rumors, but iirc they also did work with google's TPU (could never figure out if that was actually confirmed?). Interconnect IP like that is definitely in their wheelhouse.

_nalply4y ago

My curiosity got piqued at the mention of CFP8 (configurable floating point 8), but googling this didn't yield usable information.

What exactly is CFP8? How many bits does one instance of CFP8 use? What mathematical operations are supported? How does one configure the floating point?

_nalply4y ago

I found about posits.

https://www.johndcook.com/blog/2018/04/11/anatomy-of-a-posit...

Perhaps CFP8 are parameterized 8-bit posits where the parameter is the value es. The larger es is, the greater the dynamic range is at the expense of precision. Two examples:

posit<8, 0> (es = 0) has as largest positive number 64 and the smallest positive number 1/64.

posit<8, 1> (es = 1) has as largest positive number 4012 and the smallest positive number 1/4012.

The formula for the largest positive number for 8-bit posits is:

2 ^ 2 ^ es ^ 6.

posits don't have NaNs and only one infinity (±∞), so they can use more of the 8 bit values for numbers than floating point numbers.

I wonder: is CFP8 = posit<8, es>?

scardycat4y ago

This is a step in the right direction. I witnessed the semiconductor industry abandoning their own designs in favor of Intel/x86. Better diversity in chip design is always a good thing, even if its in closed ecosystems (Google TPU, Tesla Dojo)

rektide4y ago

some discussion yesterday, https://news.ycombinator.com/item?id=28361807

it's interesting because it's clearly exciting & leading edge tech. unlike most Tesla tech which ultimately has consumers using it, where we all get to assess strengths & weaknesses, this tech is going to remain inside the Tesla castle, unviewable, unassessable. we'll probably never know what real strengths or weaknesses it has, never understand all the ways it doesn't well, or as well as competitors. it's going to remain an esoteric dollop of computing.

1 more reply

danso4y ago

I confess I have a reflexive skepticism to the idea that Tesla's achievements (and struggles) in car manufacturing would translate to any kind of lead in chip design and manufacturing. How long did it take Apple from planning to rollout for M1? And the Tesla chip seems to be making bigger revolution-sized claims?

boardwaalk4y ago

Tesla is already shipping their own chips in every car. And that’s a better comparison (it’s an end user thing you can buy) than this data center processor. It’s hard to compare vs, say, Nvidia’s car computers because it’s all locked down. But I believe the energy efficiency is fairly good.

lpapez4y ago

2022 will surely be the year of Linux on Desktop and fully self driving cars.

immmmmm4y ago

not fully related but i was doing some reading on various "new sustainable ways of transportation" and, since they're building the biggest hyperloop test track near my place, i found this interesting video of some of problems one might get trying to put vacuum in a pipe:

https://youtu.be/Zz95_VvTxZM

CasillasQT4y ago

"We believe it makes sense for Tesla to pour as much capital as needed into winning the Robotaxi race and catch up to these two". That has to be a joke right?

rvz4y ago

At this point, everyone knows the whole thing is a joke. The robotaxi race was supposed to be already finished by 2020 alongside with FSD at Level 5 - today it is still Level 2.

So, where are the so-called robotaxis?

CasillasQT4y ago

Did you see the presentation from Karpathy? Tesla goes for a general vision only end-to-end deep AI model that could in theory get rolled out everywhere on earth with enough training and a good approach for fast edge-case solving, which they showed how this can be accomplished. All the other players try to solve this with lidar and cars that cost around 500k to build and they have pretty much 0 data except for the maps they generate themselves. This approach will never solve L5. Tesla may need another 10 years, but they are so far out of reach of the other players that you cant even call them competition at this point.

michaelt4y ago

Waymo have all the StreetView data - practically every road in the country, complete with lidar and high-precision GPS. All labelled in detail by recaptcha 'volunteers'

And lidar wouldn't be expensive if manufactured in automotive volumes. Certainly less, per vehicle, than Musk charges people for "full self driving" at the moment.

California allows autonomous vehicles to be tested on the road, so long as every disengagement is reported (along with total miles driven etc). Waymo is testing, reporting mileage and disengagements. So are Toyota, Nvidia, Mercedes, BMW, Cruise, Lyft and Apple.

Guess who's too shy to have driven a single autonomous mile in California, where faults have to be reported? That's right, Tesla!

Tesla might be able to make vision-only driving work. But Musk has been promising deadlines then failing to achieve them for years. They've put all their chips on 'no lidar' and they've had a bunch of problems that lidar could trivially solve - such as detecting a fire truck or concrete barrier right in front of the vehicle. So it's far from obvious to me that they've got a winning approach.

josefx4y ago

> with enough training and a good approach for fast edge-case solving

Apparently they have neither as they have missed their deadline three years ago and continue to miss it every year since.

> All the other players try to solve this with lidar and cars that cost around 500k to build

Citation needed?

> Tesla may need another 10 years

What has you this pessimistic? Tesla promises full self driving by the end of the year every year. Are you saying a random commenter on the internet knows more about the state of their AI then they do?

rvz4y ago

So catching up to Comma.ai then (which is also Level 2), except that there is no driver monitoring that makes sure that the driver has their eyes on the road. That can't be good, but I guess once again Comma.ai and similar systems were right again.

> This approach will never solve L5.

Again, Tesla advertised FSD as Level 5 and ready for completion with the robotaxis for 2020. Sounds like it was falsely advertised right?

jowday4y ago

Except they’re nowhere near using an end-to-end model - their perception and planning stack still looks the same as every other self driving company, but with incredibly degraded performance since they’re using nothing but cameras and flakey maps. Tesla reverse engineers have uncovered the internals of the FSD beta - you can check it for yourself.

“they have pretty much 0 data except for the maps they generate themselves”

What do you mean by this? You realize the bottleneck for training data generation is always human labeling, not raw amount of data, right?

alpaca1284y ago

> All the other players try to solve this with lidar and cars that cost around 500k to build

Comma has been using that approach from the start with a cheap smartphone-like device.

baybal24y ago

> Tesla goes for a general vision only end-to-end deep AI model that could in theory get rolled out everywhere on earth with enough training and a good approach for fast edge-case solving,

No amount of "training" can fix the problem of "AI" not being AI

These people have very poor idea what they are talking about when they say the phrase "artificial intelligence." It's a clear misuse

1 more reply

eddanger4y ago

Folks understand L1 and L5. The levels between are such a blur and mix-match of things that I don't think anything is even accomplished having these levels. I agree with you, from what I have seen Tesla is far ahead and rapidly progressing using the right approach of training NNs while having a human behind the wheel.

2 more replies

plutonorm4y ago

So happy to meet someone else who gets it on hacker news.

1 more reply

j / k navigate · click thread line to collapse

107 comments

jvanderbot4y ago

Specifically, I wonder if, for the cost and time spent on CPU-and-engineer-driven research and development of autonomous cars, if we could have had nationwide autonomous rail rolled out by now.

dragontamer4y ago

> could we have had an autonomous rail system by now?

The train conductor has become more-and-more automated to remove the chance of human error. It works with a system of very reliable sensors that indicate where every train engine is on the rails.

llsf4y ago

Paris got its first autonomous metro line in 1998. It was developed in ADA. http://archive.adaic.com/projects/atwork/paris.html

Using the B method: https://en.wikipedia.org/wiki/B-Method

https://link.springer.com/content/pdf/10.1007%252F3-540-4811...

https://arxiv.org/pdf/2005.07190.pdf

dragontamer4y ago

> Obviously with a ML/AI approach it would be different now.

Is it? The theory of block signals and path signals don't change because AI was invented. And I have my doubts that AI could do better than path-signal algorithms.

https://en.wikipedia.org/wiki/Railway_signalling

jvanderbot4y ago

I wonder if, for the cost spent on CPU-and-engineer-driven research and development of autonomous cars, if we could have had nationwide autonomous rail rolled out.

dimitrios14y ago

1 more reply

dnautics4y ago

steveBK1234y ago

No because its a political and labor relations problem not a cost problem

dragontamer4y ago

https://railroads.dot.gov/train-control/ptc/positive-train-c...

We got that. We literally got that.

1 more reply

foota4y ago

Probably not, that's likely a hundreds of billions of dollars investment, whereas I imagine self driving investment totals a couple tens at the most.

samstave4y ago

Remember that derailment/near derailment that happened when someone cut the electrical circuit connector cable on some rails, which then prevented the system for knowing where the train was?

dragontamer4y ago

PTC, from my understanding, is that the train gets "positive orders" from a central server. The central server says "It is safe to move through X", and the train then moves through X.

If the sensor in X is broken, then the server will say "I cannot prove it is safe to move through X", and the train will stop.

1 more reply

zdragnar4y ago

We could have possibly automated some existing rail, but I am not altogether certain that doing so would have lead to any significant improvements in cost or efficiency.

Actually laying the infrastructure for mass transit via rail is an entirely different league of cost from what has been dumped into self driving cars.

We have a hard enough time agreeing on how to do light rail transit in places that want it, and then actually getting it done.

jvanderbot4y ago

I'm not altogether convinced that the money and effort spent on self-driving cars has led or will lead to any significant improvements in cost or efficiency either.

Even if it does succeed, it seems to be about convenience anyway.

zdragnar4y ago

All of this, of course, assumes fully autonomous driving. I'm somewhat sceptical that we'll actually get there, myself.

jeffbee4y ago

mynameisvlad4y ago

Also, it was a result of a mass shooting. Have some tact.

wumpus4y ago

VTA restarted service a couple of days ago.

deeviant4y ago

It seems to me that bringing autonomy to rail would have little on it's bottom line.

vishnugupta4y ago

> build a completely different device that solves the same problem

I realised this while discussing self-driving cars with my friends.

I used example of Uber Eats. The problem statement is "I don't want to cook" and a reasonably acceptable solution IMO is cloud kitchens + delivery. As opposed to building a cooking robot.

Cloud kitchens could automate 80% of repeatable stuff because it makes sense to solve that problem at scale.

stcredzero4y ago

dragontamer4y ago

I'm pretty sure you're just reinventing Domino's pasta line.

1 more reply

whatshisface4y ago

The labor saving advantages of carrying 100 people on the same vehicle are so enormous that there is little motivation if any to quit paying conductors and engineers.

samstave4y ago

The profit motivation of having 100 people being required to buy gas is the anti-motivation in capitalist oligarchies

the_third_wave4y ago

samstave4y ago

I've always wondered if TCP and networking design could be applied to autonomous traffic... basically think of every car/train as a packet and ensure no collisions...

Which networking protocol best maps to this?

And what if we had smart traffic lights that were aware of every car in an surrounding area of an intersection...

I mean FFS certain tech companies track all vehicles that drive by/near their corporate campuses and report that back to the city...

And that's almost a decade old now...

So apply the same but report the data back to the traffic management system which is also trained on all the traffic patterns for a given intersection to best optimize for their patterns...

hamiltonkibbe4y ago

fanf24y ago

Ethernet has not worked like that since the 1990s.

Even WiFi avoids relying on collision detection by routing client-to-client traffic via the AP instead of being peer-to-peer.

rini174y ago

Large part of TCP is dealing with dropped packets. Presumably you don't want that to carry that over to rail/road traffic :)

samstave4y ago

HAHAH yes, I know -- I just didnt have a better example protocol... TCP is "ensuring the traffic gets there" -- not so much "ensuring no collisions" that was why I chose that one.

Maybe BGP instead? "Route failed - go this way instead, as a backup"

1 more reply

fanf24y ago

The London Docklands Light Railway (DLR) has had fully automatic running since 1987 https://en.wikipedia.org/wiki/Docklands_Light_Railway

HPsquared4y ago

We could go the other way and have humanoid robot drivers that can get in and drive any car. Now that'd be difficult!

dragontamer4y ago

Somehow, I'm reminded of the Tsar tank from WW1. The Russians knew that a new weapon of war: an armored car, was necessary to break the stalemate of trench warfare.

This hypothetical armored car needed many features: the most important was that it must be able to move across the muddy no man's land reliably.

https://en.m.wikipedia.org/wiki/Tsar_Tank

The French design was to use caterpillar tracks. We know what works now since we have a century of hindsight.

--------

zaptrem4y ago

I agree with your points on the robotaxi front, but there are many other problems that will totally benefit from a bigger training computer.

dragontamer4y ago

But Tesla isn't a cloud-provider company, nor is it a hardware company. None of the technical specs, assembly language, API, SDKs or whatnot have been released for Dojo.

Its a pretty crowded field. Someone probably has struck gold (NVidia has a lead but... its still anyone's game IMO)

-------

jeffbee4y ago

You forgot the Hauwei Ascend 910. 4096 of them in a rack is an easy exaflop.

1 more reply

baybal24y ago

> many other problems that will totally benefit from a bigger training computer.

I don't really think it's that many.

The industry collectively sank untold billions into the blind belief that neural algorithms will somehow turn into "AI."

10 years later, no "AI," and not even a single money making niche use.

gpm4y ago

> not even a single money making niche use.

There are tons of money making niche uses of neural networks. From the branch predictor on your CPU, to trading on the stock market, to image-search engines.

1 more reply

atty4y ago

stcredzero4y ago

The industry collectively sank untold billions into the blind belief that neural algorithms will somehow turn into "AI."

The huge qualitative differences between GPT-2 and GPT-3 seem to suggest that they will, if you just keep adding orders of magnitude more connections and more data.

https://www.youtube.com/watch?v=_8yVOC4ciXc

deepnotderp4y ago

What? Pretty much every major Internet company extensively uses NNs. Just recently there was a great published use case by Google using GNNs for travel time predictions

justapassenger4y ago

> Of this competition, only Google and Nvidia have supercomputers that stand toe to toe with the Tesla’s

solidasparagus4y ago

I assumed they're talking about Tesla's A100 cluster, which is huge - https://blogs.nvidia.com/blog/2021/06/22/tesla-av-training-s...

Tesla's compute-to-researcher ratio is definitely rare

jeffbee4y ago

https://github.com/mlcommons/training_results_v1.0/tree/mast...

thesausageking4y ago

The Q&A section on their compiler and software that the author links to is very interesting:

https://www.youtube.com/watch?v=j0z4FweCy4M&t=8047s

Nvidia gets a lot of credit for their hardware advances, but what really what their chips work so well for deep learning was the huge software stack they created around CUDA.

Underestimating the software investment required has plagued a lot of AI chip startups. It doesn't sound like Tesla is immune to this.

ggoo4y ago

Tesla's claim to delivery ratio is abysmal. I'm not sure why anybody even bothers deconstructing these presentations anymore, they're just fluff.

snorrah4y ago

stcredzero4y ago

Tesla's claim to delivery ratio is abysmal.

Can you substantiate this concretely? How about a list, with direct sources? (Not opinion pieces.)

gooseus4y ago

That's just from the article; off the top of my head:

* NYC to LA fully autonomous drive by 2017.

* 1M Robotaxis on the road by 2021.

* Hyperloop.

* Solar roof tiles.

* All superchargers will be solar-powered.

* Tesla Semi.

stcredzero4y ago

ggoo4y ago

https://en.wikipedia.org/wiki/Criticism_of_Tesla,_Inc.

michelpp4y ago

Clearly a a shot across the bow for Cerebras and another excellent target for the GraphBLAS.

2bitencryption4y ago

from the article:

I kind of wonder if Tesla is building the Juicero of self-driving. [0]

Beautifully designed. An absolute marvel of engineering. The result of brilliant people with tons of money using every ounce of their knowledge to create something wonderful.

Except... you could just squeeze the bag. You could just use LIDAR. You could just use your hands to squish the fruit and get something just as good. You could just (etc etc).

[0] https://www.theguardian.com/technology/2017/sep/01/juicero-s...

arnaudsm4y ago

KaiserPro4y ago

> LIDAR is 10x more expensive than RGB

but pure RGB needs $millions to make a reliable realtime depth sensor, plus custom silicon and a massive annotated dataset.

It might just be that one company can do it, but its a hefty gamble.

nightski4y ago

Everyone acts like LIDAR is the holy grail but then why isn't there someone destroying Tesla with that tech? Waymo is not much farther along than Tesla, maybe even behind as far as miles driven.

If that was all that was needed then it would be done.

modeless4y ago

cr4zy4y ago

zozbot2344y ago

m3kw94y ago

All I see is [techno terms].. impressive engineering.. lots of problems need to be solved first..2022..on paper toe to toe with Nvidia.. calm the hype.

thunkshift14y ago

Const-me4y ago

> they have 1.25MB of SRAM and 1TFlop of FP16/CFP8… This is woefully unequipped for the level of performance they want to achieve.

Any idea how OP made that conclusion?

My GeForce 1080Ti has 1.3MB of in-core L1 caches (28 streaming multiprocessors, 48kb L1 each). It also has L2 but not too large, slightly under 3MB for the whole chip.

The GPU delivers about 10 TFlops of FP32 which needs 2x the RAM bandwidth of FP16. I’m generally OK with the level of performance, at least until the GPU shortage is fixed.

neolefty4y ago

Any "astute readers" here who know who the partner would be?

Melkman4y ago

I'd guess Broadcom.See also https://www.broadcom.com/info/optics/silicon-photonics

wumpus4y ago

Looks like the tech that Intel has been attempting to get to work and be cost effective for several decades.

eddanger4y ago

My astute guess is ASML.

TomVDB4y ago

That astute guess is so out of left field that it needs at least a bit of justification.

To my limited ASML knowledge, ASML is a fab equipment maker, not a silicon IP procider.

gimmeThaBeet4y ago

_nalply4y ago

My curiosity got piqued at the mention of CFP8 (configurable floating point 8), but googling this didn't yield usable information.

What exactly is CFP8? How many bits does one instance of CFP8 use? What mathematical operations are supported? How does one configure the floating point?

_nalply4y ago

I found about posits.

https://www.johndcook.com/blog/2018/04/11/anatomy-of-a-posit...

Perhaps CFP8 are parameterized 8-bit posits where the parameter is the value es. The larger es is, the greater the dynamic range is at the expense of precision. Two examples:

posit<8, 0> (es = 0) has as largest positive number 64 and the smallest positive number 1/64.

posit<8, 1> (es = 1) has as largest positive number 4012 and the smallest positive number 1/4012.

The formula for the largest positive number for 8-bit posits is:

2 ^ 2 ^ es ^ 6.

posits don't have NaNs and only one infinity (±∞), so they can use more of the 8 bit values for numbers than floating point numbers.

I wonder: is CFP8 = posit<8, es>?

scardycat4y ago

rektide4y ago

some discussion yesterday, https://news.ycombinator.com/item?id=28361807

1 more reply

danso4y ago

boardwaalk4y ago

lpapez4y ago

2022 will surely be the year of Linux on Desktop and fully self driving cars.

immmmmm4y ago

https://youtu.be/Zz95_VvTxZM

CasillasQT4y ago

"We believe it makes sense for Tesla to pour as much capital as needed into winning the Robotaxi race and catch up to these two". That has to be a joke right?

rvz4y ago

At this point, everyone knows the whole thing is a joke. The robotaxi race was supposed to be already finished by 2020 alongside with FSD at Level 5 - today it is still Level 2.

So, where are the so-called robotaxis?

CasillasQT4y ago

michaelt4y ago

Waymo have all the StreetView data - practically every road in the country, complete with lidar and high-precision GPS. All labelled in detail by recaptcha 'volunteers'

And lidar wouldn't be expensive if manufactured in automotive volumes. Certainly less, per vehicle, than Musk charges people for "full self driving" at the moment.

Guess who's too shy to have driven a single autonomous mile in California, where faults have to be reported? That's right, Tesla!

josefx4y ago

> with enough training and a good approach for fast edge-case solving

Apparently they have neither as they have missed their deadline three years ago and continue to miss it every year since.

> All the other players try to solve this with lidar and cars that cost around 500k to build

Citation needed?

> Tesla may need another 10 years

rvz4y ago

> This approach will never solve L5.

Again, Tesla advertised FSD as Level 5 and ready for completion with the robotaxis for 2020. Sounds like it was falsely advertised right?

jowday4y ago

“they have pretty much 0 data except for the maps they generate themselves”

What do you mean by this? You realize the bottleneck for training data generation is always human labeling, not raw amount of data, right?

alpaca1284y ago

> All the other players try to solve this with lidar and cars that cost around 500k to build

Comma has been using that approach from the start with a cheap smartphone-like device.

baybal24y ago

> Tesla goes for a general vision only end-to-end deep AI model that could in theory get rolled out everywhere on earth with enough training and a good approach for fast edge-case solving,

No amount of "training" can fix the problem of "AI" not being AI

These people have very poor idea what they are talking about when they say the phrase "artificial intelligence." It's a clear misuse

1 more reply

eddanger4y ago

2 more replies

plutonorm4y ago

So happy to meet someone else who gets it on hacker news.

1 more reply

j / k navigate · click thread line to collapse