The Future of Hardware Is Software (opens in new tab)

(octoml.ai)

60 pointsblueplastic4y ago33 comments

33 comments

"The Future of Hardware Is Software" ... says everyone that wants to sell you AI software on commoditized hardware.

Yet here I am, I recently won an AI competition by reviving an old 2005 algorithm and just using the fact that compute power has 7000x-ed since then (from 5 GFLOPS on a P4 to 35 TFLOPS on a 3090). No AI was needed.

And now I'm building custom electronics for a new type of 3D camera because even after years of AI and deep learning research, structured light and/or stereoscopic 3D depth estimation is still unusable. Try training a NERF from "only" 3 UHD images and you know what I mean.

metadat4y ago

You're such a tease!

Seriously, please share the deets on this 2005 "junker" algo.

It was super unsatisfying to read your post with so much interesting information glossed over.

metadat4y ago

FYI fxtentacle, people keep upvoting this comment- there seems to be enormous interest, it's already at +17.

Will be sad if we never hear from you.

fxtentacle4y ago

Sorry for the late reply. I got frustrated trying to figure out how to improve our Bomberland AI and decided to spend the rest of the day building lamps and shadow caster shapes in Lego ^_^

By now, I'm down to 5th place on the Sintel Clean rankings: http://sintel.is.tue.mpg.de/quant?metric_id=6&selected_pass=... but my entry H-v3 was 1st place when I submitted it. The algorithm is

Mota C., Stuke I., Aach T., Barth E. Divide-and-Conquer Strategies for Estimating Multiple Transparent Motions. In: Jähne B., Mester R., Barth E., Scharr H. (eds) Complex Motion. IWCM 2004.

https://doi.org/10.1007/978-3-540-69866-1_6

(so I misremembered the year. it was end of 2004 instead of 2005)

I did tweak it in a few details such as using a 5x5px ica instead of the constant brightness assumption but mainly I replaced the gauss seidel iteration (12) with brute forcing (10) so in effect I'm approximating the c* with Monte Carlo sampling on the GPU. Then as the last step, I use LUTs to fill in gaps in the prediction with their maximum likelihood prior as memorized from a large collection of real-world flow maps.

BTW as luck would have it, we are currently leading Bomberland (team CloudGamepad) with a deep learning AI trained for more than 200 million simulation steps. Yet JFB (the 2nd ranked team) uses handcrafted C++ rules and they beat us every time. It's just that against other opponents our probabilistic AI is random enough to confuse them, which is why we're still barely on the 1st place. But unless we can significantly improve things soon, I expect us to lose the tournament later this month because we will not be able to beat JFB in a fair duel. I bet on deep learning here and I'm already regretting it.

I'll reply about the camera to TaylorAlexander

1 more reply

TaylorAlexander4y ago

> I'm building custom electronics for a new type of 3D camera

I would love to know more. I am working on an open source farming robot and vision is an important component. Are you able to share more?

fxtentacle4y ago

We're using the camera for an autonomous toy car racer, so I need reliable and real-time depth estimates. Existing cameras such as the Stereolabs ZED max out at 1080p @ 30 fps and they use rolling shutter which isn't even perfectly hardware-synchronized. Plus those sensors are tiny and, hence, as noisy as a laptop webcam.

The result is that the Stereolabs AI needs to be extremely lenient when doing the stereo matching because objects will almost never look exactly the same in both images, be it due to the noise or the rolling shutter skew. If I see a pattern repeat itself on both images with 5% RGB intensity, then on the Stereolabs ZED I need to ignore that, because it's most likely just sensor noise. If the image was almost noise-free, then I could treat this pattern as a reliable correspondence and triangulate depth from it.

Also, tracking fast movements at 30 fps is really difficult, due to the large movement offsets. If you scan for them, you need lots of compute power and you risk recognizing repetitive patterns as fast movement.

If you increase the hardware from 1080p to 4K, from 30 FPS to 120 FPS, from "really noisy" to "practically noise-free", and from "rolling shutter" to "hardware-synchronized global shutter", then suddenly you have 4x the data to make a decision on, all your offsets are 4x smaller due to higher FPS, and you can treat much weaker patterns as reliable.

And all that together means that surfaces like reflective wooden floor are now doable. Whereas before, most of the visible patterns would drown in sensor noise.

EDIT: And maybe one more comment: Our camera uses USB3 10gbit/s with a high-speed FPGA and it was completely designed in the excellent open-source KiCad. I even forked it to make things look nicer and more like Altium: https://forum.kicad.info/t/kicad-schematics-font-is-a-deal-b...

1 more reply

imperialdrive4y ago

I appreciate this train of thought and enjoy seeing it in action by others. Kudos!

aidenn04y ago

> A classic example is Itanium: it is only a historical footnote today, but Itanium’s explicit parallelism and focus on scalability once made it look like the future of CPUs. The problem was never the hardware itself—it was difficult compilation and backward compatibility with the x86 software ecosystem that doomed Itanium.

The problem with the Itanium was the hardware itself. Finding sufficient ILP on general purpose loads for a VLIW like Itanium is an unsolved problem in compiler design. Saying the problem with Itanium was software would be like entering a drag-racer in formula 1 and saying the problem was that the drivers weren't good enough at steering.

xscott4y ago

There are a lot of important numerical algorithms which would have really benefited if Itanium had gone through iteration and growth. A mainstream VLIW could've had it's place, and it's trivial to find parallelism in FFTs, SVDs, matrix multiplies, and so on.

To me, there is a spectrum of parallelism on the desktop:

    multi-server,
    multi-process,
    multi-threaded (shared mem),
    <Itanium would go here>,
    SIMD instructions

Yeah, Itanium might have required assembly to exercise that niche, and maybe new programming languages would've come about. There has to be some middle ground between Verilog/VHDL and C, right? Maybe a CUDA-like language could've done the trick (it certainly works for GPUs).

I think it's a shame Itanium failed, and I think it failed for the wrong reasons. At the time, I remember everyone criticizing it for not running legacy x86 applications very well. As though word processor, spreadsheet, and presentation software wasn't fast enough. Saying legacy apps in existing languages don't make it easy to find the ILP seems like a slight generalization of that.

The AMD64 ISA (which is what really killed Itanium) was a blessing and a curse. It made x86 just better enough to not be awful, but it killed desktop/server alternatives for at least 25 years. Maybe ARM will make inroads, but it isn't that much better either.

aidenn04y ago

> There are a lot of important numerical algorithms which would have really benefited if Itanium had gone through iteration and growth. A mainstream VLIW could've had it's place, and it's trivial to find parallelism in FFTs, SVDs, matrix multiplies, and so on.

DSPs (which have great perf/watt for the numerical algorithms you mention) have used VLIW for decades, so of course there is a place for it. GPUs have moved in for all of those operations at this point though. The bet with Itanium was that compilers could be made sufficiently smart to make VLIW work for non-numeric workloads, and that bet failed to pay off. Intel and HP had hundreds of smart people trying to solve the "software problem" of Itanium and they did not succeed.

> I think it's a shame Itanium failed, and I think it failed for the wrong reasons. At the time, I remember everyone criticizing it for not running legacy x86 applications very well. As though word processor, spreadsheet, and presentation software wasn't fast enough. Saying legacy apps in existing languages don't make it easy to find the ILP seems like a slight generalization of that.

Desktop applications is a red-herring given that Itanium was targeted primarily at the workstation and server market. There was also a bad-timing issue as it was at about the same time that PC hardware was displacing dedicated workstations and server hardware.

3 more replies

deepnotderp4y ago

Also, it’s worth noting that OoO brings more than just ILP/scheduling, it also brings MLP and dynamism. Take for instance latency hiding a cache miss or a mispredicted branch. Stuff like this is impossible to know in advance, no matter how much you redesign your language to expose ILP.

jdsully4y ago

>A mainstream VLIW could've had it's place, and it's trivial to find parallelism in FFTs, SVDs, matrix multiplies, and so on.

There are already DSPs for this purpose, but typical server workloads don't generally use those algorithms. Perhaps Itanium would have made a good DSP but it wasn't really aimed at that market.

1 more reply

deepnotderp4y ago

Itanium got maximum penetration in HPC, so people were aware of this. The challenge is that GPUs and DSPs (many are VLIW) are even better at parallel.

msandford4y ago

And the future of software is hardware. Which way is the pendulum swinging now? From which perspective? Why?

If you've been around for a swing or two this is nothing new. If not, it's earth shattering.

Anyone remember thick clients, then thin clients, and now thick clients again? Anyone want to guess when mobile-first starts becoming web-first?

blueplasticOP4y ago

Agreed. I don't think the point of the article to is to say that this is a never-before-seen type of event, but rather that the landscape is shifting again in the hardware space, as it does maybe once every couple of decades... and that software (compilers specifically) are going to be needed to enable and accelerate the shift of many workloads to ML Accelerators.

karmakaze4y ago

I don't buy this inevitable future where the best software runs on the best hardware because that's obviously optimal. How many times has history played out that way vs large entities entrenched in their moat of hw, sw, or a mix?

I mean what's the closest that we have to a good example, maybe ARM hardware running Linux? What about mobile, we have Android Open Source Project which is a bit early to see what it will amount to. I still hope and wait, but wouldn't bet on it.

tw044y ago

Ahh, yes, back to 2010 when everyone told companies like Hitachi they were doing storage wrong by relying on custom ASICs.

Meanwhile Google and Facebook and Amazon are making hardware offload engines because they've figured out there's a limit to the performance of general purpose CPUs and it's a lot of wasted power.

You can't have it both ways, efficiency and speed or flexibility, choose one.

AtlasBarfed4y ago

I was going to write this in the rant article about React and how everything is rewrapped bloat.

Yeah I think it will be the opposite in the medium term future.

Moore's law can't last forever, the slowdown has already occurred, and then you'll need two things for a couple generations to get better:

1) code optimization / stack reduction / api efficiency / less abstraction

2) moving software to hardware to get that sweet speedup and efficiency

mikesabbagh4y ago

You remember the modular phone where you can replace parts and not replace everything when you want to update your phone? This hardware failure tells you that you need to use the latest hardware to run the latest software.

dusted4y ago

Nah, the future of software is hardware.

synergy204y ago

It already is, every hw/chip designer is pretty much a software engineer coding in verilog(c-like), all testbench/verification is also pure software, designer normally uses powerful CAD software daily, the hardware part for many designers are minimal.

the true hardware is those who design the boards (PCB), which, before COVID, was mostly outsourced to China, I'm unsure if this "hardware" will ever move back.

j / k navigate · click thread line to collapse

33 comments

fxtentacle4y ago

"The Future of Hardware Is Software" ... says everyone that wants to sell you AI software on commoditized hardware.

metadat4y ago

You're such a tease!

Seriously, please share the deets on this 2005 "junker" algo.

It was super unsatisfying to read your post with so much interesting information glossed over.

metadat4y ago

FYI fxtentacle, people keep upvoting this comment- there seems to be enormous interest, it's already at +17.

Will be sad if we never hear from you.

fxtentacle4y ago

Sorry for the late reply. I got frustrated trying to figure out how to improve our Bomberland AI and decided to spend the rest of the day building lamps and shadow caster shapes in Lego ^_^

By now, I'm down to 5th place on the Sintel Clean rankings: http://sintel.is.tue.mpg.de/quant?metric_id=6&selected_pass=... but my entry H-v3 was 1st place when I submitted it. The algorithm is

Mota C., Stuke I., Aach T., Barth E. Divide-and-Conquer Strategies for Estimating Multiple Transparent Motions. In: Jähne B., Mester R., Barth E., Scharr H. (eds) Complex Motion. IWCM 2004.

https://doi.org/10.1007/978-3-540-69866-1_6

(so I misremembered the year. it was end of 2004 instead of 2005)

I'll reply about the camera to TaylorAlexander

1 more reply

TaylorAlexander4y ago

> I'm building custom electronics for a new type of 3D camera

I would love to know more. I am working on an open source farming robot and vision is an important component. Are you able to share more?

fxtentacle4y ago

And all that together means that surfaces like reflective wooden floor are now doable. Whereas before, most of the visible patterns would drown in sensor noise.

1 more reply

imperialdrive4y ago

I appreciate this train of thought and enjoy seeing it in action by others. Kudos!

aidenn04y ago

xscott4y ago

To me, there is a spectrum of parallelism on the desktop:

    multi-server,
    multi-process,
    multi-threaded (shared mem),
    <Itanium would go here>,
    SIMD instructions

aidenn04y ago

3 more replies

deepnotderp4y ago

jdsully4y ago

>A mainstream VLIW could've had it's place, and it's trivial to find parallelism in FFTs, SVDs, matrix multiplies, and so on.

There are already DSPs for this purpose, but typical server workloads don't generally use those algorithms. Perhaps Itanium would have made a good DSP but it wasn't really aimed at that market.

1 more reply

deepnotderp4y ago

Itanium got maximum penetration in HPC, so people were aware of this. The challenge is that GPUs and DSPs (many are VLIW) are even better at parallel.

msandford4y ago

And the future of software is hardware. Which way is the pendulum swinging now? From which perspective? Why?

If you've been around for a swing or two this is nothing new. If not, it's earth shattering.

Anyone remember thick clients, then thin clients, and now thick clients again? Anyone want to guess when mobile-first starts becoming web-first?

blueplasticOP4y ago

karmakaze4y ago

tw044y ago

Ahh, yes, back to 2010 when everyone told companies like Hitachi they were doing storage wrong by relying on custom ASICs.

Meanwhile Google and Facebook and Amazon are making hardware offload engines because they've figured out there's a limit to the performance of general purpose CPUs and it's a lot of wasted power.

You can't have it both ways, efficiency and speed or flexibility, choose one.

AtlasBarfed4y ago

I was going to write this in the rant article about React and how everything is rewrapped bloat.

Yeah I think it will be the opposite in the medium term future.

Moore's law can't last forever, the slowdown has already occurred, and then you'll need two things for a couple generations to get better:

1) code optimization / stack reduction / api efficiency / less abstraction

2) moving software to hardware to get that sweet speedup and efficiency

mikesabbagh4y ago

dusted4y ago

Nah, the future of software is hardware.

synergy204y ago

the true hardware is those who design the boards (PCB), which, before COVID, was mostly outsourced to China, I'm unsure if this "hardware" will ever move back.

j / k navigate · click thread line to collapse