Yet here I am, I recently won an AI competition by reviving an old 2005 algorithm and just using the fact that compute power has 7000x-ed since then (from 5 GFLOPS on a P4 to 35 TFLOPS on a 3090). No AI was needed.
And now I'm building custom electronics for a new type of 3D camera because even after years of AI and deep learning research, structured light and/or stereoscopic 3D depth estimation is still unusable. Try training a NERF from "only" 3 UHD images and you know what I mean.
Seriously, please share the deets on this 2005 "junker" algo.
It was super unsatisfying to read your post with so much interesting information glossed over.
Will be sad if we never hear from you.
By now, I'm down to 5th place on the Sintel Clean rankings: http://sintel.is.tue.mpg.de/quant?metric_id=6&selected_pass=... but my entry H-v3 was 1st place when I submitted it. The algorithm is
Mota C., Stuke I., Aach T., Barth E. Divide-and-Conquer Strategies for Estimating Multiple Transparent Motions. In: Jähne B., Mester R., Barth E., Scharr H. (eds) Complex Motion. IWCM 2004.
https://doi.org/10.1007/978-3-540-69866-1_6
(so I misremembered the year. it was end of 2004 instead of 2005)
I did tweak it in a few details such as using a 5x5px ica instead of the constant brightness assumption but mainly I replaced the gauss seidel iteration (12) with brute forcing (10) so in effect I'm approximating the c* with Monte Carlo sampling on the GPU. Then as the last step, I use LUTs to fill in gaps in the prediction with their maximum likelihood prior as memorized from a large collection of real-world flow maps.
BTW as luck would have it, we are currently leading Bomberland (team CloudGamepad) with a deep learning AI trained for more than 200 million simulation steps. Yet JFB (the 2nd ranked team) uses handcrafted C++ rules and they beat us every time. It's just that against other opponents our probabilistic AI is random enough to confuse them, which is why we're still barely on the 1st place. But unless we can significantly improve things soon, I expect us to lose the tournament later this month because we will not be able to beat JFB in a fair duel. I bet on deep learning here and I'm already regretting it.
I'll reply about the camera to TaylorAlexander
I would love to know more. I am working on an open source farming robot and vision is an important component. Are you able to share more?
The result is that the Stereolabs AI needs to be extremely lenient when doing the stereo matching because objects will almost never look exactly the same in both images, be it due to the noise or the rolling shutter skew. If I see a pattern repeat itself on both images with 5% RGB intensity, then on the Stereolabs ZED I need to ignore that, because it's most likely just sensor noise. If the image was almost noise-free, then I could treat this pattern as a reliable correspondence and triangulate depth from it.
Also, tracking fast movements at 30 fps is really difficult, due to the large movement offsets. If you scan for them, you need lots of compute power and you risk recognizing repetitive patterns as fast movement.
If you increase the hardware from 1080p to 4K, from 30 FPS to 120 FPS, from "really noisy" to "practically noise-free", and from "rolling shutter" to "hardware-synchronized global shutter", then suddenly you have 4x the data to make a decision on, all your offsets are 4x smaller due to higher FPS, and you can treat much weaker patterns as reliable.
And all that together means that surfaces like reflective wooden floor are now doable. Whereas before, most of the visible patterns would drown in sensor noise.
EDIT: And maybe one more comment: Our camera uses USB3 10gbit/s with a high-speed FPGA and it was completely designed in the excellent open-source KiCad. I even forked it to make things look nicer and more like Altium: https://forum.kicad.info/t/kicad-schematics-font-is-a-deal-b...
The problem with the Itanium was the hardware itself. Finding sufficient ILP on general purpose loads for a VLIW like Itanium is an unsolved problem in compiler design. Saying the problem with Itanium was software would be like entering a drag-racer in formula 1 and saying the problem was that the drivers weren't good enough at steering.
To me, there is a spectrum of parallelism on the desktop:
multi-server,
multi-process,
multi-threaded (shared mem),
<Itanium would go here>,
SIMD instructions
Yeah, Itanium might have required assembly to exercise that niche, and maybe new programming languages would've come about. There has to be some middle ground between Verilog/VHDL and C, right? Maybe a CUDA-like language could've done the trick (it certainly works for GPUs).I think it's a shame Itanium failed, and I think it failed for the wrong reasons. At the time, I remember everyone criticizing it for not running legacy x86 applications very well. As though word processor, spreadsheet, and presentation software wasn't fast enough. Saying legacy apps in existing languages don't make it easy to find the ILP seems like a slight generalization of that.
The AMD64 ISA (which is what really killed Itanium) was a blessing and a curse. It made x86 just better enough to not be awful, but it killed desktop/server alternatives for at least 25 years. Maybe ARM will make inroads, but it isn't that much better either.
DSPs (which have great perf/watt for the numerical algorithms you mention) have used VLIW for decades, so of course there is a place for it. GPUs have moved in for all of those operations at this point though. The bet with Itanium was that compilers could be made sufficiently smart to make VLIW work for non-numeric workloads, and that bet failed to pay off. Intel and HP had hundreds of smart people trying to solve the "software problem" of Itanium and they did not succeed.
> I think it's a shame Itanium failed, and I think it failed for the wrong reasons. At the time, I remember everyone criticizing it for not running legacy x86 applications very well. As though word processor, spreadsheet, and presentation software wasn't fast enough. Saying legacy apps in existing languages don't make it easy to find the ILP seems like a slight generalization of that.
Desktop applications is a red-herring given that Itanium was targeted primarily at the workstation and server market. There was also a bad-timing issue as it was at about the same time that PC hardware was displacing dedicated workstations and server hardware.
There are already DSPs for this purpose, but typical server workloads don't generally use those algorithms. Perhaps Itanium would have made a good DSP but it wasn't really aimed at that market.
If you've been around for a swing or two this is nothing new. If not, it's earth shattering.
Anyone remember thick clients, then thin clients, and now thick clients again? Anyone want to guess when mobile-first starts becoming web-first?
I mean what's the closest that we have to a good example, maybe ARM hardware running Linux? What about mobile, we have Android Open Source Project which is a bit early to see what it will amount to. I still hope and wait, but wouldn't bet on it.
Meanwhile Google and Facebook and Amazon are making hardware offload engines because they've figured out there's a limit to the performance of general purpose CPUs and it's a lot of wasted power.
You can't have it both ways, efficiency and speed or flexibility, choose one.
Yeah I think it will be the opposite in the medium term future.
Moore's law can't last forever, the slowdown has already occurred, and then you'll need two things for a couple generations to get better:
1) code optimization / stack reduction / api efficiency / less abstraction
2) moving software to hardware to get that sweet speedup and efficiency
the true hardware is those who design the boards (PCB), which, before COVID, was mostly outsourced to China, I'm unsure if this "hardware" will ever move back.