My desk is currently set up such that I have a large monitor in the middle. I'd like to look at the center of the screen when taking calls. I'd also like it to appear as though I am looking straight into the camera, and the camera is pointed at my face. Obviously, I cannot physically place the camera right in front of the monitor as that would be seriously inconvenient. Some laptops solve but I don't think their methods apply here as the top of my monitor ends up being quite a bit higher than what would look "good" for simple eye correction.
I have multiple webcams that I can place around the monitor to my liking. I would like to have something similar to what is seen when you open this webpage, but for a video. hopefully at higher quality since I'm not constrained to a monocular source.
I've dabbled a bit with OpenCV in the past, but the most I've done is a little camera calibration for de-warping fisheye lenses. Any ideas on what work I should look into to get started with this?
In my head, I'm picturing two camera sources: one above and one below the monitor. The "synthetic" projected perspective would be in the middle of the two.
Is capturing a point cloud from a stereo source and then reprojecting with splats the most "straightforward" way to do this? Any and all papers/advice are welcome. I'm a little rusty on the math side but I figure a healthy mix of Szeliski's Computer Vision, Wolfram Alpha, a chatbot, and of course perseverance will get me there.
If you want your head to actually be centered, there are also some "center screen webcams" that exist that plop into the middle of your screen during a call. There are a few types, thin webcams that drape down, and clear "webcam holders" that hold your webcam at the center of your screen, which are a bit less convenient.
Nvidia also has a software package you can use, but I believe it is a bit fiddle to get setup.
I appreciate the pragmatism of buying another thing to solve the problem but I am hoping to solve this with stuff I already own.
I’d be lying if the nerd cred of overengineering the solution wasn’t attractive as well.
Creating a depth field with monocular camera is now possible, so that may help you get further with this.
It should be doable real-time, but might be stuck in the uncanny valley.
Also maybe look at what Meta and Apple's Vision Pro are doing to create their avatars.
This all is well and good when you are just using for a pretty visualization, but it appears gaussians have the same weakness as point clouds processed with structure from motion, in that you need lots of camera angles to get quality surface reconstruction accuracy.
The paper actually suggests the opposite. That gaussian splats actually outperform point clouds and other methods when given the same amount of data. And not just a little bit, but ridiculously so.
Their Gaussian splatting based SLAM variants with RGB-D and RGB (no depth) camera input both outperform essentially everything else and are SOTA (state-of-the-art) for the field. RGB-D obviously outperforms RGB but RGB data when used with gaussian splatting performs comparably to or beats the competition even when they are using depth data.
And not just that but their metrics outperform everything else except for systems operating on literal ground truth data but even then they perform comparably to those ground truth models within a few percent.
And importantly where most other models run at ~0.2-3fps, this model runs several orders of magnitude faster at an average 769fps. While higher fps doesn't mean much past a certain point, importantly this means you can do SLAM on much weaker hardware while still guaranteeing a WCET below the frame time.
So this actually is a massive advancement in the SOTA since gaussians let you very quickly and cheaply approximate a lot of information in a way you can efficiently compare against and refine against the current inputs from sensors.
Direct paper link for ref: https://arxiv.org/pdf/2312.06741
Are there any examples or algorithms that can turn this into 3D objects that could be used in a video game? Any examples of someone doing that?
[0]: https://www.unrealengine.com/marketplace/en-US/product/luma-...