- select 3+ pairs of matching points in each cloud (tops of trees, edges of a building etc)
- calculate the vector to the centroid of each cloud
- use SVD to calculate the rotation that gives a minimum distance when applied from the source to the target
- translate and rotate the source cloud to the target
I did this using the rust nalgebra^1 crate after reading a very helpful paper^2 detailing the process. I had planned to build the rust lib into WASM so the process could be run alongside the browser based point cloud visualiser we were using, but had limited time and instead used Neon^3 to build a native binding for our nodejs server to use.
^1 https://docs.rs/nalgebra/latest/nalgebra/linalg/struct.SVD.h...
> We can decompose a given image into the three color channels red, green and blue. Each channel can be represented as a (m × n)‑matrix with values ranging from 0 to 255. We will now compress the matrix A representing one of the channels.
I wonder if the author considered converting to YCbCr colorspace first. The luminance component (Y) is substantially more important to the human visual system than the Cb/Cr components. Some subsampling of the chrominance components would probably work well in these schemes.
SVD also works on complex matrices. I imagine there's value in compressing the subsampled Cb/Cr channels as real/imaginary components in a complex matrix.
It's worse than just about any "real" image compression algorithm, but it works! (Plus you get lossless compression if your image is low-rank.)
When you’re trying to compress an image you’re trying to optimize something quite different, so it is actually not too surprising.
In general it doesn't make sense to compress images this way, since the algorithm is not invariant with respect to 2D image rotation, a very relevant operation for realistic images, but is invariant with respect to row/column permutations, which are not a relevant operation for realistic images.
Maybe using yuv would be better than rgb?
Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers.
Use some generic compression on top.
Is there a smart way to store the basis vectors? Something about angles somehow (analogue to quaternions)? Also, once U is known, isnt the inverse implied? Also, if U is approximated and fixed, perhaps the singular values can be adjusted again to minimize errors.
Consider, instead, compressing small blocks of the image instead of a whole image plane at once. Let's say 8x8.
Over such a small region, pixels are usually very highly correlated. You could model them statistically as an order-1 autoregressive process with a high correlation coefficient (e.g., x[i + 1] = x[i]*rho + noise, with rho >= 0.95 usually, separately in both the horizontal and vertical directions).
Then, computing the eigendecomposition of the 8x8 cross-correlation matrix C_{ij} = rho^|i - j| produces a set of common eigenvectors that you can use for all blocks (recall that the left and right singular vectors U and V of a matrix M are just the eigenvectors of M.M^T and M^T.M, respectively). So now instead of several very large vectors, you only need a single 8x8 matrix of basis vectors (because it's square and orthonormal, the inverse is just the transpose).
But wait. Let's take the limit as rho -> 1. It just so happens that the resulting eigenvectors converge to the Type 2 Discrete Cosine Transform [0]. So, in fact, you don't need to transmit the basis at all.
You can store the output of the transform as fixed point numbers with some arbitrary base (the choice controls the amount of compression: anything too small just gets rounded to zero). Maybe use run-length encoding as your generic compression, because you expect to have a lot of zeroes (visit them in a zig-zag order, to get the big values first and maximize the length of the runs of zeroes), with some Huffman coding of the actual symbols. Add a simple predictor for the (0,0) coefficient of each block (it winds up being the average of the pixel values, so it should be pretty similar from block to block).
Congratulations, you just invented JPEG.
[0] http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1672...
> Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers. Use some generic compression on top.
Codecs like JPEG use a variable amount of quantization. You quantize by scaling the 0..1 float by some scalar, putting it in the range 0..k, and then truncating it to an integer, and encoding the integer. The integer value is encoded using an entropy coder like Huffman. The parameter k must also be encoded somehow, or fixed.
Look up “JPEG coefficient quantization” for how JPEG does it.
Codecs for audio and images are often made up of understandable parts that fit together: transformations, quantization, entropy coding. If you come up with a new transformation, you can make a whole codec by putting together the remaining pieces—but something like a new transformation is itself interesting, because somebody else can always assemble it into a codec if it shows promise.
The reason quantization works so well in JPEG is because of the DCT step and its energy compaction properties. This gets most of the coefficients near zero. I think without this transform you would be introducing a lot more noise in the final result.
At some point, we are going to end up re-implementing a thing approximating jpeg here. Colorspace convert, subsampling & DCT+quantization is most of the magic sauce.
1. http://tensorly.org/stable/auto_examples/decomposition/plot_...
In data science most traditional usecases for SVD are superceded by other algorithms (UMAP is especially popular these days).