Image Compression with Singular Value Decomposition (opens in new tab)

(timbaumann.info)

165 pointsegorpv3y ago43 comments

43 comments

I worked on a very interesting project aligning point clouds using SVD, for a pair of point clouds of the same scene that are not aligned:

- select 3+ pairs of matching points in each cloud (tops of trees, edges of a building etc)

- calculate the vector to the centroid of each cloud

- use SVD to calculate the rotation that gives a minimum distance when applied from the source to the target

- translate and rotate the source cloud to the target

I did this using the rust nalgebra^1 crate after reading a very helpful paper^2 detailing the process. I had planned to build the rust lib into WASM so the process could be run alongside the browser based point cloud visualiser we were using, but had limited time and instead used Neon^3 to build a native binding for our nodejs server to use.

^1 https://docs.rs/nalgebra/latest/nalgebra/linalg/struct.SVD.h...

^2 https://igl.ethz.ch/projects/ARAP/svd_rot.pdf

^3 https://github.com/neon-bindings/neon

foooobaba3y ago

This is cool, I briefly read through the paper, one thing I’m curious of is how hard it would be to add ability to do scale transformation (eg zoom in/zoom out) in addition to translation and rotational transformations. Would it be as simple as just adding a scale factor to the optimization objective and rework a bit of the math?

nighthawk4543y ago

I think you're looking for the Procrustes transform, which uses SVD to optimize a minimal transform for rotation, translation, _and_ scale.

https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem

1 more reply

chpatrick3y ago

https://web.stanford.edu/class/cs273/refs/umeyama.pdf

bob10293y ago

Very interesting. I will be digging into this one. The first thing that popped out at me:

> We can decompose a given image into the three color channels red, green and blue. Each channel can be represented as a (m × n)‑matrix with values ranging from 0 to 255. We will now compress the matrix A representing one of the channels.

I wonder if the author considered converting to YCbCr colorspace first. The luminance component (Y) is substantially more important to the human visual system than the Cb/Cr components. Some subsampling of the chrominance components would probably work well in these schemes.

nwallin3y ago

This leaped out at me too. Doing lossy compression on RGB is not setting yourself up for success. Of course, then you'd need two sliders in the UI; one for how much to compress the Y and one for the CbCr.

SVD also works on complex matrices. I imagine there's value in compressing the subsampled Cb/Cr channels as real/imaginary components in a complex matrix.

tomrod3y ago

The representation by SVD would work almost identically. SVD is one of those absolutely magical, amazing algorithms that power a ton of things we do every day.

wiredfool3y ago

Especially if YCbCr is a linear transform of the RGB. (I'm not sure if it is, but it's likely to be close). If it is, it's essentially just making eigenvectors from a matrix with a different set of bases.

1 more reply

daturkel3y ago

Funny enough, I did this same project (minus the fancy web interface) for a numerical linear algebra course in college—except I had to do it in Matlab.

It's worse than just about any "real" image compression algorithm, but it works! (Plus you get lossless compression if your image is low-rank.)

bee_rider3y ago

It is interesting that lossy compression algorithms are better, despite the Eckart–Young–Mirsky theorem. I guess “best low rank approximation under unitarily invariant norms“ doesn’t mean much to eyeballs though.

sfpotter3y ago

Eckart-Young-Mirsky just relates to approximation by low rank matrices, and it says that the error is best in terms of the operator norm, i.e. the action of the approximation is closest.

When you’re trying to compress an image you’re trying to optimize something quite different, so it is actually not too surprising.

CaptainNegative3y ago

Images have structure that can be exploited, and exploitable structure is everything. A Fibonacci spiral isn't going to have a nice low rank decomposition but it can be specified exactly (and thus reproduced exactly) in a small handful of bytes. EYM is a structure-free guarantee, so we can't expect it to show that SVD would perform better on the natural image manifold than would purpose-built algorithms.

snovv_crash3y ago

Perceptual loss is different from mathematical RMSe loss. There has been a lot of work into making things that decompress into something mathematically completely different, but which to Mk1 Human Eyeball looks very similar.

abetusk3y ago

For those that don't know about it already, Steve Brunton has some good videos on many math topics, including image compression using SVD [0].

[0] https://www.youtube.com/watch?v=QQ8vxj-9OfQ

antegamisou3y ago

A great primer to SVD from AMS:

https://www.ams.org/publicoutreach/feature-column/fcarc-svd

imranq3y ago

SVD is probably the most important theorem in linear algebra. Basically you can take any matrix and find how much it rotates and stretches as a linear transformation

lqr3y ago

It is a tough battle between SVD and the concept of eigenvalues/vectors. SVD is only meaningful for linear operators between inner product spaces, whereas eigenvalues/vectors do not even require a norm. On the other hand, eigenvectors are only meaningful for linear operators from a space to itself.

snovv_crash3y ago

SVD is just an extension of the Eigenvector Decomposition to allow the two orthogonal matrices to not be equal. Think of SVD as Eigenvectors of your data both in a rowwise and colwise perspective and intuitively it works out pretty well.

devit3y ago

Basically this expresses the image as an image where each row is a linear combination of a set of K rows, with different coefficients for each row (or equivalently for columns).

In general it doesn't make sense to compress images this way, since the algorithm is not invariant with respect to 2D image rotation, a very relevant operation for realistic images, but is invariant with respect to row/column permutations, which are not a relevant operation for realistic images.

chalcolithic3y ago

I wonder if it's possible to exploit 2d structure of images more efficiently with this algorithm? Perhaps remapping pixel coordinates somehow..

ant6n3y ago

Mmh, compression ratio doesnt seem super high. I wonder how the values are stored, perhaps floats? Some thoughts:

Maybe using yuv would be better than rgb?

Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers.

Use some generic compression on top.

Is there a smart way to store the basis vectors? Something about angles somehow (analogue to quaternions)? Also, once U is known, isnt the inverse implied? Also, if U is approximated and fixed, perhaps the singular values can be adjusted again to minimize errors.

derf_3y ago

> Is there a smart way to store the basis vectors?

Consider, instead, compressing small blocks of the image instead of a whole image plane at once. Let's say 8x8.

Over such a small region, pixels are usually very highly correlated. You could model them statistically as an order-1 autoregressive process with a high correlation coefficient (e.g., x[i + 1] = x[i]*rho + noise, with rho >= 0.95 usually, separately in both the horizontal and vertical directions).

Then, computing the eigendecomposition of the 8x8 cross-correlation matrix C_{ij} = rho^|i - j| produces a set of common eigenvectors that you can use for all blocks (recall that the left and right singular vectors U and V of a matrix M are just the eigenvectors of M.M^T and M^T.M, respectively). So now instead of several very large vectors, you only need a single 8x8 matrix of basis vectors (because it's square and orthonormal, the inverse is just the transpose).

But wait. Let's take the limit as rho -> 1. It just so happens that the resulting eigenvectors converge to the Type 2 Discrete Cosine Transform [0]. So, in fact, you don't need to transmit the basis at all.

You can store the output of the transform as fixed point numbers with some arbitrary base (the choice controls the amount of compression: anything too small just gets rounded to zero). Maybe use run-length encoding as your generic compression, because you expect to have a lot of zeroes (visit them in a zig-zag order, to get the big values first and maximize the length of the runs of zeroes), with some Huffman coding of the actual symbols. Add a simple predictor for the (0,0) coefficient of each block (it winds up being the average of the pixel values, so it should be pretty similar from block to block).

Congratulations, you just invented JPEG.

[0] http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1672...

klodolph3y ago

I’m guessing there’s just a lot left on the table—

> Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers. Use some generic compression on top.

Codecs like JPEG use a variable amount of quantization. You quantize by scaling the 0..1 float by some scalar, putting it in the range 0..k, and then truncating it to an integer, and encoding the integer. The integer value is encoded using an entropy coder like Huffman. The parameter k must also be encoded somehow, or fixed.

Look up “JPEG coefficient quantization” for how JPEG does it.

Codecs for audio and images are often made up of understandable parts that fit together: transformations, quantization, entropy coding. If you come up with a new transformation, you can make a whole codec by putting together the remaining pieces—but something like a new transformation is itself interesting, because somebody else can always assemble it into a codec if it shows promise.

bob10293y ago

> Codecs like JPEG use a variable amount of quantization

The reason quantization works so well in JPEG is because of the DCT step and its energy compaction properties. This gets most of the coefficients near zero. I think without this transform you would be introducing a lot more noise in the final result.

At some point, we are going to end up re-implementing a thing approximating jpeg here. Colorspace convert, subsampling & DCT+quantization is most of the magic sauce.

1 more reply

kelseyfrog3y ago

Fun fact, you can also do it with various flavors of tensor decomposition[1].

1. http://tensorly.org/stable/auto_examples/decomposition/plot_...

DesiLurker3y ago

Have you done any PSNR analysis & comparison to prevailing standards like webp (or even jpeg)? from the look of it, it seems not very efficient as the compression ratio gets pretty bad for any kind of visually comparable scenarios.

kookamamie3y ago

It's quite crappy, to be honest. Even a simple 4x4 block-based BC7 texture compression (1:4 ratio in ARGB) beats this, hands down.

rullelito3y ago

I remember doing this in uni 14 years ago. Can't remember it being very efficient.

ekianjo3y ago

It's just PCA applied to image compression. Nothing new...

lagrange773y ago

Or POD.

https://en.wikipedia.org/wiki/Proper_orthogonal_decompositio...

frazar03y ago

One issue with SVD is its significant time complexity compared to, for example, the Discrete Cosine Transform used in JPEG

VHRanger3y ago

SVD is used more for mathematical elegance than practicality (like ordinary least squares)

In data science most traditional usecases for SVD are superceded by other algorithms (UMAP is especially popular these days).

sfpotter3y ago

There are loads of numerical algorithms where the SVD is the tool of choice because of its particular optimality properties.

1 more reply

scotty793y ago

Can you build image compression on UMAP?

ahartmetz3y ago

It's funny how the higher compression ratios make everything look similar to a style Gerhard Richter often used. It could be used as an effect in painting apps.

chromatin3y ago

Neat. I remember reading about this technique in my university numerical linear algebra textbook in 2001 =)

outside12343y ago

Crazy how much you can compress the Mondrian with this method with almost no loss

j / k navigate · click thread line to collapse

43 comments

ntr--3y ago

I worked on a very interesting project aligning point clouds using SVD, for a pair of point clouds of the same scene that are not aligned:

- select 3+ pairs of matching points in each cloud (tops of trees, edges of a building etc)

- calculate the vector to the centroid of each cloud

- use SVD to calculate the rotation that gives a minimum distance when applied from the source to the target

- translate and rotate the source cloud to the target

^1 https://docs.rs/nalgebra/latest/nalgebra/linalg/struct.SVD.h...

^2 https://igl.ethz.ch/projects/ARAP/svd_rot.pdf

^3 https://github.com/neon-bindings/neon

foooobaba3y ago

nighthawk4543y ago

I think you're looking for the Procrustes transform, which uses SVD to optimize a minimal transform for rotation, translation, _and_ scale.

https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem

1 more reply

chpatrick3y ago

https://web.stanford.edu/class/cs273/refs/umeyama.pdf

bob10293y ago

Very interesting. I will be digging into this one. The first thing that popped out at me:

nwallin3y ago

SVD also works on complex matrices. I imagine there's value in compressing the subsampled Cb/Cr channels as real/imaginary components in a complex matrix.

tomrod3y ago

The representation by SVD would work almost identically. SVD is one of those absolutely magical, amazing algorithms that power a ton of things we do every day.

wiredfool3y ago

1 more reply

daturkel3y ago

Funny enough, I did this same project (minus the fancy web interface) for a numerical linear algebra course in college—except I had to do it in Matlab.

It's worse than just about any "real" image compression algorithm, but it works! (Plus you get lossless compression if your image is low-rank.)

bee_rider3y ago

sfpotter3y ago

Eckart-Young-Mirsky just relates to approximation by low rank matrices, and it says that the error is best in terms of the operator norm, i.e. the action of the approximation is closest.

When you’re trying to compress an image you’re trying to optimize something quite different, so it is actually not too surprising.

CaptainNegative3y ago

snovv_crash3y ago

abetusk3y ago

For those that don't know about it already, Steve Brunton has some good videos on many math topics, including image compression using SVD [0].

[0] https://www.youtube.com/watch?v=QQ8vxj-9OfQ

antegamisou3y ago

A great primer to SVD from AMS:

https://www.ams.org/publicoutreach/feature-column/fcarc-svd

imranq3y ago

SVD is probably the most important theorem in linear algebra. Basically you can take any matrix and find how much it rotates and stretches as a linear transformation

lqr3y ago

snovv_crash3y ago

devit3y ago

Basically this expresses the image as an image where each row is a linear combination of a set of K rows, with different coefficients for each row (or equivalently for columns).

chalcolithic3y ago

I wonder if it's possible to exploit 2d structure of images more efficiently with this algorithm? Perhaps remapping pixel coordinates somehow..

ant6n3y ago

Mmh, compression ratio doesnt seem super high. I wonder how the values are stored, perhaps floats? Some thoughts:

Maybe using yuv would be better than rgb?

Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers.

Use some generic compression on top.

derf_3y ago

> Is there a smart way to store the basis vectors?

Consider, instead, compressing small blocks of the image instead of a whole image plane at once. Let's say 8x8.

Congratulations, you just invented JPEG.

[0] http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1672...

klodolph3y ago

I’m guessing there’s just a lot left on the table—

> Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers. Use some generic compression on top.

Look up “JPEG coefficient quantization” for how JPEG does it.

bob10293y ago

> Codecs like JPEG use a variable amount of quantization

At some point, we are going to end up re-implementing a thing approximating jpeg here. Colorspace convert, subsampling & DCT+quantization is most of the magic sauce.

1 more reply

kelseyfrog3y ago

Fun fact, you can also do it with various flavors of tensor decomposition[1].

1. http://tensorly.org/stable/auto_examples/decomposition/plot_...

DesiLurker3y ago

kookamamie3y ago

It's quite crappy, to be honest. Even a simple 4x4 block-based BC7 texture compression (1:4 ratio in ARGB) beats this, hands down.

rullelito3y ago

I remember doing this in uni 14 years ago. Can't remember it being very efficient.

ekianjo3y ago

It's just PCA applied to image compression. Nothing new...

lagrange773y ago

Or POD.

https://en.wikipedia.org/wiki/Proper_orthogonal_decompositio...

frazar03y ago

One issue with SVD is its significant time complexity compared to, for example, the Discrete Cosine Transform used in JPEG

VHRanger3y ago

SVD is used more for mathematical elegance than practicality (like ordinary least squares)

In data science most traditional usecases for SVD are superceded by other algorithms (UMAP is especially popular these days).

sfpotter3y ago

There are loads of numerical algorithms where the SVD is the tool of choice because of its particular optimality properties.

1 more reply

scotty793y ago

Can you build image compression on UMAP?

ahartmetz3y ago

It's funny how the higher compression ratios make everything look similar to a style Gerhard Richter often used. It could be used as an effect in painting apps.

chromatin3y ago

Neat. I remember reading about this technique in my university numerical linear algebra textbook in 2001 =)

outside12343y ago

Crazy how much you can compress the Mondrian with this method with almost no loss

j / k navigate · click thread line to collapse