undefined | Better HN

0 pointsMaxious20h ago0 comments

https://mesuvash.github.io/blog/2026/turboquant-interactive/ has a little visualisation

0 comments

Geee7h ago

Is there an error in the visualization? It shows that every vector is rotated the same amount. My understanding was that they are randomized with different values, which results in a predictable distribution, which is easier to quantize.

mesuvash2h ago

That's actually correct and intentional. TurboQuant applies the same rotation matrix to every vector. The key insight is that any unit vector, when multiplied by a random orthogonal matrix, produces coordinates with a known distribution (Beta/arcsine in 2D, near-Gaussian in high-d). The randomness is in the matrix itself (generated once from a seed), not per-vector. Since the distribution is the same regardless of the input vector, a single precomputed quantization grid works for everything. I've updated the description to make this clearer.

Geee1h ago

Thanks. However, from this visualization it's not clear how the random rotation is beneficial. I guess it makes more sense on higher dimensional vectors.

fc417fc8022h ago

I believe they are all rotated by the same random matrix, the purpose being (IIUC) to distribute the signal evenly across all dimensions. So effectively it drowns any structure that might be present in noise. That's essential for data efficiency in addition to avoiding bias related issues during the initial quantization step. However there are still some other issues due to bias that are addressed by a second quantization step involving the residual.

That said, I don't believe the visualization is correct. The grid for one doesn't seem to match what's described in the paper.

Also it's entirely possible I've misunderstood or neglected to notice key details.

Rapzid11h ago

Awesome! So it nudges the vectors into stepped polar rays.. It's effectively angle snapping? Plus a sort of magnitude clustering.

pstoll15h ago

Good post but link at the end is broken.

“”” For the full technical explanation with equations, proofs, and PyTorch pseudocode, see the companion post: TurboQuant: Near-Optimal Vector Quantization Without Looking at Your Data.“

mesuvash2h ago

Author here. Sorry still working on refining the post. Will share once the post is ready.

spencerflem20h ago

I like the visualization, but I don’t understand the grid quantization. If every point is on the unit circle aren’t all the center grid cords unused?

mesuvash2h ago

Yes. Great catch. I simplified the grid just for visualization purpose.

I've updated the visualization. The grid is actually not uniformly spaced. Each coordinate is quantized independently using optimal centroids for the known coordinate distribution. In 2D, unit-circle coordinates follow the arcsine distribution (concentrating near ±1), so the centroids cluster at the edges, not the center.

fc417fc80214h ago

Yeah that's odd. It seems like you'd want an n-1 dimensional grid on the surface of the unit sphere rather than an n dimensional grid within which the sphere resides.

Looking at the paper (https://arxiv.org/abs/2504.19874) they cite earlier work that does exactly that. They object that grid projection and binary search perform exceptionally poorly on the GPU.

I don't think they're using a regular grid as depicted on the linked page. Equation 4 from the paper is how they compute centroids for the MSE optimal quantizer.

Why specify MSE optimal you ask? Yeah so it turns out there's actually two quantization steps, a detail also omitted from the linked page. They apply QJL quantization to the residual of the grid quantized data.

My description is almost certainly missing key details; I'm not great at math and this is sufficiently dense to be a slog.

vincnetas19h ago

i think grid can be a surface of the unit sphere

j / k navigate · click thread line to collapse

0 comments

Geee7h ago

mesuvash2h ago

Geee1h ago

Thanks. However, from this visualization it's not clear how the random rotation is beneficial. I guess it makes more sense on higher dimensional vectors.

fc417fc8022h ago

That said, I don't believe the visualization is correct. The grid for one doesn't seem to match what's described in the paper.

Also it's entirely possible I've misunderstood or neglected to notice key details.

Rapzid11h ago

Awesome! So it nudges the vectors into stepped polar rays.. It's effectively angle snapping? Plus a sort of magnitude clustering.

pstoll15h ago

Good post but link at the end is broken.

“”” For the full technical explanation with equations, proofs, and PyTorch pseudocode, see the companion post: TurboQuant: Near-Optimal Vector Quantization Without Looking at Your Data.“

mesuvash2h ago

Author here. Sorry still working on refining the post. Will share once the post is ready.

spencerflem20h ago

I like the visualization, but I don’t understand the grid quantization. If every point is on the unit circle aren’t all the center grid cords unused?

mesuvash2h ago

Yes. Great catch. I simplified the grid just for visualization purpose.

fc417fc80214h ago

Yeah that's odd. It seems like you'd want an n-1 dimensional grid on the surface of the unit sphere rather than an n dimensional grid within which the sphere resides.

Looking at the paper (https://arxiv.org/abs/2504.19874) they cite earlier work that does exactly that. They object that grid projection and binary search perform exceptionally poorly on the GPU.

I don't think they're using a regular grid as depicted on the linked page. Equation 4 from the paper is how they compute centroids for the MSE optimal quantizer.

My description is almost certainly missing key details; I'm not great at math and this is sufficiently dense to be a slog.

vincnetas19h ago

i think grid can be a surface of the unit sphere

j / k navigate · click thread line to collapse