Consider the following square matrix:
TSLA APPL GOOG MSFT
Alice | 100 5 0 1
Bob | 0 30 100 5
Carol | 2 2 2 2
Dan | 0 0 0 1000
An input vector of stock prices gives an output vector of net worths. However, that is about the only way you can use this matrix. You cannot transform the table arbitrarily and still have it make sense, such as applying a rotation matrix -- it is nonsensical to speak of a rotation from Tesla-coordinates to Google-coordinates. The input and output vectors lacks tensor transformation symmmetries, so they are not tensors.This is also why Principal Component Analysis and other data science notions in the same vein are pseudoscience (unless you evaluate the logarithm of the quantities, but nobody seems to recognize the significance of unit dimensions and multiplicative vs additive quantities)
1. Technically, the table you shared is better thought of as a two-dimensional tensor, rather than a "graph-like matrix" -- which as you point out must be a linear map from a (vector) space to itself.
2. While not technically "Principal Component Analysis", one could do "Singular Value Decomposition" for an arbitrarily shaped 2-tensor. Further, there are other decomposition schemes that make sense for more generic tensors.
3. (Rotations / linear combinations in such spaces) Given a table of stock holdings, it can be sensible to talk about linear combinations / rotations etc. Eg: The "singular vectors" in this space could give you a decomposition in terms of companies held simultaneously by people (eg: SAAS, energy sector, semiconductors, entertainment, etc). Likewise, singular vectors on the other side would tell you the typical holding patterns among people (and clustering people by those, eg. retired pensioner invested for steady income stream, young professional investing for long-term capital growth, etc). As it turns out, this kind of approximate (low-rank) factorization is at the heart of recommender systems.
By the way by changing the graph representation we can give meaning even to non square matrices as described in this article https://www.math3ma.com/blog/matrices-probability-graphs
If you had a lot of people and a lot of stocks, a low-rank representation of the matrix (probably not PCA per se with that particular matrix, but something closely related) could convey a lot of information about, e.g., submarkets and how they're valuated together. Or not, depending on how those prices covary over time.
For instance, you can normalize along the columns, and build a "recommender system" using matrix factorization.
With that, when a new person comes with a portfolio, the system will output a probability for this new person to acquire the other assets he doesn't have.
It's (the very basic) idea of how Netflix recommends movies.
Really, if your conclusions change depending on whether you measure in inches or centimeters, there’s something wrong with the analysis!
> When I try to get this point across about techniques like the PCA, I like to show that the measurement units strongly affect the inference.
In such a case the problem is not with PCA but with application. PCA is just a rotation of the original coordinate system that projects the data on new axes which are aligned with the directions of highest variability. It is not the job of PCA to parse out the origin of that variability (is it because of different units, or different effects).
> Really, if your conclusions change depending on whether you measure in inches or centimeters, there’s something wrong with the analysis!
To get a statistical distance one should: subtract the mean if the measurements differ in origin; divide by standard deviation if the measurements differ in scale; rotate (or equivalently compute Mahalanobis distance) if the measurements are dependant (co-vary). The PCA itself is closely related to Mahalanobis distance: Euclidian distance on PCA-transformed data should be equivalent to Mahalanobis distance on the original data. So, saying that something is wrong with PCA because it doesn't take units of measurement into account is close to saying that something is wrong with dividing by standard deviation because it doesn't subtract the mean.
For example, the transform from (Alive, Bob, Carol, Dan) to (Male, Female) is linear -- it's another matrix that you can compose with the individual-ownership one you have here.
Or, call your individual-ownership matrix A, and say that P is the covariance of daily changes to prices of the four stocks listed. Then A P A' is the covariance of daily changes to the peoples' wealths. The framing as linear algebra hasn't been useless.
I kinda get what you're saying though. Like, why would powers of this matrix be useful? It only makes sense if there's some implicit transform between prices and people, or vice versa, that happens to be an identity matrix.
You can make up a story. Say the people can borrow on margin some fraction of their wealth. Then say that they use that borrowing to buy stock, and that that borrowing affects prices. Composing all these transforms, you could get from price to price, and then ask what the dynamics are as the function is iterated.
But, ok, "I'm just going to do an SVD of the matrix and put it in a slide" isn't going to tell anybody much.
Maybe there's a use for a rank-one approximation to this system? Like, "this is pretty close to a situation where there's a single ETF with those stocks in these proportions, and where the people own the following numbers of shares in the ETF"? Maybe if you have millions of people and millions of stocks and wanted to simulate this "stock market" at 100Hz on a TI-83?
I dunno. You can make up stories.
"RedisGraph is the first queryable Property Graph database to use sparse matrices to represent the adjacency matrix in graphs and linear algebra to query the graph."
Justify Linked Data; https://5stardata.info/
W3C RDF-star and SPARQL-star > 2.2 RDF-star Graph Examples: https://w3c.github.io/rdf-star/cg-spec/editors_draft.html#rd...
def to_matrices(g: rdflib.MultiDiGraph) -> Union[Matrix, Tensor]
rdflib.MultiDiGraph:
https://networkx.org/documentation/stable/reference/classes/...Multigraph: https://en.wikipedia.org/wiki/Multigraph :
> In mathematics, and more specifically in graph theory, a multigraph is a graph which is permitted to have multiple edges (also called parallel edges[1]), that is, edges that have the same end nodes. Thus two vertices may be connected by more than one edge ... [which requires multidimensional matrices, netcdf (pydata/xarray,), tensors, or a better implementation of a representation; and edge reification in RDF]
From "Why tensors? A beginner's perspective" https://news.ycombinator.com/item?id=30629931 :
> https://en.wikipedia.org/wiki/Tensor
... Tensor product of graphs: https://en.wikipedia.org/wiki/Tensor_product_of_graphs
Hilbert space: https://en.wikipedia.org/wiki/Hilbert_space :
> The inner product between two state vectors is a complex number known as a probability amplitude.
A little off topic, but this is just one more example of beautifully done content on Substack. I have seriously considered setting my freedom.to settings to only allow accessing HN, FB, Twitter, Mastodon, etc., 1 or 2 mornings a week, and that time would mostly be for ensuring that I was always subscribed to a few good Substack channels.
With the explosion of ‘tech stuff I should read’, I think I need a more extreme culling of what I spend my time on.
I wonder what the continuous form of a graph is... Some sort of a manifold perhaps?
Exactly!
The correspondence between manifolds and graphs is very beautiful. What many folks call today "graph signal processing" has traditionally been called "discrete differential geometry". Scalar fields are functions defined on vertices, vector fields are functions defined on edges, the incidence matrix is the gradient operator, its transpose is the divergence, the Laplacian is the divergence of the gradient, integrals and fluxes are scalar products by indicator functions, the boundary operator is minus the gradient, Green's formula is just matrix transposition, etc.
You can even go further in the analogy and define p-forms as functions defined on the p-cliques of the graph, and from that rebuild a whole discrete Hodge theory. The correspondence is almost perfect, except for the fact that you cannot write easily the product rule for derivatives (because you cannot multiply pointwise scalar fields with vector fields).
A graphon?
Edit: This was already mentioned by meindnoch.
https://en.wikipedia.org/wiki/Spectral_graph_theory
https://en.wikipedia.org/wiki/Algebraic_graph_theory
Pretty much every field in math has a related field where you try to turn problems from the former into linear algebra problems.
The "spectral theorem" (https://en.wikipedia.org/wiki/Spectral_theorem) is an important theorem in linear algebra that gives conditions for when a matrix can be diagonalized, which is closely related to what its eigenvalues/eigenvectors look like.
The simplest version of the spectral theorem says that a symmetric matrix with real-number entries has real-number eigenvalues. The eigenvalues of a matrix are called the "spectrum of the matrix", hence "spectral theorem" and "spectral graph theory".
The adjacency matrix of any undirected graph is real symmetric, so its eigenvalues are all real numbers and it's natural to ask whether they say anything about the underlying graph.
Lucky for us, there are lots of surprising connections!
For example, say G is an finite undirected graph. The chromatic number of G, denoted χ(G), is the fewest number of colors needed to color its vertexes so that no two adjacent vertexes have the same color.
If λ₁ is the largest eigenvalue of G's adjacency matrix then there's a theorem (Wilf's theorem) that says
χ(G) ≤ 1 + ⌊λ₁⌋
That is, you can always color a graph with 1 + ⌊λ₁⌋ colors, where ⌊x⌋ is the floor of x.And there are some (finite, undirected) graphs that require exactly 1 + ⌊λ₁⌋ colors, so we're not doing any better unless we can say something more specific about the graph.
Wilf's Theorem:
https://www2.math.upenn.edu/~wilf/website/Eigenvalues%20of%2...
https://www2.math.upenn.edu/~wilf/website/Inequality%20for%2...
see https://mathworld.wolfram.com/CharacteristicPolynomial.html
In HPC, there are multiple frameworks to do distributed sparse linear algebra, and also more graph and GPU focussed frameworks using semirings as the abstraction layer (CombBLAS[2], GraphBLAST[3]).
References:
[1] Kepner, Jeremy, and John Gilbert, eds. Graph algorithms in the language of linear algebra. Society for Industrial and Applied Mathematics, 2011.
[2] Buluç, Aydın, and John R. Gilbert. "The Combinatorial BLAS: Design, implementation, and applications." The International Journal of High Performance Computing Applications 25.4 (2011): 496-509.
[3] Yang, Carl, Aydın Buluç, and John D. Owens. "GraphBLAST: A high-performance linear algebra-based graph framework on the GPU." ACM Transactions on Mathematical Software (TOMS) 48.1 (2022): 1-51.
[4] D. Konig. Graphen und Matrizen (Graphs and matrices). Matematikai Lapok, 38:116–119, 1931.
[5] D. Konig. Theorie der endlichen und unendlichen graphen (Theory of Finite and Infinite Graphs). Leipzig: Akademie Verlag M.B.H. 1936.
https://ocw.mit.edu/courses/18-085-computational-science-and...
He says start with the highest order, which makes me think the neighborhood with order 3 would get the smaller node labels, and the neighborhoods with order 0 would get the highest.
It looks like the opposite was done.
The most important thing is that graph algorithms and concepts have a strong relation to numerical aspects of the linear algebra and can be used to accelerate computation.
(You could of course argue that the act that graphs can be represented as a diagram helps humans come up with such algorithms, but that's basically equivalent to saying that you can represent a matrix as a block of numbers and that helps humans look at it).
Diagrams are how you encode categorical models of semantics, which naturally can be represented as graphs. Those graphs can in turn be encoded as matrices. So you have a way to encode semantic foundations as matrices — which you can then use graph algorithms to analyze.
Being able to move your semantic models (eg, diagrams) into a computational framework (eg, linear algebra) is neat.
Please show us the graph that "encodes" the fundamental theorem of algebra.
The fundamental theorem of algebra is a fact about the diagram which relates polynomials via division by monomials.
Every polynomial of degree n is n divisions of a monomial away from the empty product.
- - - -
This is probably easier to see the other direction:
Polynomials of degree n are isomorphic to n-products of monomials, and you can build a graph of the assembly where each arrow represents a multiplication by a particular monomial. (Then reverse the arrows, to get my original diagram.)