Unfortunately I'm one of those people who tends to reject the process until I understand why it works.
If it wasn't for Strang's thoughtful and sometimes even entertaining lectures via OCW, I probably would have failed the course. Instead, as the material became considerably more abstract and actually required understanding, I had my strongest exam scores. I didn't even pay attention in class. I finished with an A. Although my first exam was a 70/100, below the class average, the fact that I got an A overall suggests how poorly the rest of the class must have done on the latter material, where I felt my strongest thanks to the videos.
So anyway, thank you Gilbert Strang.
After reading your comment and ansible's reply [0] I wanted to pause and comment on this.
The United States Air Force Academy found that its cadets who took their first calculus class with a professor who focused on conceptual understanding helped those cadets create a durable and flexible understanding of the math [1].
The kicker is that the cadets got worse scores in Calculus I and gave professors who taught in this way worse ratings.
Ansible's anecdotal reply is what a lot of students experience. A feeling of initial success with the material, but they later find that their knowledge of it was fleeting and inflexible. What the Air Force Academy study found was that professors who taught in the manner ansible described, that resulted in fleeting and inflexible knowledge, were rated higher by their students. Those students got better initial scores in Calculus I, but went on to do worse in later calculus courses and related courses.
I encourage you to read the study. It is as good of a study design and execution you can get in the social sciences.
David Epstein also discusses the study in Chapter 4 of his book, Range [2].
[0] https://news.ycombinator.com/item?id=23154241 [1] http://faculty.econ.ucdavis.edu/faculty/scarrell/profqual2.p... [2] https://www.goodreads.com/book/show/41795733-range
The very best students loved it, but most of the people didn't like it at all.
With mathematics, like with gym, you gain when you put in effort. Most people don't enjoy either.
I had a similar, though sort of opposite experience.
In high school, I breezed through the material, and started teaching myself calculus during the summer to prepare for university. Other than being a lazy student, I had no problems taking the 2nd semester advanced calc 2 and 3 courses my freshman year. I totally get what's being taught. There weren't a ton of practical examples, but I can easily see (for example) what the purpose of integration is, and how and why you'd do it in two or more dimensions. I could work the equations, no problem. Everything is great.
Along comes sophomore year, and still thinking I am hot stuff, I take advanced linear algebra and differential equations. More of the same, I thought.
Well... we seemed to spend the entire semester just solving different kinds of equations. No explanations given as to what they are for, where they are used, or what the point of any of it was. I struggled, for the very first time.
I either got a D or F for the mid-term exam, which was shocking to me.
We had one chapter where we were doing something practical. This is where you have a water tank, and a hole in to bottom. Because the pressure lessens as the tank empties, the flow rate is not constant. However, you can solve this via diff equations, and I really grokked it. I finally saw the point for some of what we had been doing. But it was just that one chapter, we skipped any other practical aspects for what we were studying.
I did end up pulling out a 'C' with that class, to my relief. Sure, most of the blame for my lousy performance must rest with me, because of my poor study habits. And a little blame can go to the TA, who wasn't a good communicator, so that hour every week was kind of useless. But I also blame the material and how it was presented.
Some places use a rigorous "proof-theoretic" approach in math curricula. It's much harder and takes more time, but it's better than merely grinding on hundreds of easy calc-101/diff-eq problems, because students gain an understanding that doesn't erode as easily once they forget "the tricks".
More CS, engineering and science students, IMHO, should dabble in math department courses beyond the the usual "required" sequence for their majors. It can be eye-opening and provide long lasting benefit to take a hardcore real-analysis course, abstract algebra or a number of other courses in math.
I came in for the first exam, sat there for maybe 15 minutes reading the questions, and realized I had no idea how to solve any of them.
Luckily it was before the drop date! That was a turning point where I decided to only take classes that seemed fun. For me that was discrete math, number theory, abstract algebra, etc.
My only regret is that I took the class as a six week short course. I think my recall would be better if I had taken the full semester. We covered all the material, but missed out on the longer spaced repetition. Linear Algebra was by far my favorite pure math course, I hope to revisit it soon. Maybe Strang's lectures are the way to do that.
I particularly like his videos because he breaks them down into small bites that are easy to work into your day and he's a great teacher.
Second exam was 85/100, the highest between C.S. and Automation Engineer (both lectured by that first professor). While I do agree that a good teacher can pave the way for a good student, I think most of the work you have to do it yourself, as if your life depend on it (mine did).
At some point it's like "Wait, is linear algebra really just about heaps of multiplication and addition? Like every dimension gets multiplied by values for every dimension, and values 0 and 1 are way more interesting than I previously appreciated. That funny identity matrix with the diagonal 1s in a sea of 0s, that's just an orthonormal basis where each corresponding dimension's axis is getting 100% of the multiplication like a noop. This is ridiculously simple yet unlocks an entire new world of understanding, why the hell couldn't my textbooks explain it in these terms on page 1? FML"
I'm still a noob when it comes to linear algebra and 3D stuff, but it feels like all the textbooks in the world couldn't have taught me what some hands-on 3D graphics programming impressed upon me rather quickly. Maybe my understanding is all wrong, feel free to correct me, as my understanding on this subject is entirely self-taught.
I wouldn't say it is all wrong. Just that the stuff you are talking about is a very tiny fraction of LA. I took a graduate class in LA, based on Strang's book. I have the book right here in front of me. So the stuff you allude to, i.e. rotation matrix, reflection matrix & projection matrix, is on p130 of Chapter 2. We got to that in the 1st month of the semester, & it got about 1 hour of classtime total. That's it. An LA class is like 4 months, or 50 hours. If the point of LA to derive those matrices so one can do 3D computer graphics with scaling, rotation & projection ? No, that stuff is too basic. We got 1 homework problem on that, that's it.
The stuff that most of the class struggled with ( & still struggle with, because Strang goes over it rather quickly in his book), is function spaces ( chapter 3, p182), Gram Schmidt for functions ( p184), FFTs, (p195), fibonacci & lucas numbers (p255), the whole stability of differential equations chapter ( he gives these hard and fast rules like a Differential Equation is stable if trace is negative & determinant is positive, but its not too clear why. ), quadratic forms & minimum principles - that whole 6th chapter glosses over too much material imo.
Overall, Strang's book is a solid A+ on how to get stuff done, but maybe a B- on why stuff works the way it works. Like, why should I find Rayleigh quotient if I want to minimize one quadratic divided by another ? Strang just says, do it & you'll get the minimum. How to find a quadratic over [-1,1] that is the least distance away from a cubic in that same space ? Again, Strang gives a method but the why part of it is quite mysterious.
So does LA get substantially more involved than just lots of multiplications and additions or is it always at the end of the day still just bags of floats getting multiplied and summed? Is it just a fantastic rabbit hole describing what values you put where in those bags of numbers?
One advantage of linear algebra is that it is, well, linear. Linear is nice. It means you can decompose things into their independent elements, and put them all together again, without loss. The monad interface, as simple as it is, is not linear; specific implementations of it can have levels of complexity more like a Turing machine.
You don't even have to go 3D, just starting with the points of a rectangle in 2D and asking, "how do you put the edge points of this rectangle 10px to the left, rotate them 45° and stretch them 200% vertically?" and you've applied a matrix. Even if you're not using the fancy brackets, you're using a matrix, and understanding it.
That's just one of dozens of things LA is "about"
> why the hell couldn't my textbooks explain it in these terms on page 1
Because you wouldn't have understood terms like
> orthonormal
and because it would have been unhelpful to everyone else who want in LA for the exact same reason you were.
Being obviously in retrospect doesn't mean it was obvious in forespect. You had to learn the material first.
If you've never written a standalone software-rendered ray tracer, I found that to be a very useful exercise early on. There are plenty of tutorials for those on the interwebs.
I'm really thankful to MIT OCW for putting his lectures out for free -- in fact, I think I'll go donate to them now.
I hope you’re donating and actively contributing to many non-profit projects and that your comment comes from being tired of the world’s injustices rather than from callous impertinence, although I suspect it does not.
If you look at the order of topics in his book "An Introduction to Linear Algebra", you will find the topic "Linear Transformation" way back in chapter 8! Even after the chapters eigenvalue decomposition and singular value decomposition. But understanding that a matrix is just the representation of a linear transformation in a particular basis is probably the most important and first thing you should learn about matrices ...
You are onto something though. Strang is coming from a direction of numerical computations and algorithms for solving real-world problems. Pure mathematics departments for at least the past maybe 80 years often look down on numerical analysis, statistics, engineering, and natural science, and adopt a position that education of students should be optimized in the direction of helping them prove the maximally general results using the most abstract and technical machinery, with an unfortunate emphasis on symbol twiddling vs. examining concrete examples. By contrast, in the 19th century there was much more of a unified vision and more respect for computations and real-world problems. Gauss himself was employed throughout his career as an astronomer / geodesist, rather than as a mathematician, and arguably his most important work was inventing the method of least squares, which he used for interpreting astronomical observations.
With the rise of electronic computers, it is possible that the dominant 2050 vision of linear algebra and the dominant 1900 vision of linear algebra will be closer to each-other than either one is to a 1950 vision from a graduate course in a pure math department.
Take Hilbert spaces for example. They are based on linear algebra. They are quite general and you might argue that there's a lot of symbol twiddling there. However, Hilbert spaces are/were essential in the study of Quantum Mechanics, which we can argue is a very important topic.
And if you only stick with matrices and numerics, you're bound to get stuck in the numbers and details and miss the big picture. A lot of results are much cleaner to obtain once you divorce yourself from the concrete world of matrix representation.
Of course, we should probably have the best of both worlds. I'm not saying applications are unimportant. Take something like signal processing, which relies heavily on both numerics and general theory.
So I'd like to add something to your point. Math departments optimize the education of math students towards the more general, and perhaps students not interested in pursuing pure math should have course-work that reflects that.
I had this view when I took linear algebra as an undergraduate, but I have gradually changed on the subject over time. I took a standard "linear algebra for scientists and engineers" course but I found it too abstract at the time. The instructor rarely concentrated on examples and applications despite the more applied focus in the course title. Later I came to appreciate the abstraction, since it helped me understand more advanced mathematical topics unrelated to the "number-crunching" I originally associated the topic with. I now think the instructor had a more "unified" approach, but I didn't realize it at the time.
He mentions this sentiment in a lot of interviews and things too.
Then please do. I took several online linear algebra courses from sources I trust and they were pretty bad. Or let's put it another way: I'm a pretty clever guy and I was still left confused. Strang is excellent in the classroom, and I almost even like videos for learning now thanks to him (x1.5 speed is your friend). His videos should not be your only learning source, but judging his course only by the book might result in a lot of learners skipping what I found to be the best course by far. If you want to learn linear algebra, give Strang a try first and you might save a lot of time.
While this view certainly helps intuition at initial stages of learning, it is not "just" that, and computational methods involving matrices are of much more practical importance (similar to being able to add and multiply numbers which we are taught early in life) which is probably why the stress is on them first and foremost. Someone said, "learn to calculate, understanding will come later."
Even though I also use Linear Algebra mostly computationally today, the origin of it is in the geometry and I think this connection should come first. Also, "number crunching" is a boring way to learn things.
Though, "matrix way" can be good for engineers.
I found going through Linear Algebra Done Right to provide a good counterbalance to Strang’s book+lectures.
The rest was effectively preaching to the choir so those that already know linear algebra nodded their heads and idiots like me were still flummoxed
Books have pictures that do a pretty good job.
Animations are pretty and interestit but that isn't the same as teaching all the math.
Link: https://link.springer.com/book/10.1007/978-3-319-11080-6
> You are probably about to begin your second exposure to linear algebra. Unlike your first brush with the subject, which probably emphasized Euclidean spaces and matrices, this encounter will focus on abstract vector spaces and linear maps.
But, I do foresee some difficulties. One thing that I find really difficult, for example, is that I take undergrads who have had linear algebra and ask "what is the determinant?" and seldom get back the "best" conceptual answer, "the determinant is the product of the eigenvalues." Like, this is math, the best answer should not be the only one, but it should be ideally the most popular. We would consider it a failure in my mind if the most popular explanation of the fundamental theorem of calculus was not some variation of "integrals undo derivatives and vice versa". I don't see this approach solving that. Furthermore there is a lot of focus from day one on this CR decomposition which serves to say that a linear transform from R^m to R^n might map to a subspace of R^n with smaller dimension r < min(m,n) and while in some sense this is true it is itself quite "unphysical"—if a matrix contains noisy entries then it will generally only be degenerate in this way with probability zero. (You need perfect noise cancelation to get degeneracies, which amounts to a sort of neglected underlying conserved quantity which is pushing back on you and demanding to be conserved.) In that sense the CR decomposition is kind of pointless and is just working around some "perfect little counterexamples". So it seems weird to see someone say "hold this up as the most important thing!!"
I found that the "best conceptual" answer depends a lot on taste, and what concepts you are familiar with.
In this case:
- Calculating exact eigenvalues of matrices larger than 4x4 is impractical, since it requires you to solve a polynomial of degree >4.
- The EV exist only in algebraically closed fields (complex numbers), while the determinant itself lives in the base field (rationals, reals).
How about:
- [Geometric Determinant] The determinant is the volume of the polytope (parallel-epiped) spanned by the column vectors of the matrix.
- [Coordinate Free Determinant] The determinant is the map induced between the highest exterior powers of the source and target vector spaces (https://en.wikipedia.org/wiki/Exterior_algebra)
- I think there is also a representation theoretic version, that characterizes the determinant as invariant under the Symmetric group acting by permutation on the columns/rows of the matrix.
The permanant [1] is the matrix function which is fully symmetric, so permuting any rows or cols leaves it invariant. It emerges from the identity representation.
Finally, partially symmetric matrix functions are known as immanants [2], defined using the other irreps of the symmetric group.
[1] https://en.wikipedia.org/wiki/Permanent_%28mathematics%29
This view also motivates the concept of vector bundles and vector spaces at a point.
My country curriculum introduces linear algebra through group theory and vector spaces. Matrices come later.
I was also taught linear algebra this way, by an applied mathematician with a background in chemical engineering:
- start by solving Ax=b with row reduction
- develop theorems about linear independence and spanning sets of vectors based on these exercises
- introduce the determinant from the perspective of linear systems (rather than eg geometry or group theory)
- eigenvectors and eigenvalues
Later I switched from physics to math and TAed a more “algebraic” approach involving groups/rings/fields. But the matrix-first approach was more helpful for both my physics coursework and later courses in numerical linear algebra.
I took like 3-4 courses in the US involving the engineering approach, starting in high school and continuing through the college as a CS major. That was all that was required.
But I also like algebra, so I happened to take a 400-level course that only math majors take my senior of college. And then I got the group theory / vector space view on it. I don't think 95% of CS majors got that.
I don't think one is better than the other, but they should have tried to balance it out more. It helps to understand both viewpoints. (If you haven't seen the latter, then picture a 300-page text on linear algebra that doesn't mention matrices at all. It's all linear transformations and spaces.)
What country were you taught in? Wild guess: France?
A book I enjoyed is Axler's Linear Algebra Done Right[0], in which, if I remember correctly, doesn't contain a single matrix.
[0]https://zhangyk8.github.io/teaching/file_spring2018/linear_a...
It does have plenty of matrices. The main thing it really does is avoid determinants until the very end. The determinant is certainly something I remember learning as a kind of rote operation, without really understanding any intuition behind why you'd multiply and add these numbers in this particular way. I still feel lacking in "feel" here, which is why I suppose I'm going through Axler now.
For example, I remember looking at the linear algebra book my department had used previously. Early on, it introduced the concept of the transpose of a matrix:
https://en.wikipedia.org/wiki/Transpose
Superficially, it looks like something good to introduce. It is fodder for easy homework exercises, and there is a satisfyingly long list of formal properties satisfied.
But why? What does the transpose mean? For what sort of problem would you want to compute it?
There are good answers to these questions (see the "transpose of a linear map" section of the Wikipedia article I linked), but they are not easy for a beginner to the subject to appreciate.
> You are probably about to begin your second exposure to linear algebra. Unlike your first brush with the subject, which probably emphasized Euclidean spaces and matrices, this encounter will focus on abstract vector spaces and linear maps.
It's not universal.
The US is a very big place. I doubt there is an american approach to linear algebra. We really don't have a single approach to anything. Different schools and majors probably approach the topic differently. My college had a linear algebra course specifically crafted for CS majors and engineers. I took that and it did focus on matrices. It was also the only math class that required programming. I believe math majors had their own linear algebra course.
> My country curriculum introduces linear algebra through group theory and vector spaces. Matrices come later.
Different strokes for different folks. If it worked out for you that's all that matters.
Some of the concepts made sense, especially solving for linear systems of equations.
Recently, I decided to brush up on my math skills via Youtube videos, and came across this series: https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw
It explains Linear Algebra concepts using 2D and 3D vector manipulation, and the animations help me visualize the underlying maths.
In my time I had picked LA from Ben Noble, Halmos and Axler and the computation side of things from Golub & van Loan.
Ben Noble's book was my entry to LA. I was an undergraduate and involved in a research activity that demanded a lot of knowledge of the eigenvalue problem. The concrete approach in that book helped a lot.
It was only later on that I took a class based on G&vL (implementing a bunch of basic LA factorizations in Matlab), and in my spare time read Halmos's book. I understand the coordinate-free algebraic approach, but I work on applications and that viewpoint has not stuck with me. The stuff on numerical accuracy in GvL really did stick, OTOH.
From the comments here, and Strang's book's table of contents, I gather that his book (which has a lot of fans) has a concrete geometric approach.
Having said that, he is explaining many things really well and is helping a lot to build intuition. He is always cautious presenting things that are computationally inefficient and suggests the alternatives.
Exercises are too hard for me personally. I'd prefer a more laborious set of exercises helping to cement the material, (as in calculus or usual algebra) and then have one or two problem solving puzzles at the end.
I hope its not just that, that would be very limiting considering what linear algebra is about and capable of.
I started doing LA on Khan academy, and checked out Linear Algebra Done Right. LADR was a little too much into the deep end for me. KA seemed to be good. One nice thing about KA is that when I didn't quite remember something (i.e. how exactly to multiply a matrix) I could just go to an earlier pre-LA lesson, pick it up, and then go back to LA where I left off. I'm a few lessons in.
What do you all recommend for someone like me?
I feel like I don't really understand his explanation, because it's kind of vague. But I think that might be because you've seen the equations dozens of times, and I haven't seen them at all, so you were prepared to understand the video.
https://www.tandfonline.com/doi/abs/10.1080/00029890.2018.14...
I also really like the applied linear algebra book by Boyd Vandenberghe: https://web.stanford.edu/~boyd/vmls/ Free PDF is available on their website. There is Julia and Python code companions for the book and lecture slides from both Profs their websites. Also check out their other books, many of which have free PDF's available.
I can also recommend Data-Driven science and engineering by Brunton and Kutz. http://databookuw.com/ There used to be a free preprint PDF of the book but I can't find it now. Book is totally worth picking up... MATLAB and Python code available. Steve Brunton's lectures on YouTube are pretty damn good and compliment the book well: https://www.youtube.com/channel/UCm5mt-A4w61lknZ9lCsZtBw/fea...
Another really cool book is Algorithms for Optimization by Mykel Kochenderfer and Tim Wheeler: https://mitpress.mit.edu/books/algorithms-optimization. Julia code used in book.
I waited after the lecture to personally thank him and have him autograph the textbook; very glad I did in retrospect.
One of the interesting new ways of thinking in these lectures is the A = CR decomposition for any matrix A, where C is a matrix that contains a basis for the column space of A, while R contains the non-zero rows in RREF(A) — in other words a basis for the row space, see https://ocw.mit.edu/resources/res-18-010-a-2020-vision-of-li...
Example you can play with: https://live.sympy.org/?evaluate=C%20%3D%20Matrix(%5B%5B1%2C...
Thinking of A as CR might be a little intense as first-contact with linear algebra, but I think it contains the "essence" of what is going on, and could potentially set the stage for when these concepts are explained (normally much later in a linear algebra course). Also, I think the "A=CR picture" is a nice justification for where RREF(A) comes about... otherwise students always complain that the first few chapters on Gauss-Jordan elimination is "mind-numbing arithmetic" (which is kind of true...) but maybe if we present the algorithm as "finding the CR-decomposition which will help you understand dozens of other concepts in the remainder of the course" it would motivate more people to learn about RREFs and the G-J algo.
def crd(A):
"""
Computes the CR decomposition of the matrix A.
"""
rrefA, licols = A.rref() # compute RREF(A)
C = A[:, licols] # linearly indep. cols of A
r = len(licols) # = rank(A)
R = rrefA[0:r, :] # non-zero rows in RREF(A)
return C, R
Test to check it works: https://live.sympy.org/?evaluate=A%20%3D%20Matrix(%5B%0A%20%...On another note, he is such a nice guy. 10/10.
If you have a vector v=(1,0) that points to the right, you can scale this vector infinitely in that direction by multiplying it by a positive scalar.
5v = (5,0)
62.1v = (62.1,0)
Similarly, you can scale that vector infinitely in the opposite direction (i.e. left) by multiplying it by a negative scalar:
-987v = (-987,0)
If we call this scalar c, the expression cv allows us to represent any point along the X axis simply by varying c, meaning that cv defines a line along that axis.
Similarly, we can do the same for a vector w=(0,1) along the Y axis, scaling it by d.
Now we have a method for moving to any point on the XY plane simply by varying c and d in the linear combination: cv + dw, meaning that we've defined a plane using two vectors.
Two caveats:
- this won't work if v and w are parallel; for example, if v = -w (and neither are zero) then we can only move along a line instead of a plane
- it also won't work if either of the vectors are zero, because no matter what you multiply by, a zero vector can only represent a single point
If you do the same for another 1x3 vector, and it is not parallel to the first, you get all the points on a different line.
These two lines define a plane (and the cross product of the two vectors defines its normal vector)
Also, if you have 3 dimensional vectors you were always in 3D.
That certainly got our attention. I’ve always found linear algebra to be kind of ... almost soothing.
Well, every time, we can make so-called "guess" that solution looks like e*rt. Why? We know that because professors will only give well-behaved systems on final exams because it's hard to grade the other kind.
So we know characteristic polynomials look like so (because of course they do, you can just memorize this) ... so now we lift out the coefficients into nifty thing called _matrix_ and now follow these easy four steps to get roots, plug back in, and incidentally these are "eigenvalues", we'll talk about this later ...
Bam. Done. A-, easy. No sweat."
That book discusses the actual algorithms used for computation. It is a bit more advanced, but amazingly clear.
I am aware of his course on OCW, but wondering is there something more interactive and/or newer than those lectures that has similar quality.
[1] https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQ...
Just to have a taste of use cases: compression, filters(image filters for de-noising, HP & LP filters for audio), encoding/decoding, computer vision techniques, cryptography, neural nets, computer graphics (this is where most people learn how to use it in real computer programs)
- root finding algorithm with more than one variable.
- graph problems like Google's PageRank
- statistical analysis
- 3d rendering (projecting a 3d scene onto a 2d image)
- solving systems of equation (also see linear programming)
Linear algebra is very basic and fundamental to physics and math.
From the course description:
> These six brief videos, recorded in 2020, contain ideas and suggestions from Professor Strang about the recommended order of topics in teaching and learning linear algebra.