Part of my learning journey this year:
SVD - SIngular Value Decomposition
What is it? Incredibly, every matrix can be factored using this technique, into 3 matrices. (not often in math you can say the words every, all, no exceptions). The 3 matrices are called left/right singular vectors and a diagonal in the middle of singular values. These are unique if the singular values are distinct. The SVD of a matrix X = USV* (U-left vectors, S diagonal singular values, and V-right vectors).
Why do this? It helps to have a context for the matrix X - what is it? It could be a 10 megapixel photo, or a matrix of ratings for Hulu/Netflix of Users by Items with ratings or consuming data, or it could be a matrix of word embeddings. All three are complex data representations of the world, snapshots in high resolution perhaps...millions of pixels from an iphone photo, millions of movie ratings, or documents converted to data for language models.
What is amazing about this? The singular values in the diagonal allow capture of signal, using "signal to noise" loosely. So if you pick the first few, say 100 singular values out of thousands in that diagonal, they will be the largest, contain the most signal and define the best approximate version of that matrix. (no kidding, the best approximation of that size (rank)). So for image compression, X = USV* is approximately U'S'V*', and if the S' is the first 100, this is the best approximation of the matrix by that size of matrix (rank 100).
So - a 90% compression (e.g. from 2000x1500 to 100x100) of an iphone image with 100 singular values may look almost as crisp as the original photo. But it will be the best 100X100 compressed photo. Even more amazing, the SVD for a sparse matrix works well - when X is a ratings or watching matrix, it is very sparse as each person only watches a few movies out of the library. Even with all those missing values, the SVD truncated at a selected rank approximates well and reduces the computational nightmare of the large matrix.
So - what is SVD? A matrix factorization into 3 parts (USV*)...that provides an ordered diagonal of singular values, allowing us to compress while retaining the bulk of the info in the original. This is a best approximation of the rank selected and we can dial it in to be as accurate as we desire.
For a photo, it can provide compression or even watermarking (we can add in a watermark at the compressed level by modifying the singular values/vectors). For a ratings matrix it can be used to predict ratings and consumption patterns for unwatched movies or word patterns and sets the stage for the concepts that drive language models that are all the rage today (conceptually a start here only).
Oh well - a start of a new concept. Let me know if I did not lay this out accurately - there are also cool nuances like latent factors I did not explore. The Python code is simple - just a line or 2 and wallah! SVD done. :)
#Datascience #AIML #mathisamazing