Information in Science, Mathematics and Business are often arranged into rows and columns that forms a rectangular structure called Matrix (Matrices in the plural form). Matrices can be viewed astables that can contain physical observations from the real world or from the Mathematical contexts as well. For example, we can see that in order to solve a system of linear equations $$ 5x + y = 3 $$ $$ 2x + 2y = 6 $$ the information can be embodied in the following Matrix $$\begin{bmatrix} 5 & 1 \\ 2 & 2 \end{bmatrix}$$
This is just one example and matrices are more than a tool for solving equations and they are Mathematical objects in their own right. The branch of Mathematics that performs the study of matrices and related topics that forms the mathematical field that we call “linear algebra.”
Generally a finite number of linear equations with a finite number of unknowns x, y, z, w, . . . is
called a system of linear equations or just a linear system.
A system of linear equations with m number of equations and n unknowns has the following form
$ a^{1}_{1} x{1} + a^{1}_{2} x{2} + ........ + a^{1}_{n} x{n} = b^{1} $
$ a^{2}_{1} x{1} + a^{2}_{2} x{2} + ........ + a^{2}_{n} x{n} = b^{1} $
$ a^{m}_{1} x{1} + a^{m}_{2} x{2} + ........ + a^{m}_{n} x{n} = b^{m} $
We need to determine whether such system has a solution, if this solution is unique and in finding
the solution(s). The system can be written in the abbreviated the system to
$$ \large{Ax = b} $$
$$ \large{A = }\begin{bmatrix} a^1_1 & a^1_2 & ... & a^1_n \\
a^2_1 & a^2_2 & ... & a^2_n \\
.... & .... & ... & .... \\
.... & .... & ... & .... \\
a^m_1 & a^m_2 & ... & a^m_n\end{bmatrix} $$
is an $ m \times n $ matrix and
$$ \large{b = }\begin{bmatrix} b^1 \\
b^2 \\
.... \\
.... \\
b^m \end{bmatrix} \ \text{and} \
\large{x = }\begin{bmatrix} x^1 \\
x^2 \\
.... \\
.... \\
x^m \end{bmatrix} \ \ \text{are column vectors} $$
$$ \large{a_1x_1 + a_2x_2 + .... + a_nx_n = 0} $$
A common approach of formalizing intuitive concepts is to construct a set of objects and set of rules to manipulate these object. Linear Algebra is the study of vectors and certain rules to manipulate vectors. Vectors can be geometric, polynomial, audio signals or elements of $ x^{n} $
They can be used to compactly represent systems of linear equations, but they also represent linear functions (linear mappings)
$$ \large{A = }\begin{bmatrix} a^1_1 & a^1_2 & ... & a^1_n \\
a^2_1 & a^2_2 & ... & a^2_n \\
.... & .... & ... & .... \\
.... & .... & ... & .... \\
a^m_1 & a^m_2 & ... & a^m_n\end{bmatrix} $$
is an $ m \times n $ matrix and
A vector space is a set of elements, known as Vectors, may be added and multiplied together.
A (real) vector space is a set V that has the following properties:
1. We can add elements of V , i.e., for any elements u 2 V and v 2 V there is an element in V called the sum u + v.
2. Addition is commutative, i.e. for all u; v 2 V , u + v = v + u.
3. Addition is associative, i.e. for all u; v;w 2 V , (u + v) + w = u + (v + w).
4. There is an element 0 (called the zero vector) in V such that, for all u 2 V , u + 0 = u.
5. For any $ u \in V $ there is an element called $-u \in V$ such that $ u + (-u) = 0$.
6. We can scale elements by real numbers, for any element $ -u \in V$ and $ u \in V $
there is an element $\alpha u \in V $.
7. Scaling is distributive (with respect to vector addition), i.e. $ \alpha(u + v) = \alpha u + \alpha v $.
8. Scaling is distributive (with respect to addition of the scalar), i.e. $( \alpha + \beta )u = \alpha u + \beta u.
9. Scaling is associative, i.e. (\alpha \beta)u = \alpha (\beta u).
10.Scaling by 1 does not change u, i.e. 1.u = u.
The elements of the vector space V are called vectors and the numbers of R are called scalars.
It is possible to find a set of vectors with which we can represent every vector in the vector space by adding them together and scaling them. This set of vectors is the basis of the vector space. Given a Vector space V with $ k \in \mathbf{N}$ and $ x_1, x_2, ...., x_k \in V $. If there is a non-trivial linear combination, such that $ 0 = \sum^{k}_{i=1} \lambda_{i}x_{i}$ with atleast one $\lambda_i \neq 0 , the vectors $ x_1, x_2, ...., x_k $ are linearly dependent. If only trivial solution exists then they are linearly independent.
$$ \vec{r} + \vec{r} = \vec{s} + \vec{r} $$ $$ 2r = r + r $$ $$ \lVert r \rVert^{2} = \sum_i r^{2}{i} $$
A norm on a vector space V is a function, which assigns each vector x its length \lVert x \rVert
$$ \large{r.s = \sum_i r_{i}s_{i}} $$
commutative : $ r.s = s.r $
distributive : $ r.(s + t) = r.s + r.t $
associative : $ r.(a.s) = a(r.s) $
$ r.r = \lVert r \rVert^{2} $
$ \Large{\frac{r.s}{r.r} r} $
A basis is a set of n vectors that:
(i) are not linear combinations of each other
(ii) span the space
The space is then n-dimensional.
Matrix Operations: $ \begin{bmatrix} a & c \\ b & d \end{bmatrix} \begin{bmatrix} e \\ f \end{bmatrix} = \begin{bmatrix} ae & + & bf \\ ce & + & df \end{bmatrix} $
Identity : $ I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $
Clock-wise rotation by $\theta $ : $ \begin{bmatrix} \cos \theta & \sin \theta \\ -\cos \theta & \sin \theta \end{bmatrix} $
determinant of 2x2 matrix: $ \begin{vmatrix} a & b \\ c & d \end{vmatrix}=ad-bc $
Change of basis: Change from an original basis to a new, primed basis. The columns of the transformation matrix B are the new basis vectors in the original coordinate system. So $$ B\vec{r}' = \vec{r} $$ Where r' is the vector in the B-basis, and r is the vector in the original basis. Or; $$ \vec{r}' = B^{-1}\vec{r}
Let V be an inner product space, $ x \in V $ and $ U \subseteq V $ be a subspace with basis $ e_1, ....., e_k $. Then $ x \perp U $ if and only if $ x \perp e_j \text{for} 1 \leq j \leq k.
Orthonormal Matrix: If a matrix is orthonormal(all the columns are of unit size and orthogonal to each other) then, the transpose of A is equal to Inverse of A. $$ A^{T} = A^{-1} $$
Gram-Schmidt is an orthogonalization process. It helps find
the orthoinormal basis of a Matrix and this orthonormal basis is desirable because
it simplifies the computations considerably. These orthonormal basis are the bedrock of feature
extraction when the dataset is expressed in terms of a vector space and hence linear
algebraically computable. Following are the steps of Gram-Schmidt orthogonalization process.
Step 1. Choose an arbitrary basis $ u_1, ....., u_n $ of V. Set $v_1= u_1$
Step 2. Modify $u_2$ by adding a multiple of $v_1$, so the resulting vector is orthogonal to $v_1$:
$$ v_2 = u_2 - \frac{\begin{bmatrix} u_2 \\ v_1 \end{bmatrix}}{\begin{bmatrix} v_1 \\ v_1 \end{bmatrix}}v_1 = u_2 - pr_{v_1} u_2 $$
$ \begin{bmatrix} v_1, v_1 \end{bmatrix} \neq 0 $ because B is positive definite. We check $v_2 \perp v_1$
$\begin{bmatrix} v_2 \\ v_1 \end{bmatrix} = \large{[u_2 - \frac{\begin{bmatrix} u_2 \\ v_1 \end{bmatrix}}{\begin{bmatrix} v_1 \\ v_1 \end{bmatrix}} v_1, v_2 ]} $
$ = \begin{bmatrix} u_2 \\ v_1 \end{bmatrix} - \begin{bmatrix} u_2 \\ v_1 \end{bmatrix} \large{\frac{\begin{bmatrix} v_1 \\ v_1 \end{bmatrix}}{\begin{bmatrix} v_1, v_1 \end{bmatrix}}} = 0 $
Constructing $v_{k+1}$ by modifying $u_{k+1}$ as follows
$$ v_{k+1} = u_{k+1} - \frac{\begin{bmatrix} u_{k+1} \\ v_1 \end{bmatrix}}{\begin{bmatrix} v_1 \\ v_1 \end{bmatrix}} v_1 - ..... - \frac{\begin{bmatrix} u_{k+1} \\ v_k \end{bmatrix}}{\begin{bmatrix} v_k \\ v_k \end{bmatrix}} v_k = u_{k+1} - pr_{v_1} u_{k+1} - .... - pr_{v_k} u_{k+1}$$
$ v_{k+1} \perp \text{span}(v_1. ...., v_k) $
$ \begin{bmatrix} v_{k+1} \\ v_j \end{bmatrix} = [u_{k+1} - \large{\frac{\begin{bmatrix} u_{k+1} \\ v_1 \end{bmatrix}}{\begin{bmatrix} v_1 \\ v_1 \end{bmatrix}}} v_1 - ... - \frac{\begin{bmatrix} u_{k+1} \\ v_k \end{bmatrix}}{\begin{bmatrix} v_k \\ v_k \end{bmatrix}} v_k, v_j $
$ = \begin{bmatrix} u_{k+1} \\ v_j \end{bmatrix} - \begin{bmatrix} u_{k+1} \\ v_j \end{bmatrix} \frac{\begin{bmatrix} v_j \\ v_j \end{bmatrix}}{\begin{bmatrix} v_j \\ v_j \end{bmatrix}} = 0 $
It follows by indeuction that
$ v_{k+1} = u_{k+1} + \alpha^{k-1}_{k} u_{k} + ... + \alpha^{1}_{k} u_1 $
last step: If required, normalise the modified vectors to obtain unit vectors
$ \large{e_{j} = \frac{1}{\lVert v_{j} \rVert^{2}} v_{j}} $
EigenVector $\vec{v}$ statisfies $ T(\vec{v}) = \lambda \vec{v} $ for the transformation T, and $\lambda$ is the eigenvalue that is associated with the eigenvector $\vec{v}$. The transformation T is a linear Transformation that can be represented as $T(\vec{v}) = A\vec{v}$
Eigenvectors $\vec{v}$ are do not change direction when transformation such as T is applied to it. So if we apply T to a vector $\vec{v}$ and the result $T(\vec{v})$ is parallel to the original $\vec{v}$ then $\vec{v}$ is an eigenvector.
Diagonalization enables us to find the powers of Matrices such as $A^{100}$. In this process the matrix A is factorized into three matrices, one of which is a diagonal matrix. The primary objective of deriving a diagonal matrix is, any matrix calculation is easier with a diagonal matrix.
For a square matrix A, there is an invertible matrix P (P has an inverse) such that $P^{-1}AP$ produces a diagonal matrix. The process of converting any $ n \times n $ matrix into a diagonal matrix is called diagonalization
Singular Value Decomposition is a useful matrix factorization technique. This decomposition technique can be applied not only to symmetric matrix but to any matrix A where A is m by n and $ m \geq n $. SVD factorization breaks the matrix down into useful parts such as orthogonal matrices.
We can decompose any given matrix A of size of m by n with positive singular values $ \sigma_1 \geq \sigma_2 \geq .... \geq \sigma_k \geq 0 \ \ \text{where} k \geq n \ \text{into} \ \large{UDV^{T}}, \ \text{that is}$ $$ large{A = UDV^{T} $$
where U is an m by m orthogonal matrix, D is an m by n matrix and V is an n by n orthogonal matrix. The values of m (rows) and n (columns) is the size of the given matrix A.