pathterminuspages/machine learning/aboutcontactabout me

Using Vector Notation

19.11.2020 | Regression/Linear

Contents/Index

1. Introduction
@2. Using Vector Notation

We can extend on the former approach by using vector notation. Instead of finding $$\hat{w_0} = \bar{y} - w_1 \bar{x}$$ and $$\hat{w_1} = \frac{\bar{yx} - \bar{y} \bar{x}}{\bar{x^2} - \bar{x}^2}$$ in $$f(x) = w_0 + w_{1}x$$ we can combine variables into vectors and matrixes. This eases the work when we are calculating regression with more than a couple of variables.

First off we rewrite our individual variables into vectors $$\textbf{w} = \begin{bmatrix} w_0 \\ w_1 \end{bmatrix}$$ and $$\textbf{x}_n = \begin{bmatrix} 1 \\ x_n \end{bmatrix}$$ where the first element of the vector x is there to avoid cancellation of the first element of w when multiplying. Thus our model can be expressed as $$f(x) = w_0 + w_{1}x = \textbf{w}^\top \textbf{x}_n$$ given normal vector multiplication. Following this idea we can describe the squarred error as: $$sqrErr = (y_n - \textbf{w}^\top \textbf{x}_n)^2$$ and the mean of sqrErr as $$meanErr = \frac{1}{N}(\textbf{y} - \textbf{X} \textbf{w})^\top (\textbf{y} - \textbf{X} \textbf{w})$$ where $$\textbf{X} = \begin{bmatrix}1 & x_1\\1 & x_2\\. & . \\. & . \\1 & x_N\end{bmatrix}$$ and $$\textbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ . \\ . \\ y_N \end{bmatrix}$$ and where any given matrix and/or vector operation is given as that of linear algebra.

What is remarkable is that given the vector $$\textbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$$ we can calculate $\textbf{x}^\top \textbf{x}$ as the sum $$x_{1}^2 + x_{2}^2$$ which fits the squarring of the squarred error.

By differentiating the vectorized exspression we get $$\frac{\partial meanErr}{\partial \textbf{w}} = \frac{2}{N} \textbf{X}^\top \textbf{X} \textbf{w} - \frac{2}{N} \textbf{X}^\top \textbf{t}$$ (that is partial differentiate the whole vector w) and equating 0 and rewriting we get $$\textbf{X}^\top \textbf{X} \textbf{w} = \textbf{X}^\top \textbf{t}$$ and lastly multiplying both sides by $(\textbf{X}^\top \textbf{X})^{-1}$ we get $$\textbf{w} = (\textbf{X}^\top \textbf{X})^{-1} \textbf{X}^\top \textbf{t} $$ the key expression to do linear regression in matrix form.

Matrix multiplication can be quite exhaustive for us humans. Fortunately python has numpy. By adding import numpy as np in the top of the python script matrix algebra suddently becomes as easy as anything. So lets redo the regression with this new approach. We are given the observations, and we get:

import numpy as np xs = np.array([1.2,2.0,3.3,4.0,4.9]) ys = np.array([2.5,3.5,4.0,4.5,5.0]) # add value to avoid cancel X = np.array([[1,x] for x in xs]) w = np.dot(X.transpose(),X) w = np.linalg.inv(w) w = np.dot(w,X.transpose()) w = np.dot(w,ys) print(w) # result is [1.94993264 0.63313875]

the cool thing is that this method scales. Thus we can add as many different w-parameters as we want. The result is in the comments, we have $[w_0,w_1]$, hence the resulting model is $$ f(x) = 1.94993264 + 0.63313875 \cdot x $$ The plot is found in the former article in Figure 1.

CommentsGuest Name:Comment: