# Using Vector Notation

## 19.11.2020 | Regression/Linear

### Contents/Index

1. Introduction
@2. Using Vector Notation

Given the formula for a linear model $$f(x) = w_0 + w_1 x$$ we can instead of the former approach use linear algebra to determine $w_1$ and $w_0$. First we define the two following vectors $$\textbf{w} = \begin{bmatrix} w_0 \\ w_1 \end{bmatrix}$$ and $$\textbf{x}_n = \begin{bmatrix} 1 \\ x_n \end{bmatrix}$$ where the first element of $x_n$ is there in order to not cancel out when doing multiplication. We can write $f$ using this new notation: $$f(x) = \textbf{w}^{\top} \textbf{x}_n$$ given normal vector mulitplication. Now we can write the squared error as $$SqrErr = \sum_{i}^{N} (y_i - \textbf{w}^{\top} \textbf{x}_i)^2$$ Or equivalently as $$SqrErr = (\textbf{y} - \textbf{X} \textbf{w})^{\top} (\textbf{y} - \textbf{X} \textbf{w})$$ where $$\textbf{X} = \begin{bmatrix} 1 & x_{1,1} & x_{2,1} & \cdots\\ 1 & x_{1,2} & x_{2,2} & \cdots \\ \vdots & \vdots & \vdots \\ 1 & x_{1,N} & x_{2,N} & \cdots \end{bmatrix}$$ and $$\textbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ . \\ . \\ y_N \end{bmatrix}$$ Note that given a vector $$\textbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \end{bmatrix}$$ we have that $$\textbf{x}^{\top} \textbf{x} = x_1^2 + x_2^2 + ...$$ Now we have the mean squared error as: $$MSqrErr = \frac{1}{N} (\textbf{y} - \textbf{X} \textbf{w})^{\top} (\textbf{y} - \textbf{X} \textbf{w})$$

Now. As in the last article. We start by differentiate the mean squared error. By differentiating vector wise we get: $$\frac{\partial MSqrErr}{\partial \textbf{w}} = \frac{2}{N} \textbf{X}^{\top} \textbf{X} \textbf{w} - \frac{2}{N} \textbf{X}^{\top} \textbf{y}$$ Equating to 0 and rewriting we get $$\textbf{X}^{\top} \textbf{X} \textbf{w} = \textbf{X}^{\top} \textbf{y}$$ and lastly multiplying both sides by $(\textbf{X}^\top \textbf{X})^{-1}$ we get $$\textbf{w} = (\textbf{X}^\top \textbf{X})^{-1} \textbf{X}^\top \textbf{y}$$ This last result is central in linear regression.

## Example

Doing the example from the former article again. We have $$xs = [1.2,2.0,3.3,4.0,4.9]$$ and $$ys = [2.5,3.5,4.0,4.5,5.0]$$ Now we build the model using numpy with the following code:

import numpy as np xs = np.array([1.2,2.0,3.3,4.0,4.9]) ys = np.array([2.5,3.5,4.0,4.5,5.0]) X = np.array([[1,x] for x in xs]) w = np.dot(X.transpose(),X) w = np.linalg.inv(w) w = np.dot(w,X.transpose()) w = np.dot(w,ys) print(w)

For which we get the vector $$\textbf{w} = [1.94993264,0.63313875]$$