pathterminuspages/machine learning/aboutcontactabout me

Using Vector Notation

19.11.2020 | Regression/Linear

Contents/Index

1. Introduction
@2. Using Vector Notation

Given the formula for a linear model $$ f(x) = w_0 + w_1 x $$ we can instead of the former approach use linear algebra to determine $w_1$ and $w_0$. First we define the two following vectors $$ \textbf{w} = \begin{bmatrix} w_0 \\ w_1 \end{bmatrix} $$ and $$ \textbf{x}_n = \begin{bmatrix} 1 \\ x_n \end{bmatrix} $$ where the first element of $x_n$ is there in order to not cancel out when doing multiplication. We can write $f$ using this new notation: $$ f(x) = \textbf{w}^{\top} \textbf{x}_n $$ given normal vector mulitplication. Now we can write the squared error as $$ SqrErr = \sum_{i}^{N} (y_i - \textbf{w}^{\top} \textbf{x}_i)^2 $$ Or equivalently as $$ SqrErr = (\textbf{y} - \textbf{X} \textbf{w})^{\top} (\textbf{y} - \textbf{X} \textbf{w}) $$ where $$ \textbf{X} = \begin{bmatrix} 1 & x_{1,1} & x_{2,1} & \cdots\\ 1 & x_{1,2} & x_{2,2} & \cdots \\ \vdots & \vdots & \vdots \\ 1 & x_{1,N} & x_{2,N} & \cdots \end{bmatrix} $$ and $$ \textbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ . \\ . \\ y_N \end{bmatrix} $$ Note that given a vector $$ \textbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \end{bmatrix} $$ we have that $$ \textbf{x}^{\top} \textbf{x} = x_1^2 + x_2^2 + ... $$ Now we have the mean squared error as: $$ MSqrErr = \frac{1}{N} (\textbf{y} - \textbf{X} \textbf{w})^{\top} (\textbf{y} - \textbf{X} \textbf{w}) $$

Now. As in the last article. We start by differentiate the mean squared error. By differentiating vector wise we get: $$ \frac{\partial MSqrErr}{\partial \textbf{w}} = \frac{2}{N} \textbf{X}^{\top} \textbf{X} \textbf{w} - \frac{2}{N} \textbf{X}^{\top} \textbf{y} $$ Equating to 0 and rewriting we get $$ \textbf{X}^{\top} \textbf{X} \textbf{w} = \textbf{X}^{\top} \textbf{y} $$ and lastly multiplying both sides by $(\textbf{X}^\top \textbf{X})^{-1}$ we get $$ \textbf{w} = (\textbf{X}^\top \textbf{X})^{-1} \textbf{X}^\top \textbf{y} $$ This last result is central in linear regression.

Example

Doing the example from the former article again. We have $$ xs = [1.2,2.0,3.3,4.0,4.9] $$ and $$ ys = [2.5,3.5,4.0,4.5,5.0] $$ Now we build the model using numpy with the following code:

import numpy as np xs = np.array([1.2,2.0,3.3,4.0,4.9]) ys = np.array([2.5,3.5,4.0,4.5,5.0]) X = np.array([[1,x] for x in xs]) w = np.dot(X.transpose(),X) w = np.linalg.inv(w) w = np.dot(w,X.transpose()) w = np.dot(w,ys) print(w)

For which we get the vector $$ \textbf{w} = [1.94993264,0.63313875] $$

CommentsGuest Name:Comment: