y(x, w) = w0 + w1x1 + ... + wDxD
w: bias parameter
x: input (independent with each other)
Being simple.
Goal is to minimize squared errors and analytic solution
convex losses and regularizers
See X as a matrix, rows are data points, columns are input dimentions.
Infinite magnitude. Extrapolate poorly
Magnitude bounded. Would not vanish.
Hyperbolic tangent.
beta: percision
Sum-of-squares error function: E_D(w)
Maximise Likelihood function = Minimise Error function
Then find the stationary point.
Batch learning