本学期学习Machine Learning。本课程在Udacity上有免费版本,并提供详细的笔记。本文的笔记是超级脱水版,目的是自用。
Week 01 tasks
- Lectures: Decision Trees, Regression and Classification, and Neural Networks.
- Reading: Chapters 1, 3, and 4 of Mitchell
SL1 Decision Trees
data:image/s3,"s3://crabby-images/9a954/9a954d6c99b8238eb6a44afb25c29c4cccd925dd" alt="Classification and regression"
- Classification is simply the process of taking some kind of input,
x, and mapping it to some discrete label.
- And regression is mapping from some input space to some real
number
data:image/s3,"s3://crabby-images/f6184/f61848afe52f774dcc6dc252c108284b823618c6" alt="Quiz1: Supervised Learning"
data:image/s3,"s3://crabby-images/a056b/a056b8439fc4200a6e59cdaffd311cca65ab01db" alt="Classification learning terms"
- Instances is input;
- Concept is the function to generate labels
- Target concept: answer the question
- Hypothesis: all possible concepts?
- Sample: the training set
- candidate: all concepts of the target concept
- testing set
- TEsting set should never be the same as the training set to show generalization.
data:image/s3,"s3://crabby-images/9e087/9e087601fb5f60a3ab1f3107668c31eb8e36033e" alt="Decision trees"
- start with the root node
- edges represent different choice
- leaf is final output
data:image/s3,"s3://crabby-images/a425c/a425c0aaeaea844c7f19df5c8a40ecf112dc8c4e" alt="Quiz 2: Representation"
data:image/s3,"s3://crabby-images/a7f2e/a7f2e3b7b58c75c99859f492ff72d8eab045140d" alt="20 Question"
data:image/s3,"s3://crabby-images/2b608/2b608b6f9c1fc6323858b2c87be037251dd9f4ea" alt="20 Question algorithm"
data:image/s3,"s3://crabby-images/5d60f/5d60fe0a2b5a31fad6c397a02bc8a80477e54e70" alt="Quiz 3: Best Attribute"
data:image/s3,"s3://crabby-images/c8842/c8842fd40094e686a1dcdd7a68d8f0fd1410182a" alt="Decision Trees Expressiveness AND"
data:image/s3,"s3://crabby-images/12163/121631c192c33c69a3229541633297c3bd6573bd" alt="Decision Trees Expressiveness OR"
data:image/s3,"s3://crabby-images/24aba/24aba5646d2e00aa90f73881b5c71dc8c1a34940" alt="Decision Trees Expressiveness XOR"
- A and B are commutative so that switching the position of A and B on the trees above is OK
data:image/s3,"s3://crabby-images/b44a1/b44a10e41fee8ccfaa2b352b801c134927a92984" alt="Decision Trees Expressiveness Or and XOR generalization"
- Linear tree: number of nodes = number of attributes
- exponential tree: number of nodes grows exponentially to the number of attributes
data:image/s3,"s3://crabby-images/ebf0a/ebf0a5d5d3dd52e98a43a1996227b5021e715d13" alt="Decision Tree Expressiveness Quiz 4 and 5"
- the number possible trees can be huge
data:image/s3,"s3://crabby-images/2cb83/2cb83cc0c2cf4c98e32d5667600bc88871963e3e" alt="ID3 algorithm"
- information gain:
- What is Entropy? - a measure of randomness。 ∑P(v) log(P(v))
data:image/s3,"s3://crabby-images/43991/43991878935080737d9f0206ed5c7ad7bc60b572" alt="ID3 Bias"
- The inductive Bias of ID3 is (Preference bias)
- good splits at top
- correct over incorrect
- shorter trees
data:image/s3,"s3://crabby-images/7d829/7d829884420bb92a826ac2f7588058615156d4a2" alt="Quiz 6: Decision Trees Other Considerations"
- For continuous-valued Attributes, we can group them by a range
- It does not make sense to repeat a discrete-valued attribute, but continuous attribute could be repeated if a different question is asked.
data:image/s3,"s3://crabby-images/16490/164903a56fd10447d04a0ce820f6e9305fc1f361" alt="Decision Trees Other Considerations"
- all attributes are correctly classified: this can't happen if data is noisy.
- Run out of attributes: if attributes are continuous, will never run out.
- No overfitting: pruning.
data:image/s3,"s3://crabby-images/b20cc/b20cc906e6cce4bda96c7c3fdb761dde80c0b726" alt="Wrap up"
SL2: Regression and Classification
Recap: Supervised learning: learn from pairs of input and output, then given a new set of input, predict the output. This is mapping input to output. If the output is discrete, it's classification. If the output is continuous, it is Regression.
data:image/s3,"s3://crabby-images/f09fb/f09fb4959896a26d66f02cf224c77b0612766cc4" alt="Quiz 1:"
- Originally, regression means regress to the mean
data:image/s3,"s3://crabby-images/84eaa/84eaae44b7e6c202013dfe67fb2ed22772e8d081" alt="Regression and function approximation"
- regression now means the find the function to represent to the relationship of 2 variables.
data:image/s3,"s3://crabby-images/21a4f/21a4f8641be8f9742477042134a6aa44ab39b5a2" alt="Regression, the best line"
- the green line is the best fit linear line, but is it the best line?
data:image/s3,"s3://crabby-images/e5a5b/e5a5b31b4914cda743d4c58b9b341148f668b694" alt="Quiz 2: how to find the best line"
data:image/s3,"s3://crabby-images/8d8a8/8d8a823c18ede0a213f0a663733b6c4056019c6c" alt="Quiz 2: answer"
- the best constant function is the mean of Y.
data:image/s3,"s3://crabby-images/58f5a/58f5a0401c87e0dbd6c16ae7139a35d8864aa186" alt="Order of Polynomial"
- in the case, the error for order = 8 is zero
data:image/s3,"s3://crabby-images/ddb0d/ddb0d3a94f9e58faed49e2dd54b275d114f31c13" alt="Order of PolyNomial: Error function"
data:image/s3,"s3://crabby-images/3f797/3f7979093621dc0c2eca3dc4e8a95783f8d3a4a6" alt="Quiz 3: find best function"
Polynomial Regression
data:image/s3,"s3://crabby-images/1946d/1946d702146db5e1ae602187e915d3e8c8c48dc8" alt="Polynomial Regression"
data:image/s3,"s3://crabby-images/cc846/cc84665470121822cfd6382272483696a58b54c2" alt="Polynomial Regression"
- here shows the polynomial regression represented by matrix and vectors.
- the coefficient is computable
Errors
data:image/s3,"s3://crabby-images/43599/435999d3e732bafa225fac23bb389e9a8f73aeb6" alt="Sources of error"
- Sensor Error: The actual reading was 10, but a moth landed on the sensor so it read 0 instead.
- Malicious Error: An intelligent malicious agent got in between the measurement and the receiver of the data, they edited the data to say what they wanted it to say, rather than what it actually was.
- Transcription Error: A machine copied a number from one place to another and it flattened all of the E notation floats to a bare integer. Or a program cast a UTF16 hieroglyphic to a Unicode pile of poo.
- Unmodeled influences: Suppose we are predicting house prices based on square footage and the number of bathrooms. The house price sold for very low value and the reason was that of an unmodeled influence, that there was mold in the attic and walls. The unmodeled influence caused the Machine Learning to fail at predicting a low house price.
Cross Validation
data:image/s3,"s3://crabby-images/b9543/b954311ee70d977e8df904cc3f10d8e684e173c6" alt="Cross Validation"
- the goal is to generalize to the world, not to fit certain training or testing data set perfectly.
- we need the training and testing data to be IID: independently identically distributed (Fundamental assumption)
- We can split the data into folds and training while leaving one out and use the one for testing. then we average the error of all combinations. Pick the model with the lowest cross-validation error.
data:image/s3,"s3://crabby-images/0758d/0758d372613443971c6b8d700780ed4a0b334bb0" alt="Fitting curve"
data:image/s3,"s3://crabby-images/53cef/53cef8671f2aa5ae887671b5ad71540a7b913694" alt="Other input spaces"
data:image/s3,"s3://crabby-images/7badd/7baddc1cf91ca0267dd74fd14d8891f02a8ef3f7" alt="Recap Regression"
SL3 Neural Networks
data:image/s3,"s3://crabby-images/5df43/5df4363f29c09362ebc9b7162393407237000ec5" alt="Neuro Networks"
- A neuron will get input if all the input reach the firing threshold, it will fire
data:image/s3,"s3://crabby-images/cd1c5/cd1c50e626e40906d37a7a653eacb4216698efd8" alt="Perceptron"
- X<sub>1</sub>,X<sub>2</sub>,... are inputs
- w<sub>1</sub>,w<sub>2</sub>,... are weights
- if the sum of all the weighted inputs is activation, if it passes a threshold θ, the output y=1; if not, y=0.
data:image/s3,"s3://crabby-images/339a0/339a051de520bd7393280dbd400b5dafad5a1d2b" alt="Quiz 1: output is given inputs and weights"
data:image/s3,"s3://crabby-images/46b25/46b255a1c5259d1d472835ce8b47e83edb6d4ffa" alt="How Powerful is a Perceptron Unit"
- weight matters a lot when deciding the line to split the plane.
data:image/s3,"s3://crabby-images/d1889/d18894e4343db192c66e8c3656ccf48a0ac70d71" alt="Quiz 2: Neural network can represent AND"
- When X<sub>1</sub>=0 and X<sub>2</sub>=0, y=0
- When X<sub>1</sub>=0 and X<sub>2</sub>=1, y=0
- When X<sub>1</sub>=1 and X<sub>2</sub>=0, y=0
- When X<sub>1</sub>=1 and X<sub>2</sub>=1, y=1; so y represents AND
data:image/s3,"s3://crabby-images/5ff87/5ff879d7f47b83752d04a3ecf44d038febbabd1d" alt="Quiz 3: Neural network can represent OR"
data:image/s3,"s3://crabby-images/97542/97542f6f1f0e587fa6caf1ff94207a1f884136af" alt="Quiz 4: Neural network can represent NOT"
data:image/s3,"s3://crabby-images/abfbb/abfbb7b6c2083888fa7f5ba38899bb2432d259a3" alt="Quiz 5: Neural network can represent XOR"
Perceptron Training
data:image/s3,"s3://crabby-images/2d82a/2d82a09246e4c6d1576445e54e498810b415bec3" alt="Perceptron Training"
data:image/s3,"s3://crabby-images/02660/026602ade9921fb1271f36ae5baedfbe13383ea9" alt="Perceptron Training"
- the Perceptron Rule updates weight with weight_change ( Δw<sub>i</sub>). ( Δw<sub>i</sub>) is defined by the learning rate, the difference between target and output and input.
- if the data is linearly separable, Perceptron rule will find it, in finite iterations ( but it's hard to know how many iterations are needed)
data:image/s3,"s3://crabby-images/c9d3d/c9d3daac3dce2aa07928af6f2d7615fa23e9857f" alt="Gradient Descent"
data:image/s3,"s3://crabby-images/522b9/522b9be39d7fb2f5f71451404e5e9d9ad70a3109" alt="Quiz 6: Comparison of Learning Rules Quiz"
data:image/s3,"s3://crabby-images/cdd3c/cdd3c2cce02482cbeead16196907a02c79518e73" alt="Sigmoid"
data:image/s3,"s3://crabby-images/6ec85/6ec85ca500b7009e4400b2e567bc02de6dd983fb" alt="Neural Network"
data:image/s3,"s3://crabby-images/1422a/1422a3495c12f60c5087f67f2a341d830e437ef6" alt="Optimizing Weights"
data:image/s3,"s3://crabby-images/47cc4/47cc439963ddbab777aa908140cf11f4c6b46f24" alt="Restriction Bias"
Preference Bias
data:image/s3,"s3://crabby-images/7538e/7538ea843aa15615a71d953701c5e407328b04c4" alt="Preference Bias"
- Preference bias tells you something about the algorithm that you are using to learn.
- Prefer simpler explanations
- do not multiply unnecessarily ( not fitting the data).
Summary
data:image/s3,"s3://crabby-images/dd383/dd383d213cd7d1e517d1373573d9ae889c1da875" alt="Summary"
这些内容本该是Jan 11 – 17, 2016之间完成的,但是因为准备面试,拖到了一周之后。于是不得不一周补两周的内容。现在去看本周内容了……下次不要拖了,一拖就压力陡增啊。
2016-01-21 看到 Cross Validation in SL2
2016-01-22 继续SL3,初稿完成。