Week 06 tasks:
- Lectures: Randomized Optimization.
- Reading: Chapter 9 of Mitchell, as well as the overview on the No Free Lunch Theorem.
data:image/s3,"s3://crabby-images/15807/158079d34164f8a90e95f9162485f056ea708cbb" alt="Randomized Optimization"
data:image/s3,"s3://crabby-images/69e7a/69e7a5182b8d22944684fdebec2e6a2f163d3ffe" alt="Optimization"
- optimization is to find a x<sup>*</sup> in input space X to make the objective function has the maximum (or close to maximum).
Optimize Me
data:image/s3,"s3://crabby-images/6ca5c/6ca5c07489b1ed62700d773fe72433f2fc248bb7" alt="Quiz: best x"
- The first question, since the input space is small, we can plot the function and find the largest f(x).
- the second the functions, we can try to use calculus: Set the derivative f(x) to 0 and solve the derivative function ( which will be hard to solve since it's a cubic function). Alternatively, we can try to plot the data using subgroup of x.
Optimization Approaches
data:image/s3,"s3://crabby-images/fd3cd/fd3cdcb02b459996e03f6875407552d9d12c36e2" alt="Optimization Approaches"
- Three approaches:
- generate & test: when we have a small input space but a complex function
- using calculus: when the function has derivative and the derivative function is solvable when its value is set to 0
- Newton's method: when the function has derivative and it can be improved iteratively.
- When these approaches does not work, then Randomized Optimization.
data:image/s3,"s3://crabby-images/4dc64/4dc64cedeaede64d844738300c0be53260297293" alt="Hill Climbing"
- the hill climbing algorithm will stuck at a local maxima.
data:image/s3,"s3://crabby-images/bdc29/bdc29620220cabc8ea1c804f1b28998b96eb21ba" alt="Quiz 2: find my word"
Random Restart Hill Climbing
data:image/s3,"s3://crabby-images/a06c9/a06c94abf2ea160f1f096991a21ceae77f54d860" alt="Random Restart Hill Climbing"
- how to avoid the unlucky starting point?
- checking and make sure the starting point is not visited before
- make sure the starting point is faraway from known local maxima
data:image/s3,"s3://crabby-images/09fe3/09fe370e16cbfced9330e8d433204351059d971e" alt="Quiz3: randomized Hill climbing"
- The six points under the maxima hill are special
- 4 points under the global maxima will take average 4 steps to get to the global maxima. and that's the 4/28*(4)
- the 2 basin points, half of the time will lead up in global maxima, and half of the time will lead to local maxima (and restart).
- all the other 22 points will lead to local maxima and then start a new round with a randomized starting point.
- if keep track of the points visited, the algorithm might do better than 28 on average.
- if the attraction basin under the local maxima is large, hill climbing will do better.
Annealing algorithm
data:image/s3,"s3://crabby-images/4cb03/4cb03440b07ac5f7d55077c3c2eaf9b95224feef" alt="Simulated Annealing"
- One step further than hill climbing: allow downwards to "explore"
data:image/s3,"s3://crabby-images/33d2f/33d2f7ff857ce60a5d4035e4482be834c3f2be60" alt="Annealing algorithm"
- if f(x<sub>t</sub>) and f(x) is very close, then f(x<sub>t</sub>) -f(x) is close to 0 and p will be 1.
- If T is really big, say infinity, then f(x<sub>t</sub>) -f(x) doesn't matter then. P will be 1, and x will jump to x<sub>t</sub>
- if T is really small, close to 0, p will climb, not jump.
data:image/s3,"s3://crabby-images/20898/20898f7bd7fbbc54a63f8ef72818556ecba4544d" alt="Properties of Simulated Annealing"
- T -> 0: like hill climbing
- T-> ∞: random walk
- Desearse T slowly: As T decreases, it's harder and harder to jump over local maxima, and algorithm will reach global maxima
Genetic Algorithms
data:image/s3,"s3://crabby-images/86698/86698710032e2d05b2e7539682afe568f009089e" alt="GA"
- GA is like random restart but doing the restart in parallel.
- Crossover - population holds information.
data:image/s3,"s3://crabby-images/39c75/39c75b5595345d2baedbc012d6886aaf75544c57" alt="GA Skeleton"
- Define "most fit"
- top half or truncate selection:the half individuals with the largest score
- weighted prob. or roulette wheel: weighted fitness function with probability.
- what's crossover?
data:image/s3,"s3://crabby-images/80e25/80e25007c2b8a37ac47854f846424bed3fa48dda" alt="Crossover Example"
- Uniform crossover: randomization at each position
data:image/s3,"s3://crabby-images/39174/39174e5bebbe150faa7fd597e429620c70f88257" alt="Recap"
MIMIC
data:image/s3,"s3://crabby-images/dd211/dd2114843b1755af8600df404f996cee7262b3ad" alt="MiMIC"
data:image/s3,"s3://crabby-images/adc1e/adc1e43506b849941f66d16c7bc1ef78c6d3343f" alt="Quiz 4:"
- when θ is min, then P is the uniform distribution of X
- when θ is max, we get the maxima or optima of the distribution
data:image/s3,"s3://crabby-images/20a72/20a7291f8627e39f887ca42b68417baccf8ef76c" alt="Pseudo code"
- Given a threshold θ<sub>t</sub>, generate samples from P<sup>θ<sub>t</sub></sup>(x), then set θ<sub>t+1</sub> to be n<sup>th</sup> percentage and retain only those samples, s.t. f(x)>=θ<sub>t_1</sub>
- instead of using these samples, we estimate P<sup>θ<sub>t+1</sub></sup>(x), and then generate new samples from it. then repeat the process again and again until converge.
- Two assumptions:
- P<sup>θ<sub>t</sub></sup>(x) is estimable
- P<sup>θ<sub>t+1</sub></sup> is close enough to P<sup>θ<sub>t</sub></sup>
- this is a lot like Genetic algorithm but P<sup>θ</sup> and P<sup>θ+ ϵ</sup> can preserve the structure.
data:image/s3,"s3://crabby-images/d3540/d354023275ddfa3ccf83d1906aacaea2a018d447" alt="Estimating Distributions"
- P(x) is the combination of the p of each x in X=[x1 x2 .... x<sub>n</sub>] given others in X. To estimate P(x) this way, we will need huge amount of data ( exponential to n).
- Dependent tree: each node only have one parent.
- it can represent relationships
- and sampling dependent tree is easy
- it's a general form of crossover and it does not require locality.
Finding Dependency Trees
data:image/s3,"s3://crabby-images/b0467/b0467309c87ad7304642f080d8ab200214d9caef" alt="Finding Dependency Trees"
- Minimize Divergence of P and P_hat<sub>π</sub>, Divergency can be written as entropy.
- build a cost function by remove -h(p) and add entropy -h(x<sub>i</sub>), then we get the mutual information between x and π
- Finding Dependency Trees is now converted to a problem to maximize mutual information between x and it's parent. or in another word, to find the parent who knows the most about the node.
data:image/s3,"s3://crabby-images/c0567/c05671a3f326a3128cb2ac3c3f9f7e578dea2b64" alt="Finding Dependency Trees"
- This can be solved by solving the maximum spanning tree. Using Prim.
- The prim algorithms is fairly fast algorithm ( polynomial to the number of edges of a graph) and it's good for dense connected graph.
data:image/s3,"s3://crabby-images/ab418/ab41885468685a4ba7b9ac2e191e156445b6e70f" alt="back to pseudo code"
- dependent tree is used to estimate P given the samples from the previous step.
data:image/s3,"s3://crabby-images/23e86/23e861e0031d839bdeae714e66872b8c77813006" alt="Quiz 5: Probability Distribution"
- what distribution can represent the maxima of the problems on left column.
data:image/s3,"s3://crabby-images/b41a9/b41a923b7242b855f6cb0799eab11213d4c16666" alt="Practical Matters"
- MIMIC evaluate cost function less when comparing to Simulated Annealing, but it usually takes longer to run.
- MIMIC will get more information in and out
2016-02-17 完成 Genetic Algorithms
2016-02-19 初稿