data:image/s3,"s3://crabby-images/0a266/0a266154bc48e8e2bca87a9aa15446ae233e44de" alt="Reason for Feature selection"
- The aim of Feature selection can be find the knowledge in data and reduce dimentionality of data.
- with less features, it's easier to interpret data and get insight with data
- the amount of data needed for solving ML problems grows exponentially as the number of features grows. So it's better to reduce the number of features.
data:image/s3,"s3://crabby-images/f74d0/f74d01e81be2362cbf7c127b9fd4acd418b4d560" alt="Quiz 1: How hard is the feature selection problem?"
- It is NP-hard and it is exponential.
Filtering and Wrapping
data:image/s3,"s3://crabby-images/354d6/354d69625625dae120ff7e7abb3dab9aaf3bcee8" alt="Filtering and Wrapping"
- Filtering is forward flow, there is no feedback from learning to the searching algorithm
- Wrapping has the searching algorithm inside with the learning algorithm and allows feedback from learning to the search algorithm.
data:image/s3,"s3://crabby-images/aef94/aef94af04f22534f78797074b046fd982761bf35" alt="Filtering example"
- Filtering
- Pros: fast
- Cons: 1. slow for isolated features; 2 ignores the learning problem
- Wrapping
- Pros: 1. takes into account of model bias; 2. takes into account of learning
- cons: very slow.
- example of filtering: use DT to select important features for the learning algorithms (e.g. kNN).
data:image/s3,"s3://crabby-images/c9a18/c9a18113ba75010658c43b84a0e49cdf9e709548" alt="How to do filtering and wrapping"
For filtering Criteria:
- Information gain
- variation, entropy
- independent/non-redundant
How to do Wrapping:
- hill climbing
- randomized optimization
- Forward search: find the best feature first. then in the rest feature, find one and combine with the first selected feature which give the best the score and keep it; then find the one which get the best score when combined with the selected……
- backward search: remove one, for the rest of combinations, keep the one does the best, repeat... until the score change too much?
data:image/s3,"s3://crabby-images/401a8/401a8afc84d5eded848e3c8fa0a5ee9b61a5d82b" alt="Quiz2: using filtering, choose the features to get zero training error"
- For DT, it's easy. when a == 0, then label is -; when a == 1, then split on b, and when b == 0, label is -; when b == 1, label is +. This is a AND B.
- For the perceptron (w<sup>T</sup>x > 0), it is not that easy to see the results. With a and b, the problem is not solvable. adding c with weight of -1, the problem can be solved. Although c does not offer any information, it is still useful in this case.
data:image/s3,"s3://crabby-images/89d7d/89d7d2038753a14c6e52755e9cd62eab3f61b86e" alt="Relevance"
- B.O.C:Bayes optimal classifier. Relevance only concerns B.O.C.
- Strongly relevant: removing x degrades B.O.C, then x is strongly relevent
- weakly relevant: when x is not strongly relevent and exits subset of features that addig x to it improves B.O.C
- irrelevant: NOT( strongly or weakly relevant)
data:image/s3,"s3://crabby-images/8e2cd/8e2cd4ea223f1e322d86f27ad299ccfc0c7455e6" alt="Usefulness"
data:image/s3,"s3://crabby-images/8e627/8e6275b1daa2c0279fa5964819182c58836d8c07" alt="Wrap up"
2016-03-16