第十周根本没时间上课,只能利用第11周的春假补全。
This week: going over Feature Transformation this week, and starting on Information Theory.
data:image/s3,"s3://crabby-images/57630/57630eb969e7ca48f90434e47737d89756f7cdcc" alt="Defination of Feature Ransformation"
- Feature selection is a subset of feature transformation
- Transformation operator is linear combinations of original features
Why do Feature Transformation
data:image/s3,"s3://crabby-images/aee69/aee699156383021996460dbb034e14b3f020001b" alt="Example of words"
- XOR, Kernel methods, Neural networks already do FT.
- ad hoc Information Retrieval Problem: finding documents within
a corpus that are relevant to an information need specified using a query. (Query is unknown)
- Problems of Information Retrieval:
- Polysemy: e.g. a word have multiple meanings; cause false positive problem
- Synonymy: e.g. a meaning can be expressed by multiple words. can cause false negatives problems.
PCA
This paper does a fantastic job building the intuition and implementation behind PCA
An eigenproblem is a computational problem that can be solved by finding the eigenvalues and/or eigenvectors of a matrix. In PCA, we are analyzing the covariance matrix (see the paper for details)
PCA Features
- maximize variance
- mutually orthogonal (every components are perpendicular to each other)
- Global algorithm: the resulted components have a global constraint which is that they must be orthogonal
it gives best reconstruction
EigenValue monotonically not increasing and 0 eigenvalue = ignorable (irrelevant, maybe not useful).
It's well studied and fast to run.
- it's like a classification. and using a filtering method to select dimensions to use.
- PCA is about finding
ICA
ICA has also been applied to the information retrieval problem, in a paper written by Charles himself
data:image/s3,"s3://crabby-images/23bc7/23bc70725b5d0ea4e0de7aae2bf3a85b5aaab190" alt="ICA"
data:image/s3,"s3://crabby-images/4e870/4e870116072d1e600c8afdf5395610389e8cca45" alt=""
- find components that are statistically independent from each other using mutual information.
- Designed to solve the blind source separation problem.
- Model: given observables, find hidden variables.
data:image/s3,"s3://crabby-images/c6c8c/c6c8ceed1cb423ea50a70192163dcf8d1e311992" alt="quize 1: defining features for PAC and ICA"
data:image/s3,"s3://crabby-images/3512f/3512f384889e0af51c90a6a601eba6d1756adbb9" alt="More PCA vs ICA"
- ICA is more suitable for BSS problems and is directional.
- Eg,
- PCA on faces will separate image based on brightness and average faces. ICA will get features such as nose, mouth etc, which are basic components of a face.
Alternatives:
data:image/s3,"s3://crabby-images/a7cea/a7cea6f00310eeddcaabeb96de46c3bce7e8d2ed" alt="RCA"
Random components Analysis: generates random directions
- Can project to smaller dimensions (m << n)but in practice often have more dimensions than PCA.
- Can project to higher dimensions (m > n)
- It works and works very fast.
data:image/s3,"s3://crabby-images/f3a9d/f3a9d1e6bfb7d3ef632a362ff7c9cbbad8429196" alt="LDA"
- Linear Discriminant analysis: find a projection that discriminates based on the label
wrap up
data:image/s3,"s3://crabby-images/7bdcc/7bdccf2f413585ee7658de2c08d61a818faad646" alt="Wrap up"
This excellent paper is a great resource for the Feature Transformation methods from this course, and beyond
2016-03-17 初稿
2016-03-26 补完