PCA is the process of computing the principal components (the axes with the highest variance - in other words, the most descriptive directions) and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest.
Generates *new* features that may be highly descriptive of the data, as opposed to mere feature selection -ASU CSE 575
Disadvantages:
Assumes linearity of data; but non-linear extensions exist
when multiple classes overlap in one of the non-principal component dimensions, misclassification will get bad. In such cases, Linear Discriminant Analysis (LDA) may be a better approach.