"There is nothing more deceptive
than an obvious fact."
Sherlock Holm, Sir Arthur Conan Doyle
_________
Part II:
A cluster of data is an important region
of interest (ROI) in machine learning. It has location, orientation and individual
points. To better describe its characteristics and doing math manipulation, cluster
usually converts to matrix.
In 2D, location is the centroid
(X-mean, Y-mean) and all the points (Xi, Yi) of the cluster. From its center, eigenvectors
very much define the orientation of the data. Every eigenvector
can be scaled to be a unit vector. This is also
implied an angle associate with a vector of unit length 1. The angle is not
often pointed out in a lot of text. The fact is so obvious, sometime the angle
being neglected and not mentioned at all. A vector should be associated with an
angle to complete its property in a coordinate system. Vector angle is the important missing link to gain more
understanding of eigen- vector. Arcsine, arccosine, and arctangent are common
formulas to find this angle from (x, y) components of the vector.
Figure-5
The unit vector r=1 of figure-5 is the eigenvector start at
center (u, v) and the tip at (x, y). And the parametric equations as x
= u + r*cos(theta) , y = v + r*sin(theta).
It also illustrates the Pythagoras theorem
of cos(theta)**2 + sin(theta)**2 = 1. This is
also applied to N-dimension eigenvector (more
features or variables). The norm is always equal to 1 for
eigenvector.
---------------------------- <(0_0)>’
-------------------------
Information age comes with massive amount of data and features
to consider. Typically, the input variables are too large for useful analysis. PCA
is a technique used to reduce the high-dimensional datasets to manageable fewer
dimensions, while still preserving the original structure and relationships. Information loss is at minimum
from original data.
Most of the time the ROI is an oval
shape. We will define a random data set and layout the process of analyzing PCA
using python. Let’s go over some matrix properties:
Q , U , V : orthogonal matrix Am.n : rectangle matrix
S , D : diagonal matrix I : identity matrix
S = ST D = DT
Q-1 = QT QQT
= QTQ = I UTU = Im.m VTV = In.n
AAT/(1-n) - Covariance in matrix form
Amxn = Umxm
Smxn VTnxn (SVD) - singular value decomposition
Some properties of SVD:
A = USVT , AT = VSUT <==
(USVT)T = (V T) T (US)T = VSTUT
ATA = VSUTUSVT
ATA = VS2VT
ATAV = VS2VTV
= VS2 ==> ATAvi = σi2vi
AAT U = USVT VSUTU = US2 ==> AATui = σi2ui
A(V) = USVT(V) = US ==> Avi = σiui
A(V) = USVT(V) = US ==> Avi = σiui
ATAvi = σi2vi and AATui = σi2ui (Ax = λx form) are of the equation A(V) = USVT(V) = US. Which is equivalent to Avi = σiui. Calculating the SVD consists of finding the eigenvalues or variance ( λi ) and eigenvectors (ui, vi) of AAT and ATA . ui is called left singular vector (unit eigenvectors of AAT). vi is called right singular vector (unit eigenvectors of ATA). The singular values (sigma or standard deviation) of Amxn are square roots of eigenvalues (σi = √λi) from AAT or ATA. The sigma values are the diagonal entries of the S or D matrix. They are arranged in descending order.
Any matrix data, when it can
decompose to eigenvectors then the location (centroid) and orientation (angle)
of the data cluster are defined. The largest eigenvalue and its corresponding
eigenvector define the 1st principal component axis. The 2nd
principal component axis is defined by the 2nd largest eigenvalue and
its corresponding eigenvector, … and so on.
The singular values or square root of
eigenvalue/variance is the standard deviation or one sigma in the corresponding
eigenvector direction. We can map out the three sigma region (ROI - Region Of Interest) to further
analyze the data cluster.
Any (Xi, Yi) cluster data point multiply by
the unit vectors or eigenvectors (dot product) will become the new (Xi, Yi)’ of principal component axes.
Basically it uses the unit vectors to scale data to the 1st and 2nd
principal axes. In 2D, cos(theta) and sin(theta) are the loading or weighted
factors for the new eigenvector direction or principal component axes.
Another way of looking at it is,
when the data cluster is rotated by negative theta we get to the principal
component axes. The frame of reference axis changes from the existing (Xi, Yi)
to new (Xi, Yi)’ of principal component
axes.
For 3D-(Xi, Yi, Zi) dataset, let Ud be any column eigenvector:
U1 = [C1, C2, C3], U2 = [C4, C5, C6] # some scalars/constants
Xi’ = [Xi, Yi, Zi] @ U1 = Xi * C1 + Yi * C2 + Zi * C3
Yi’ = [Xi, Yi, Zi] @ U2 = Xi * C4 + Yi * C5 + Zi * C6
Xi’ is on 1st principal axis
PC1, Yi’ is on 2nd
principal axis PC2.
( d = 1, 2, 3, … k and i = 1, 2, 3, … n k <<< n ).
( @ is the dot product or
matrix multiplication in python 3.x ).
In general, data transformation is a
linear combination equation of weights / loadings (eigenvector values of Ud)
and data matrix Amxn. Depending on python library and the data, you need to
transpose the matrix to make it work. Applying dot product, we have:
A’n.d = (Anxm) @ (Um.d) or A’d.n = (Ud.m) @ (Amxn)
--------------------------- (*^_^*)’
---------------------------
When using python, the data matrix
column and row should be labeled correctly depending on the parameters in the
module. That means data may have to transpose sometimes. We have to be aware of
the library usage. In this case (m) is feature variables, (n) is number of
observations, (d) is the reduced dimension. The data should be centered at (0, 0),
or zero-mean. Python subtract mean from data matrix when utilize covariance matrix or PCA by default.
Principal component or eigenvector:
PCmxd = (VT
)T = V from
Anxm = Unxn Snxm VTmxm
Feature reduction formulas –
The new feature matrix is the dot
product of data matrix and
eigenvector (unit vector). Later, the
scree plot (part III) will help to decide how many eigenvectors
to include. Transformed cluster data are the projection of
existing (Xi, Yi, ...) points onto the principal component axes:
to include. Transformed cluster data are the projection of
existing (Xi, Yi, ...) points onto the principal component axes:
Fnxd = Anxm @ (PC)mxd
# ----------------------------------------------------------------
PYTHON SESSION BELOW EXPLAINED:
Initially, a simple random data cluster of x-y variables using
covariance matrix is compared with PCA using sklearn library.
This shows the matrix transformation by eigenvector and rotation
matrix in action. After that, iris dataset will be utilized to
demonstrate using PCA in
production (explained in part III).
Figure-6 Figure-7
Figure-6 Figure-7
Figure-7
shows the final data sample position after matrix transformation from Figure-6
(translation - derotation). The steps are reversible (major application in data compression).
The ellipse is mapped out with 3-sigma (a-b, major-minor axes) to analyze the cluster's ROI further.
After getting
Figure-7 from FINISHED
PYTHON SESSION below,
we can verify the manual process of using covariance and rotation matrix (translation-derotation)
with sklearn PCA. The projected data onto principal component axes should be
the same in both cases.
array([[37.6662, 30.2462],
[30.2462, 33.5037]])
Same PCA-transform or projected data result with/without zero-mean:
We can compute the projected data using either VT.T, eigVec, or pca.components_.T:
==> Amxn = (Um.p) @ A’p.n
That is how data compression work. The more eigenvectors or principal components we use, the better quality of the retrieved data. We can find the balance or “good choice” of how many eigenvectors to use with scree plot (part III).
By switching row/column and move eigenvector matrix around you can achieve some amazing result with matrix multiplication.
Let’s try out with existing data:
array([[-1.6026, -4.8529],
Before doing shearing, scaling, rotating
of a cluster matrix, data center should move to origin (zero-mean) to prevent
unwanted distortion. In python, numpy and sklearn library, both covariance and PCA
methods have built-in removal of the mean.
Same covariance result with/without
zero-mean:
>>> np.cov(np.array([x1,y1])) # as is, no mean removal
array([[37.6662, 30.2462],
[30.2462, 33.5037]])
>>> np.cov(np.array([x1c,y1c])) # zero-mean matrix sample
>>> np.cov(np.array([x1,y1])) # as is, no mean removal
array([[37.6662, 30.2462],
[30.2462, 33.5037]])
>>> np.cov(np.array([x1c,y1c])) # zero-mean matrix sample
array([[37.6662, 30.2462],
[30.2462, 33.5037]])
Same PCA-transform or projected data result with/without zero-mean:
>>> pca = PCA()
# instantiate PCA class
>>> x1y1_pca =
pca.fit_transform(np.array([x1,y1]).T)
# PC1_2 data
>>> x1y1_mean_pca =
pca.fit_transform(np.array([x1c,y1c]).T)
>>> x1y1_pca[:3] # no mean removal of matrix sample, first 3 rows
array([[-4.4831, 2.4537],
[ 1.6845, 1.5764],
[-0.9238, 0.0164]])
>>> x1y1_mean_pca[:3] #
zero-mean matrix sample
array([[-4.4831, 2.4537],
[ 1.6845, 1.5764],
[-0.9238, 0.0164]])
Now, comparing PCA() projected data with
matrix translation-derotation result:
>>> (rot_M @ (x1c, y1c))[:,:3].T
array([[-4.4831, -2.4537],
[ 1.6845, -1.5764],
[-0.9238, -0.0164]])
They are the same except for the
sign flipping sometimes. Because eigenvector can be in opposite direction (180
degrees out – But it is still on the same principal component axis line).
# ---------------------------------------------------------------
One way of getting eigenvectors is
from numpy.linalg.svd() or scipy.linalg.svd(). This method accepts zero-mean
data and variables/features in columns.
# sample matrix
factorized into U, S, VT -
features in column
>>> U, S, VT = np.linalg.svd(np.array([x1c,y1c]).T)
>>> PC = VT.T # transpose to compare
with eigVec (column format)
>>> PC
array([[ 0.731 , -0.6824],
[ 0.6824,
0.731 ]])
>>> eigVec # from covariance
matrix factorization, np.linalg.eig()
array([[ 0.731 , -0.6824],
[ 0.6824,
0.731 ]])
>>> pca.fit(np.array([x1,y1]).T)
>>> pca.components_.T # column eigenvectors
array([[ 0.731 , 0.6824],
[ 0.6824, -0.731 ]]) We can compute the projected data using either VT.T, eigVec, or pca.components_.T:
>>> F = (np.array([x1c,y1c]).T) @ PC # same values as derotation
>>> F[:3]
array([[-4.4831, -2.4537],
[ 1.6845, -1.5764],
[-0.9238, -0.0164]])
PCA() from scikit-learn and svd() of
numpy-scipy library are fast ways to get the result. It is amazing with a few
lines of code in python, we can get answer quickly. That is not possible with
some other programming languages.
# ------------------------------------------------------------------
Most of the time we can solve one
unknown from other variables in an equation. Independent and respond variables
can also be switched around (reversible process). This is generally held true
in matrix equations. Considering the equations:
A’n.p = (Anxm) @ (Um.p) --> A’n.p (Um.p)T = (Anxm) @ (Um.p)(Um.p)T
==> Anxm
= A’n.p @ (Up.m)
A’p.n = (Up.m) @ (Amxn) --> (Up.m)T A’p.n = (Up.m)T(Up.m)
@ (Amxn)
That is how data compression work. The more eigenvectors or principal components we use, the better quality of the retrieved data. We can find the balance or “good choice” of how many eigenvectors to use with scree plot (part III).
By switching row/column and move eigenvector matrix around you can achieve some amazing result with matrix multiplication.
Let’s try out with existing data:
>>> A = x1y1_pca @ eigVec # transformed matrix multiply
eigenvector
>>> A[:3] # list first three
rows of zero-mean data
array([[-1.6026, 4.8529],
[ 2.3071,
0.0028],
[-0.6641,
0.6423]])
Verify:
>>> (xyDF-xyDF.mean(axis=0))[:3] # original data minus
mean
x1 y1
0 -1.602601 -4.85292
1
2.307078 -0.00283
2 -0.664081 -0.64234
Try x1y1_mean_pca and pca.components_.T
matrix :
>>> ( x1y1_mean_pca @ pca.components_.T )[:3]
>>> ( x1y1_mean_pca @ pca.components_.T )[:3]
[
2.3071, -0.0028],
[-0.6641, -0.6423]])
Because most method remove the mean, We have to subtract mean from original data to compare. Again, data change sign due to eigenvector flipping 180° out.
When working with matrix, it would help to start thinking of vectors as columns or rows of matrix. Multiplication between matrices is the dot products of row vectors by column vectors. Features or variables must be aligned with python library method. Also, they should match correctly with other matrix during multiplication. Each eigenvector composes of individual weight variable values and must be paired accordingly with other matrix features when using dot product.
# ----- FIGURE-5-6-7 DISPLAY AFTER FINISHED PYTHON SESSION ------
# continuation from Part-I with same libraries/modules
plt.figure(5)
plt.figure(5)
#fig,
ax = plt.subplots() # figure and axis
plt.xlim(-.5, 5)
# set x-axis limit
plt.ylim(-.5, 5)
#ax.set_aspect('equal')
# set aspect ratio of axis
plt.axis('equal')
#
initialize center(u, v) values and others
u,
v, a, b, rot, r = 2.5,
2.5, 1,
.5,
np.pi/6, 2
#
default 50 points or angles in radian
theta
= np.linspace(0, 2*np.pi)
plt.axhline(ls='--',
color='grey') # line
thru origin
plt.axvline(ls='--',
color='grey')
#
draw a magenta circle at (u, v) of r=2
c3
= plt.Circle([u,v], 2,
color='m', fill=False, lw=3)
plt.gca().add_artist(c3)
# add circle c3 to figure
plt.plot(np.array([0,
0.5*2, .5*2,
0]) + u,
np.array([0,
.866*2,
0, 0])
+ v, lw=3) # triangle
plt.plot([u,
u], [0, 5],
color='g') # vertical
line thru center
plt.plot([0,
5], [v, v], color='g') #
horizontal line thru center
#
text formatting of figure 5
plt.text(u
+ .5*2.1,
v + .866*2.5,
'Eigenvector',
size=16)
plt.text(u
+ .5*2.1,
v + .866*2.1,
'(x, y)',
size=16)
plt.text(u
+ .1,
v -.4,
r'cos($\theta$)',
size=18)
plt.text(u
+ .5*2.1,
v + .5,
r'sin($\theta$)',
size=18)
plt.text(1.6 , v + .1,
'(u, v)',
size=20, color='r')
plt.text(u
+ .25,
v + 1,
'1',
size=20, color='r')
plt.text(u
+ .15,
v + .1,
r'$\theta$)',
size=20, color='r')
plt.figure(6)
sp.random.seed(15)
# reproducible, to get same
sample
#
x1 mean at 20 with spread of 6 std of 100 points, multiply
#
by tan to get y1, add 3 std for small variation in y1
x1 = sp.random.normal(20,6,100)
y1 = x1*np.tan(sp.pi/5)
+ sp.random.normal(0,3,100)
#
convert x1 & y1 to pandas dataframe, use xyDF.x1.values ,
# xyDF.y1.values for calculation - less likely to cause error later
xyDF
= pd.DataFrame({'x1': x1, 'y1': y1}) # view labeled data easier
#
covariance of x1 & y1, notice transpose to match method
covXY
= np.cov(xyDF.T) # method default
row as variable
#
unpacking covariance into eigenvalues and eigenvectors
eigVal,
eigVec = np.linalg.eig(covXY)
sig1,
sig2 = eigVal**0.5 # singular value or sigma
#
initialize new center(u, v) and 3 sigma values of a & b
u,
v, a, b = x1.mean(), y1.mean(), 3*sig1,
3*sig2
#
default 50 points for parametric equations
theta
= np.linspace(0, 2*np.pi)
#
angle of 1st eigenvector or principal axis
angle
= np.degrees(np.arccos(eigVec[0,0]))
#
define an ellipse using parametric eq. at new center(u, v)
#
with zero angle as reference, draw as green ellipse later
ell_M = np.array([u + a*np.cos(theta), v + b*np.sin(theta)])
#
red ellipse of a & b at three sigma value from the
#
data center, rotated at angle of 1st eigenvector
e1
= mp.Ellipse([u,v], 2*a,
2*b, angle, lw=3, fill=False, color='r')
plt.axhline(ls='--',
color='grey') # grey
line thru origin
plt.axvline(ls='--',
color='grey')
plt.scatter(x1,
y1, 100,
'm',
edgecolor='y') # plot
sample data
plt.plot(ell_M[0,:],ell_M[1,:],
'g--',
lw=3) # draw
green ellipse
plt.gca().add_artist(e1)
# add red ellipse to figure
#
draw three sigma lines of data sample (ROI - red ellipse)
sigma3
= np.array([[u,v]]).T + (3*eigVal**0.5) * eigVec
plt.plot([u,
sigma3[0,0]], [v, sigma3[1,0]], lw=3, color='b')
plt.plot([u,
sigma3[0,1]], [v, sigma3[1,1]], lw=3, color='b')
#
move sample center to origin and derotation of red ellipse
plt.figure(7)
plt.axhline(ls='-',
color='b', lw=2)
plt.axvline(ls='-',
color='b', lw=2)
x1c
= x1-x1.mean() # removed
mean from x1 matrix sample
y1c
= y1-y1.mean() # removed
mean from y1 matrix sample
#
negative rotation angle of eigenvector
rotNeg
= - np.arccos(eigVec[0,0])
#
negative rotation matrix
rot_M
= np.array([[np.cos(rotNeg), -np.sin(rotNeg)],
[np.sin(rotNeg), np.cos(rotNeg)]])
#
transformation of data sample by rotation matrix
el1,
el2 = rot_M @ (x1c, y1c)
#
plot data sample at new center (0, 0)
plt.scatter(el1,
el2, 100,
'm',
edgecolor='y')
#
green ellipse at origin, after translation & derotation
# of red ellipse
e2 = mp.Ellipse([0,0], 2*a,
2*b, angle=0, lw=3,
fill=False, color='g')
fill=False, color='g')
plt.gca().add_artist(e2)
# add green ellipse to figure 7
plt.text(1,
19, 'PC_2',
size=18, color='purple') # label
x-y axes
plt.text(23.5, 3, 'PC_1', size=18, color='purple')
plt.text(23.5, 3, 'PC_1', size=18, color='purple')
#plt.xlabel('PC_1',
color='purple', fontsize=18)
#plt.ylabel('PC_2', color='purple', fontsize=18)
#plt.ylabel('PC_2', color='purple', fontsize=18)
plt.axis('equal') # set aspect=1 for xy-axis
plt.show()
================================================================