Nuts and Bolts Engineering Corner: April 2020

"There is nothing more deceptive than an obvious fact."

Sherlock Holm, Sir Arthur Conan Doyle _________

Part II:

A cluster of data is an important region of interest (ROI) in machine learning. It has location, orientation and individual points. To better describe its characteristics and doing math manipulation, cluster usually converts to matrix.

In 2D, location is the centroid (X-mean, Y-mean) and all the points (Xi, Yi) of the cluster. From its center, eigenvectors very much define the orientation of the data. Every eigenvector can be scaled to be a unit vector. This is also implied an angle associate with a vector of unit length 1. The angle is not often pointed out in a lot of text. The fact is so obvious, sometime the angle being neglected and not mentioned at all. A vector should be associated with an angle to complete its property in a coordinate system. Vector angle is the important missing link to gain more understanding of eigen- vector. Arcsine, arccosine, and arctangent are common formulas to find this angle from (x, y) components of the vector.

Figure-5

The unit vector r=1 of figure-5 is the eigenvector start at center (u, v) and the tip at (x, y). And the parametric equations as x = u + r*cos(theta) , y = v + r*sin(theta).

It also illustrates the Pythagoras theorem of cos(theta)**2 + sin(theta)**2 = 1. This is also applied to N-dimension eigenvector (more features or variables). The norm is always equal to 1 for eigenvector.

sin(theta) and cos(theta) are consider as loadings in PCA. Think of them as how much each feature/variable (x and y, 2D in this case) contribute at theta angle of the unit vector to principal component axes (biplot illustrates this point more in part III).

---------------------------- <(0_0)>’ -------------------------

Information age comes with massive amount of data and features to consider. Typically, the input variables are too large for useful analysis. PCA is a technique used to reduce the high-dimensional datasets to manageable fewer dimensions, while still preserving the original structure and relationships. Information loss is at minimum from original data.

Most of the time the ROI is an oval shape. We will define a random data set and layout the process of analyzing PCA using python. Let’s go over some matrix properties:

Q , U , V : orthogonal matrix A_m.n : rectangle matrix

S , D : diagonal matrix I : identity matrix

S = S^T D = D^T

Q^-1 = Q^T QQ^T = Q^TQ = I U^TU = I_m.m V^TV = I_n.n

AA^T/(1-n) - Covariance in matrix form

A_mxn= U_mxm S_mxn V^T_nxn (SVD) - singular value decomposition

Some properties of SVD:

A = USV^T , A^T = VSU^T <== (USV^T)^T = (V^T)^T (US)^T = VS^TU^T

A^TA = VSU^TUSV^T

A^TA = VS²V^T

A^TAV = VS²V^TV = VS² ==> A^TAv_i = σ_i²v_i

AA^TU = USV^TVSU^TU = US² ==> AA^Tu_i = σ_i²u_i
_{A(V) = USV^T(V) = US ==> Av_i = σ_iu_i}

A^TAv_i = σ_i²v_i and AA^Tu_i = σ_i²u_i (Ax = λx form) are of the equation A(V) = USV^T(V) = US. Which is equivalent to Av_i= σ_iu_i. Calculating the SVD consists of finding the eigenvalues or variance ( λ_i ) and eigenvectors (u_i, v_i) of AA^T and A^TA . u_i is called left singular vector (unit eigenvectors of AA^T). v_i is called right singular vector (unit eigenvectors of A^TA). The singular values (sigma or standard deviation) of A_mxn are square roots of eigenvalues (σ_i = √λ_i) from AA^T or A^TA. The sigma values are the diagonal entries of the S or D matrix. They are arranged in descending order.

Any matrix data, when it can decompose to eigenvectors then the location (centroid) and orientation (angle) of the data cluster are defined. The largest eigenvalue and its corresponding eigenvector define the 1^st principal component axis. The 2^nd principal component axis is defined by the 2^nd largest eigenvalue and its corresponding eigenvector, … and so on.

The singular values or square root of eigenvalue/variance is the standard deviation or one sigma in the corresponding eigenvector direction. We can map out the three sigma region (ROI - Region Of Interest) to further analyze the data cluster.

Any (Xi, Yi) cluster data point multiply by the unit vectors or eigenvectors (dot product) will become the new (Xi, Yi)’ of principal component axes. Basically it uses the unit vectors to scale data to the 1^st and 2^nd principal axes. In 2D, cos(theta) and sin(theta) are the loading or weighted factors for the new eigenvector direction or principal component axes.

Another way of looking at it is, when the data cluster is rotated by negative theta we get to the principal component axes. The frame of reference axis changes from the existing (Xi, Yi) to new (Xi, Yi)’ of principal component axes.

For 3D-(Xi, Yi, Zi) dataset, let U_d be any column eigenvector:

U₁ = [C₁, C₂, C₃], U₂ = [C₄, C₅, C₆] # some scalars/constants

Xi’ = [Xi, Yi, Zi] @ U₁ = Xi * C₁ + Yi * C₂ + Zi * C₃

Yi’ = [Xi, Yi, Zi] @ U₂ = Xi * C₄ + Yi * C₅ + Zi * C₆

Xi’ is on 1^st principal axis PC₁, Yi’ is on 2^nd principal axis PC₂.

( d = 1, 2, 3, … k and i = 1, 2, 3, … n k <<< n ).

( @ is the dot product or matrix multiplication in python 3.x ).

In general, data transformation is a linear combination equation of weights / loadings (eigenvector values of U_d) and data matrix A_mxn. Depending on python library and the data, you need to transpose the matrix to make it work. Applying dot product, we have:

A’_n.d = (A_nxm) @(U_m.d) or A’_d.n = (U_d.m) @ (A_mxn)

--------------------------- (*^_^*)’ ---------------------------

When using python, the data matrix column and row should be labeled correctly depending on the parameters in the module. That means data may have to transpose sometimes. We have to be aware of the library usage. In this case (m) is feature variables, (n) is number of observations, (d) is the reduced dimension. The data should be centered at (0, 0), or zero-mean. Python subtract mean from data matrix when utilize covariance matrix or PCA by default.

Principal component or eigenvector:

PC_mxd = (V^T )^T = V from A_nxm= U_nxn S_nxm V^T_mxm

Feature reduction formulas – The new feature matrix is the dot

product of data matrix and eigenvector (unit vector). Later, the

scree plot (part III) will help to decide how many eigenvectors
to include. Transformed cluster data are the projection of
existing (Xi, Yi, ...) points onto the principal component axes:

F_nxd = A_nxm @ (PC)_mxd

# ----------------------------------------------------------------

PYTHON SESSION BELOW EXPLAINED:

Initially, a simple random data cluster of x-y variables using

covariance matrix is compared with PCA using sklearn library.

This shows the matrix transformation by eigenvector and rotation

matrix in action. After that, iris dataset will be utilized to

demonstrate using PCA in production (explained in part III).

Figure-6 Figure-7

Figure-7 shows the final data sample position after matrix transformation from Figure-6 (translation - derotation). The steps are reversible (major application in data compression). The ellipse is mapped out with 3-sigma (a-b, major-minor axes) to analyze the cluster's ROI further.

Before doing shearing, scaling, rotating of a cluster matrix, data center should move to origin (zero-mean) to prevent unwanted distortion. In python, numpy and sklearn library, both covariance and PCA methods have built-in removal of the mean.

After getting Figure-7 from FINISHED PYTHON SESSION below, we can verify the manual process of using covariance and rotation matrix (translation-derotation) with sklearn PCA. The projected data onto principal component axes should be the same in both cases.

Same covariance result with/without zero-mean:

>>> np.cov(np.array([x1,y1])) # as is, no mean removal

array([[37.6662, 30.2462],
[30.2462, 33.5037]])

>>> np.cov(np.array([x1c,y1c])) # zero-mean matrix sample

array([[37.6662, 30.2462],
[30.2462, 33.5037]])

Same PCA-transform or projected data result with/without zero-mean:

>>> pca = PCA() # instantiate PCA class

>>> x1y1_pca = pca.fit_transform(np.array([x1,y1]).T) # PC1_2 data

>>> x1y1_mean_pca = pca.fit_transform(np.array([x1c,y1c]).T)

>>> x1y1_pca[:3] # no mean removal of matrix sample, first 3 rows

array([[-4.4831, 2.4537],

[ 1.6845, 1.5764],

[-0.9238, 0.0164]])

>>> x1y1_mean_pca[:3] # zero-mean matrix sample

array([[-4.4831, 2.4537],

[ 1.6845, 1.5764],

[-0.9238, 0.0164]])

Now, comparing PCA() projected data with matrix translation-derotation result:

>>> (rot_M @ (x1c, y1c))[:,:3].T

array([[-4.4831, -2.4537],

[ 1.6845, -1.5764],

[-0.9238, -0.0164]])

They are the same except for the sign flipping sometimes. Because eigenvector can be in opposite direction (180 degrees out – But it is still on the same principal component axis line).

# ---------------------------------------------------------------

One way of getting eigenvectors is from numpy.linalg.svd() or scipy.linalg.svd(). This method accepts zero-mean data and variables/features in columns.

# sample matrix factorized into U, S, V^T - features in column

>>> U, S, VT = np.linalg.svd(np.array([x1c,y1c]).T)

>>> PC = VT.T # transpose to compare with eigVec (column format)

>>> PC

array([[ 0.731 , -0.6824],

[ 0.6824, 0.731 ]])

>>> eigVec # from covariance matrix factorization, np.linalg.eig()

array([[ 0.731 , -0.6824],

[ 0.6824, 0.731 ]])

>>> pca.fit(np.array([x1,y1]).T)

>>> pca.components_.T # column eigenvectors

array([[ 0.731 , 0.6824],

[ 0.6824, -0.731 ]])

We can compute the projected data using either VT.T, eigVec, or pca.components_.T:

>>> F = (np.array([x1c,y1c]).T) @ PC # same values as derotation

>>> F[:3]

array([[-4.4831, -2.4537],

[ 1.6845, -1.5764],

[-0.9238, -0.0164]])

PCA() from scikit-learn and svd() of numpy-scipy library are fast ways to get the result. It is amazing with a few lines of code in python, we can get answer quickly. That is not possible with some other programming languages.

# ------------------------------------------------------------------

Most of the time we can solve one unknown from other variables in an equation. Independent and respond variables can also be switched around (reversible process). This is generally held true in matrix equations. Considering the equations:

A’_n.p = (A_nxm) @(U_m.p) --> A’_n.p (U_m.p)^T = (A_nxm) @(U_m.p)(U_m.p)^T

==> A_nxm = A’_n.p @ (U_p.m)

A’_p.n = (U_p.m) @ (A_mxn) --> (U_p.m)^T A’_p.n = (U_p.m)^T(U_p.m) @ (A_mxn)

==> A_mxn = (U_m.p) @ A’_p.n

That is how data compression work. The more eigenvectors or principal components we use, the better quality of the retrieved data. We can find the balance or “good choice” of how many eigenvectors to use with scree plot (part III).

By switching row/column and move eigenvector matrix around you can achieve some amazing result with matrix multiplication.

Let’s try out with existing data:

>>> A = x1y1_pca @ eigVec # transformed matrix multiply eigenvector

>>> A[:3] # list first three rows of zero-mean data

array([[-1.6026, 4.8529],

[ 2.3071, 0.0028],

[-0.6641, 0.6423]])

Verify:

>>> (xyDF-xyDF.mean(axis=0))[:3] # original data minus mean

x1 y1

0 -1.602601 -4.85292

1 2.307078 -0.00283

2 -0.664081 -0.64234

Try x1y1_mean_pca and pca.components_.T matrix :

>>> ( x1y1_mean_pca @ pca.components_.T )[:3]

array([[-1.6026, -4.8529],

[ 2.3071, -0.0028],

[-0.6641, -0.6423]])

Because most method remove the mean, We have to subtract mean from original data to compare. Again, data change sign due to eigenvector flipping 180° out.

When working with matrix, it would help to start thinking of vectors as columns or rows of matrix. Multiplication between matrices is the dot products of row vectors by column vectors. Features or variables must be aligned with python library method. Also, they should match correctly with other matrix during multiplication. Each eigenvector composes of individual weight variable values and must be paired accordingly with other matrix features when using dot product.

# ----- FIGURE-5-6-7 DISPLAY AFTER FINISHED PYTHON SESSION ------

# continuation from Part-I with same libraries/modules
plt.figure(5)

#fig, ax = plt.subplots() # figure and axis

plt.xlim(-.5, 5) # set x-axis limit

plt.ylim(-.5, 5)

#ax.set_aspect('equal') # set aspect ratio of axis

plt.axis('equal')

# initialize center(u, v) values and others

u, v, a, b, rot, r = 2.5, 2.5, 1, .5, np.pi/6, 2

# default 50 points or angles in radian

theta = np.linspace(0, 2*np.pi)

plt.axhline(ls='--', color='grey') # line thru origin

plt.axvline(ls='--', color='grey')

# draw a magenta circle at (u, v) of r=2

c3 = plt.Circle([u,v], 2, color='m', fill=False, lw=3)

plt.gca().add_artist(c3) # add circle c3 to figure

plt.plot(np.array([0, 0.5*2, .5*2, 0]) + u,

np.array([0, .866*2, 0, 0]) + v, lw=3) # triangle

plt.plot([u, u], [0, 5], color='g') # vertical line thru center

plt.plot([0, 5], [v, v], color='g') # horizontal line thru center

# text formatting of figure 5

plt.text(u + .5*2.1, v + .866*2.5, 'Eigenvector', size=16)

plt.text(u + .5*2.1, v + .866*2.1, '(x, y)', size=16)

plt.text(u + .1, v -.4, r'cos($\theta$)', size=18)

plt.text(u + .5*2.1, v + .5, r'sin($\theta$)', size=18)

plt.text(1.6 , v + .1, '(u, v)', size=20, color='r')

plt.text(u + .25, v + 1, '1', size=20, color='r')

plt.text(u + .15, v + .1, r'$\theta$)', size=20, color='r')

plt.figure(6)

sp.random.seed(15) # reproducible, to get same sample

# x1 mean at 20 with spread of 6 std of 100 points, multiply

# by tan to get y1, add 3 std for small variation in y1

x1 = sp.random.normal(20,6,100)

y1 = x1*np.tan(sp.pi/5) + sp.random.normal(0,3,100)

# convert x1 & y1 to pandas dataframe, use xyDF.x1.values ,

# xyDF.y1.values for calculation - less likely to cause error later

xyDF = pd.DataFrame({'x1': x1, 'y1': y1}) # view labeled data easier

# covariance of x1 & y1, notice transpose to match method

covXY = np.cov(xyDF.T) # method default row as variable

# unpacking covariance into eigenvalues and eigenvectors

eigVal, eigVec = np.linalg.eig(covXY)

sig1, sig2 = eigVal**0.5 # singular value or sigma

# initialize new center(u, v) and 3 sigma values of a & b

u, v, a, b = x1.mean(), y1.mean(), 3*sig1, 3*sig2

# default 50 points for parametric equations

theta = np.linspace(0, 2*np.pi)

# angle of 1st eigenvector or principal axis

angle = np.degrees(np.arccos(eigVec[0,0]))

# define an ellipse using parametric eq. at new center(u, v)

# with zero angle as reference, draw as green ellipse later

ell_M = np.array([u + a*np.cos(theta), v + b*np.sin(theta)])

# red ellipse of a & b at three sigma value from the

# data center, rotated at angle of 1st eigenvector

e1 = mp.Ellipse([u,v], 2*a, 2*b, angle, lw=3, fill=False, color='r')

plt.axhline(ls='--', color='grey') # grey line thru origin

plt.axvline(ls='--', color='grey')

plt.scatter(x1, y1, 100, 'm', edgecolor='y') # plot sample data

plt.plot(ell_M[0,:],ell_M[1,:], 'g--', lw=3) # draw green ellipse

plt.gca().add_artist(e1) # add red ellipse to figure

# draw three sigma lines of data sample (ROI - red ellipse)

sigma3 = np.array([[u,v]]).T + (3*eigVal**0.5) * eigVec

plt.plot([u, sigma3[0,0]], [v, sigma3[1,0]], lw=3, color='b')

plt.plot([u, sigma3[0,1]], [v, sigma3[1,1]], lw=3, color='b')

# move sample center to origin and derotation of red ellipse

plt.figure(7)

plt.axhline(ls='-', color='b', lw=2)

plt.axvline(ls='-', color='b', lw=2)

x1c = x1-x1.mean() # removed mean from x1 matrix sample

y1c = y1-y1.mean() # removed mean from y1 matrix sample

# negative rotation angle of eigenvector

rotNeg = - np.arccos(eigVec[0,0])

# negative rotation matrix

rot_M = np.array([[np.cos(rotNeg), -np.sin(rotNeg)],

[np.sin(rotNeg), np.cos(rotNeg)]])

# transformation of data sample by rotation matrix

el1, el2 = rot_M @ (x1c, y1c)

# plot data sample at new center (0, 0)

plt.scatter(el1, el2, 100, 'm', edgecolor='y')

# green ellipse at origin, after translation & derotation

# of red ellipse

e2 = mp.Ellipse([0,0], 2*a, 2*b, angle=0, lw=3,
fill=False, color='g')

plt.gca().add_artist(e2) # add green ellipse to figure 7

plt.text(1, 19, 'PC_2', size=18, color='purple') # label x-y axes
plt.text(23.5, 3, 'PC_1', size=18, color='purple')

#plt.xlabel('PC_1', color='purple', fontsize=18)
#plt.ylabel('PC_2', color='purple', fontsize=18)

plt.axis('equal') # set aspect=1 for xy-axis

plt.show()

================================================================

Nuts and Bolts Engineering Corner

Friday, April 17, 2020

PCA - SVD and Machine Learning II

Friday, April 17, 2020

________________ PCA - SVD and Machine Learning II ________________

PCA - SVD and Machine Learning II