\n", "**(b)**$\\quad$ The covariance of the data is then calculated, this is an $m\\times m$ symmetrical matrix\n", "\n", "$$\\boldsymbol C=\\frac{1}{n-1}\\boldsymbol{XX^T}\\qquad\\tag{52}$$\n", "\n", "The superscript $T$ is the transpose, and the matrix multiplication has dimensions $(m \\times n)\\cdot(n \\times m) = m \\times m)$. (The $n-1$ makes an unbiased estimate). The entries in this matrix would be diagonal if the data were random, because each value is then independent of every other. However, this is very unlikely to be the case unless the data contains only random noise; therefore, off-diagonal terms are not zero. The covariance describes how much the data is spread among the axes. If the data were to lie only along each axis the covariance would be diagonal, the rotation of axes accompanying the principal components reduces the off-diagonal component in the covariance.\n", " \n", "**(c)**$\\quad$ The m eigenvalues $\\lambda$ and (column) eigenvectors $V$ ($m\\times m$ matrix) of the covariance matrix $C$ are calculated and the eigenvalues are sorted from largest to smallest and the eigenvectors are placed in the same order see chapter 7.12.3. The eigenvalue-eigenvector equation is $\\boldsymbol{CV} = \\boldsymbol{\\Lambda V}$ where $\\Lambda$ is the diagonal matrix of the eigenvalues $\\lambda$, i.e. a vector. It is assumed that the calculation produces eigenvectors that are normalized, if not they should be normalized by dividing each column by its vector length $\\vec V_i\\to \\vec V_i/\\sqrt{\\vec V_i\\cdot \\vec V_i}$ where $\\vec V_i$ is one column vector. The eigenvectors form the new principal component basis set because they are orthogonal one to another.\n", "\n", "The eigenvector of the largest eigenvalue forms the principal component of the data, smaller values the other principal components. The largest contains most of the variance or spread of the data and best describes any trend in the data. The smaller eigenvalues do so to lesser extents depending on their size. The largest s eigenvalues are then chosen to approximate the data. The eigenvalues correspond to variances along the principal axes, thus the fractional value of each eigenvalue to the sum of them all is\n", "\n", "$$\\displaystyle \\lambda_i/ \\sum_i^m\\lambda_i$$\n", "\n", "and this is the fraction that eigenvalue $i$ contributes to the data. A measure of the error made by approximating the data by s principal components instead of the total number $m$, is the residual variance,\n", "\n", "$$\\displaystyle \\sigma^2=1-\\frac{\\sum_i^s \\lambda_i}{\\sum)i^m \\lambda_i}$$\n", " \n", "For example, if the first two eigenvalues describe $85$% of the data, it may be appropriate to ignore all the others. As a rule of thumb, when selecting data to eliminate, eigenvalues less than $\\approx 1$ can be ignored.\n", "\n", "**(d)**$\\quad$ To project the data points onto the new axes, the principal components, the dot product of the data $\\boldsymbol X$ is made with each eigenvector $\\boldsymbol V$,\n", "\n", "$$\\displaystyle \\boldsymbol Y = \\boldsymbol V^T\\boldsymbol X^T\\qquad\\tag{53}$$\n", "\n", "The matrix dimensions are $(m \\times m)\\cdot(m \\times n) = (m \\times n)$. Plotting one row vs another, the data in matrix $Y$ produces the data along the various principal component axes as shown on the right of figure 16. The first column is $p_1$, the second, $p_2$, etc. Because the eigenvector (modal) matrix $V$ is square and orthogonal, the relationship $\\boldsymbol V^{-1} = \\boldsymbol V^T$ can be used to reform the original data. The transformation steps are to left multiply both sides of the equation with $\\boldsymbol V$ giving \n", "\n", "$$\\displaystyle \\boldsymbol{VV}^T\\boldsymbol X^T = \\boldsymbol{VY}$$\n", "\n", "Simplifying the left-hand side produces \n", "\n", "$$\\displaystyle \\boldsymbol{VV}^T\\boldsymbol X^T = \\boldsymbol{VV}^{-1}\\boldsymbol X^T = \\boldsymbol X^T = \\boldsymbol{VY}$$\n", "\n", "and then $\\boldsymbol{VY}$ is transposed again to form $\\boldsymbol X$ or,\n", "\n", "$$\\displaystyle \\boldsymbol X=(\\boldsymbol{VY})^T, \\qquad \\boldsymbol D = \\boldsymbol X + \\boldsymbol\\mu\\qquad\\tag{54}$$\n", "\n", "and $\\boldsymbol\\mu$ restores the mean to the data $\\boldsymbol D$.\n", "\n", "**(e)**$\\quad$ When only $s$ of the eigenvalues and eigenvectors are being used, the $V$ matrix is reduced by ignoring all but the first $s$ columns and equation 54 is used to reconstruct the data.\n", "\n", "A python PCA algorithm is detailed below" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "eigenvalues and eigenvectors\n", "1 4.886993943985875 [ 0.28976365 0.9562666 -0.03989011]\n", "2 0.11490529483430316 [ 0.94367801 -0.2784974 0.17863651]\n", "3 0.08368965006871082 [-0.15971484 0.08940578 0.98310619]\n" ] } ], "source": [ "# Algorithm Pricipal component analysis \n", "\n", "filename='PCA data.txt' # 3 sets of data 10 values each\n", "# data is at end of book in 'Appendix, some basic Python instructions'\n", "\n", "xv = []\n", "yv = []\n", "zv = []\n", "with open(filename) as ff: # length not known so read in all data and make list of each\n", " for line in ff:\n", " new_str = ' '.join(line.split())\n", " vals = new_str.split(' ')\n", " xv.append(vals[0]) \n", " yv.append(vals[1]) \n", " zv.append(vals[2]) \n", "ff.close()\n", "\n", "n = len(xv) # get length of data \n", "m = 3 # 3 axes of data\n", "data = np.zeros((n,m),dtype=float) \n", "for i in range(n): \n", " data[i,0] = float(xv[i]) # make lists into arrays, data set 0,1,2 with n values each\n", " data[i,1] = float(yv[i])\n", " data[i,2] = float(zv[i])\n", " \n", "# finished reading \n", "\n", "S = 1 # select first S components eigenvals\n", "means = np.zeros((n,m),dtype=float) \n", "X = np.zeros((n,m),dtype=float) \n", "VS = np.zeros((m,m),dtype=float )\n", "Sev = np.zeros(m,dtype=float)\n", "\n", "for k in range(m): # calculate mean value\n", " s = 0\n", " for i in range(n):\n", " s = s + data[i,k]\n", " means[:,k] = s/n \n", " pass \n", "X = data - means # subtract mean \n", "C = (np.transpose(X) @ X)/(n-1) # make covariance matrix ( @ is matrix multiply)\n", "(eigval,eigvec) = lina.eig(C) # eigvals, eigvects of covariance. eigvects normalised\n", "\n", "indx = np.argsort(eigval) \n", "\n", "Sev[:] = eigval[ indx[::-1]] # reverse indx array to get largest first\n", "VS[:,:]= eigvec[:,indx[::-1]]\n", "\n", "print('eigenvalues and eigenvectors')\n", "for i in range(m):\n", " print(i+1,Sev[i], VS[:,i]) \n", "\n", "V = np.zeros((m,S)) # do PCA here DR is result \n", "V[:,:] = VS[:,0:S] # select S sorted eigenvectors\n", "\n", "Y = np.transpose(V) @ np.transpose(X) # project onto S new axes \n", "\n", "DR = np.transpose(V @ Y) + means # DR is data on new axes " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Drawing](analysis-fig17.png)\n", "\n", "Figure 17. Left: Data plotted with mean values subtracted together with the principal axes. The major principal axis is (1). Right: Data partly reconstructed (also with mean values subtracted for comparison) with the first eigenvector (red) and then the first two eigenvectors combined (light blue).\n", "_______\n", "\n", "The figure shows the data (centred at zero by subtracting the mean values), and the three principal axes. The first eigenvector, (1) the major axis, is drawn in a red line. The lengths are arbitrary. The eigenvalues are $4.9, 0.11, 0.084$. On the right of the figure, two plots are superimposed. The light blue symbols are the data reconstructed from the first two eigenvectors using equations 53 and 54. The original data is flattened onto the plane of the first two principal axes because the third coordinate is ignored. The second set of data (red symbols) is reconstructed using only the first eigenvalue and is a set of points lying along the principal axis at their relative positions when projected from the other axes. If all three eigenvectors are used the original data is reformed exactly.\n", "\n", "Related to PCA is the method of singular value decomposition (SVD). Formally, it is a technique in which one matrix is split into three, one of them is diagonal and the other two are row and column orthogonal as are eigenvector modal matrices (Prest et al. 1986). The reason for expanding a matrix in this way is that it can now be inverted even though it is close to being singular, i.e. inverting it would normally produce infinity or a number that approximates this, and normal inversion methods fail. This method could be used to invert the $A$ matrix in non-linear least squares, in the example given. The interest in SVD is however not primarily to do with the mathematics but in un-ravelling complex information just as in PCA. SVD can be used either directly on raw data, not the covariance as in PCA, or as a mathematical method to perform PCA (Jolliffe 2002)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 2 }