% Ella Bingham, Feb 2006. Partially based on Heikki Mannila's old codes. % % Computes the similarity matrix and the Laplacian matrix and its eigenvalue % decomposition. Returns the eigenvector corresponding to the 2nd smallest % eigenvalue, and the similarity and Laplacian matrices. % % Inputs: % % data - data matrix, rows=observations, columns=attributes. % % similarity_measure - 'dot' for plain dot product similarity, 'wdot' for % weighted dot product similarity (takes into account the total number of % attribute appearances in the whole data), 'cos' for cosine similarity (not % recommended as we don't know its behaviour well enough). % % Outputs: % % spcoeff - spectral coefficients; eigenvector corresponding to the 2nd % smallest eigenvalue of the Laplacian matrix % % simm - similarity matrix of rows of data % % lapm - Laplacian matrix % function [spcoeff,simm,lapm] = laplacian(data,similarity_measure); simm = data*data'; %dot products of rows of data [x,y] = size(simm); sc = sum(simm); %column sum rowsum = sum(data'); %number of taxa per site for i=1:x for j=1:x switch similarity_measure case 'cos' %cosine similarity of data rows: dot product divided by row %lengths. Not recommended as we don't know its behaviour well enough simm(i,j) = simm(i,j)/sqrt(rowsum(i)*rowsum(j)); case 'wdot' %weighted dot product of data rows: dot product divided by %weighted row lengths; weight = total number of appearances of %those mammals that appear at the row. Used in the Spectral paper. simm(i,j) = simm(i,j)/(sqrt(sc(j)*sc(i)));; case 'dot' simm(i,j) = simm(i,j); % plain dot product similarity end end end % The diagonal elements of the Laplacian are chosen so that the rows sum to 0. % Otherwise they do not matter, and neither do the diagonal elements of % simm. lapm = diag(sum(simm'))-simm; [veig,val] = eig(lapm); spcoeff = veig(:,2);