% Ella Bingham, May 2005 & Feb 2006. % % Discard the disconnected rows of the similarity matrix of data rows. Also % discard the corresponding rows in siteinfo. % % Inputs: % data - a data matrix of sites=rows, taxa=columns. % siteinfo - a structure containing auxiliary information about the sites. % similarity_measure - how the similarity matrix is formed. 'dot' for plain % dot product similarity, 'wdot' for weighted dot product similarity (takes % into account the total number of attribute appearances in the whole data). % % Outputs: % outputdata, outputsiteinfo - same as the input variables, except for the % disconnected rows. % function [outputdata,outputsiteinfo] = findconnecteddata(data,siteinfo,similarity_measure); [spcoeff,simm,lapm] = laplacian(data,similarity_measure); [n,n] = size(simm); m = ceil(log2(n)); T = simm; for i=1:m T = T^2; T = (T>0); end; connectedsites = find(sum(T)>=mean(sum(T))); outputdata = data(connectedsites,:); % Update the contents of siteinfo correspondingly outputsiteinfo = siteinfo; names = fieldnames(siteinfo); if ~isempty(names) for field=1:max(size(names)) fieldcontents = getfield(siteinfo,char(names(field))); outputsiteinfo=setfield(outputsiteinfo,char(names(field)),fieldcontents(connectedsites)); end end