The Matlab codes in this directory are written for the spectral ordering of palaeontological data (sites and taxa). Spectral ordering of palaeontological data is described in: Mikael Fortelius, Aristides Gionis, Jukka Jernvall and Heikki Mannila: "Spectral Ordering and Biochronology of European Fossil Mammals". Paleobiology 2006 (in press) The Matlab codes are written by Ella Bingham and Heikki Mannila, 2004-2006. HOW TO USE THE CODES: spectral_analysis.m is the main program which invokes other programs, except for sort_taxa.m which can be invoked thereafter if desired. These two programs are described in the following. spectral_analysis.m =================== Spectral ordering of data rows. It is called in Matlab as [new_data,new_taxoninfo,new_siteinfo,sorted_data,sorted_siteinfo,spcoeff] = spectral_analysis(data,taxoninfo,siteinfo,similarity_measure,outlier_criterion); Input variables: --------------- data - a data matrix with sites as rows and taxa as columns. taxoninfo, siteinfo - structures containing auxiliary information about those taxa and sites that are present in the data matrix, in the same order as in the data matrix. Can be user-defined or empty; if empty please use taxoninfo = struct([]), siteinfo = struct([]). Note: it is advisable to index the taxa and sites somehow, to keep track of what happens, instead of using empty taxoninfo and siteinfo. A simple way is taxoninfo = struct('taxonnumber',1:size(data,2)), siteinfo = struct('sitenumber',1:size(data,1)). similarity_measure - how the similarity matrix is formed. 'cos' for cosine similarity, 'wcos' for weighted cosine similarity, 'dot' for plain dot product similarity. outlier_criterion - a coefficient c such that an observation is discarded if its value in the 2nd smallest eigenvector (called "spcoeff" here) of the Laplacian matrix is more than c*std(spcoeff) apart from mean(spcoeff). Output variables: ---------------- new_data, new_taxoninfo, new_siteinfo - reduced versions of the corresponding input arguments, resulting from outlier removal. Not sorted according to spcoeff. sorted_data, sorted_siteinfo - reduced versions of the corresponding input arguments, resulting from outlier removal. Rows (sites) of sorted_data and entries of sorted_siteinfo are sorted according to the spectral coefficients of the sites. spcoeff - vector of spectral coefficients of sites; eigenvector corresponding to the 2nd smallest eigenvalue of the Laplacian matrix. Example data: ------------ Files datag1010NOW.mat, taxoninfog1010NOW.mat and siteinfog1010NOW.mat give examples of input data (124 sites, 139 taxa) and input structures containing the names of taxa and sites in the input data. The data are from the NOW data base (public release 030717): Fortelius, M. (coordinator). Neogene of the Old World. Database of Fossil Mammals (NOW). University of Helsinki. http://www.helsinki.fi/science/now/. Example: [new_data,new_taxoninfo,new_siteinfo,sorted_data,sorted_siteinfo,spcoeff] = spectral_analysis(datag1010NOW,taxoninfog1010NOW,siteinfog1010NOW,'wcos',3); sort_taxa.m =========== Ordering of data columns (taxa), according to a user-chosen method. It is called in Matlab as [columnsorted_data,sorted_taxoninfo] = sort_taxa(data,taxoninfo,sorting_method,similarity_measure) Input variables: ---------------- data - a data matrix with sites as rows and taxa as columns. It is advisable (but not necessary) to have the rows sorted, as the columns will be sorted here, and a figure of sorted data will be drawn. taxoninfo - auxiliary information about taxa present in the data matrix sorting_method - 'spectral' for spectral ordering of taxa, 'firstocc' for ordering based on the first occurrence of the taxon in the input data (assumes the rows of input data are sorted somehow). NOTES: (1) in spectral ordering, some taxa may be outliers due to an almost-disconnected nature of the data matrix; the outliers are not removed here. (2) 'spectral' and 'firstocc' may return opposite orderings as the spectral ordering does not distinguish the direction of time. similarity_measure - 'wdot' or 'dot' if "sorting_method" was chosen as 'spectral'; otherwise obsolete and you can use similarity_method=[]. Output variables: ----------------- columnsorted_data, sorted_taxoninfo - data and auxiliary information, in the same format as in the input, but taxa (columns of data; entries of taxoninfo) are sorted. Example: (uses the outputs of spectral_analysis.m) [columnsorted_data,sorted_taxoninfo] = sort_taxa(sorted_data,new_taxoninfo,'spectral','wdot') ==================================================================== March 2006, Ella Bingham, ella@iki.fi