1.The most widely used program package for easy phylogeny constructions is the MEGA-software. Read the paper which describes it's latest version: Tamura, K. et al., 2007. Be prepared to give a short (~5 minutes) presentation about the paper. An answer to this, when sending notes to Laura, is that you have read the paper: yes, I have read the paper. * Tamura K, Dudley J, Nei M & Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution (24), p.1596-1599. 2007. ________________________________________________________________________________ 2.Download the MEGA4-software to your computer from http://www.megasoftware.net/mega.html and find out what kind of phylogenies can be constructed with this program package. The general rough classification of different methods is: parsimony methods, distance matrix methods, maximum likelihood methods and Bayesian methods. Phylogenetic trees have topologies, and branching patterns, but uncertainty is present. What is MEGA's method for statistical confidence or credibility inference (i.e. how is the uncertainty noticed). * MEGA's help menu -> Appendix B -> Main MEGA Window -> Phylogeny Menu -> Construct Phylogeny: Phylogenies can by constructed by Neighbor-Joining (NJ), Minimum Evolution (ME), Maximum Parsimony (MP), and Unweighted Pair Group Method with Arithmetic Means (UPGMA). NJ is a simplified version of the ME method. For constructing an MP tree, only sites at which there are at least two different kinds of nucleotides or amino acids, each represented at least twice, are used. MEGA estimates MP tree branch lengths by using the average pathway method for unrooted trees. UPGMA assumes that the rate of nucleotide or amino acid substitution is the same for all evolutionary lineages. UPGMA is a algorithm that examines a pairwise distance matrix. Also known as agglomerative or hierarchical clustering method using average linkage. * Parsimony methods employ the idea that the tree requiring the least number of mutations to relate those sequences is the preferred one. The total parsimony cost for a tree is the sum of the parsimony scores for all the sites. The method in its simplest form is to compute the parsimony cost of all possible trees, and choose the minimum-cost tree. For larger trees, a variety of heuristic search methids are used to attempt to identify the best ones without examining all of the trees. (R.C. Deonier, S. Tavare, M.S.Waterman: Computationak Genome Analysis.Springer, 2005 Chapter 12) * MP is a parsimony method * ME, NJ, and UPGMA are distance matrix method * MEGA's help menu -> Appendix B -> Main MEGA Window -> Phylogeny Menu: Bootstrap Test of Phylogeny: One of the most commonly used tests of the reliability of an inferred tree. A new set of sequences is constructed with random sampling and replacement. With these new sequences using the same tree building method as before. Next the topology of this tree is compared to that of the original tree. Each interior branch of the original tree that is different from the bootstrap tree the sequence it partitions is given a score of 0; all other interior branches are given the value 1. This procedure of resampling the sites and the subsequent tree reconstruction is repeated several hundred times, and the percentage of times each interior branch is given a value of 1 is noted. This is known as the bootstrap value. As a general rule, if the bootstrap value for a given interior branch is 95% or higher, then the topology at that branch is considered "correct". The Bootstrap Test is possible for NJ, ME, MP, and UPGMA. * MEGA's help menu -> Appendix B -> Main MEGA Window -> Phylogeny Menu: Interior Branch Test of Phylogeny: A t-test, which is computed using the bootstrap procedure, is constructed based on the interior branch length and its standard error. If the confidence probability is greater than 95% for a given branch, then the inferred length for that branch is considered significantly positive. The Interior Branch Test is available only for the NJ and ME trees. ________________________________________________________________________________ 3.When you inspect the contents of MEGA4, you notice that it includes a set of example datasets. Select one of them and perform neighbor-joining, UPGMA and parsimony phylogeny analyses without bootstrapping and including 1000 x bootstrapping. Please, read the MEGAtutorial ("help"). Skip most of the options, e.g. nucleotide substitution models, patterns etc., as they are out of the scope and topics of this Introduction to bioinformatics course; just use the default settings. Describe what kind of differences you find between parsimony, neighborjoining and UPGMA phylogenies, as well as between no-bootsrapping and 1000 x bootstrapping. * Example datasets: C:\Program Files\MEGA 4\Examples * see Drosophila.pdf, Chloroplast.pdf, and Crab.pdf ________________________________________________________________________________ 4. M27325 refers to a gene sequence in a cow. Collect a dataset from ~15-20 animal species, perform phylogeny analyses by MEGA, and describe the results. What animals (on the basis of this gene) form clear and statistically supported clusters, what animals are clustered together, but not in a statistically supported way. Note that when you have collected the data, you must align the sequences and construct a file in a format acceptable by MEGA. You'll find all instructions from MEGA-tutorial. Note also that MEGA4 includes the aligning facility. You can also do the alignment by other means (Clustal by webservice or by Clustal in your own computer, if you have downloaded it for previous purposes. Aligning by Clustal included in MEGA is sometimes difficult, in practice, although it should be easy.) * NCBI -> Nucleotide -> "GH growth hormone" with limits: field=title, molecule=mRNA * save into one fasta file * see GH_mod.fasta * pull file into MEGA * align by ClustalW * save alignment in MEGA format * pull alignment into MEGA * perform phylogeny analyses * see GH.pdf