1. In PubMed, search for articles published during the last three months about whole genome sequencing. o How many articles discuss about human genome? less than 14 ('whole genome sequencing' + limits:published in the last 90 days, humans) for exact number check each of the 14 articles if it is really about whole genome sequencing in humans o You can notice that in articles concerning the human genome, the interest is in human diseases. On what kind of diseases? leukemia, rotavirus infection, colorectal cancer, liver abscess, meningitis, diabetes, papillomavirus infection o List other (than human) organisms on which whole genome sequencing has been published during the last three months. (You don't have to give a complete list, just show that you understand the question.) red fox, American mink, wild fish, Drosophila melanogaster, Schistosoma mansoni ________________________________________________________________________________ 2. By using NCBI facilities, find out: o from how many Eukaryotic species genome sequences have been completed. all databases -> genome project -> statistics -> 22 all databases -> genome project -> eurkaryotic projects -> 24, but there are two proects for Drosophila melanogaster and two projects for Oryza sativa Japonica Group. o In addition, there are Eukaryote genome sequences under "assembly" or in "progress". How many? Assembly: 187 (277 projects) in progress: 173 (333 projects) o Explain briefly, what is the difference between completed, under assembly, in progress. see http://www.ncbi.nlm.nih.gov/genomes/static/gprj_help.html#seq_info (all databases -> genome project -> help -> Properties of Eukaryotic Genome Sequencing Projects Table -> Sequence Information -> Status) The Status property refers to the current stage of the sequencing project. Possible values for this property are Complete, which typically means that each chromosome is represented by a single scaffold of very high sequence quality; Assembly, which typically means that scaffolds have been constructed that are not yet at the chromosome level and/or are of draft sequence quality; and In Progress, which indicates that either the sequencing project is at the pre-assembly stage or the assembled/completed sequences have not yet been submitted to GenBank/EMBL/DDBJ. ________________________________________________________________________________ 3. By using NCBI, find out and explain briefly what are the statuses of cattle (Bos taurus) and horse (Equus caballus) genome projects. Are they published? Bos taurus: assembly, published, release date: 01.10.2004 Equus caballus: assembly, published, release date: 12.01.2007 ________________________________________________________________________________ 4. Accession number to dog insulin gene sequence is NM_001130093. On the basis of this information, construct a FASTA-file consisting of dog, cat, swine, human, chimpanzee and rabbit insulin sequences. Nucleotide -> insulin (INS) dog: Canis lupus familiaris, NM_001130093 cat: Felis catus, NM_001009272 swine: Sus scrofa domestica, NM_001109772 human: Homo sapiens, NM_000207 chimpanzee: Pan troglodytes, NM_001008996 rabbit: Oryctolagus cuniculus, NM_001082335 Align the sequences by using Clustal. One possibility is to use one of the free servers, for example http://www.ebi.ac.uk/Tools/clustalw2/index.html, another is to download Clustal (X or W) to your own computer from http://www.clustal.org/. Your answer to this should only include a brief explanation about what you have done and some general description about the alignment, for example how many nucleotide (base) differences between human, chimpanzee, rabbit etc. Please, do not send the alignment by e-mail, but be prepared to show it in the class. When you perform the alignment, use the default settings and do not pay attention to various scoring, gap penalty etc. options. This kind of things will be teached during the lectures. In the present exercise you just familiarize, a little bit, to alignment as a practical concept.