Fragment Identificator 1.20 beta (Jul 2006) http://www.cs.helsinki.fi/group/sysfys/software/fragid/ DESCRIPTION ----------- Fragment Identificator (FiD) is a tool to assist mass spectrometrist in defining the structures of corresponding peaks of tandem mass spectrum of a known molecule. FiD incorporates an algorithm which computes all possible fragment structures for each peak without the usage of predefined fragmentation rules. REQUIREMENTS ------------ - Windows 2000 or newer - (Windows 98 not tested, might work) - Internet explorer 6.0 or newer INSTALLATION ------------ To install unzip the package into a folder and run 'fragmenter.exe'. Package includes KEGG ligand compound database (~14k molecules) as a single file (~11MB) instead of separate mol files. FILES ----- + FiD/ - fragmenter.exe - leda_md.dll = Leda 5.1 library - atomic_weights.txt = exact masses of atoms - bondenergies.txt = standard covalent energies for bonds - compoundparser = KEGG compound file in FiD-trimmed format - readme.txt = this help file - preferences.txt = help file explaining preferences + pidc/ = folder for PIDC-compatible result fragment files - place-holder.txt + Images/ - dot.exe = Graphviz' dot 2.8 + MS files/ = spectrum peak definition files - C00041.txt = example ms/ms peaks for amino acids - C00047.txt = ... - ... QUICK USE CASE -------------- To quickly get acquinted with FiD, one can follow this example procedure: 1) Choose 'L-Lysine' from molecule list (C00047) using list or search 2) Input Lysine's peaks from file 'MS files/C00047.txt' using 'Load from file' button 3) Start fragment search with 'Calculate fragments' button 4) Browse the generated fragments 5) Start solution search with 'Calcualte solution' button 6) Visualize the received fragmentation tree by first generating an image of it using 'Draw Tree' button and the clicking 'View Tree' INPUT ----- FiD takes as input a molecule and its accompanying mass spectrum. Spectrum data is related to a selected molecule, thus a molecule has to be selected first. Molecule can be inputted: - By choosing from the molecule list. It contains the release 39 version of KEGG ligand compound database (around 14k organic molecules). The list includes a search functionality which looks the search string from molecules all fields (name, synonyms, CAS, ligand id number). The search uses AND logic whenever several words are given. 'Clear' button clears the search results and returns the whole list. The molecule list can be customized from Preferences (F5) to include/exclude any of the four fields: Name, synonyms, CAS number, KEGG id. - From external MOL-files (File->Open MOL file). The mol file can be in either standard MDL format or in slightly differing mol format used by kegg. Properties blocks (blocks and lines starting with "M") aren't supported. FiD only reads header, atom and bond blocks. Mass spectrum can be inputted: - By inserting the masses and intensities by hand using 'Add' button. 'Remove' button removes masses. NOTE: Masses (m/z ratios) can be inputted in arbitrary accuracy. By default the application uses integer accuracy (Meaning there's 0.5 tolerance to the given mass in both ways. E.g. for a given mass 120.34 all fragments with mass in the range of [120.34-0.5; 120.34+0.5] = [119.84; 120.84] are accepted as candidates.). Mass accuracy can be changed from Preferences (F5) using the 'Mass accuracy' toggle. Intensity Threshold in the preferences excludes from computations all mass/intensity pairs lower than the threshold. This is useful when inserting complete spectrum data with lots of low peaks. - From text file using 'Load from file' button. The file format is simple: - Any rows starting with '#' (without apostrophes) and blank ones are ignored. - All other rows are treated as (,) pairs which can be separated by space(s) or tab(s). Information of a selected molecule is provided in the information panel. Molecules are visualized based on mol-file coordinate definitions. The visualization can be customized in the Preferences (F5) by selecting 'Show Carbon ID's' to show carbon indices or by ticking 'Show Hydrogen index' to show how many implicit hydrogens each atom contains. OUTPUT ------ Action->Submit saves the chosen fragments in a PIDC-compatible format. http://www.cs.helsinki.fi/group/sysfys/software/pidc/ USAGE ----- After choosing a molecule and inserting its peaks, there are two ways to proceed: - Calculate fragments button - Calculate solution button Both action can be found from menu, toolbar and from button on the bottom right corner. Fragment calculation computes all possible fragments matching the given peaks and peak accuracy value. This might take a while. The user can select the correct fragments for each peak in the fragment list by hand or by using the automatical 'Calculate solution' button, which construct a MILP model defined in the Preferences (F5) and solves it using lpsolve-module. The MILP solver finds the optimal set of fragments covering all peaks according to a model cost function. There are three MILP models to be chosen from: - Energy Threshold, where bond cost function is shared: a cleaved bond can be freely cut in every fragment - Multistep, where fragmentation graph is formed and least-cost subgraph chosen, which explains the whole spectrum - H-Optimized Multistep, which optimizes the hydrogen atoms. (Recommended) If 'H-Optimized Multistep' is computationally too expensive, the 'Energy Threshold' model is recommended instead. If Fragments are not calculated, they are computed automatically as prestep. All calculations can be stopped with Stop-button. After calculating or choosing the fragments for peaks, the fragmentation graph can be imagified with "Draw Tree" and "View Tree" buttons. 'Draw Tree' effectively uses Graphviz to draw a fragmentation tree into a file. 'View Tree' uses Windows default image viewer application to open this file. PUBLICATIONS ------------ - Heinonen, M., Rantanen, A., Mielikäinen, T., Pitkänen, E., Kokkonen, J., Rousu, J. Ab Initio Prediction of Moleculear Fragments from Tandem Mass Spectrometry Data. German Conference on Bioinformatics 2006 (GCB'06), accepted. CREDITS ------- FiD was produced in the university of Helsinki, Finland in the SYSFYS project (Experimental and computational analysis of physiological regulation at transcriptome, proteome and metabolome level). Development, testing - Markus Heinonen Ideas and support - Ari Rantanen - Taneli Mielikäinen - Esa Pitkänen - Juho Rousu Chemical expertise - Juha Kokkonen - Jari Kiuru - Virpi Tarkiainen CONTACT ------- Comments, suggestions, bugs, etc. are eagerly received at markus.heinonen@cs.helsinki.fi