Fragment Identificator 1.20 beta (Jul 2006)

  http://www.cs.helsinki.fi/group/sysfys/software/fragid/



DESCRIPTION
-----------

Fragment Identificator (FiD) is a tool to assist mass spectrometrist in defining
the structures of corresponding peaks of tandem mass spectrum of a known
molecule. FiD incorporates an algorithm which computes all possible fragment
structures for each peak without the usage of predefined fragmentation rules.



REQUIREMENTS
------------

- Windows 2000 or newer
- (Windows 98 not tested, might work)
- Internet explorer 6.0 or newer



INSTALLATION
------------

To install unzip the package into a folder and run 'fragmenter.exe'.

Package includes KEGG ligand compound database (~14k molecules) as a single file
(~11MB) instead
of separate mol files.



FILES
-----

+ FiD/
     - fragmenter.exe
     - leda_md.dll                = Leda 5.1 library
     - atomic_weights.txt         = exact masses of atoms
     - bondenergies.txt           = standard covalent energies for bonds
     - compoundparser             = KEGG compound file in FiD-trimmed format
     - readme.txt                 = this help file
     - preferences.txt            = help file explaining preferences
     + pidc/                      = folder for PIDC-compatible result fragment
     files
          - place-holder.txt
     + Images/
          - dot.exe               = Graphviz' dot 2.8
     + MS files/                  = spectrum peak definition files
          - C00041.txt            = example ms/ms peaks for amino acids
          - C00047.txt            = ...
          - ...


QUICK USE CASE
--------------

To quickly get acquinted with FiD, one can follow this example procedure:

1) Choose 'L-Lysine' from molecule list (C00047) using list or search
2) Input Lysine's peaks from file 'MS files/C00047.txt' using 'Load from file'
button
3) Start fragment search with 'Calculate fragments' button
4) Browse the generated fragments
5) Start solution search with 'Calcualte solution' button
6) Visualize the received fragmentation tree by first generating an image of it
using 'Draw Tree' button and the clicking 'View Tree'



INPUT
-----

FiD takes as input a molecule and its accompanying mass spectrum.

Spectrum data is related to a selected molecule, thus a molecule has to be
selected first.

Molecule can be inputted:
 - By choosing from the molecule list. It contains the release 39 version of
 KEGG ligand compound database (around 14k organic molecules). The list includes
 a search functionality which looks the search string from molecules all fields
 (name, synonyms, CAS, ligand id number). The search uses AND logic whenever
 several words are given. 'Clear' button clears the search results and returns
 the whole list.

 The molecule list can be customized from Preferences (F5) to include/exclude
 any of the four fields: Name, synonyms, CAS number, KEGG id.

 - From external MOL-files (File->Open MOL file). The mol file can be in either
 standard MDL format or in slightly differing mol format used by kegg.
 Properties blocks (blocks and lines starting with "M") aren't supported. FiD
 only reads header, atom and bond blocks.


Mass spectrum can be inputted:
 - By inserting the masses and intensities by hand using 'Add' button. 'Remove'
 button removes masses.

 NOTE: Masses (m/z ratios) can be inputted in arbitrary accuracy. By default the
 application uses integer accuracy (Meaning there's 0.5 tolerance to the given
 mass in both ways. E.g. for a given mass 120.34 all fragments with mass in the
 range of [120.34-0.5; 120.34+0.5] = [119.84; 120.84] are accepted as
 candidates.).

 Mass accuracy can be changed from Preferences (F5) using the 'Mass accuracy'
 toggle.

 Intensity Threshold in the preferences excludes from computations all
 mass/intensity pairs lower than the threshold. This is useful when inserting
 complete spectrum data with lots of low peaks.

 - From text file using 'Load from file' button.
 The file format is simple:
  - Any rows starting with '#' (without apostrophes) and blank ones are ignored.
  - All other rows are treated as (<m/z>,<intensity>) pairs which can be
  separated by space(s) or tab(s).

Information of a selected molecule is provided in the information panel.
Molecules are visualized based on mol-file coordinate definitions.

The visualization can be customized in the Preferences (F5) by selecting 'Show
Carbon ID's' to show carbon indices or by ticking 'Show Hydrogen index' to show
how many implicit hydrogens each atom contains.



OUTPUT
------

Action->Submit saves the chosen fragments in a PIDC-compatible format.

  http://www.cs.helsinki.fi/group/sysfys/software/pidc/



USAGE
-----

After choosing a molecule and inserting its peaks, there are two ways to
proceed:

- Calculate fragments button
- Calculate solution button

Both action can be found from menu, toolbar and from button on the bottom right
corner.

Fragment calculation computes all possible fragments matching the given peaks
and peak accuracy value. This might take a while.

The user can select the correct fragments for each peak in the fragment list by
hand or by using the automatical 'Calculate solution' button, which construct a
MILP model defined in the Preferences (F5) and solves it using lpsolve-module.
The MILP solver finds the optimal set of fragments covering all peaks according
to a model cost function.

There are three MILP models to be chosen from:

- Energy Threshold, where bond cost function is shared: a cleaved bond can be
freely cut in every fragment
- Multistep, where fragmentation graph is formed and least-cost subgraph chosen,
which explains the whole spectrum
- H-Optimized Multistep, which optimizes the hydrogen atoms. (Recommended)

If 'H-Optimized Multistep' is computationally too expensive, the 'Energy
Threshold' model is recommended instead.

If Fragments are not calculated, they are computed automatically as prestep.

All calculations can be stopped with Stop-button.


After calculating or choosing the fragments for peaks, the fragmentation graph
can be imagified with "Draw Tree" and "View Tree" buttons. 'Draw Tree'
effectively uses Graphviz to draw a fragmentation tree into a file. 'View Tree'
uses Windows default image viewer application to open this file.



PUBLICATIONS
------------

- Heinonen, M., Rantanen, A., Mielikäinen, T., Pitkänen, E., Kokkonen, J.,
Rousu, J. Ab Initio Prediction of Moleculear Fragments from Tandem Mass
Spectrometry Data. German Conference on Bioinformatics 2006 (GCB'06), accepted.



CREDITS
-------

FiD was produced in the university of Helsinki, Finland in the SYSFYS project
(Experimental and computational analysis of physiological regulation at
transcriptome, proteome and metabolome level).


Development, testing
- Markus Heinonen

Ideas and support
- Ari Rantanen
- Taneli Mielikäinen
- Esa Pitkänen
- Juho Rousu

Chemical expertise
- Juha Kokkonen
- Jari Kiuru
- Virpi Tarkiainen



CONTACT
-------

Comments, suggestions, bugs, etc. are eagerly received at

   markus.heinonen@cs.helsinki.fi