MOODS - Perl extension for finding significant matches of position weight matrices.
Title : search
Usage : @results = MOODS::search(seq => Bio::Seq(..), -matrix =>[[1,0],[0,1]] -threshold => 0.1)
Function: Finds position weight matrix matches in dna sequence.
Returns : An array of references to result arrays. There are one result array
corresponding to each matrix. (matrix1_results, matrix2_results,..)
Each result array is a list of positions and scores like:
(pos1, score1, pos2, score2 ...)
Args :
Obligatory
-seq BioPerl sequence object
-matrix or -matrices
A matrix or a list of matrices. One matrix is represented
as a typical perl multidimensional array: a reference to array of
references to arrays of numbers, corresponding to the frequencies
or scores of the nucleotides A, C, G and T, respectively
-threshold or -thresholds
A number or a list of numbers, used as threshold values for
matrix scanning. If a single number is given, it is used
for all matrices; otherwise, there should be as many
threshold values as there are matrices.
Optional
-bg Background distribution - an array of four doubles. If neither
-bg or -flatbg is given, the background is estimated from
the sequence.
-flatbg
If 1, the background distribution is set to a distribution
giving equal probability to all characters. Not compatible
with -bg. If neither -bg or -flatbg is given, the background
is estimated from the sequence.
-count_log_odds
If 1, assumes that the input matrices are frequency or
count matrices, and converts them to log-odds scoring
matrices; otherwise, treat them as scoring matrices.
Default 1.
-threshold_from_p
If 1, assumes that thresholds are p-values and computes
the corresponding absolute threshold based on the matrix;
otherwise the threshold is used as a hard cut-off.
Default 1.
-log_base
Base for logarithms used in log-odds computations. Relevant
if using -convert_log_odds => 1 and -threshold_from_p => 0.
Defaults to natural logarithm if parameter is not given.
-pseudocount
Pseudocount used in log-odds conversion and added to
sequence symbol counts when estimating the background
from sequence. Default 1.
Tuning parameters:
(Optional, do not affect the results, but can give minor
speed-ups in some cases. You can pretty much ignore these.)
-algorithm Selects the algorithm to use for scanning
"naive" naive algorithm
"pla" permutated lookahead algorithm
"supera" super alphabet algorithm.
- Good for long matrices (> 20)
"lf" lookahead filtration algorithm.
- Default algorithm in most cases.
- Sequence can be searched with multiple matrices
simultaneously.
- You should use this when you have large amount of matrices.
-q An integer, used for fine-tuning "supera" and "lf" algorithms.
The default value 7 should be ok pretty much always, but can
be tuned to possibly slightly increase performance.
-combine
determines whether "lf" algorithm combines all
matrices to a single scanning pass.
-buffer_size
use Bio::Perl;
use Bio::Seq;
use MOODS;
use MOODS::Tools qw(printResults);
#we need a position weight matrix
my $matrix = [ [10,0,0],
[0,10,0],
[0,0,10],
[10,10,10]];
#we need also a bioperl sequence object
my $seq = Bio::Seq->new(-seq => 'actgtggggacgtcagtagcaggcatag',
-alphabet => 'dna' );
my @results = MOODS::search(-seq => $seq, -matrix => $matrix, -threshold => 0.3);
printResults($results[0]);
BioPerl documentation.
Petri J Martinmaki, Janne H Korhonen
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/