MOODS - Perl extension for finding significant matches of position weight matrices.
Title : search Usage : @results = MOODS::search(seq => Bio::Seq(..), -matrix =>[[1,0],[0,1]] -threshold => 0.1) Function: Finds position weight matrix matches in dna sequence. Returns : An array of references to result arrays. There are one result array corresponding to each matrix. (matrix1_results, matrix2_results,..) Each result array is a list of positions and scores like: (pos1, score1, pos2, score2 ...) Args : Obligatory -seq BioPerl sequence object -matrix or -matrices A matrix or a list of matrices. One matrix is represented as a typical perl multidimensional array: a reference to array of references to arrays of numbers, corresponding to the frequencies or scores of the nucleotides A, C, G and T, respectively -threshold or -thresholds A number or a list of numbers, used as threshold values for matrix scanning. If a single number is given, it is used for all matrices; otherwise, there should be as many threshold values as there are matrices. Optional -bg Background distribution - an array of four doubles. If neither -bg or -flatbg is given, the background is estimated from the sequence. -flatbg If 1, the background distribution is set to a distribution giving equal probability to all characters. Not compatible with -bg. If neither -bg or -flatbg is given, the background is estimated from the sequence. -count_log_odds If 1, assumes that the input matrices are frequency or count matrices, and converts them to log-odds scoring matrices; otherwise, treat them as scoring matrices. Default 1. -threshold_from_p If 1, assumes that thresholds are p-values and computes the corresponding absolute threshold based on the matrix; otherwise the threshold is used as a hard cut-off. Default 1. -log_base Base for logarithms used in log-odds computations. Relevant if using -convert_log_odds => 1 and -threshold_from_p => 0. Defaults to natural logarithm if parameter is not given. -pseudocount Pseudocount used in log-odds conversion and added to sequence symbol counts when estimating the background from sequence. Default 1. Tuning parameters: (Optional, do not affect the results, but can give minor speed-ups in some cases. You can pretty much ignore these.) -algorithm Selects the algorithm to use for scanning "naive" naive algorithm "pla" permutated lookahead algorithm "supera" super alphabet algorithm. - Good for long matrices (> 20) "lf" lookahead filtration algorithm. - Default algorithm in most cases. - Sequence can be searched with multiple matrices simultaneously. - You should use this when you have large amount of matrices. -q An integer, used for fine-tuning "supera" and "lf" algorithms. The default value 7 should be ok pretty much always, but can be tuned to possibly slightly increase performance. -combine determines whether "lf" algorithm combines all matrices to a single scanning pass. -buffer_size
use Bio::Perl; use Bio::Seq; use MOODS; use MOODS::Tools qw(printResults); #we need a position weight matrix my $matrix = [ [10,0,0], [0,10,0], [0,0,10], [10,10,10]]; #we need also a bioperl sequence object my $seq = Bio::Seq->new(-seq => 'actgtggggacgtcagtagcaggcatag', -alphabet => 'dna' ); my @results = MOODS::search(-seq => $seq, -matrix => $matrix, -threshold => 0.3); printResults($results[0]);
BioPerl documentation.
Petri J Martinmaki, Janne H Korhonen
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/