Back to HPM and TreeDT

Haplotype Pattern Mining (HPM) algorithm for gene localization

Input

  • marker map M = (m1, ... ,mk)
  • phenotype vector Y = (Y1, ..., Yn)
  • haplotype matrix H of size n * k
  • association threshold x for chi-squared test
  • maximum pattern length l
  • maximum number of gaps g
  • maximum gap size s
  • Output

  • prediction for a DS locus: the marker with the highest marker frequency in those haplotype patterns over M that are strongly associated with the disease with respect to H, Y, and x, and that adhere to l, g, and s
  • Method

    // Initialize the set of strongly associated haplotype patterns:
    PP := {};
    // Number of case and control chromosomes:
    piA := number of disease-associated chromosomes;
    piC := number of control chromosomes;
    pi := piA + piC;
    // A lower bound for pattern frequency:
    lb := piA * pi * x / (piC * pi + piA * x);
    // Variable for iterating over different patterns:
    P = (p1, ... , pk) := ('*', ... , '*');
    for  i := 1 to k  {
        // alleles(mi) is the set of alleles of the i:th marker
        foreach  a  in alleles(mi)  {
            pi := a;
            // Test pattern P and all its extensions:
            depthFirst(P, i, i, 0, 0);
            // Reset  pi:
            pi := '*';
        }
    }
    for  i := 1 to k  { compute marker frequency  f(mi) in the set PP of patterns; }
    output the marker which has the highest marker frequency;


    // Test haplotype pattern P and all patterns that can be generated by extending P from the right:
    procedure depthFirst(P, start, i, nr_of_gaps, gap_length) {
        // Add strongly associated patterns to PP:
        if  chi-squared(P, M, H, Y) >= x  and  pi != '*'  then insert P to the set PP;
        // Return if extended patterns would be too long:
        if  i = k  or  i+1-start > l  then return;
        // Return if extended patterns can not be strongly disease-associated:
        if  frequency of P in disease-associated chromosomes is less than  lb  then return;
        // Create and test legal extensions of current pattern P (3 cases):
        // 1. Give marker i+1 all possible values:
        foreach  a  in alleles(mi+1)  {
            pi+1 := a;
            depthFirst(P, start, i+1, nr_of_gaps, 0);
        }
        // 2. Introduce a new gap starting at marker i+1:
        if  pi != '*'  and  nr_of_gaps < g and s >= 1 then {
            pi+1 := '*';
            depthFirst(P, start, i+1, nr_of_gaps+1, 1);
        }
        // 3. Extend the current gap over marker i+1:
        if  pi = '*'  and  gap_length < s then {
            pi+1 := '*';
            depthFirst(P, start, i+1, nr_of_gaps, gap_length+1);
        }
        // Before returning, reset pi+1:
        pi+1 := '*';
        return;
    }



    Back to HPM and TreeDT